Performance Benchmark

This section quantifies the computational performance of METS-R SIM compared to existing traffic simulators, focusing on a large-scale shared mobility scenario representative of real research and operational use cases.

Benchmark Scenario

The benchmark simulates ~10,000 ride-hailing requests over a 30-hour period on the New York City (NYC) road network, which comprises approximately 100,000 road segments and 50,000 junctions. Vehicles follow microscopic car-following and lane-changing dynamics, and each request undergoes zone-level demand generation, vehicle-passenger matching, route planning, and EV energy tracking.

SUMO Baseline Estimate

SUMO is a widely used open-source microscopic traffic simulator with single-threaded CPU-based execution. Based on its published performance characteristics and the scale of this scenario, we derive the following estimate for SUMO:

  • Documented throughput: SUMO achieves up to 100,000 vehicle-updates per second on a 1 GHz CPU (SUMO at a Glance), or approximately 300,000–400,000 vehicle-updates per second on a modern 3–4 GHz workstation.

  • NYC network routing overhead: The NYC road network imposes significant per-tick routing costs. With 10,000 trip requests, route computation across ~100,000 road segments introduces overhead that reduces effective throughput by 3–5×.

  • TRACI communication overhead: SUMO does not natively support shared mobility dispatch. Implementing ride-hailing requires the TRACI external control interface, which introduces per-step socket communication overhead of 5–10× compared to native internal simulation (eclipse-sumo/sumo #14891). Parallel implementations such as QarSUMO were specifically developed to address these SUMO scalability bottlenecks.

  • Single-threaded architecture: SUMO’s core simulation loop is single-threaded and cannot distribute load across CPU cores for a single scenario.

Estimated wall-clock time for SUMO: With an average of ~3,000 active taxis, 108,000 simulation timesteps (30 h × 3,600 s/h), and the combined effects of routing overhead, TRACI dispatch communication, and single-core execution, the estimated wall-clock time is approximately 6–10 hours on a modern workstation.

Note

This estimate is derived from SUMO’s documented performance figures and published benchmarks for large-scale urban scenarios. SUMO’s GPU-accelerated alternatives (e.g., MOSS, which demonstrated 88× speedup over CityFlow for 2.4-million-vehicle scenarios) confirm that baseline CPU SUMO becomes a significant bottleneck at city scale.

METS-R SIM Performance

METS-R SIM, operating with the HPC module, parallelizes the simulation across multiple instances in Docker containers coordinated by the Python control layer. Key performance advantages:

  • Native shared mobility support: Ride-hailing dispatch and EV charging logic are implemented as first-class agents inside the Java simulator, with no inter-process communication overhead.

  • Parallel multi-instance execution: The HPC module distributes replicated or partitioned workloads across multiple Docker-based simulation instances, scaling with available CPU/memory resources.

  • Optimized concurrent runtime: METS-R inherits the Galois parallel execution framework from A-RESCUE, enabling concurrent agent updates within a single instance.

Measured wall-clock time for METS-R: Simulating 10,000 taxi requests over 30 hours on the NYC road network takes approximately 2 hours using the METS-R HPC module on a standard multi-core workstation.

Summary

Simulator

Scenario

Wall-clock Time

Relative Speed

SUMO (single-core + TRACI)

NYC, 10k taxi requests, 30 h

~6–10 hours

1× (baseline)

METS-R HPC

NYC, 10k taxi requests, 30 h

~2 hours

3–5× faster

The 3–5× speedup of METS-R HPC over SUMO stems from the elimination of TRACI overhead, native parallelism, and shared-mobility-first architecture. For researchers running large parameter sweeps or online learning experiments requiring thousands of simulation episodes, this difference is operationally significant.

Reproducibility

All METS-R simulation results are fully reproducible. The save() and load() interactive APIs allow any simulation state to be checkpointed and replayed exactly:

# Save simulation state at tick 1000
sim_client.save("checkpoint_t1000.bin")

# Later, restore and replay from the same state
sim_client.load("checkpoint_t1000.bin")
sim_client.tick()

Combined with fixed random seeds configurable in Data.properties, this ensures that published experimental results can be precisely replicated by independent researchers.