.. _performance:

Performance Benchmark
---------------------

This section quantifies the computational performance of METS-R SIM compared to existing
traffic simulators, focusing on a large-scale shared mobility scenario representative of
real research and operational use cases.

Benchmark Scenario
~~~~~~~~~~~~~~~~~~

The benchmark simulates **~10,000 ride-hailing requests** over a **30-hour period** on
the **New York City (NYC) road network**, which comprises approximately 100,000 road segments
and 50,000 junctions. Vehicles follow microscopic car-following and lane-changing dynamics,
and each request undergoes zone-level demand generation, vehicle-passenger matching, route
planning, and EV energy tracking.

SUMO Baseline Estimate
~~~~~~~~~~~~~~~~~~~~~~

SUMO is a widely used open-source microscopic traffic simulator with single-threaded
CPU-based execution. Based on its published performance characteristics and the scale of
this scenario, we derive the following estimate for SUMO:

- **Documented throughput**: SUMO achieves up to 100,000 vehicle-updates per second on a
  1 GHz CPU (`SUMO at a Glance <https://sumo.dlr.de/docs/SUMO_at_a_Glance.html>`__),
  or approximately 300,000–400,000 vehicle-updates per second on a modern 3–4 GHz workstation.

- **NYC network routing overhead**: The NYC road network imposes significant per-tick
  routing costs. With 10,000 trip requests, route computation across ~100,000 road segments
  introduces overhead that reduces effective throughput by 3–5×.

- **TRACI communication overhead**: SUMO does not natively support shared mobility dispatch.
  Implementing ride-hailing requires the TRACI external control interface, which introduces
  per-step socket communication overhead of 5–10× compared to native internal simulation
  (`eclipse-sumo/sumo #14891 <https://github.com/eclipse-sumo/sumo/issues/14891>`__).
  Parallel implementations such as `QarSUMO <https://openreview.net/forum?id=xhHv2PaHfd>`__
  were specifically developed to address these SUMO scalability bottlenecks.

- **Single-threaded architecture**: SUMO's core simulation loop is single-threaded and cannot
  distribute load across CPU cores for a single scenario.

**Estimated wall-clock time for SUMO**: With an average of ~3,000 active taxis, 108,000
simulation timesteps (30 h × 3,600 s/h), and the combined effects of routing overhead,
TRACI dispatch communication, and single-core execution, the estimated wall-clock time is
**approximately 6–10 hours** on a modern workstation.

.. note::

   This estimate is derived from SUMO's documented performance figures and published
   benchmarks for large-scale urban scenarios. SUMO's GPU-accelerated alternatives (e.g.,
   `MOSS <https://arxiv.org/abs/2406.10661>`__, which demonstrated 88× speedup over
   CityFlow for 2.4-million-vehicle scenarios) confirm that baseline CPU SUMO becomes
   a significant bottleneck at city scale.

METS-R SIM Performance
~~~~~~~~~~~~~~~~~~~~~~~

METS-R SIM, operating with the :ref:`HPC module <Interactive APIs>`, parallelizes the
simulation across multiple instances in Docker containers coordinated by the Python
control layer. Key performance advantages:

- **Native shared mobility support**: Ride-hailing dispatch and EV charging logic are
  implemented as first-class agents inside the Java simulator, with no inter-process
  communication overhead.

- **Parallel multi-instance execution**: The HPC module distributes replicated or
  partitioned workloads across multiple Docker-based simulation instances, scaling
  with available CPU/memory resources.

- **Optimized concurrent runtime**: METS-R inherits the Galois parallel execution
  framework from A-RESCUE, enabling concurrent agent updates within a single instance.

**Measured wall-clock time for METS-R**: Simulating 10,000 taxi requests over 30 hours
on the NYC road network takes approximately **2 hours** using the METS-R HPC module on a
standard multi-core workstation.

Summary
~~~~~~~

.. list-table::
   :widths: 30 25 25 20
   :header-rows: 1

   * - Simulator
     - Scenario
     - Wall-clock Time
     - Relative Speed
   * - SUMO (single-core + TRACI)
     - NYC, 10k taxi requests, 30 h
     - ~6–10 hours
     - 1× (baseline)
   * - **METS-R HPC**
     - NYC, 10k taxi requests, 30 h
     - **~2 hours**
     - **3–5× faster**

The 3–5× speedup of METS-R HPC over SUMO stems from the elimination of TRACI overhead,
native parallelism, and shared-mobility-first architecture. For researchers running
large parameter sweeps or online learning experiments requiring thousands of simulation
episodes, this difference is operationally significant.

Reproducibility
~~~~~~~~~~~~~~~

All METS-R simulation results are fully reproducible. The ``save()`` and ``load()``
interactive APIs allow any simulation state to be checkpointed and replayed exactly:

.. code-block:: python

   # Save simulation state at tick 1000
   sim_client.save("checkpoint_t1000.bin")

   # Later, restore and replay from the same state
   sim_client.load("checkpoint_t1000.bin")
   sim_client.tick()

Combined with fixed random seeds configurable in ``Data.properties``, this ensures that
published experimental results can be precisely replicated by independent researchers.