This tutorial makes three assumptions
- It targets an AlpaSim user rather than an AlpaSim developer
- It treats docker compose as the primary execution environment.
- It focuses on letting the user do simple things quick and leaves detail for later. This is reflected in subdivision into three levels of complexity.
In level 1 we run a default simulation with the VaVAM driver policy, learn how to interpret the results, and perform basic debugging.
AlpaSim consists of multiple networked microservices (renderer, physics simulation, runtime, controller, driver, traffic simulation). The AlpaSim runtime requests observed video frames from the renderer and egomotion history from the controller, communicates with the physics microservice to constrain actors to the road surface, and provides the information to the driver, with the expectation of receiving driving decisions in return to close the loop.
This repository contains the implementations of a subset of the services needed to execute the simulation as well as config files and infra code necessary to bring the microservices up via docker/enroot.
Let's start by executing a run with default settings.
- Follow onboarding to ensure necessary dependencies have been installed
- Set up your environment with:
- If you are on a slurm cluster, this step may need to be done on a compute node rather than a login node, depending on how your cluster is set up.
source setup_local_env.sh- This will compile protos, download an example driver model, ensure you have a valid Hugging
Face token, and install the
alpasim_wizardcommand line tool.
- Run the following one-time setup steps required by the default Docker workflow:
If you need to create a Hugging Face token, see the Hugging Face access section in onboarding.
# 1) Ensure HF env is present (needed for scene/model downloads) export HF_TOKEN="<your_hf_token>" # 2) Download VaVAM assets required by default driver bash data/download_vavam_assets.sh --model vavam-b
- Run the wizard to create the necessary config files, download the scene (if necessary), and run a
simulation:
uv run alpasim_wizard deploy=local topology=1gpu driver=vavam wizard.log_dir=$PWD/tutorial. This will create atutorial/directory with all necessary config files and run the simulation.
The simulation logs/output will be in the created tutorial directory. For a visualization of the
results, an mp4 file is created in tutorial/eval/videos/clipgt-026d..._0.mp4. The full results
should looks something like:
tutorial/
├── aggregate
│ ├── metrics_results.png
│ ├── metrics_results.txt
│ ├── metrics_unprocessed.parquet
│ └── videos
│ ├── all
│ │ └── clipgt-026d6a39-bd8f-4175-bc61-fe50ed0403a3_814f3c22-bb78-11f0-a5f3-2f64b47b8685_0.mp4
│ └── violations
│ ├── collision_at_fault
│ ├── collision_rear
│ ├── dist_to_gt_trajectory
│ │ └── clipgt-026d6a39-bd8f-4175-bc61-fe50ed0403a3_814f3c22-bb78-11f0-a5f3-2f64b47b8685_0.mp4 -> ../../all/clipgt-026d6a39-bd8f-4175-bc61-fe50ed0403a3_814f3c22-bb78-11f0-a5f3-2f64b47b8685_0.mp4
│ └── offroad
├── rollouts
│ ├── clipgt-f7020b3e-3d61-4cb6-b157-2f4aac1a7d8d
│ │ └── 86513b18-96c5-11ef-8b6f-b83fd26d88f0
│ │ ├── rollout.asl
│ │ ├── rollout_indexed
│ │ │ ├── camera_front_tele_30fov.mp4
│ │ │ ├── camera_front_wide_120fov.mp4
│ │ │ ├── manifest.json
│ │ │ ├── rclog-all.index
│ │ │ └── rclog-all-indexed.log
│ │ ├── rollout.rclog
│ │ ├── metrics.parquet
│ │ ├── {clipgt_id}_{batch_id}_{rollout_id}.mp4
│ │ └── _complete
│ ├── clipgt-fa369408-2787-41cb-b629-a7885d7c46e2
│ │ └── 8656ecb6-96c5-11ef-8b6f-b83fd26d88f0
│ │ ├── rollout.asl
│ │ ├── rollout_indexed
│ │ │ ├── camera_front_tele_30fov.mp4
│ │ │ ├── camera_front_wide_120fov.mp4
│ │ │ ├── manifest.json
│ │ │ ├── rclog-all.index
│ │ │ └── rclog-all-indexed.log
│ │ ├── rollout.rclog
│ │ ├── metrics.parquet
│ │ ├── {clipgt_id}_{batch_id}_{rollout_id}.mp4
│ │ └── _complete
│ └── clipgt-fe127c3f-8b06-4c4f-9933-1e5089a1a731
│ └── 864da3ea-96c5-11ef-8b6f-b83fd26d88f0
│ ├── rollout.asl
│ ├── rollout_indexed
│ │ ├── camera_front_tele_30fov.mp4
│ │ ├── camera_front_wide_120fov.mp4
│ │ ├── manifest.json
│ │ ├── rclog-all.index
│ │ └── rclog-all-indexed.log
│ ├── rollout.rclog
│ ├── metrics.parquet
│ ├── {clipgt_id}_{batch_id}_{rollout_id}.mp4
│ └── _complete
├── driver
│ └── vam-driver.yaml
├── driver-config.yaml
├── eval
│ ├── metrics_unprocessed.parquet
│ └── videos
│ └── clipgt-026d6a39-bd8f-4175-bc61-fe50ed0403a3_814f3c22-bb78-11f0-a5f3-2f64b47b8685_0.mp4
├── eval-config.yaml
├── generated-network-config.yaml
├── generated-user-config-0.yaml
├── metrics
├── run_metadata.yaml
├── run.sh
├── trafficsim-config.yaml
├── txt-logs
├── wizard-config-loadable.yaml
└── wizard-config.yaml
Some noteworthy files and directories:
rolloutscontains logs of simulation behavior in each rollout, used to analyze AV behavior and calculate metrics. The logs are organized intorollouts/{scenario.scene_id}/{batch_uuid}/rollout.*- in this case we have 3 scenes with one batch of a single rollout each. The subdivision into batches is historical and can be ignored for most purposes..aslfiles which record the messages exchanged within the simulation. These are useful for debugging the simulator behavior and replaying events. More in asl log format.metrics.parquetcontains per-rollout evaluation metrics.{clipgt_id}_{batch_id}_{rollout_id}.mp4evaluation videos (when video rendering is enabled), whereclipgt_id=f"clipgt-{scene_id}",batch_id="0",rollout_id=rollout_uuid._completeis a marker file created when a rollout finishes successfully. Used by the autoresume feature to track which rollouts completed and to remove incomplete rollout directories on restart.
aggregate/contains aggregated results across all rollouts:metrics_results.txt- Formatted table of driving scores (mean, std, quantiles)metrics_results.png- Visual summary of driving quality metricsmetrics_unprocessed.parquet- Combined metrics from all rolloutsvideos/- Videos organized by violation type (collision_at_fault, offroad, etc.)
metrics/contains performance profiling data (see OPERATIONS.md for details):metrics.prom- Prometheus metrics from simulationmetrics_plot.png- Performance visualization (CPU/GPU/RPC metrics)
driveris a directory with logs written by the driver service, useful to debug policy-internal problems.wizard-config.yamlcontains the config the wizard used for this run after applying the inheritance of hydra. This is useful for debugging configuration issues.generated-user-config-{ARRAY_ID}.yamlcontains an expanded version of the simulation config provided by the user, possibly split into chunks when simulating on multiple nodes.trafficsim-config.yaml. A copy of the traffic simulation config used for simulation, useful for debugging traffic simulation.generated-network-config.yamldescribes which services listen on which ports during simulation. Not useful unless debugging the simulator itself.
If everything went correctly rollouts and aggregate are usually the only results of interest.
For understanding driving quality metrics and performance tuning, see the
Operations Guide.
⚠️ This section is about debugging the configuration of the simulator itself (not of vehicle behavior within simulation)
The console contains logs from all microservices, and is the first place one should look when
something goes wrong. When an error happens (for example the rollouts directory does not appear),
it's best to consult that log to see where the first errors occurred. The microservices may produce
additional logs that can be useful for debugging, but that is not covered here.
The wizard requires three config groups:
| Group | Purpose | Examples |
|---|---|---|
deploy= |
Where to run (filesystem paths, SLURM vs Docker) | local, local_external_driver |
topology= |
How many GPUs, replicas, and workers | 1gpu, 2gpu, 8gpu_64rollouts |
driver= |
Which driving model to use | vavam, alpamayo1, alpamayo1_5, manual |
Additionally, service-specific config groups can override the default images and launch behavior,
for example physics=disabled.
In level 2 we learn to customize the simulation (i.e. change the driver policy, change simulated scenes, etc.) and understand the architecture in more depth.
AlpaSim wizard is configured via hydra and takes in a .yaml
configuration file and arbitrary command line overrides. Example config files are in
src/wizard/configs/. We suggest reading base_config.yaml,
which has detailed comments on the configuration fields.
Under the top-level runtime item in the base_config.yaml, we describe the details of the
simulation to be performed (as opposed to deployment settings under wizard.* and services.*).
The important configurable fields of runtime are:
save_dir- the name of the directory where to saveasllogs. It needs to be kept in sync with wizard mount points. certain modulesendpoints- used to configure simulator scaling propertiessimulation_config- specify all the simulation parameters (e.g. timing, cameras, vehicle configuration, etc.).
For example, one might change the number of rollouts per scene generated in the configuration files by running the wizard as follows:
uv run alpasim_wizard deploy=local topology=1gpu driver=vavam wizard.log_dir=<dir> runtime.simulation_config.n_rollouts=8You can choose which video layouts to render via eval.video.video_layouts. Available layouts are DEFAULT (BEV map, camera, metrics) and REASONING_OVERLAY (first-person camera with reasoning text overlay and trajectory chart). To generate reasoning-overlay videos only, override when invoking the wizard, for example:
uv run alpasim_wizard deploy=local topology=1gpu driver=alpamayo1 wizard.log_dir=$PWD/tutorial eval.video.video_layouts=[REASONING_OVERLAY]You can also set eval.video.video_layouts=[DEFAULT,REASONING_OVERLAY] to render both layouts per rollout.
The driver in AlpaSim is a policy for the ego vechicle that takes in sensor inputs and optional navigation commands, and outputs a trajectory for the ego vehicle to follow, along with other optional outputs, such as chain-of-causation reasoning text.
The driver is specfied by a pair of config files under src/wizard/configs/, one for the driver
service itself, and one for the runtime (so that it provides the inputs required for the specific
driver).
The wizard uses VaVAM as the default driver. To explicitly define the driver config, one can use:
uv run alpasim_wizard deploy=local topology=1gpu driver=vavam wizard.log_dir=$PWD/tutorial_alpamayoBoth Alpamayo 1 and
Alpamayo 1.5 are 10B-parameter
driving models that share the same runtime config
(alpamayo_configs). Download the weights from HuggingFace before
running:
# Alpamayo 1
huggingface-cli download nvidia/Alpamayo-R1-10B
# Alpamayo 1.5 — both the model and its VLM backbone are gated;
# accept the license agreements first, then authenticate:
# https://huggingface.co/nvidia/Alpamayo-1.5-10B
# https://huggingface.co/nvidia/Cosmos-Reason2-8B
huggingface-cli login # paste your HF token when prompted
huggingface-cli download nvidia/Alpamayo-1.5-10B
huggingface-cli download nvidia/Cosmos-Reason2-8BThe wizard will use the HF_HOME environment variable to find the system HuggingFace cache
(~/.cache/huggingface by default). If the model weights do not exists locally, the driver service
will automatiocally download them, but the download may timeout, requiring you to re-run.
Alternatively, you can specify the path to the model directory by setting the
model.checkpoint_path configuration field.
Run with Alpamayo 1:
uv run alpasim_wizard deploy=local topology=1gpu driver=alpamayo1 wizard.log_dir=$PWD/tutorial_alpamayoRun with Alpamayo 1.5:
uv run alpasim_wizard deploy=local topology=1gpu driver=alpamayo1_5 wizard.log_dir=$PWD/tutorial_alpamayo
⚠️ Both models are large (10B parameters). Alpamayo 1 requires ~40 GB VRAM; Alpamayo 1.5 standard inference also requires ~40 GB VRAM.
To enable classifier-free guidance navigation for Alpamayo 1.5 (requires ~60 GB VRAM):
uv run alpasim_wizard deploy=local topology=1gpu driver=alpamayo1_5 wizard.log_dir=$PWD/tutorial_alpamayo driver.model.use_classifier_free_guidance_nav=trueTo visualize the predicted chain-of-causation reasoning you can change the generated video layout:
uv run alpasim_wizard deploy=local topology=1gpu driver=alpamayo1 wizard.log_dir=$PWD/tutorial_alpamayo eval.video.video_layouts=[REASONING_OVERLAY]As an example for how to integrate a different driver model, we provide a provisional integration for the Transfuser policy, specifically the Latent TransFuser v6 (LTFv6) model developed for NAVSIM.
To run with the Transfuser model use driver=transfuser.
First, one must download the Transfuser model weights/config from HuggingFace:
huggingface-cli download longpollehn/tfv6_navsim model_0060.pth --local-dir=data/drivers/transfuser/
huggingface-cli download longpollehn/tfv6_navsim config.json --local-dir=data/drivers/transfuser/Then, run the wizard with the following command:
uv run alpasim_wizard deploy=local topology=1gpu driver=transfuser wizard.log_dir=$PWD/tutorial_transfuserIf you would like to force the ego vehicle to follow its recorded trajectory, instead of following
the predictions of a policy, you can set
runtime.endpoints.{physics,trafficsim,controller}.skip: true,
runtime.simulation_config.physics_update_mode: NONE and
runtime.simulation_config.force_gt_duration_us to a very high value (20s+).
The scene in AlpaSim is a NuRec reconstruction of a real-world driving log.
Publicly available NuRec scenes are stored on
Hugging Face
and, once downloaded, are placed under data/nre-artifacts/all-usdzs. The scenes are identified by
their uuid, rather than their filenames, to prevent versioning issues. The list of currently
available scenes exists in scenes set and the set of available suites
exists in scene suites.
For custom scene selection, you can specify scenes manually using scenes.scene_ids:
uv run alpasim_wizard deploy=local topology=1gpu driver=vavam wizard.log_dir=$PWD/tutorial_2 scenes.scene_ids=['clipgt-02eadd92-02f1-46d8-86fe-a9e338fed0b6']If necessary, the scene will automatically be downloaded from Hugging Face to your local
data/nre-artifacts/all-usdzs directory. If the download is necessary, ensure you have set your
Hugging Face token in the HF_TOKEN environment variable as described in the onboarding
instructions.
📗 Scene ids are defined/viewable in
data/scenes/sim_scenes.csv⚠️ A scene id does not uniquely identify theusdzfile as the scene id comes from themetadata.yamlfile inside theusdzzip file. The proper artifact file will be chosen to satisfy the NRE version requirements.
Scene suites provide pre-validated collections of scenes for testing. To use the public sceneset with 910 validated scenes (:warning: this will download all the scenes):
uv run alpasim_wizard deploy=local topology=1gpu driver=vavam scenes.test_suite_id=public_2507 wizard.log_dir=$PWD/tutorial_suiteThis will run simulations across all 910 scenes in the public_2507 suite from the 25.07 release dataset.
Code changes in the repo are automatically mounted into the docker containers at runtime, with the
exception that the virtual environment of the container is not synced, so changes that rely on new
dependencies will require rebuilding the container image. To try this out, one can add some logging
statements to the driver code in src/driver/src/alpasim_driver/ and rerun the wizard.
The simulation is split into multiple microservices, each running in its own docker container. The
primary requirement for a custom container image is that it exposes a gRPC endpoint compatible with
the expected service interface. Default images are defined directly in
base_config.yaml, and plugin-provided config groups can
override individual services. You can also override any service directly by setting
services.<service>.image to the desired image name and updating the relevant service command
services.<service>.command. For more information about the service interfaces, please see the
protocol buffer definitions.
asl contains most of messages exchanged in the course of a batch simulation as size-delimited
protobuf messages. These files can be read to access detailed information about the course of the
simulation. Aside from being used for evaluation, they can also be useful for debugging model or
simulation behavior. The script at src/tools/log_replay/replay_logs_to_driver.py shows an
example of reading an asl log and "replaying the stimuli" on a driver instance, allowing for
reproducing behavior with your favorite debugger attached.
In level 3 we show how to circumvent the alpasim_wizard defined components: this enables use cases
such as enabling breakpoint debugging in components or even replacing components entirely. The basic
idea behind the approach is to:
- Use the
alpasim_wizardto generate config files without actually running the simulation - Manually start the desired components with the generated config files
- Use the
alpasim_wizardgenerated config files to run the rest of the simulation as normal.
For interactive control of the ego vehicle with keyboard input, see MANUAL_DRIVER.md. This allows you to drive through scenarios manually while visualizing the camera feed in real-time.
The following steps might be used to show how to debug the controller component with breakpoints in the context of a full simulation.
-
(Terminal 1) Run the wizard to generate config files without running the simulation:
uv run alpasim_wizard deploy=local topology=1gpu driver=vavam wizard.log_dir=$PWD/tutorial_dbg wizard.run_method=NONE wizard.debug_flags.use_localhost=True -
(Terminal 1)
cdto the generated directory (tutorial_dbg) and note the command/port of the component to be replaced indocker-compose.yaml. For the simulation case, we are looking for components in thesimprofile, which includescontroller-0,driver-0,physics-0,runtime-0, andsensorsim-0. Here we will replacecontroller-0, which in this case has been allocated port 6003. -
(Terminal 2)
cdinto the the controller src directory (<repo_root>/src/controller/) and prepare to start the controller. Note that there are various ways to accomplish this, including through an IDE. Add breakpoints as desired in the controller code and then start the controller with:cd <repo_root>/src/controller/ mkdir my_controller_log_dir # Note: port (6003 in this case) must match the port allocated in docker-compose.yaml uv run python -m alpasim_controller.server --port=6003 --log_dir=my_controller_log_dir --log-level=INFO
-
(Terminal 1) Start the rest of the simulation with docker compose:
docker compose -f docker-compose.yaml --profile sim up runtime-0 driver-0 physics-0 sensorsim-0
For VSCode users, instead of running the controller from the command line (step 3), you can use the built-in debugger:
- Create or update
.vscode/launch.jsonwith:
{
"version": "0.2.0",
"configurations": [
{
"name": "Debug Controller (Level 3 Tutorial)",
"type": "debugpy",
"request": "launch",
"module": "alpasim_controller.server",
"justMyCode": false,
"cwd": "${workspaceFolder}/src/controller",
"args": ["--port=6003", "--log_dir=my_controller_logdir", "--log-level=INFO"],
"console": "integratedTerminal"
}
]
}- Set breakpoints in the controller code
- Press F5 (or go to Run and Debug → "Debug Controller")
- Your breakpoints will hit as the simulation runs!
Note: Make sure the --port argument matches the port allocated in docker-compose.yaml.
If the runtime is the service being debugged, there are a few things that change. For one, it is
expected that the other services are up and running before the runtime is brought up, so the
ordering of steps will change. Additionally, one can speed up iteration by preventing the simulation
from shutting down the docker containers after each simulation by setting
runtime.endpoints.do_shutdown=False in the wizard command line.
- (Terminal 1) Run the wizard to generate config files without running the simulation:
uv run alpasim_wizard deploy=local topology=1gpu driver=vavam \ wizard.log_dir=$PWD/tutorial_dbg_runtime \ wizard.run_method=NONE \ wizard.debug_flags.use_localhost=True \ runtime.endpoints.do_shutdown=False - (Terminal 1)
cdto the generated directory (tutorial_dbg_runtime) and start the non-runtime services:bash docker compose -f docker-compose.yaml --profile sim up driver-0 controller-0 physics-0 sensorsim-0 - (Terminal 2)
cdinto the the runtime src directory (<repo_root>/src/runtime/) and prepare to start the runtime. The exact command paths will vary, but, to use the configuration generated from the earlier steps, an example command would be:bash cd <repo_root>/src/runtime/ # Following command is based on the docker-compose.yaml generated by the wizard uv run python -m alpasim_runtime.simulate \ --usdz-glob=../../data/nre-artifacts/all-usdzs/**/*.usdz \ --user-config=../../tutorial_dbg_runtime/generated-user-config-0.yaml \ --network-config=../../tutorial_dbg_runtime/generated-network-config.yaml \ --log-dir=../../tutorial_dbg_runtime \ --log-level=INFO
For VSCode users, instead of running the runtime from the command line (step 3), you can use the built-in debugger:
- Add this configuration to
.vscode/launch.json:
{
"name": "Debug Runtime (Level 3 Tutorial)",
"type": "debugpy",
"request": "launch",
"module": "alpasim_runtime.simulate",
"justMyCode": false,
"cwd": "${workspaceFolder}/src/runtime",
"args": [
"--usdz-glob=../../data/nre-artifacts/all-usdzs/**/*.usdz",
"--user-config=../../tutorial_dbg_runtime/generated-user-config-0.yaml",
"--network-config=../../tutorial_dbg_runtime/generated-network-config.yaml",
"--log-dir=../../tutorial_dbg_runtime",
"--log-level=INFO"
],
"console": "integratedTerminal"
}- Set breakpoints in the runtime code
- Press F5 (or go to Run and Debug → "Debug Runtime")
- Your breakpoints will hit as the simulation runs!