IL Baselines¶
CostNav provides learning-based (imitation learning) navigation baselines for benchmarking sidewalk robot policies. For rule-based navigation, see Nav2 Baseline.
Imitation Learning Baselines¶
Supported Algorithms¶
| Algorithm | Type | Framework | Goal Support |
|---|---|---|---|
| NavDP | Diffusion Policy | NavDP | Point, Image, Pixel, No-Goal, Mixed |
| ViNT | Visual Transformer | visualnav-transformer | Image, No-Goal |
| NoMaD | Diffusion Model | visualnav-transformer | Image, No-Goal |
| GNM | General Navigation | visualnav-transformer | Image, No-Goal |
| CANVAS | Sketch-based | canvas | Point Goal |
Architecture¶
ViNT / NoMaD / GNM / NavDP¶
Two-node ROS2 architecture with local model inference:
graph LR
subgraph Isaac["Isaac Sim Container"]
SIM["Physics Sim\nRobot + ROS2 Bridge"]
end
subgraph IL["IL Container (ROS2 Jazzy)"]
PN["Policy Node\n~10 Hz\n(model inference)"]
TF["Trajectory Follower\n~20 Hz\n(MPC controller)"]
PN -->|/model_trajectory| TF
end
SIM -->|"/camera, /odom"| PN
TF -->|/cmd_vel| SIM
style Isaac fill:#76B900,color:#fff
style IL fill:#1565c0,color:#fff
CANVAS¶
Three-node architecture with remote VLA inference:
graph LR
subgraph Isaac["Isaac Sim Container"]
SIM["Physics Sim\nRobot + ROS2 Bridge"]
end
subgraph Canvas["Canvas Container (ROS2 Jazzy)"]
NP["Neural Planner\n(sends HTTP requests)"]
CV["cmd_vel_publisher\n(action → velocity)"]
NP -->|/vel_predict| CV
end
subgraph GPU["GPU Server"]
MW["Model Worker\n(VLA inference)\nport 8200"]
end
SIM -->|"/camera, /odom"| NP
NP <-->|HTTP| MW
CV -->|/cmd_vel| SIM
style Isaac fill:#76B900,color:#fff
style Canvas fill:#1565c0,color:#fff
style GPU fill:#e65100,color:#fff
Key ROS2 Topics:
| Topic | Type | Direction | Description |
|---|---|---|---|
/front_stereo_camera/left/image_raw |
sensor_msgs/Image |
Isaac Sim → Policy | Camera images |
/chassis/odom |
nav_msgs/Odometry |
Isaac Sim → Policy | Robot odometry |
/cmd_vel |
geometry_msgs/Twist |
Policy → Isaac Sim | Velocity commands |
/model_trajectory |
nav_msgs/Path |
Internal | Policy → Trajectory Follower |
Running IL Baselines¶
# Build shared ROS2 + PyTorch image (first time only)
make build-ros2-torch
# Run a specific baseline
make run-vint
make run-nomad
make run-gnm
make run-navdp
make run-canvas
Navigation mode override — switch between image_goal and topomap:
# Image-goal mode
GOAL_TYPE=image_goal MODEL_CHECKPOINT=checkpoints/nomad.pth make run-nomad
# Topomap mode
GOAL_TYPE=topomap MODEL_CHECKPOINT=checkpoints/nomad.pth make run-nomad
Automated Evaluation¶
# Run evaluation with default settings
make run-eval-vint
make run-eval-gnm
make run-eval-nomad
make run-eval-navdp
make run-eval-canvas
# Custom parameters
make run-eval-vint TIMEOUT=120 NUM_MISSIONS=10
Logs saved to ./logs/<baseline>_evaluation_<timestamp>.log.
Data Collection & Training¶
Data Collection¶
Teleoperation data is collected via:
Data Processing Pipeline¶
cd CostNav/costnav_isaacsim
# Install dependencies
uv sync
# Step 1: Convert ROS bags to MediaRef
uv run python -m il_training.data_processing.converters.ray_batch_convert \
--config data_processing/configs/processing_config.yaml
# Step 2: Convert to ViNT format
uv run python -m il_training.data_processing.process_data.process_mediaref_bags \
--config data_processing/configs/vint_processing_config.yaml
Training¶
Download pretrained checkpoints first:
Train models (ViNT, NoMaD, GNM, NavDP):
cd CostNav/costnav_isaacsim
# Train ViNT
uv run python -m il_training.training.train_vint \
--config il_training/training/visualnav_transformer/configs/vint_costnav.yaml
# Train NoMaD
uv run python -m il_training.training.train_vint \
--config il_training/training/visualnav_transformer/configs/nomad_costnav.yaml
# Train NavDP
uv run python -m il_training.training.train_navdp \
--config il_training/training/configs/navdp_costnav.yaml
For SLURM cluster training:
cd costnav_isaacsim/il_training/scripts/
sbatch train_vint.sbatch
sbatch train_nomad.sbatch
sbatch train_navdp.sbatch
CANVAS Training Not Open-Sourced
CANVAS training code is not included in this repository. Only pretrained checkpoints and the inference pipeline (model worker + agent) are provided.
NavDP Data Processing¶
NavDP requires a separate data processing step to convert teleoperation data into its LeRobot-compatible format with DepthAnything depth estimation:
cd CostNav/costnav_isaacsim
# Convert MediaRef bags to NavDP format (with DepthAnything depth)
uv run python -m il_training.data_processing.process_data.process_mediaref_bags \
--config data_processing/configs/navdp_processing_config.yaml
NavDP training uses point+image fusion with DepthAnything-generated depth maps, 24-step trajectory prediction, and 8-frame context windows.
Baseline Comparison¶
| Feature | ViNT | NoMaD | GNM | NavDP |
|---|---|---|---|---|
| Architecture | Transformer | Diffusion | CNN | Diffusion + Critic |
| Goal Support | Image, NoGoal | Image, NoGoal | Image, NoGoal | Point, Image, Pixel |
| Trajectory Length | 8 waypoints | 8 waypoints | 5 waypoints | 24 waypoints |
| Context Frames | 5 | 5 | 5 | 8 |
Heading Alignment¶
IL baselines require heading alignment
IL baselines (ViNT, GNM, NoMaD, NavDP) default to ALIGN_HEADING=True. Disabling this will cause poor performance — these models are trained on forward-moving demonstration trajectories and cannot handle arbitrary initial headings (out-of-distribution observations).
This aligns the robot's initial heading with the first topomap waypoint direction before navigation begins.
| Method | ALIGN_HEADING default |
Reason |
|---|---|---|
| Nav2 | False |
Classical planner handles arbitrary headings |
| Canvas | False |
Learned policy handles arbitrary headings |
| ViNT | True |
IL model requires aligned initial heading |
| GNM | True |
IL model requires aligned initial heading |
| NoMaD | True |
IL model requires aligned initial heading |
| NavDP | True |
IL model requires aligned initial heading |
# Disable for testing
ALIGN_HEADING=False MODEL_CHECKPOINT=checkpoints/baseline-vint.pth make run-vint
Evaluation¶
See Evaluation for the unified eval script, collected metrics, and log output.
References¶
Research Papers¶
- GNM: Shah et al., "GNM: A General Navigation Model to Drive Any Robot", ICRA 2023
- ViNT: Shah et al., "ViNT: A Foundation Model for Visual Navigation", CoRL 2023
- NoMaD: Sridhar et al., "NoMaD: Goal Masking Diffusion Policies for Navigation and Exploration", ICRA 2025
- NavDP: Cai et al., "NavDP: Learning Sim-to-Real Navigation Diffusion Policy", 2025
- CANVAS: Choi et al., "CANVAS: Commonsense-Aware Navigation System", ICRA 2025
Code¶
- visualnav-transformer — ViNT, NoMaD, GNM
- NavDP — Navigation diffusion policy
- MediaRef — Lightweight media references