Skip to content

🧭 IL Baselines

CostNav provides learning-based (imitation learning) navigation baselines for benchmarking sidewalk robot policies. For rule-based navigation, see Nav2 Baseline.


🧠 Imitation Learning Baselines

Supported Algorithms

Algorithm Type Framework Goal Support
NavDP Diffusion Policy NavDP Point, Image, Pixel, No-Goal, Mixed
ViNT Visual Transformer visualnav-transformer Image, No-Goal
NoMaD Diffusion Model visualnav-transformer Image, No-Goal
GNM General Navigation visualnav-transformer Image, No-Goal
CANVAS Sketch-based canvas Point Goal

Architecture

ViNT / NoMaD / GNM / NavDP

Two-node ROS2 architecture with local model inference:

graph LR
    subgraph Isaac["Isaac Sim Container"]
        SIM["Physics Sim\nRobot + ROS2 Bridge"]
    end

    subgraph IL["IL Container (ROS2 Jazzy)"]
        PN["Policy Node\n~10 Hz\n(model inference)"]
        TF["Trajectory Follower\n~20 Hz\n(MPC controller)"]
        PN -->|/model_trajectory| TF
    end

    SIM -->|"/camera, /odom"| PN
    TF -->|/cmd_vel| SIM

    style Isaac fill:#76B900,color:#fff
    style IL fill:#1565c0,color:#fff

CANVAS

Three-node architecture with remote VLA inference:

graph LR
    subgraph Isaac["Isaac Sim Container"]
        SIM["Physics Sim\nRobot + ROS2 Bridge"]
    end

    subgraph Canvas["Canvas Container (ROS2 Jazzy)"]
        NP["Neural Planner\n(sends HTTP requests)"]
        CV["cmd_vel_publisher\n(action → velocity)"]
        NP -->|/vel_predict| CV
    end

    subgraph GPU["GPU Server"]
        MW["Model Worker\n(VLA inference)\nport 8200"]
    end

    SIM -->|"/camera, /odom"| NP
    NP <-->|HTTP| MW
    CV -->|/cmd_vel| SIM

    style Isaac fill:#76B900,color:#fff
    style Canvas fill:#1565c0,color:#fff
    style GPU fill:#e65100,color:#fff

Key ROS2 Topics:

Topic Type Direction Description
/front_stereo_camera/left/image_raw sensor_msgs/Image Isaac Sim → Policy Camera images
/chassis/odom nav_msgs/Odometry Isaac Sim → Policy Robot odometry
/cmd_vel geometry_msgs/Twist Policy → Isaac Sim Velocity commands
/model_trajectory nav_msgs/Path Internal Policy → Trajectory Follower

Running IL Baselines

# Build shared ROS2 + PyTorch image (first time only)
make build-ros2-torch

# Run a specific baseline
make run-vint
make run-nomad
make run-gnm
make run-navdp
make run-canvas

Navigation mode override — switch between image_goal and topomap:

# Image-goal mode
GOAL_TYPE=image_goal MODEL_CHECKPOINT=checkpoints/nomad.pth make run-nomad

# Topomap mode
GOAL_TYPE=topomap MODEL_CHECKPOINT=checkpoints/nomad.pth make run-nomad

Automated Evaluation

# Run evaluation with default settings
make run-eval-vint
make run-eval-gnm
make run-eval-nomad
make run-eval-navdp
make run-eval-canvas

# Custom parameters
make run-eval-vint TIMEOUT=120 NUM_MISSIONS=10

Logs saved to ./logs/<baseline>_evaluation_<timestamp>.log.

Data Collection & Training

Data Collection

Teleoperation data is collected via:

make teleop

Data Processing Pipeline

ROS2 Bags → MediaRef → Processing → ViNT Format
cd CostNav/costnav_isaacsim

# Install dependencies
uv sync

# Step 1: Convert ROS bags to MediaRef
uv run python -m il_training.data_processing.converters.ray_batch_convert \
    --config data_processing/configs/processing_config.yaml

# Step 2: Convert to ViNT format
uv run python -m il_training.data_processing.process_data.process_mediaref_bags \
    --config data_processing/configs/vint_processing_config.yaml

Training

Download pretrained checkpoints first:

make download-baseline-checkpoints-hf

Train models (ViNT, NoMaD, GNM, NavDP):

cd CostNav/costnav_isaacsim

# Train ViNT
uv run python -m il_training.training.train_vint \
    --config il_training/training/visualnav_transformer/configs/vint_costnav.yaml

# Train NoMaD
uv run python -m il_training.training.train_vint \
    --config il_training/training/visualnav_transformer/configs/nomad_costnav.yaml

# Train NavDP
uv run python -m il_training.training.train_navdp \
    --config il_training/training/configs/navdp_costnav.yaml

For SLURM cluster training:

cd costnav_isaacsim/il_training/scripts/
sbatch train_vint.sbatch
sbatch train_nomad.sbatch
sbatch train_navdp.sbatch

CANVAS Training Not Open-Sourced

CANVAS training code is not included in this repository. Only pretrained checkpoints and the inference pipeline (model worker + agent) are provided.

NavDP requires a separate data processing step to convert teleoperation data into its LeRobot-compatible format with DepthAnything depth estimation:

cd CostNav/costnav_isaacsim

# Convert MediaRef bags to NavDP format (with DepthAnything depth)
uv run python -m il_training.data_processing.process_data.process_mediaref_bags \
    --config data_processing/configs/navdp_processing_config.yaml

NavDP training uses point+image fusion with DepthAnything-generated depth maps, 24-step trajectory prediction, and 8-frame context windows.

Baseline Comparison

Feature ViNT NoMaD GNM NavDP
Architecture Transformer Diffusion CNN Diffusion + Critic
Goal Support Image, NoGoal Image, NoGoal Image, NoGoal Point, Image, Pixel
Trajectory Length 8 waypoints 8 waypoints 5 waypoints 24 waypoints
Context Frames 5 5 5 8

Heading Alignment

IL baselines require heading alignment

IL baselines (ViNT, GNM, NoMaD, NavDP) default to ALIGN_HEADING=True. Disabling this will cause poor performance — these models are trained on forward-moving demonstration trajectories and cannot handle arbitrary initial headings (out-of-distribution observations).

This aligns the robot's initial heading with the first topomap waypoint direction before navigation begins.

Method ALIGN_HEADING default Reason
Nav2 False Classical planner handles arbitrary headings
Canvas False Learned policy handles arbitrary headings
ViNT True IL model requires aligned initial heading
GNM True IL model requires aligned initial heading
NoMaD True IL model requires aligned initial heading
NavDP True IL model requires aligned initial heading
# Disable for testing
ALIGN_HEADING=False MODEL_CHECKPOINT=checkpoints/baseline-vint.pth make run-vint

📊 Evaluation

See Evaluation for the unified eval script, collected metrics, and log output.


🔗 References

Research Papers

  1. GNM: Shah et al., "GNM: A General Navigation Model to Drive Any Robot", ICRA 2023
  2. ViNT: Shah et al., "ViNT: A Foundation Model for Visual Navigation", CoRL 2023
  3. NoMaD: Sridhar et al., "NoMaD: Goal Masking Diffusion Policies for Navigation and Exploration", ICRA 2025
  4. NavDP: Cai et al., "NavDP: Learning Sim-to-Real Navigation Diffusion Policy", 2025
  5. CANVAS: Choi et al., "CANVAS: Commonsense-Aware Navigation System", ICRA 2025

Code