CostNav

MDP Components

This document provides detailed explanations of the Markov Decision Process (MDP) components that define the CostNav navigation task.

Overview

The MDP is defined by:

Observations

The observation space combines vector observations and visual observations.

Vector Observations

1. Pose Command (2D)

Function: pose_command_2d in mdp/observations.py

def pose_command_2d(env: ManagerBasedEnv, command_name: str) -> torch.Tensor:
    """2D goal position in robot's base frame."""

2. Base Linear Velocity

Source: Isaac Lab’s built-in base_lin_vel observation

3. Base Angular Velocity

Source: Isaac Lab’s built-in base_ang_vel observation

Visual Observations

RGB-D Images

Function: rgbd_processed in mdp/observations.py

def rgbd_processed(
    env: ManagerBasedEnv,
    sensor_cfg: SceneEntityCfg = SceneEntityCfg("tiled_camera"),
    convert_perspective_to_orthogonal: bool = False,
    normalize: bool = True,
    flatten: bool = False,
) -> torch.Tensor:

Purpose: Visual perception of environment (obstacles, terrain, etc.)

Actions

Action Space: Restricted Car Action

Class: RestrictedCarAction in coco_robot_cfg.py

The action space is 2-dimensional:

action = [velocity, steering_angle]

Action Processing

  1. Input: Raw actions from policy
    • velocity: Forward velocity (m/s)
    • steering_angle: Front axle steering angle (radians)
  2. Clamping:
    • velocity: [0, 4.0] m/s (forward only)
    • steering_angle: [-40°, +40°] (±0.698 radians)
  3. Conversion to Joint Commands:
    • Wheel velocities: Computed using Ackermann steering geometry
    • Axle position: Steering angle applied to front axle joint

Ackermann Steering Geometry

# Given velocity v and steering angle α
wheel_base = 1.5  # Distance between front and rear axles
track_width = 1.8  # Distance between left and right wheels
radius_rear = 0.3  # Wheel radius

# Turning radius
R = wheel_base / tan(α)

# Individual wheel angles (for proper Ackermann steering)
left_wheel_angle = arctan(wheel_base / (R - 0.5 * track_width))
right_wheel_angle = arctan(wheel_base / (R + 0.5 * track_width))

# Wheel velocities (all wheels same speed for simplicity)
wheel_velocity = v / radius_rear

Action Interval

Actions are applied every 4 simulation steps (ACTION_INTERVAL = 4) to:

Commands

Safe Position Pose 2D Command

Class: SafePositionPose2dCommand in mdp/commands.py

Generates navigation goals from pre-validated safe positions.

Safe Position Validation

Safe positions are generated offline using raycasting:

Goal Sampling

def _resample(self, env_ids: Sequence[int]):
    # Sample random indices from safe positions
    indices = torch.randint(0, num_safe_positions, (num_resets,))
    sampled_positions = self.safe_positions[indices]
    
    # Set goal position
    self.pos_command_w[env_ids] = sampled_positions
    
    # Set heading (two modes)
    if self.cfg.simple_heading:
        # Point towards goal
        target_vec = goal - robot_pos
        heading = atan2(target_vec.y, target_vec.x)
    else:
        # Random heading
        heading = uniform(-π, π)

Command Update

Every step, the command is transformed to the robot’s base frame:

def _update_command(self):
    # Transform goal from world frame to base frame
    target_vec = pos_command_w - robot_pos_w
    pos_command_b = quat_apply_inverse(robot_quat, target_vec)
    
    # Transform heading
    heading_command_b = wrap_to_pi(heading_w - robot_heading_w)

This ensures the observation is always relative to the robot’s current pose.

Rewards

The reward function is a weighted sum of multiple components.

1. Arrival Reward

Weight: +20,000

arrived_reward = is_terminated_term(term_keys="arrive") * 20000.0

Large positive reward for successfully reaching the goal.

2. Collision Penalty

Weight: -200

collision_penalty = is_terminated_term(term_keys="collision") * -200.0

Penalty for colliding with obstacles.

3. Position Command Error (Tanh)

Weight: +1.0

def position_command_error_tanh(env, std: float, command_name: str):
    distance = norm(goal_position_base_frame)
    return (1 - tanh(distance / std))

4. Heading Command Error

Weight: -0.5

def heading_command_error_abs(env, command_name: str):
    heading_error = command[:, 3]  # Heading in base frame
    return abs(heading_error)

Penalty for not facing the goal direction.

5. Distance Progress Reward

Weight: +100.0

def distance_to_goal_progress(env, command_name: str, slack_penalty: float):
    progress = previous_distance - current_distance
    return progress - slack_penalty

6. Moving Towards Goal

Weight: +1.0

def moving_towards_goal_reward(env, command_name: str):
    vel_direction = target_pos / distance
    movement_reward = (velocity * vel_direction).sum()
    return movement_reward * (episode_length >= 10)

Terminations

1. Arrive (Success)

Function: arrive in mdp/terminations.py

def arrive(env, threshold: float, command_name: str):
    distance = norm(goal_position_base_frame[:, :2])
    return distance <= threshold

2. Collision (Failure)

Function: Isaac Lab’s illegal_contact

collision = illegal_contact(
    sensor_cfg=SceneEntityCfg("contact_forces", body_names="body_link"),
    threshold=1.0,
)

3. Time Out

Function: Isaac Lab’s time_out

time_out = time_out(time_out=True)

Events

Reset Event

Function: reset_root_state_from_safe_positions in mdp/events.py

def reset_root_state_from_safe_positions(env, safe_positions, velocity_range):
    # Sample random safe position
    position = random_choice(safe_positions)
    
    # Random yaw orientation
    yaw = uniform(-π, π)
    
    # Zero velocity
    velocity = [0, 0, 0]
    angular_velocity = [0, 0, 0]
    
    # Set robot state
    env.scene["robot"].write_root_state_to_sim(position, orientation, velocity, angular_velocity)

Ensures robot starts in valid positions at episode reset.

Summary

The MDP components work together to create a navigation task where:

This design enables learning cost-effective navigation policies that balance speed, safety, and goal achievement.