This document provides detailed explanations of the Markov Decision Process (MDP) components that define the CostNav navigation task.
The MDP is defined by:
The observation space combines vector observations and visual observations.
Function: pose_command_2d in mdp/observations.py
def pose_command_2d(env: ManagerBasedEnv, command_name: str) -> torch.Tensor:
"""2D goal position in robot's base frame."""
(num_envs, 2)[x, y] position of goal relative to robotSource: Isaac Lab’s built-in base_lin_vel observation
(num_envs, 3)[vx, vy, vz] in robot’s base frameSource: Isaac Lab’s built-in base_ang_vel observation
(num_envs, 3)[wx, wy, wz] in robot’s base frameFunction: rgbd_processed in mdp/observations.py
def rgbd_processed(
env: ManagerBasedEnv,
sensor_cfg: SceneEntityCfg = SceneEntityCfg("tiled_camera"),
convert_perspective_to_orthogonal: bool = False,
normalize: bool = True,
flatten: bool = False,
) -> torch.Tensor:
(num_envs, 4, height, width) or (num_envs, 4*height*width) if flattened[R, G, B, D]Purpose: Visual perception of environment (obstacles, terrain, etc.)
Class: RestrictedCarAction in coco_robot_cfg.py
The action space is 2-dimensional:
action = [velocity, steering_angle]
velocity: Forward velocity (m/s)steering_angle: Front axle steering angle (radians)velocity: [0, 4.0] m/s (forward only)steering_angle: [-40°, +40°] (±0.698 radians)# Given velocity v and steering angle α
wheel_base = 1.5 # Distance between front and rear axles
track_width = 1.8 # Distance between left and right wheels
radius_rear = 0.3 # Wheel radius
# Turning radius
R = wheel_base / tan(α)
# Individual wheel angles (for proper Ackermann steering)
left_wheel_angle = arctan(wheel_base / (R - 0.5 * track_width))
right_wheel_angle = arctan(wheel_base / (R + 0.5 * track_width))
# Wheel velocities (all wheels same speed for simplicity)
wheel_velocity = v / radius_rear
Actions are applied every 4 simulation steps (ACTION_INTERVAL = 4) to:
Class: SafePositionPose2dCommand in mdp/commands.py
Generates navigation goals from pre-validated safe positions.
Safe positions are generated offline using raycasting:
safe_positions_auto_generated.pydef _resample(self, env_ids: Sequence[int]):
# Sample random indices from safe positions
indices = torch.randint(0, num_safe_positions, (num_resets,))
sampled_positions = self.safe_positions[indices]
# Set goal position
self.pos_command_w[env_ids] = sampled_positions
# Set heading (two modes)
if self.cfg.simple_heading:
# Point towards goal
target_vec = goal - robot_pos
heading = atan2(target_vec.y, target_vec.x)
else:
# Random heading
heading = uniform(-π, π)
Every step, the command is transformed to the robot’s base frame:
def _update_command(self):
# Transform goal from world frame to base frame
target_vec = pos_command_w - robot_pos_w
pos_command_b = quat_apply_inverse(robot_quat, target_vec)
# Transform heading
heading_command_b = wrap_to_pi(heading_w - robot_heading_w)
This ensures the observation is always relative to the robot’s current pose.
The reward function is a weighted sum of multiple components.
Weight: +20,000
arrived_reward = is_terminated_term(term_keys="arrive") * 20000.0
Large positive reward for successfully reaching the goal.
Weight: -200
collision_penalty = is_terminated_term(term_keys="collision") * -200.0
Penalty for colliding with obstacles.
Weight: +1.0
def position_command_error_tanh(env, std: float, command_name: str):
distance = norm(goal_position_base_frame)
return (1 - tanh(distance / std))
std parameter controls the falloff rateWeight: -0.5
def heading_command_error_abs(env, command_name: str):
heading_error = command[:, 3] # Heading in base frame
return abs(heading_error)
Penalty for not facing the goal direction.
Weight: +100.0
def distance_to_goal_progress(env, command_name: str, slack_penalty: float):
progress = previous_distance - current_distance
return progress - slack_penalty
slack_penalty prevents reward hackingWeight: +1.0
def moving_towards_goal_reward(env, command_name: str):
vel_direction = target_pos / distance
movement_reward = (velocity * vel_direction).sum()
return movement_reward * (episode_length >= 10)
Function: arrive in mdp/terminations.py
def arrive(env, threshold: float, command_name: str):
distance = norm(goal_position_base_frame[:, :2])
return distance <= threshold
Function: Isaac Lab’s illegal_contact
collision = illegal_contact(
sensor_cfg=SceneEntityCfg("contact_forces", body_names="body_link"),
threshold=1.0,
)
Function: Isaac Lab’s time_out
time_out = time_out(time_out=True)
Function: reset_root_state_from_safe_positions in mdp/events.py
def reset_root_state_from_safe_positions(env, safe_positions, velocity_range):
# Sample random safe position
position = random_choice(safe_positions)
# Random yaw orientation
yaw = uniform(-π, π)
# Zero velocity
velocity = [0, 0, 0]
angular_velocity = [0, 0, 0]
# Set robot state
env.scene["robot"].write_root_state_to_sim(position, orientation, velocity, angular_velocity)
Ensures robot starts in valid positions at episode reset.
The MDP components work together to create a navigation task where:
This design enables learning cost-effective navigation policies that balance speed, safety, and goal achievement.