CostNav: A Navigation Benchmark for Real-World Economic-Cost Evaluation of Physical AI Agents

Haebin Seong*1, Sungmin Kim*1, Yongjun Cho*1, Myunchul Joe1, Geunwoo Kim2, Yubeen Park1, Sunhoo Kim1, Yoonshik Kim1, Suhwan Choi1, Jaeyoon Jung1, Jiyong Youn3,
Jinmyung Kwak3, Sunghee Ahn4, Jaemin Lee4, Younggil Do4, Seungyeop Yi4, Woojin Cheong4, Minhyeok Oh4, Minchan Kim1, Seongjae Kang3, Samwoo Seong1, Youngjae Yu4, Yunsung Lee1
* Equal contribution, 1 MAUM.AI, 2 University of California, Irvine, 3 KAIST, 4 Seoul National University
Under Review
Presented at CES 2026
costnav logo

CostNav is the first navigation benchmark that evaluates robots the way businesses actually evaluate them: by profit per run. Instead of just measuring success rates, CostNav asks: How much did that navigation cost? How much revenue did it generate? When will this system become profitable?

Abstract

While current navigation benchmarks prioritize task success in simplified settings, they neglect the multidimensional economic constraints essential for the real-world commercialization of autonomous delivery systems. We introduce CostNav, an Economic Navigation Benchmark that evaluates physical AI agents through comprehensive economic cost-revenue analysis aligned with real-world business operations. By integrating industry-standard data—such as Securities and Exchange Commission (SEC) filings and Abbreviated Injury Scale (AIS) injury reports—with Isaac Sim's detailed collision and cargo dynamics, CostNav transcends simple task completion to accurately evaluate business value in complex, real-world scenarios. To our knowledge, CostNav is the first physics-grounded economic benchmark that uses industry-standard regulatory and financial data to quantitatively expose the gap between navigation research metrics and commercial viability, revealing that optimizing for task success on a simplified task fundamentally differs from optimizing for real-world economic deployment. Evaluating seven baselines—two rule-based and five imitation learning—we find that no current method is economically viable, all yielding negative contribution margins. The best-performing method, CANVAS (−27.36$/run), equipped with only an RGB camera and GPS, outperforms LiDAR-equipped Nav2 w/ GPS (−35.46$/run). We challenge the community to develop navigation policies that achieve economic viability on CostNav. We remain method-agnostic, evaluating success solely on cost rather than the underlying architecture. All resources are available at https://github.com/worv-ai/CostNav.

The Problem: Optimizing for the Wrong Things

problem illustration
This scene is captured directly from CostNav's simulation environment, with lighting enhancements.

Traditional navigation benchmarks celebrate high navigation success rates and path efficiency, and people try to improve these metrics.

But these metrics don't answer the questions that keep entrepreneurs awake at night:

  • Do I invest $3,000 in LiDAR for a classical Nav2 pipeline, or $400 in RGB-D for a learning-based approach?
  • What compensation cost should I budget when the robot spills food even if it reaches the destination?
  • What is the true cost per delivery after energy, maintenance, crashes, and customer service compensation?
  • How many deliveries are required to break even on the upfront investment?

Prior benchmarks don't answer these questions. CostNav does.

How CostNav Differs from Existing Benchmarks

Comparison of navigation benchmarks

Existing navigation benchmarks (UnrealZoo, OpenBench, Arena-RosNav, Urban-Sim, DeliveryBench) evaluate agents primarily on task-oriented metrics such as success rate and path efficiency, without modeling cost-revenue dynamics essential for commercial deployment. CostNav is the first benchmark to combine high-fidelity physics simulation (Isaac Sim with full collision dynamics, delivery package dynamics, and cargo intactness modeling) with a comprehensive economic cost model grounded in real-world data sources—including energy costs, pedestrian safety costs, property damage, service compensation, repair costs, and break-even point analysis.

CostNav: Measuring the Real-World Profitability of Physical AI

economic model

We introduce CostNav, an Economic Navigation Benchmark for Physical AI. It links navigation performance to business value by measuring profit per run: delivery revenue minus the full cost of operation.

High-Fidelity Physics Simulation for effective Real-World Economic Scenarios

CostNav uses Isaac Sim (PhysX 5 + Newton) and prices physical interactions. Collisions are evaluated via impulse and delta-v mapped to the AIS injury scale, while jerk-induced spoilage and hardware wear capture non-obvious costs beyond geometric success.

Cost-revenue model and break-even analysis

The framework separates CAPEX from per-run OPEX (energy, maintenance, collisions, and service compensation). Revenue follows real pricing and SLA compliance: timeouts earn zero, and food intactness determines service completeness. This yields break-even analysis (BEP) for how many deliveries recover fixed costs.

Real-world Grounded Parameters combined with Detailed Physics Simulation Outcomes

Real-world parameters (pricing, labor, energy, maintenance, incident costs, ops ratios, and failure rates) are paired with simulation outputs (runtime, power, distance, collisions, assistance, and spoilage) to compute per-run costs, revenue, profit, and BEP. Sources include manufacturer pricing/spec sheets for robots and sensors, SEC filings for fleet throughput and assistance rates, U.S. wage and electricity statistics, delivery-platform pricing and refund data, AIS/NHTSA injury cost reports, and municipal repair cost records.

CostNav's high-fidelity physics simulation enables the modeling of real-world economic scenarios, including critical failures like food spoilage and robot rollovers.

Results

We evaluate seven baselines—two rule-based (Nav2 w/ AMCL, Nav2 w/ GPS) and five learning-based (GNM, ViNT, NoMaD, NavDP, CANVAS)—over 100 delivery episodes on urban sidewalk scenarios. Learning-based methods rely solely on RGB cameras (CANVAS additionally uses GPS), while rule-based methods are equipped with a 360° 3D LiDAR.

Overall Economic Performance

No evaluated method is economically viable. All seven methods yield negative contribution margins, resulting in no finite break-even point (BEP). The best-performing method is CANVAS at −27.36$/run, equipped with only an RGB camera and GPS, outperforming LiDAR-equipped Nav2 w/ GPS (−35.46$/run). Pedestrian safety costs are the dominant operational cost across all methods, ranging from 9.30$/run (NavDP) to 29.89$/run (ViNT).

Overall economic performance results

Cost Breakdown Analysis

The cost breakdown reveals that pedestrian injury costs and service compensation dominate OPEX across all methods. Even CANVAS, the best learning-based method, incurs 14.38$/run in pedestrian costs. This underscores the critical importance of robust pedestrian avoidance for commercial viability.

Cost breakdown analysis

Comparison Video

Side-by-side comparison of rule-based and learning-based navigation methods in CostNav's urban sidewalk environment.

Why This Matters

For Entrepreneurs

CostNav gives you data-driven answers to deployment decisions. No more guessing whether expensive sensors are worth it—you'll see the break-even analysis.

For Engineers & Researchers

CostNav lets you optimize for what actually matters in deployment. Explore cost-aware reward functions, evaluate trade-offs between sensor cost and performance, and publish work that directly translates to commercial value.

For the Future of Autonomous Systems

CostNav bridges the gap between impressive demos and sustainable businesses. A robot that's technically impressive but economically unviable won't change the world. A robot that's profitable at scale will.

The Vision

Imagine a world where navigation research papers include a "profitability" section alongside accuracy metrics. Where we optimize for dollars per delivery, not just success rates. Where choosing between navigation approaches is guided by break-even analysis, not just technical performance.

That's the world CostNav is building.

We're not saying traditional metrics don't matter—they absolutely do. But they're incomplete. A robot that's technically impressive but economically unviable won't change the world. A robot that's profitable at scale will.

Get Involved

CostNav is not just a benchmark—it is a community-driven platform.

We are releasing everything:

  • The technical report and benchmark framework
  • Real-world referenced, cost and revenue formulas for break-even point analysis
  • An Isaac Sim environment featuring realistic physics, collision dynamics, and a Segway E1 delivery robot navigating an urban setting with pedestrians
  • A human-collected teleoperation dataset
  • Evaluation code and our baselines

We invite the community to build on this foundation.


Coming Soon

  • Diverse maps, robots reflecting infinite choices in the real world
  • Expanded scenarios testing robustness under challenging conditions
  • Open challenges for the community to beat our baselines

The Physical AI field has made incredible technical progress.
Now it's time to make it economically viable.

It's time to talk about money. It's time for CostNav.

BibTeX

@misc{seong2026costnavnavigationbenchmarkrealworld,
      title={CostNav: A Navigation Benchmark for Real-World Economic-Cost Evaluation of Physical AI Agents}, 
      author={Haebin Seong and Sungmin Kim and Yongjun Cho and Myunchul Joe and Geunwoo Kim and Yubeen Park and Sunhoo Kim and Yoonshik Kim and Suhwan Choi and Jaeyoon Jung and Jiyong Youn and Jinmyung Kwak and Sunghee Ahn and Jaemin Lee and Younggil Do and Seungyeop Yi and Woojin Cheong and Minhyeok Oh and Minchan Kim and Seongjae Kang and Samwoo Seong and Youngjae Yu and Yunsung Lee},
      year={2026},
      eprint={2511.20216},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2511.20216}, 
}