While current navigation benchmarks prioritize task success in simplified settings, they neglect the multidimensional economic constraints essential for the real-world commercialization of autonomous delivery systems. We introduce CostNav, an Economic Navigation Benchmark that evaluates physical AI agents through comprehensive economic cost-revenue analysis aligned with real-world business operations. By integrating industry-standard data - such as SEC filings and AIS injury reports - with Isaac Sim's detailed collision and cargo dynamics, CostNav transcends simple task completion to accurately evaluate business value in complex, real-world scenarios. To our knowledge, CostNav is the first work to quantitatively expose the gap between navigation research metrics and commercial viability, revealing that optimizing for task success on a simplified task fundamentally differs from optimizing for real-world economic deployment. Our evaluation of rule-based Nav2 navigation shows that current approaches are not economically viable: the contribution margin is -22.81/run (AMCL) and -12.87/run (GPS), resulting in no break-even point. We challenge the community to develop navigation policies that achieve economic viability on CostNav. We remain method-agnostic, evaluating success solely on the metric of cost rather than the underlying architecture. All resources are available at https://github.com/worv-ai/CostNav.
Traditional navigation benchmarks celebrate high navigation success rates and path efficiency, and people try to improve these metrics.
But these metrics don't answer the questions that keep entepreneurs awake at night:
Prior benchmarks don't answer these questions. CostNav does.
We introduce CostNav, an Economic Navigation Benchmark for Physical AI. It links navigation performance to business value by measuring profit per run: delivery revenue minus the full cost of operation.
CostNav uses Isaac Sim (PhysX 5 + Newton) and prices physical interactions. Collisions are evaluated via impulse and delta-v mapped to the AIS injury scale, while jerk-induced spoilage and hardware wear capture non-obvious costs beyond geometric success.
The framework separates CAPEX from per-run OPEX (energy, maintenance, collisions, and service compensation). Revenue follows real pricing and SLA compliance: timeouts earn zero, and food intactness determines service completeness. This yields break-even analysis (BEP) for how many deliveries recover fixed costs.
Real-world parameters (pricing, labor, energy, maintenance, incident costs, ops ratios, and failure rates) are paired with simulation outputs (runtime, power, distance, collisions, assistance, and spoilage) to compute per-run costs, revenue, profit, and BEP. Sources include manufacturer pricing/spec sheets for robots and sensors, SEC filings for fleet throughput and assistance rates, U.S. wage and electricity statistics, delivery-platform pricing and refund data, AIS/NHTSA injury cost reports, and municipal repair cost records.
CostNav's high-fidelity physics simulation enables the modeling of real-world economic scenarios, including critical failures like food spoilage and robot rollovers.
CostNav gives you data-driven answers to deployment decisions. No more guessing whether expensive sensors are worth it—you'll see the break-even analysis.
CostNav lets you optimize for what actually matters in deployment. Explore cost-aware reward functions, evaluate trade-offs between sensor cost and performance, and publish work that directly translates to commercial value.
CostNav bridges the gap between impressive demos and sustainable businesses. A robot that's technically impressive but economically unviable won't change the world. A robot that's profitable at scale will.
Imagine a world where navigation research papers include a "profitability" section alongside accuracy metrics. Where we optimize for dollars per delivery, not just success rates. Where choosing between navigation approaches is guided by break-even analysis, not just technical performance.
That's the world CostNav is building.
We're not saying traditional metrics don't matter—they absolutely do. But they're incomplete. A robot that's technically impressive but economically unviable won't change the world. A robot that's profitable at scale will.
CostNav is not just a benchmark—it is a community-driven platform.
We are releasing everything:
We invite the community to build on this foundation.
The Physical AI field has made incredible technical progress.
Now it's time to make it economically viable.
It's time to talk about money. It's time for CostNav.
@misc{seong2026costnavnavigationbenchmarkrealworld,
title={CostNav: A Navigation Benchmark for Real-World Economic-Cost Evaluation of Physical AI Agents},
author={Haebin Seong and Sungmin Kim and Yongjun Cho and Myunchul Joe and Geunwoo Kim and Yubeen Park and Sunhoo Kim and Yoonshik Kim and Suhwan Choi and Jaeyoon Jung and Jiyong Youn and Jinmyung Kwak and Sunghee Ahn and Jaemin Lee and Younggil Do and Seungyeop Yi and Woojin Cheong and Minhyeok Oh and Minchan Kim and Seongjae Kang and Samwoo Seong and Youngjae Yu and Yunsung Lee},
year={2026},
eprint={2511.20216},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2511.20216},
}