AstraNav-World ✨
World Model for Foresight Control and Consistency
Abstract
Embodied navigation in open, dynamic environments demands accurate foresight of how the world will evolve and how actions will unfold over time. We introduce a unified generative world model that jointly reasons about future visual states and action sequences within a unified probabilistic framework. Our approach integrates a diffusion-based video generator with a vision-language policy, enabling synchronized rollouts where predicted scenes and planned actions are updated together. Training optimizes two complementary objectives: generating action-conditioned multi-step visual predictions and deriving trajectories conditioned on those predicted visuals. At inference, the model alternates between forecasting plausible future frames and refining the action plan given both language instructions and evolving visual rollouts. This bidirectional constraint makes visual predictions executable and keeps decisions grounded in physically consistent, task-relevant futures, mitigating cumulative errors common in decoupled “predict-then-plan” pipelines. Experiments across diverse embodied navigation benchmarks show improved trajectory accuracy and higher success rates. Ablations confirm the necessity of tight vision–action coupling and unified training, with either branch removal degrading both prediction quality and policy reliability. The model further provides interpretable, step-by-step future visualizations that expose planning rationale and uncertainties, facilitating diagnosis and robust deployment. Overall, by unifying foresight and control within a single generative model, we move closer to reliable, interpretable, and general-purpose embodied agents that operate robustly in open-ended real-world settings.
Consistency Visualization
( Zero Shot )
Approach
Experiment
Real-World Visualization
( Zero Shot )
Habitat Visualization
BibTeX Citation
@misc{hu2025astranavworldworldmodelforesight,
title={AstraNav-World: World Model for Foresight Control and Consistency},
author={Junjun Hu and Jintao Chen and Haochen Bai and Minghua Luo and Shichao Xie and Ziyi Chen and Fei Liu and Zedong Chu and Xinda Xue and Botao Ren and Xiaolong Wu and Mu Xu and Shanghang Zhang},
year={2025},
eprint={2512.21714},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.21714},
}