NVIDIA has unveiled Cosmos 3, a new open-source AI model built for machines that need to understand and interact with the real world. The release brings together scene understanding, physical reasoning, and action generation in one system, giving developers a single platform for building robotics and autonomous AI applications.
The company has released two versions of the model: Cosmos 3 Nano with 8 billion parameters and Cosmos 3 Super with 32 billion parameters. Alongside the models, developers will get access to training scripts, deployment tools, model checkpoints, and synthetic datasets designed for industries such as robotics, autonomous driving, and warehouse automation.
Previous Cosmos releases handled different tasks through separate models. Cosmos 3 takes a different approach by combining those capabilities into one architecture. At its core are two connected systems: a Reasoner that understands what’s happening in a scene and a Generator that predicts what could happen next and what actions should follow.
The Reasoner processes information from text, images, videos, audio, and action data to build an understanding of physical environments. The Generator then uses that context to create future observations, simulate scenarios, and generate action sequences.
NVIDIA says this setup simplifies development by reducing the need to connect multiple AI models together. For tasks focused on perception and analysis, the Reasoner can work on its own. More advanced workloads, such as simulation and action planning, make use of both components.
The two model sizes are aimed at different computing environments. Cosmos 3 Nano is intended for workstation-class hardware and real-time robotics deployments. Cosmos 3 Super is designed for larger-scale operations running on Hopper and Blackwell GPU platforms, where synthetic data generation and complex reasoning tasks require more computing power.
The model supports a broad range of capabilities, including image generation, video prediction, video understanding, action-guided world modeling, and robot policy training. NVIDIA believes these features can help developers build systems for robotic manipulation, autonomous vehicles, warehouse management, smart environments, and embodied AI agents.
To support training and testing, NVIDIA has also released six synthetic datasets through Hugging Face. The datasets cover robot interactions, physical simulations, spatial reasoning challenges, digital human environments, driving scenarios, and warehouse operations. Several of them include physics-related annotations such as object movement, velocity measurements, and semantic segmentation data.
Developers can further customize Cosmos 3 using newly released post-training workflows. These tools support supervised fine-tuning as well as specialized training methods for robotics tasks, including forward-dynamics prediction, inverse-dynamics modeling, and policy generation.
For deployment, NVIDIA is making Cosmos 3 available through its NIM microservices platform. The first release includes a Reasoner service, while support for the Generator component is expected in a future update.
The company also introduced Cosmos Human Evaluation (HUE), an open-source framework created to assess AI-generated videos. Rather than relying solely on benchmark scores, HUE evaluates outputs through fact-based questions covering visual quality, physical consistency, geometric accuracy, and semantic correctness.
NVIDIA reports that Cosmos 3 performs strongly across several public benchmarks focused on physical AI and video generation, including VANTAGE-Bench, PAI-Bench, R-Bench, Physics-IQ, and RoboLab. The company also says the model ranks among the top open-source image and video generation systems tracked by industry analysis platforms.
Also Read: NVIDIA Launches Ising Open-Source AI Models for Quantum Systems








