The next wave of AI-powered robots, such as humanoids and self-driving vehicles, needs high-quality physics-based training data. If their datasets lack diversity and realism, these systems may not train well and could struggle with unexpected situations. Gathering large real-world datasets is costly, time-consuming, and often constrained by pragmatic constraints.  

NVIDIA Cosmos addresses this problem by accelerating the development of world-class models (WCMs). Cosmos WFM enables faster synthetic data generation and provides a foundation for training specialized physical AI models. In this post, we’ll look at the newest Cosmos WFM’s, their main features to advance physical AI, and how you can use them.  

Cosmos World Foundation Model Updates 

NVIDIA Cosmos world-based models are improving rapidly, making it easier for users to access high-quality synthetic data and accelerated physical AI development. After just one year, recent updates ensure users benefit from faster, more flexible, and realistic data generation processes.  

  • Cosmos Transfer 2.5: Delivers Faster, More Scalable Data Augmentation. The process of creating varied data by altering existing data from simulations and 3D spatial inputs provides greater variety within environments, lighting, and scene setups.  
  • Cosmos predict 2.5: improves generation of rare scenarios for sequences up to 30 seconds, attaining up to 10 times higher accuracy when post-trained on custom or sector-specific data. It also supports multi-view outputs, custom camera setups, and various policy outputs, such as action and simulation.  
  • Cosmos Reason 2: offers advanced physical AI reasoning with better spatio-temporal understanding (the ability to interpret spatial and temporal relationships) and more precise timestamps. It adds: Object Detection, 2D and 3D point localization (Finding locations in flat and 3D spaces), Bounding box coordinates (Boxes that identify the positions of objects), Reasoning explanations, and labels. It now supports Long Context Improved inputs up to 256,000 tokens (a token is a unit of text, like a word or character).  

Cosmos Transfer Creates Photorealistic Videos That Adhere To Real-World Physics 

Cosmos Transfer creates detailed word sense from structural inputs, ensuring accurate spatial alignment and composition.  

Cosmos Transfer uses the controlnet architecture to retain pre-trained knowledge, resulting in structured, consistent outputs. It uses spatial-temporal control maps to match artificial and real-world scenes, giving detailed control over:  

  • scene layout  
  • object placement and movement  
  • eye points  
  • lidar scans  
  • trajectories  
  • HD maps  
  • 3D bounding boxes  

Ground Truth Annotations: High Fidelity References for Exact Alignment  

Output: photorealistic video sequences with controlled layout, object placement, and motion.  

Key Capabilities 

  • Generate scalable, photorealistic, synthetic data that aligns with real-world physics, allowing users to train more reliable AI and robotics models.s.  
  • Control object interactions and scene composition with structured multi-modal input, giving users precise customization and more relevant training data for their specific use cases.s.  

Using Cosmos Transfer for Controllable Synthetic Data 

With Generative AI APIs and SDKs, NVIDIA Omniverse enables users to create accurate 3D simulations for real-world training and testing. These experiments provide ground-truth video inputs for Cosmos Transfer, improving photorealism and diversifying datasets to fit user-specific conditions, ensuring your AI agents are better prepared for real-world deployment.  

This process speeds up the generation of high-quality data, enabling users’ AI agents to learn more efficiently from simulation to real-world applications, reducing development cycles and boosting performance in practical tasks.  

As a result, Cosmos Transfer helps users train robots and AI for diverse environments and conditions by adding realistic lighting and textures. This improves model robustness and makes it easier for users to transition from simulation to real-world use, especially for robotics platforms like GR00T-N1.1.  

Cosmos Predict for Generating Future World States 

Cosmos Predict WFM enables users to generate predictive video sequences for future scenarios using varied inputs such as text, video, and image sequences. Its smooth, accurate video generation helps users test and refine how AI systems might respond in real-world situations.  

The following key capabilities were developed in our Cosmos Credit functions. It creates realistic video scenes directly from text prompts.  

  • Predicts subsequent events in a video by generating missing frames or continuing motion  
  • Generates multiple frames (intermediate images) between a starting and ending image to create a smooth, complete video sequence.  

Cosmos Predict WFM is a solid starting point for training world models, AI systems that simulate environments used in robotics and self-driving vehicles. After initial training, you can teach these models to generate actions rather than videos for policy modelling and AI decision-making, or adapt them for visual language tasks to build custom AI perception models (systems that understand visual information).  

Cosmos Reason: Designed to Perceive Reason and Respond Intelligently 

Cosmos Reason is a flexible AI model designed to understand motion, how objects interact, and relationships over time and space. It uses chain-of-thought reasoning to examine visual input, predict outcomes from prompts, and choose the best actions. Unlike text-only models, it bases its reasoning on actual physics and provides clear natural-language context for its answers.  

Video: observations along with a text question for instruction (prompt).  

Output: a text response created using long-term chain of thought reasoning (step-by-step analysis over time).  

  • Understands how objects move, interact, and change  
  • Predicts and selects optimal next actions based on observations.  
  • Continuously refines its decision-making ability over time.  
  • It is designed for further training to help build perception AI and embodied AI models.  

Let’s Get Started 

Explore our Cosmos Cookbook for user-focused step-by-step guidance, technical tips, and examples that help you streamline and accelerate your Cosmos WFM projects.s.  

Access open Cosmos models and datasets on Hugging Face and GitHub to quickly enhance your projects or evaluate models, making experimentation and implementation faster and easier for users.  

Join our Cosmos Discord community now—connect with peers, get real-time support, and share unique experiences. Become part of our vibrant network today!  

Be inspired: Watch the GTC Keynote from NVIDIA founder and CEO Jensen Huang. Then explore Cosmos sessions and kick-start your own breakthrough projects with insights at https://www.nvidia.com/gtc/sessions/physical-AI-days/. Start your journey today! 

Source: Scale Synthetic Data and Physical AI Reasoning with NVIDIA Cosmos World Foundation Models 

Tags:  

 
Artificial Intelligence 

  • AI Infrastructure 
  • Robotics 
  • Autonomous Systems 
  • Tech Innovation 
Amazon

Leave a Reply

Your email address will not be published. Required fields are marked *