The Cosmos 3 prompt generator gives you 20 free, copy-ready prompts for NVIDIA's physics-aware omnimodel — #1 open-source on both Text-to-Image and Image-to-Video leaderboards. Product shots, landscapes, robotics, architecture. No signup needed.
The Cosmos 3 prompt generator on this page gives you 20 professionally written, copy-ready prompts for NVIDIA Cosmos 3 — the 64-billion-parameter open-source omnimodel released on May 31, 2026 at GTC Taipei. Cosmos 3 is the first AI model to unify physical reasoning, image generation, video generation, and robotic action planning in a single architecture, and it currently holds the #1 open-source position on both the Artificial Analysis Text-to-Image and Image-to-Video leaderboards.
What sets Cosmos 3 apart is its physics-aware generation. Built on a Mixture-of-Transformers (MoT) architecture with two complementary towers — an autoregressive transformer for text and a diffusion transformer for images and video — it understands gravity, light behaviour, material properties, and spatial relationships at a level no other open-source model matches. The result: shadows that actually match the light source, reflections that follow Fresnel equations, and motion that obeys real-world physics.
The 20 prompts below are structured to exploit Cosmos 3's physics engine — using precise lighting directions, material descriptions, spatial relationships, and realistic motion cues. They work with the free Cosmos 3 Super model on Hugging Face, fal.ai, cosmos3.app, or any local deployment via the NVIDIA Cosmos framework.
Use this structure to get physics-accurate, photorealistic results:
Why physics language matters with Cosmos 3:
Cosmos 3 was trained on physical reasoning — it models gravity, optics, fluid dynamics, and material interactions natively. When you describe "accurate shadow angles matching the low sun position" or "realistic motion blur on rotating blades," the model's physics engine produces results that other generators approximate statistically. Precise physical descriptions unlock Cosmos 3's core advantage over pure aesthetic models.
Click any prompt to copy it — each prompt is optimised for Cosmos 3 Super's physics-aware engine
A luxury watch falling through empty space against a matte black background, captured at the exact moment of free-fall with realistic motion physics — the watch face tilted 15 degrees, metal bracelet links separating slightly under gravity, soft caustic reflections on the sapphire crystal from a single overhead studio light, microsecond freeze-frame, commercial product photography, 4K.
A self-driving electric sedan navigating a busy urban intersection at dusk, accurate traffic light reflections on wet asphalt, three pedestrians mid-crosswalk with realistic stride poses, a cyclist waiting at the kerb, accurate shadow angles from street lamps matching the low sun position, dashboard-camera perspective, photojournalistic realism, 4K.
A robotic arm in a clean-room warehouse precisely placing a microchip onto a circuit board, overhead fluorescent lighting, realistic joint articulation at 45-degree bend, the gripper fingers holding a 2cm chip with visible golden pin contacts, shallow depth of field on the placement point, industrial automation photography, 4K.
A sweeping aerial view of a coastal city at golden hour, the sun setting behind a mountain range to the west, long shadows cast eastward across a grid of streets, a harbour with twelve sailboats at anchor, accurate water reflections of the orange sky, a freeway overpass with tiny vehicles maintaining realistic spacing, drone photography at 200m altitude, 4K.
A single drop of deep crimson wine hitting the surface of a full glass, frozen at the exact instant of crown splash formation, twelve micro-droplets ejected symmetrically outward, the surrounding wine surface showing concentric ripple waves, black background, macro photography at 1/50000 shutter speed, physically accurate fluid dynamics, 4K.
The interior of a Victorian glass greenhouse at midday, iron-frame structure with condensation droplets on the glass panes, tropical plants arranged on tiered wooden shelves — monstera, ferns, orchids — dappled sunlight creating accurate shadow patterns through the glass roof grid, warm humid atmosphere with subtle light haze, architectural interior photography, 4K.
A professional chef flambéing a pan of crêpes suzette in a restaurant kitchen, the flame rising 30cm with realistic blue-orange gradient, the chef's face lit by the flame from below, copper pans hanging on the wall behind, stainless steel countertops reflecting the fire, motion blur on the chef's wrist tilt, editorial food photography, 4K.
A turquoise glacial lake at the base of a snow-capped mountain range, early morning mist hovering 2 metres above the water surface, a single kayaker in a red jacket paddling toward the centre, accurate mirror reflections of the peaks in the still water, subtle warm light on the eastern ridgeline, landscape photography with 24mm wide-angle perspective, 4K.
A basketball player at the peak of a slam dunk, body fully extended with realistic muscle tension, the ball 5cm from the rim, sweat droplets frozen mid-air under arena flood lights, motion blur on the crowd in the background, the net displaced backward from the player's approach, sports photography with 200mm telephoto compression, 4K.
A close-up portrait of an elderly fisherman on a wooden dock at dawn, deeply weathered skin with visible pore detail, grey stubble, squinting into low morning sun from the left, holding a coiled rope in calloused hands, soft harbour bokeh behind, natural light with no fill flash, 85mm prime lens rendering, documentary portrait photography, 4K.
A wide shot of an automotive assembly line with six robotic welding arms working simultaneously, bright orange welding sparks frozen in arcs from each arm, a partially assembled car chassis on the conveyor belt, accurate overhead industrial lighting with sodium-vapor colour cast, factory floor photography, ultra-wide 16mm perspective, 4K.
An underwater photograph of a coral reef at 10 metres depth, realistic light caustics filtering through the ocean surface above, a school of twenty clownfish swimming in formation through branching staghorn coral, accurate colour shift toward blue-green in the deeper background, a sea turtle gliding through the upper-right third of the frame, underwater wildlife photography, 4K.
A matte white electric SUV plugged into a sleek charging station at night, the charging port emitting a subtle green LED glow, rain-wet parking lot with accurate reflections of the station's display screen, a modern glass-fronted convenience store in the background with warm interior light, low-angle automotive photography, 4K.
A fashion editorial shot on a concrete rooftop at blue hour, a model in a tailored ivory suit standing at the roof edge, city skyline silhouetted behind, two off-camera LED panels creating cross-lighting — warm from the left, cool from the right — wind catching the suit jacket slightly, medium-format camera aesthetic with shallow depth of field, 4K.
A delivery drone hovering 3 metres above a suburban front lawn, four rotors with realistic motion blur, a cardboard package suspended from the release mechanism, accurate downwash flattening the grass below in a circular pattern, late afternoon sun casting the drone's shadow on the lawn, photorealistic concept photography, 4K.
A rustic sourdough bread loaf on a flour-dusted wooden cutting board, freshly sliced with a serrated knife resting beside it, steam still rising from the exposed crumb structure, scattered wheat stalks and a small ceramic pot of butter to the left, warm window light from the right creating a long shadow across a linen tablecloth, artisan food photography, 4K.
A high-rise construction site photographed from an adjacent building, a tower crane swinging a steel beam across the frame, workers in orange vests visible on the 12th floor scaffolding, accurate perspective convergence on the vertical concrete columns, clear blue sky with two small clouds, morning light hitting the east facade, architectural construction documentation photography, 4K.
A bustling Asian night market from eye level, a vendor grilling skewers over charcoal with rising smoke lit from below by the grill's orange glow, string lights overhead creating bokeh circles, shoppers walking past with natural motion blur, signage in neon colours reflected on the wet cobblestone path, street photography with 35mm lens, 4K.
A photorealistic satellite view of a hurricane system over the Caribbean Sea, the eye wall clearly defined with spiral cloud bands radiating outward, the curvature of the Earth visible at the frame edges, deep black space above, sunlight illuminating the cloud tops from the upper-left creating accurate shadow depth in the eye, earth observation photography, 4K.
A minimalist penthouse living room at sunset, floor-to-ceiling windows revealing a city skyline, the warm sunset light casting long parallelogram shadows across a polished concrete floor, a low-profile charcoal sofa centred in the room, a single abstract bronze sculpture on a marble side table, the glass reflecting both the interior and exterior simultaneously, architectural interior photography, 4K.
How NVIDIA Cosmos 3 Super compares across the capabilities that matter most:
| Model | Physics Accuracy | Open Source | Best For |
|---|---|---|---|
| NVIDIA Cosmos 3 | ★★★★★ Physics-native engine | Yes — OpenMDW-1.1 | Product shots, robotics, automotive, architecture |
| GPT Image 2 | ★★★★☆ Statistical realism | No — ChatGPT Plus/Pro | Complex scenes, human portraits, creative concepts |
| Reve 2.0 | ★★★☆☆ Layout-first, not physics | No — reve.art free tier | Editorial layouts, precise element placement |
| FLUX.2 | ★★★☆☆ Standard diffusion | Yes — open-weight Dev | Multi-reference compositions, fine-tuning |
| Imagen 4 | ★★★★☆ Strong natural light | No — Google API | Outdoor landscapes, natural photo realism |
| Ideogram 4 | ★★☆☆☆ Design-focused | Yes — open-weight 9.3B | Typography, logos, posters, transparent backgrounds |
NVIDIA Cosmos 3 is an open-source omnimodal AI foundation model released on May 31, 2026 at GTC Taipei / Computex. Built on a 64-billion-parameter Mixture-of-Transformers (MoT) architecture, it unifies physical reasoning, image generation, video generation, and action generation in a single model. Cosmos 3 supports text-to-image, image-to-video, text-to-video, and even robotic action planning. The Super variant targets datacenter GPUs (Hopper and Blackwell) while lighter variants run on consumer hardware. It's currently #1 open-source on both the Artificial Analysis Text-to-Image and Image-to-Video leaderboards.
Yes — NVIDIA released Cosmos 3 under the OpenMDW-1.1 license, meaning the model weights, six synthetic data generation datasets, training recipes, and the HUE benchmark are all freely available. You can run it locally via Hugging Face Diffusers or the official NVIDIA Cosmos framework on GitHub. For browser-based access without a GPU, online tools like cosmos3.app and fal.ai host the model with free tiers. The output you create is yours to use commercially.
Cosmos 3's defining feature is its physics-aware generation. Unlike pure diffusion models (FLUX.2, Stable Diffusion) or token-prediction models (GPT Image 2), Cosmos 3 was trained on physical reasoning — it understands gravity, light physics, material properties, and spatial relationships natively. This makes it exceptionally strong for scenes involving motion, reflections, shadows, fluid dynamics, and realistic environmental lighting. Its two-tower architecture (autoregressive for text + diffusion for images/video) also means it handles both modalities in a single inference pass.
Cosmos 3 excels at physically realistic scenes: product photography with accurate reflections and shadows, architectural visualization with correct perspective and lighting, automotive and robotics imagery, nature scenes with realistic water/atmosphere, and any composition where physics accuracy matters. It's also the strongest open-source option for text-to-video with synchronized physics. For stylized art, fantasy scenes, or text-in-image work, models like MidJourney V8.1 or Ideogram 4 may be better choices.
Cosmos 3 responds best to detailed, narrative-style prompts that describe the physical scene precisely. Name the subject, its exact position, the action or state, the environment, the lighting direction and quality, and the camera perspective. Use physically grounded language — describe shadow angles relative to the light source, specify material properties (matte, glossy, translucent), and mention realistic details like reflections, motion blur, or atmospheric effects. End with a shot type and '4K' for full resolution. The 20 prompts on this page demonstrate all of these techniques.
GPT Image 2 ranks #1 overall on the Arena T2I leaderboard (Elo 1339) and excels at complex scenes with many elements, human portraits, and creative/artistic interpretations. Cosmos 3 Super ranks lower overall but leads among open-source models and is superior for physics-accurate scenes — realistic lighting, material interactions, motion dynamics, and spatial consistency. GPT Image 2 requires ChatGPT Plus/Pro; Cosmos 3 is fully open-source and can run locally or through free online tools. For commercial product shots, automotive imagery, and robotics visualization, Cosmos 3's physics grounding gives it an edge.