2 Million Trajectories Before First Contact: How GPU Simulation Became Robotics' Proving Ground

NVIDIA is bringing receipts to ICRA 2026 - eight accepted papers it says show the sim-to-real gap closing on real hardware. More and more, a robot policy is the output of large-scale GPU simulation, with the heavy compute done long before a gripper touches an object.

A single robotic gripper stands in a data-center aisle, emitting a glowing purple swarm of translucent ghost-grippers — a visual metaphor for millions of GPU-simulated grasp attempts run before one real grasp. — NVIDIA says its Grasp-MPC policy attempts a grasp roughly two million times inside a GPU-accelerated simulator, across 8,000 objects, before a gripper ever touches a real oneSupercomputing News

Before NVIDIA's Grasp-MPC policy closes a gripper on an object it has never seen, the company says it has already tried the grasp about two million times. None of those attempts touched a workbench. They ran inside a GPU-accelerated physics simulator, across 8,000 different objects, on the same kind of silicon that trains large AI models. The payoff, NVIDIA reports, is a real-robot grasp success rate near 75 percent in cluttered scenes, against 41 percent for the prior baseline. Worth flagging up front: NVIDIA gives trajectory and object counts but no GPU-hours, node counts, or cluster sizes for any of these runs. The "supercomputing" scale behind each policy is the company's framing, not a published number.

That is the kind of receipt NVIDIA Research is bringing to ICRA 2026, the IEEE Robotics and Automation Society's flagship conference, which opens June 1 in Vienna and runs through June 5 (2026.ieee-icra.org). Eight of the company's 28 accepted papers go straight at the sim-to-real problem, and NVIDIA says nearly 50 more, from CMU, ETH Zurich, MIT, and UT Austin, lean on its accelerated simulation or learning tools. As a robotics story, it is a strong showing. Through the supercomputing lens this publication uses, it points to something narrower: a growing share of robot policies now come out of large-scale GPU simulation and learning, and the machine doing the forging looks less like a lab rig than a training cluster.

The boundary that's blurring

In the pipeline NVIDIA is showing at ICRA, the line between the physics simulator and the AI training loop is getting hard to see. They run inside the same GPU-accelerated stack now, with simulation, motion planning, and learning tightly coupled. Physics is the data source: NVIDIA says Grasp-MPC's two million trajectories are synthetic, generated by the simulator rather than collected off a robot arm. The simulator is also the training environment, and with NVIDIA's new differentiable engine it can be the gradient source too. In some workflows simulation is no longer just a place to pretrain; it becomes part of the optimization loop.

You can see the shape of that convergence in the tooling. NVIDIA frames the workflow as forge-in-the-datacenter, deploy-at-the-edge, and each stage maps to a recognizable piece of accelerated-compute infrastructure.

The forge stage runs on Isaac Lab, NVIDIA's open-source, GPU-accelerated robot-learning framework, which NVIDIA Research calls a "natural successor to Isaac Gym." The design point is parallelism at what NVIDIA calls "data-center scale execution": thousands of domain-randomized environments stepping at once on GPU infrastructure. Trajectory data comes from CUDA-accelerated motion planning (cuRobo) and generated grasp datasets (GraspGen). The economics are the interesting part. A robot-training run on Isaac Lab competes for the same compute cycles, and the same scarce high-bandwidth memory, as an LLM pretraining job. We've written before about how HBM allocation rather than raw supply now governs who gets to run large jobs; physical-AI simulation is just one more bidder at that table.

Three-stage diagram of NVIDIA's robot-learning pipeline - Forge in the datacenter (Isaac Lab), Train with differentiable physics (Newton), Deploy at the edge (Jetson) — above a row of reported ICRA 2026 benchmark results. — NVIDIA's "forge-in-the-datacenter, deploy-at-the-edge" workflow, with the headline figures the company reports from its ICRA 2026 papers: ~75% real-robot grasp success (Grasp-MPC), 4.5× over an imitation baseline (COMPASS), +38.4% assembly success (SPARR), and up to 41.4× for a sim-only 3D policy (PEEK).Supercomputing News

Closing the gap, with the distinctions intact

The eight papers cluster around one hard question: once you have trained a policy in simulation, how do you make it survive contact with reality? The answers vary, and the differences are where the engineering lives.

A few are described as genuine zero-real-data results. COMPASS, a cross-embodiment navigation system, trained entirely in Isaac Lab with no real-robot data; NVIDIA reports it hit roughly 80 percent success across 20 physical trials, a 4.5× improvement over an imitation-learning baseline. The deformable-cluster manipulation work gets stranger: per its arXiv paper, it trains on synthetic trees grown from biological growth equations, thousands of simulated specimens, then deploys zero-shot to real tangled branches of the kind a robot might clear off a power line.

Other results close the gap with an explicit correction layer, and that distinction is worth holding onto. SPARR, NVIDIA's precise-assembly system, is not a zero-shot result. It pairs an Isaac Lab strategy layer with a residual correction layer that runs on the real hardware. The authors report a 38.4 percent jump in success rate, a 29.7 percent cut in cycle time, and about a 75 percent improvement on unseen NIST assembly tasks, all without human supervision. The correction layer is the point, not a footnote: it is what lets a simulation-trained strategy absorb the gap between the simulator and a physical fixture.

PEEK is the number people will quote. Using a vision-language model to annotate a scene and hand the policy a filtered view, its authors report up to a 41.4× gain in real-world accuracy, but only for a 3D policy trained purely in simulation. For large vision-language-action models and smaller manipulation policies, the gains land in the 2–3.5× range. The 41× is in the paper. It is not a universal multiplier, and quoting it as one would misstate the result.

Rounding out the set: Refinery reports a roughly 11 percent mean gain in success rate and 91.5 percent success in simulation while chaining policies for long-horizon, multi-part assembly; SEAL reports up to a 15 percent gain over prior work on matching runtime reasoning to execution, in a collaboration with CMU, Utah, and Sydney; and ScheduleStream delivers what NVIDIA calls a 3× speedup on multi-arm task planning, running on hardware like NVIDIA Jetson, the edge end of the same pipeline that starts in the datacenter.

The substrate is the moat

Step back from the papers and a platform argument comes into focus. NVIDIA is making it deliberately. The evidence supports a strong ecosystem position, not universal dependence: a growing share of physical-AI work runs on NVIDIA-accelerated infrastructure, some of it built by NVIDIA's competitors.

The foundation is partly open. NVIDIA positions Newton as a next-generation, GPU-accelerated, differentiable physics engine for the Isaac and robot-learning stack: built on NVIDIA Warp and OpenUSD, managed by the Linux Foundation, co-developed with Google DeepMind and Disney Research (developer.nvidia.com/newton-physics). Being differentiable by design is what lets gradients flow through the physics step and blurs the simulate-then-train boundary. The governance is a real consortium, worth noting before anyone calls the substrate proprietary. But the center of gravity is obvious. NVIDIA says its open Physical AI Dataset has passed 15 million downloads, and its Isaac and Omniverse tools are a common place teams build their training data.

That gravity reaches rivals. Skild AI, the robot-foundation-model lab that raised about $1.4 billion at a valuation above $14 billion in January, is one example: NVIDIA says Skild is using its Cosmos world models and Isaac simulation frameworks to validate robot policies. (NVIDIA's materials don't show that Skild uses Newton specifically or runs inference on Jetson, so this piece doesn't claim it.) The ICRA showing, then, is not only "our policies work." It is the argument that NVIDIA's simulation, data, and edge-inference stack is becoming the default substrate for a growing share of physical-AI development, which is a more durable position than any single benchmark.

This is the seam between two stories. In March we argued that Physical AI was NVIDIA's quiet second act at GTC 2026: the strategy story, the CUDA playbook applied to robotics, the "operating system for the physical world." ICRA is where the receipts for that thesis start to land. The GTC piece described the platform NVIDIA was betting on; these eight ICRA papers are the evidence NVIDIA is putting forward that the platform produces policies which beat prior baselines on real hardware. (Accepted ICRA papers are peer-reviewed; some of the supporting numbers above come from project pages and preprints, not final proceedings, and are flagged that way.)

Why the compute economics are the real headline

For the strategic reader, the through-line is a market shift the robot demos tend to hide. Simulation is becoming a compute market of its own. A Market Intelo report estimates the physical-AI simulation and digital-twin segment at $3.8 billion in 2025, growing toward $34.6 billion by 2034, with "robot training simulation" the largest slice at 38.4 percent, and names GPU-accelerated physics as a key driver. That is a commercial research estimate, not an audited figure, so treat it as directional. Bank of America's institute made the compute point plainly in its note on physical AI: advancing the field will take new model architectures, training methods, and compute strategies beyond what scaled LLMs.

That is the supercomputing story buried inside a robotics announcement. On this reading, cost and constraint move off the robot and onto the compute infrastructure that trains it. As analysis rather than settled fact: a grasp policy's quality is increasingly bounded by infrastructure, by how many parallel environments you can afford to run and how much HBM you can get before a differentiable simulator turns days of compute into minutes. On that view, the gripper is the cheap part.

What to watch in Vienna

The papers present June 2–4. The signal worth watching is not which demo draws applause; it is whether the independent academic work, the roughly 50 university papers NVIDIA points to, holds up the sim-to-real gains on hardware the original authors did not tune. Reproducible results across labs would be a strong sign that simulation-forged policies have crossed from promising to dependable, and would back the thesis that the binding constraint on robot capability is shifting toward compute rather than algorithmic novelty. If that holds, it sharpens the question this beat keeps circling: in robotics, the supercomputing workload and the intelligence it produces are getting hard to separate, and the ceiling on what a robot can do may track, more and more, the compute behind its training run.

Robotics NVIDIA Digital Twins

About the contributor

SCN Staff

The Squad

The SCN Staff is a small AI editorial squad working under human direction. Each agent owns one job.

Scout does the research. It runs down primary sources and checks what's already been published, on SCN and everywhere else, before a story gets written. If a claim can't be traced back to a real document, Scout flags it.

Forge writes. It takes what Scout found and turns it into a draft, argument and sentences and all. Every SCN piece starts here, then gets sharpened.

Cipher handles search: the titles, descriptions, and keyphrase work that decides whether a good article ever gets found. Least glamorous job on the squad. Also one that matters more than it looks.

Pixel makes the visuals. Images, charts, the occasional diagram, all built to SCN's brand instead of pulled from a stock library. When something's easier to see than to read, it goes to Pixel.

Editorial judgment and the final call stay with the humans. So does the fact-checking.

Artificial IntelligenceAINews