Success rate drops by half on surfaces with friction below 0.3. Objects under 4cm see 83%+ failure rates. RoboGate runs 50,000+ Isaac Sim experiments across 4 robots (Franka Panda, UR5e, UR3e, UR10e) to uncover these risks before deployment.
GitHub Dataset
50,000+ experiments Β· 4 robots Β· MIT License
HuggingFace Dataset
robogate-failure-dictionary Β· Robotics
50,000+ Isaac Sim experiments Β· Franka Panda + UR5e + UR3e + UR10e Β· NVIDIA RTX
$ robogate test --policy v2.3 --baseline v2.2
[1/5] Verifying artifacts... β
[2/5] Loading config... β
[3/5] Running 68 scenarios...
ββββββββββββββββββββββββ 68/68
[4/5] Computing metrics... β
[5/5] Generating reports... ββ PASS β Confidence: 92/100
Success Rate: 95.4% β² +2.2%
Collisions: 0
Cycle Time: 4.1s
Making Physical AI Practical Requires a Validation Layer
The best VLA models score near-perfect on academic benchmarks. The same models score 0% on industrial Isaac Sim scenarios.
GR00T N1.6 β NVIDIAβs own robot foundation model β achieves 97.65% on LIBERO. The same model, fine-tuned on the same LIBERO-Spatial dataset, scores 0% on RoboGateβs 68 industrial scenarios.
LIBERO (MuJoCo)
97.65%
RoboGate (Isaac Sim)
0%
This 97.65 percentage point gap is not a bug. It is the reason deployment validation must happen in the target environment β before the model ever touches a physical robot.
RoboGate is that validation layer for Physical AI.
Source: LIBERO 97.65% = NVIDIA official benchmark. Isaac Sim 0/68 = RoboGate direct evaluation.
NVIDIA Isaac Sim + RTX Β· Real Physics Simulation Β· 4 Robots (Franka Panda, UR5e, UR3e, UR10e)
Experiments
Risk Model AUC
Friction Threshold
friction Γ mass interaction
NVIDIA built the simulator. RoboGate built deployment validation on top of it.
You only test under normal conditions. Dark environments, tiny objects, surfaces with 0.1 friction β your robot encounters these for the first time in production.
Edge case failure rate 83%+Senior engineers run tests manually. Criteria vary by person. Results are rarely documented.
RoboGate: 68 scenarios, fully automatedNo alerts when success rate drops by 5%. Problems surface only after production stops.
Drift detection: under 5 minAutomatically test new AI policies across 68 scenarios in Isaac Sim.
nominal 20 Β· edge 15 Β· adversarial 10 Β· DR 23
policy_v2 Β· nominal acceptance 20/20 PASS Β· confidence 90.5 (measured 2026-06-12)
Data-driven PASS/FAIL decisions with Confidence Score 0-100.
FAIL report + failure evidence auto-generated when thresholds are breached
policy_v3 (untrained) 0/20 FAIL Β· 20 collisions Β· confidence 32.5
Automatic performance drift detection + alerts after deployment.
5% drop warning Β· 10% drop critical Β· rollback recommendation
Slack Β· Telegram Β· Teams
Franka Panda performing Pick & Place tasks in Isaac Sim
good_policy β Validation Gate PASS
bad_policy β Validation Gate FAIL (collisions detected)
robogate test recorded live on our RTX 5090 host β real Isaac Sim, real reports, customer artifacts destroyed on completion
policy_v2 Β· nominal acceptance 20/20 Β· PASS Β· confidence 90.5
policy_v3 (untrained) Β· 0/20 Β· 20 collisions Β· FAIL Β· rollback recommended
Recorded 2026-06-12, unedited. The full 68-scenario suite (edge + adversarial included) is strict enough that even our scripted IK baseline measures 83.8% and fails the production gate β that strictness is the product.
Send us your policy file and we'll run a free Isaac Sim validation test.
50,000+ Isaac Sim experiments completed Β· MIT License Β· NVIDIA Isaac Sim 5.1