AGS Embodied AI (Group L) | Embodied AI, Humanoids & Robot Fleets | Version 3.0
Embodied Validation governs the evidence required before an AI-driven physical agent operates around people: simulation/world-model validation with documented sim-to-real transfer, safety-of-the-intended-functionality (SOTIF) analysis for perception-driven autonomy, physical and predictive red-teaming of the agent's learned policy, and a structured safety-assurance case.
Learning-enabled physical agents fail in ways that component-failure analysis misses — through perception limits, distributional shift, and emergent policy behaviour. This dimension requires the specific validation evidence that those failure modes demand.
In scope: sim-to-real validation evidence; SOTIF analysis (performance limitations, triggering conditions) for perception/autonomy; physical and predictive red-teaming of the policy; structured safety-assurance case for the embodied agent.
Out of scope: force/speed limiting (AG-835) and fail-safe-stop (AG-836); this dimension validates that those and the agent's behaviour are adequate. It governs *validation evidence for embodied AI*.
An embodied agent validated only in simulation, or only against component failures, can behave unsafely in the real world when perception degrades, conditions shift, or the learned policy meets a situation it generalises badly. Without sim-to-real evidence, SOTIF analysis, and physical red-teaming, an organisation is deploying an unproven physical system around people. This validation turns "it worked in the lab" into evidenced safety for real operation.
Test 6.1: Sim-to-Real Evidence
Test 6.2: SOTIF Coverage
Test 6.3: Policy Red-Teaming
| Score | Criteria |
|---|---|
| 0 | Embodied agent deployed around people without sim-to-real, SOTIF, or red-teaming evidence |
| 1 | Some simulation/trial testing but no SOTIF, transfer evidence, or policy red-teaming |
| 2 | Sim-to-real evidence, SOTIF analysis, policy red-teaming, structured assurance case |
| 3 | Degraded/human-interaction coverage, hardware-in-the-loop, change-triggered revalidation, authority-available |
Scenario A — Sim-Only Confidence: A humanoid validated only in simulation behaves unsafely on real surfaces and lighting its training never represented. Sim-to-real transfer evidence and SOTIF analysis would have surfaced the gap.
Scenario B — Untested Edge Case: The policy meets a novel object/occlusion in operation and produces a hazardous motion. Predictive red-teaming would have found the failure before deployment.
Scenario C — Reliability ≠ Safety: The assurance case proves component reliability but never addresses learned-policy failure modes; an emergent behaviour causes harm the case never considered.
| Requirement | EU AI Act | NIST AI RMF | ISO 42001 |
|---|---|---|---|
| R1: Sim-to-real validation evidence | Art. 15 — Accuracy, robustness | MEASURE 2.6 — Safety evaluation | Clause 8.3 — Verification |
| R2: SOTIF analysis | Art. 9 — Risk management | MAP 5.1 — Impact identification | Clause 8.3 — Verification |
| R3: Policy red-teaming | Art. 15 — Robustness | MEASURE 2.7 — Security and resilience | Clause 8.3 — Verification |
| R4: Structured assurance case | Art. 9 — Risk management | MEASURE 2.6 — Safety evaluation | Clause 8.3 — Verification |
| R5: Operating-envelope + change revalidation | Art. 9 — Lifecycle risk management | MANAGE 4.1 — Post-deployment monitoring | Clause 8.3 — Verification |
| R6: Hardware-in-the-loop | Art. 15 — Robustness | MEASURE 2.6 — Safety evaluation | Clause 8.3 — Verification |
| R7: Authority-available evidence | Art. 11 — Technical documentation | GOVERN 1.1 — Legal/regulatory | Clause 7.5 — Documented information |
| R8: Residual-risk handling | Art. 9 — Risk acceptance | MANAGE 1.4 — Residual-risk documentation | Clause 6.1 — Actions to address risk |
> Standards note: align to ISO 21448:2022 (SOTIF), UL 4600 (safety case for autonomous products), and emerging embodied-AI predictive-red-teaming and test-method practice; combine with the functional-safety basis of IEC 61508.
Article 9 (risk management) and Article 15 (accuracy/robustness) require evidence that an AI system performs safely in its real operating context; for embodied AI that evidence specifically includes sim-to-real, SOTIF, and policy red-teaming.
MEASURE 2.6 (safety evaluation) and MAP 5.1 (impact identification) require validating the learning-enabled physical agent against its real failure modes.
Clause 8.3 (verification) and Annex A.6 (lifecycle) require lifecycle validation evidence proportionate to the physical-safety impact.