AG-837

Embodied Validation — Sim-to-Real, SOTIF and Physical Red-Teaming

Embodied AI, Humanoids & Robot Fleets ~6 min read AGS v2.1 · 2026-06-06
EU AI Act NIST AI RMF ISO 42001

AGS Embodied AI (Group L) | Embodied AI, Humanoids & Robot Fleets | Version 3.0

1. Definition

Embodied Validation governs the evidence required before an AI-driven physical agent operates around people: simulation/world-model validation with documented sim-to-real transfer, safety-of-the-intended-functionality (SOTIF) analysis for perception-driven autonomy, physical and predictive red-teaming of the agent's learned policy, and a structured safety-assurance case.

Learning-enabled physical agents fail in ways that component-failure analysis misses — through perception limits, distributional shift, and emergent policy behaviour. This dimension requires the specific validation evidence that those failure modes demand.

2. Scope

In scope: sim-to-real validation evidence; SOTIF analysis (performance limitations, triggering conditions) for perception/autonomy; physical and predictive red-teaming of the policy; structured safety-assurance case for the embodied agent.

Out of scope: force/speed limiting (AG-835) and fail-safe-stop (AG-836); this dimension validates that those and the agent's behaviour are adequate. It governs *validation evidence for embodied AI*.

3. Why This Matters

An embodied agent validated only in simulation, or only against component failures, can behave unsafely in the real world when perception degrades, conditions shift, or the learned policy meets a situation it generalises badly. Without sim-to-real evidence, SOTIF analysis, and physical red-teaming, an organisation is deploying an unproven physical system around people. This validation turns "it worked in the lab" into evidenced safety for real operation.

4. Requirements

5. Maturity Model

6. Test Criteria

Test 6.1: Sim-to-Real Evidence

Test 6.2: SOTIF Coverage

Test 6.3: Policy Red-Teaming

7. Scoring

ScoreCriteria
0Embodied agent deployed around people without sim-to-real, SOTIF, or red-teaming evidence
1Some simulation/trial testing but no SOTIF, transfer evidence, or policy red-teaming
2Sim-to-real evidence, SOTIF analysis, policy red-teaming, structured assurance case
3Degraded/human-interaction coverage, hardware-in-the-loop, change-triggered revalidation, authority-available

8. Failure Scenarios

Scenario A — Sim-Only Confidence: A humanoid validated only in simulation behaves unsafely on real surfaces and lighting its training never represented. Sim-to-real transfer evidence and SOTIF analysis would have surfaced the gap.

Scenario B — Untested Edge Case: The policy meets a novel object/occlusion in operation and produces a hazardous motion. Predictive red-teaming would have found the failure before deployment.

Scenario C — Reliability ≠ Safety: The assurance case proves component reliability but never addresses learned-policy failure modes; an emergent behaviour causes harm the case never considered.

9. Regulatory Mapping

RequirementEU AI ActNIST AI RMFISO 42001
R1: Sim-to-real validation evidenceArt. 15 — Accuracy, robustnessMEASURE 2.6 — Safety evaluationClause 8.3 — Verification
R2: SOTIF analysisArt. 9 — Risk managementMAP 5.1 — Impact identificationClause 8.3 — Verification
R3: Policy red-teamingArt. 15 — RobustnessMEASURE 2.7 — Security and resilienceClause 8.3 — Verification
R4: Structured assurance caseArt. 9 — Risk managementMEASURE 2.6 — Safety evaluationClause 8.3 — Verification
R5: Operating-envelope + change revalidationArt. 9 — Lifecycle risk managementMANAGE 4.1 — Post-deployment monitoringClause 8.3 — Verification
R6: Hardware-in-the-loopArt. 15 — RobustnessMEASURE 2.6 — Safety evaluationClause 8.3 — Verification
R7: Authority-available evidenceArt. 11 — Technical documentationGOVERN 1.1 — Legal/regulatoryClause 7.5 — Documented information
R8: Residual-risk handlingArt. 9 — Risk acceptanceMANAGE 1.4 — Residual-risk documentationClause 6.1 — Actions to address risk

> Standards note: align to ISO 21448:2022 (SOTIF), UL 4600 (safety case for autonomous products), and emerging embodied-AI predictive-red-teaming and test-method practice; combine with the functional-safety basis of IEC 61508.

EU AI Act — Article 9 and Article 15

Article 9 (risk management) and Article 15 (accuracy/robustness) require evidence that an AI system performs safely in its real operating context; for embodied AI that evidence specifically includes sim-to-real, SOTIF, and policy red-teaming.

NIST AI RMF — MEASURE 2.6, MAP 5.1

MEASURE 2.6 (safety evaluation) and MAP 5.1 (impact identification) require validating the learning-enabled physical agent against its real failure modes.

ISO 42001 — Clause 8.3, A.6

Clause 8.3 (verification) and Annex A.6 (lifecycle) require lifecycle validation evidence proportionate to the physical-safety impact.

Cite this protocol
AgentGoverning. (2026). AG-837: Embodied Validation — Sim-to-Real, SOTIF and Physical Red-Teaming. The Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-837