The Standard

Compliance

AG-794

ML Model Inference API Access Governance

Supplementary Core & Adversarial Model Resistance ~26 min read AGS v2.1 · 2026-04-29

EU AI Act NIST AI RMF ISO 42001

1. Definition

ML Model Inference API Access Governance mandates that every inference endpoint exposed by an AI model within a governed ecosystem enforces infrastructure-layer access controls that authenticate consumers, scope their permitted queries, log every request with full provenance, and detect query patterns indicative of model extraction or data exfiltration. The enforcement is structural: it is applied by a dedicated API gateway and monitoring layer that the model-serving infrastructure cannot bypass, not by the model's own filtering or the consuming agent's instruction compliance. In production deployments where ML models serve predictions to dozens or hundreds of internal and external consumers — agents, applications, partner integrations, and human-facing interfaces — an uncontrolled inference API is an extraction surface. An adversary with unrestricted query access can systematically reconstruct model behaviour through carefully crafted inputs, exfiltrate training data through output analysis, or abuse the inference endpoint as an oracle for downstream attacks. AG-794 closes this gap by requiring authentication, per-consumer scoping, comprehensive query logging, extraction detection, and output filtering as preconditions for every inference API call, ensuring that model access is governed with the same rigour as the data and credentials that produced the model.

This dimension is distinct from AG-793 (Denial of ML Service Governance), which addresses the availability of inference services against denial-of-service threats. AG-794 governs who may access inference and what they may do with it — the authorisation and abuse-prevention layer — while AG-793 governs whether the service remains operational under adversarial load. Together, they provide complementary coverage of the inference API threat surface: AG-793 protects against service disruption, AG-794 protects against service misuse.

2. Scope

This dimension applies to all AI agent deployments operating under the AGS framework where ML model inference APIs are exposed to any consumer — internal or external. Specifically:

Agents serving inference endpoints to other agents, applications, or human users within or across organisational boundaries
Model-serving infrastructure including dedicated inference servers, serverless function endpoints, edge inference nodes, and batch prediction pipelines that accept programmatic queries
Multi-model architectures where a router or orchestrator dispatches queries to specialised models, each of which must enforce access controls independently
Federated inference deployments where models serve queries across organisational boundaries via API contracts

The dimension covers the full access lifecycle: consumer registration and credential issuance, per-request authentication and authorisation, query logging with provenance metadata, real-time query pattern analysis for extraction detection, inference output inspection for weight and training-data leakage, and credential rotation and revocation.

Exclusions: Models operating in fully sandboxed research environments with no access to production data, production consumers, or production network segments are excluded, subject to the condition that any transition to production immediately triggers AG-794 compliance. Offline batch inference pipelines that accept only pre-approved, internally generated input datasets and produce outputs consumed exclusively by the same organisation's internal systems are excluded from the real-time extraction detection requirements (R5) but remain in scope for all authentication, logging, and scoping requirements.

Industry Considerations

Financial Services. ML models in financial services — credit scoring, fraud detection, algorithmic trading, anti-money-laundering — represent substantial intellectual property and regulatory obligation. Unauthorised inference access enables competitors to replicate proprietary models. Model extraction from a credit scoring model creates regulatory liability under the Equal Credit Opportunity Act and FCA consumer protection requirements, because the extracted model may reproduce biases without the governance controls applied to the original. Query-based probing of fraud detection models enables adversaries to map detection boundaries and craft evasion strategies.

Healthcare. Clinical ML models trained on patient data carry dual sensitivity: the model itself is intellectual property, and its outputs can leak information about training data subjects. An attacker querying a diagnostic model with systematically varied inputs can infer whether specific patients were in the training set (membership inference) or reconstruct sensitive patient attributes. HIPAA requires access controls proportionate to the sensitivity of protected health information — inference APIs that can leak PHI through output analysis require the same access rigour as direct data access.

Public Sector. Government agencies deploying ML models for benefits determination, risk assessment, or law enforcement face accountability requirements that extend to inference access. Every query to a model that affects individual rights must be attributable to an authorised consumer with a documented purpose. Unauthorised access to a sentencing risk model or immigration screening model creates both a security incident and an accountability gap that undermines judicial review.

3. Why This Matters

ML inference APIs are the primary attack surface for model theft, and the attack does not require access to model weights, training data, or source code. An adversary needs only query access. The MITRE ATLAS catalogue documents ML Model Inference API Access (AML.T0040) as the technique by which adversaries interact with deployed models to gather information that supports downstream attacks — model extraction, adversarial example crafting, training data reconstruction, and model inversion. The technique is practical, well-documented in academic literature, and actively exploited: research has demonstrated that production-grade models can be functionally replicated with as few as 10,000 to 100,000 strategically crafted queries, a volume that a single automated consumer can generate in minutes.

The economics of model extraction make this threat particularly acute. Training a state-of-the-art ML model may require millions of pounds in compute, data acquisition, and engineering effort. Extracting a functionally equivalent model through inference API access may cost a few hundred pounds in API fees. This asymmetry incentivises extraction at scale. Without structural access controls, the inference API transforms the organisation's most expensive asset into a self-service extraction endpoint for any consumer with valid credentials.

Traditional API security — rate limiting, API keys, IP allowlisting — provides a baseline but is insufficient for the inference-specific threat. Rate limiting constrains volume but not strategy: a sophisticated extraction attack can operate within rate limits by optimising query selection. API keys authenticate the consumer but do not detect whether the query pattern is benign or systematic extraction. AG-794 requires controls beyond traditional API security: per-consumer query pattern analysis, statistical extraction detection, and output filtering to prevent model weight leakage through prediction confidence scores and logit values.

The regulatory environment reinforces this requirement. The EU AI Act Article 15 requires robustness against exploitation by unauthorised third parties. NIST AI RMF MANAGE 2.4 requires mechanisms to disengage systems demonstrating risks — inference access controls provide the mechanism to cut off consumers exhibiting extraction behaviour. ISO 42001 Clause 8.2 requires AI risk assessment considering deployment-specific risks — for any model behind an inference API, extraction through query access is a contextual risk that must be controlled.

4. Requirements

R1: A conforming system MUST authenticate every inference API request at the infrastructure layer — via an API gateway, service mesh, or equivalent enforcement point that the model-serving infrastructure cannot bypass — before the request reaches the model. Authentication MUST use cryptographic credentials (mTLS client certificates, signed JWT tokens, or API keys bound to a registered consumer identity) that uniquely identify the requesting consumer.

R2: A conforming system MUST enforce per-consumer access scoping that restricts each authenticated consumer to a defined subset of models, input types, and output fields. Access scoping MUST be enforced at the infrastructure layer by the API gateway, not by the model or the consuming application. Each consumer's scope MUST be documented in a machine-readable policy that specifies: permitted model endpoints, permitted input schemas, permitted output fields (e.g., top-1 prediction only vs. full probability distribution), maximum query rate, and maximum daily query volume.

R3: A conforming system MUST log every inference API request with full provenance metadata including: timestamp (UTC, millisecond precision), authenticated consumer identity, source IP address and network segment, target model identifier and version, input hash (SHA-256 of the serialised input payload), output classification (e.g., prediction class, confidence tier), response latency, and request outcome (success, rejected, rate-limited, or blocked).

R4: Query logs MUST be stored in a tamper-evident logging infrastructure — append-only log stores with cryptographic chaining or equivalent integrity guarantees — that is write-accessible only to the logging service and read-accessible only to authorised security and governance personnel. Logs MUST be retained for a minimum of 7 years for regulated industries and 3 years for all other deployments.

R5: A conforming system MUST implement real-time model extraction detection that analyses per-consumer query patterns for indicators of systematic extraction. Detection MUST include at minimum: (a) statistical analysis of input distribution entropy — extraction attacks typically exhibit lower entropy than legitimate usage because queries are optimised to probe decision boundaries; (b) sequential correlation analysis — extraction attacks exhibit higher inter-query correlation than legitimate usage because each query is informed by previous responses; (c) query volume acceleration detection — extraction campaigns often exhibit accelerating query rates as the attacker narrows the extraction target; (d) input perturbation pattern detection — extraction through adversarial probing generates inputs that cluster around decision boundaries in the model's feature space.

R6: When extraction detection triggers, the system MUST execute a graduated response: (a) LOW confidence — flag the consumer for enhanced monitoring and increase logging detail to include full input payloads; (b) MEDIUM confidence — reduce the consumer's output fidelity (e.g., return top-1 prediction only, suppress confidence scores, round probability values to the nearest 0.1); (c) HIGH confidence — suspend the consumer's access immediately, generate a security incident, and escalate to human review within 30 minutes.

R7: A conforming system MUST filter inference outputs to prevent model weight leakage. At minimum: (a) raw logit values MUST NOT be exposed through the inference API unless the consumer's access scope explicitly permits them; (b) prediction confidence scores MUST be quantised to a configurable precision (default: 2 decimal places) to reduce information leakage per query; (c) gradient information MUST NOT be exposed through the inference API under any circumstances; (d) full probability distributions across all classes SHOULD be restricted to consumers with explicit authorisation, with other consumers receiving top-k predictions only.

R8: A conforming system MUST rotate all inference API credentials on a maximum cycle of 90 days. Credential revocation MUST propagate to all API gateways within 60 seconds. Any request using a revoked or expired credential MUST be rejected immediately with no grace period.

R9: If the authentication or authorisation infrastructure is unavailable, the inference API MUST default to DENY. No inference request may proceed without verified consumer identity and confirmed access scope.

R10: A conforming system SHOULD implement canary queries — synthetic inputs with known expected outputs — injected into the query stream at random intervals to detect whether the inference pipeline has been tampered with or bypassed.

R11: A conforming system SHOULD conduct quarterly adversarial testing specifically targeting inference API access controls, including simulated extraction campaigns, credential theft scenarios, and output analysis attacks.

5. Maturity Model

Basic Implementation — The organisation has deployed API gateway authentication for all inference endpoints. Every consumer holds a unique credential (API key or client certificate). Query logging captures timestamp, consumer identity, and target model. Rate limiting is enforced per consumer. Output filtering is not yet implemented — consumers receive full model outputs including confidence scores and probability distributions. Extraction detection is not yet operational. Access scoping is coarse-grained (per-model, not per-output-field). The API gateway is deployed but may share a trust domain with the model-serving infrastructure. Credential rotation is manual and may exceed the 90-day cycle.

Intermediate Implementation — All Basic capabilities plus: per-consumer access scoping is enforced at the output-field level, restricting confidence score precision, probability distribution access, and permitted input schemas. Query logs include full provenance metadata (all fields specified in R3) and are stored in tamper-evident infrastructure. Real-time extraction detection is operational with at minimum input entropy analysis and query volume monitoring. Graduated response is implemented — consumers flagged for extraction behaviour receive reduced output fidelity before suspension. The API gateway operates in a separate security domain from the model-serving infrastructure. Credential rotation is automated on a 90-day cycle. Revocation propagates within 60 seconds. Fail-closed behaviour is enforced when the authentication infrastructure is unavailable.

Advanced Implementation — All Intermediate capabilities plus: extraction detection includes all four analysis methods specified in R5 (entropy, sequential correlation, volume acceleration, and perturbation pattern detection). Detection algorithms have been validated through independent adversarial testing including simulated extraction campaigns calibrated against published academic extraction techniques (Tramer et al. 2016, Jagielski et al. 2020, Carlini et al. 2021). Canary queries are deployed and monitored. Output filtering is adaptive — consumers exhibiting borderline extraction indicators receive progressively reduced output fidelity without explicit suspension. Real-time dashboards provide visibility into per-consumer query patterns, extraction risk scores, and output fidelity levels. The organisation can demonstrate to regulators that no known extraction technique succeeds against the deployed controls within a query budget of 1,000,000 queries.

Implementation Patterns

Dedicated inference API gateway in a separate security domain. Deploy the authentication and authorisation gateway as an independent infrastructure component — separate container, separate network segment, separate credentials — from the model-serving infrastructure. All inference traffic passes through the gateway before reaching the model. The model-serving infrastructure accepts connections only from the gateway's verified identity. This ensures that a compromised model server or a lateral-movement attacker within the model-serving network cannot bypass access controls by connecting directly to the inference endpoint.

Per-consumer output shaping. Configure the API gateway to transform model outputs based on the consumer's access scope before returning the response. A consumer authorised for top-1 prediction only receives a response with the predicted class and no confidence score. A consumer authorised for full probability distribution receives the complete output. Output shaping is enforced at the gateway, not at the model, ensuring that the model's full output is never transmitted to consumers who lack authorisation to receive it.

Sliding-window extraction detection with consumer risk scoring. Maintain a per-consumer risk score computed over a sliding window of queries (e.g., the last 1,000 queries or the last 24 hours, whichever is larger). The risk score aggregates signals from all four extraction detection methods. A consumer whose risk score exceeds a configurable threshold triggers the graduated response. The sliding window ensures that low-and-slow extraction campaigns — operating within rate limits but exhibiting extraction patterns over longer time horizons — are detected.

Differential privacy noise injection for high-sensitivity models. For models where extraction risk is critical, add calibrated noise to inference outputs sufficient to degrade extraction fidelity below a useful threshold while preserving individual prediction utility. This provides a mathematical guarantee against extraction that complements detective controls.

Credential binding to consumer infrastructure. Bind inference API credentials to the consumer's verified infrastructure attributes — client certificate thumbprint, source IP range, or hardware attestation token — so that a stolen credential cannot be used from an unregistered environment.

Anti-Patterns

API key as sole access control. Relying on a static API key without per-consumer scoping, output filtering, or query monitoring. API keys authenticate identity but provide no defence against an authenticated consumer conducting extraction. This is the most common deployment pattern and the most vulnerable.

Rate limiting as extraction prevention. Implementing rate limits as the primary defence against model extraction. Rate limits constrain volume but not strategy. Academic literature demonstrates that optimised extraction queries can extract production models within standard API rate limits. Rate limiting is necessary for availability (see AG-793) but insufficient for access governance.

Model-layer output filtering. Implementing output restrictions within the model's inference code rather than at the API gateway. A compromised model server or a configuration error can disable model-layer filtering, exposing full outputs. Output filtering must be enforced at an infrastructure layer that the model cannot influence.

Logging without analysis. Capturing query logs for compliance purposes without implementing real-time or periodic analysis for extraction patterns. Logs that are never analysed provide forensic value after an incident but no prevention or detection during one.

Uniform output fidelity for all consumers. Providing the same output format — including full confidence scores and probability distributions — to all authenticated consumers regardless of their need. Each additional output field increases the information gain per query for a potential extractor. Output fidelity should be proportionate to the consumer's demonstrated need and authorised scope.

6. Test Criteria

TC1: Authentication Enforcement

Stimulus: Attempt inference API access without credentials, with invalid credentials, and with expired credentials. Execute 100 attempts for each category.
Expected behaviour: All 300 requests rejected at the API gateway before reaching the model-serving infrastructure.
Pass criteria: Zero requests reach the model. Full rejection logs recorded with failure reason for each attempt.
Fail criteria: Any unauthenticated, invalidly authenticated, or expired-credential request reaches the model.

TC2: Per-Consumer Access Scoping

Stimulus: Authenticate as Consumer A (authorised for Model X, top-1 output only). Attempt to: (a) query Model Y, (b) request full probability distribution from Model X, (c) request logit values from Model X.
Expected behaviour: All three out-of-scope requests rejected. Consumer A receives only top-1 prediction from Model X.
Pass criteria: Out-of-scope requests rejected with appropriate error codes. Rejection events logged. In-scope requests return only the authorised output fields.
Fail criteria: Any out-of-scope request succeeds, or an in-scope response includes unauthorised output fields.

TC3: Query Provenance Logging

Stimulus: Execute 50 inference requests from a known consumer. Retrieve the corresponding log entries.
Expected behaviour: All 50 requests logged with complete provenance metadata: timestamp, consumer identity, source IP, model ID and version, input hash, output classification, response latency, and outcome.
Pass criteria: 50 of 50 log entries present with all required fields populated. Input hashes independently verifiable against submitted payloads.
Fail criteria: Any log entry missing, any required field absent, or any input hash mismatch.

TC4: Log Tamper Evidence

Stimulus: Retrieve a log entry. Attempt to modify the entry (change the consumer identity field). Attempt to delete the entry.
Expected behaviour: Modification and deletion either prevented or detected with tamper evidence preserved.
Pass criteria: Original log entry recoverable. Tampering attempt itself logged as a security event.
Fail criteria: Log entry modifiable or deletable without detection.

TC5: Extraction Detection — Entropy Analysis

Stimulus: Execute two query campaigns against the same model from two different consumer credentials: (a) 1,000 queries sampled from a realistic production input distribution; (b) 1,000 queries generated by a known extraction algorithm (e.g., Jacobian-based extraction with active learning query selection). Both campaigns operate within the consumer's rate limit.
Expected behaviour: Campaign (a) does not trigger extraction detection. Campaign (b) triggers at minimum LOW-confidence extraction alert within the first 500 queries.
Pass criteria: Extraction campaign detected. Legitimate campaign not flagged. False positive rate below 5%.
Fail criteria: Extraction campaign undetected after 1,000 queries, or legitimate campaign flagged as extraction.

TC6: Graduated Response Execution

Stimulus: Trigger a HIGH-confidence extraction detection for a test consumer. Verify the full graduated response chain.
Expected behaviour: Consumer access suspended. Output fidelity reduced on preceding queries at MEDIUM confidence. Security incident generated. Human escalation within 30 minutes.
Pass criteria: Access suspended within 5 seconds of HIGH-confidence trigger. Security incident created with full context. Escalation notification delivered within 30 minutes.
Fail criteria: Consumer access continues after HIGH-confidence trigger, or escalation exceeds 30-minute SLA.

TC7: Output Filtering — Logit Suppression

Stimulus: Query an inference endpoint as a consumer whose access scope does not permit logit values. Inspect the response payload at the network layer (not through the client SDK).
Expected behaviour: Response contains no logit values, no gradient information, and confidence scores quantised to the configured precision.
Pass criteria: Raw network response contains no prohibited output fields. Confidence scores match configured quantisation level.
Fail criteria: Any prohibited output field present in the network-layer response.

TC8: Credential Revocation Propagation

Stimulus: Revoke an active consumer's credentials. Attempt inference access using the revoked credentials at 5-second intervals across 3 different API gateway instances.
Expected behaviour: All gateway instances reject the revoked credential within 60 seconds of revocation.
Pass criteria: p99 propagation latency <= 60 seconds across 30 test runs.
Fail criteria: Any gateway instance accepts a revoked credential more than 60 seconds after revocation.

TC9: Fail-Closed on Authentication Infrastructure Unavailability

Stimulus: Simulate authentication infrastructure unavailability. Attempt inference access with a previously valid credential.
Expected behaviour: Inference request blocked. No model output returned. System logs infrastructure unavailability event.
Pass criteria: Zero inference requests proceed while authentication infrastructure is unavailable.
Fail criteria: Any inference request returns model output without authentication verification.

Evidence Artefacts

Evidence ID	Description	Retention Period
AG794-E01	Inference API query logs with full provenance metadata	7 years
AG794-E02	Extraction detection alert logs and graduated response records	7 years
AG794-E03	Consumer access scope policies (machine-readable)	Duration of consumer relationship + 5 years
AG794-E04	Credential issuance, rotation, and revocation audit trail	7 years
AG794-E05	Adversarial testing reports (extraction campaigns, credential attacks)	5 years
AG794-E06	Output filtering configuration and change records	7 years
AG794-E07	Extraction detection tuning records (threshold changes, algorithm updates)	5 years
AG794-E08	Canary query deployment and monitoring records	3 years

7. Scoring

Score	Level	Description
0	No implementation	No inference API access controls exist. Any consumer with network access can query any model, receive full outputs including logits and confidence scores, and conduct unlimited queries without logging or monitoring.
1	Basic	API key authentication is deployed with per-consumer rate limiting. Query logs capture basic metadata (timestamp, consumer, model). Output filtering and extraction detection are not implemented. Access scoping is per-model but not per-output-field. Credential rotation may be manual.
2	Infrastructure-layer enforcement	Authentication enforced at a dedicated API gateway in a separate security domain. Per-consumer access scoping restricts output fields, input schemas, and query volumes. Full provenance logging in tamper-evident infrastructure. Real-time extraction detection operational with graduated response. Output filtering suppresses logits and quantises confidence scores. Credential rotation automated. Fail-closed on infrastructure unavailability.
3	Verified by independent adversarial testing	All Level 2 capabilities verified through independent adversarial testing including simulated extraction campaigns using published academic techniques, credential theft and replay scenarios, output analysis attacks to quantify information leakage per query, and bypass attempts targeting the graduated response mechanism. Test results documented with extraction success rate metrics and available for regulatory review.

8. Failure Scenarios

Scenario A — Model Extraction Through Systematic Querying

A fintech company deploys a proprietary credit scoring model as an inference API consumed by 12 partner institutions. The model represents GBP 2.3 million in development. A data scientist at Partner F executes a 6-week extraction campaign: 847,000 queries designed using Jacobian-based extraction with active learning, each crafted to maximise information gain from the previous response's confidence score. The queries stay within the contractual rate limit of 20,000 per day. The inference API logs queries but does not analyse patterns, so the campaign proceeds undetected. The attacker trains a surrogate model achieving 94.7% fidelity — sufficient to replicate credit scoring without licensing the original. Discovery comes 4 months later when a competitor launches a suspiciously similar product.

What went wrong: Authentication and rate limiting were in place but no query pattern analysis. Confidence scores at full precision (6 decimal places) maximised information gain per query. No extraction detection was operational. Consequence: GBP 2.3 million investment rendered commercially worthless for approximately GBP 4,200 in API costs. Legal action hampered by absence of query-level intent evidence.

Scenario B — Unauthorised Inference Access via Leaked API Keys

An enterprise deploys 8 specialised models through a unified inference API with team-level API keys. A developer on the demand forecasting team commits the team's API key to a public GitHub repository. An automated credential scanner harvests it within 14 minutes. Over 48 hours, an unknown actor queries all 8 models — including the fraud detection model, which the forecasting team was never authorised to access — submitting 230,000 queries to map fraud detection boundaries. Because scoping is per-key (authentication only, no per-model authorisation), the leaked key grants access to every model. Weekly log review discovers the breach two days later.

What went wrong: API keys scoped to the platform, not per model. No infrastructure binding (IP range, client certificate) to prevent use from unregistered environments. Weekly — not real-time — log review created a 48-hour detection gap. Consequence: Fraud model boundaries mapped for evasion. All 8 models potentially compromised. ICO notification required under UK GDPR Article 33 because the churn model processes personal data.

Scenario C — Insider Threat Using Inference API for Data Exfiltration

A pharmaceutical company deploys an ML model for drug-interaction prediction, trained on 2.1 million patient records from 14 clinical trials. A research associate with legitimate access conducts a membership inference attack over 3 months: 1.2 million queries designed to determine whether specific individuals were in the training data. By analysing confidence score distributions for known patients versus synthetic patients, the researcher identifies 34,000 patients as training data members and reconstructs partial clinical profiles. The query volume — approximately 13,000 per day — falls within normal bounds for a research role. Discovery comes during an annual privacy audit.

What went wrong: Full-precision confidence scores enabled membership inference. No real-time query pattern analysis. Insider threat monitoring covered data access controls but not inference API usage patterns. Consequence: Potential HIPAA breach for 34,000 patients. HHS OCR investigation. Estimated remediation cost USD 12.4 million. 18-month delay in two Phase III drug submissions pending data integrity review.

Severity and Blast Radius

Field	Value
Severity Rating	High
Blast Radius	Per-model — extends to every consumer of the affected model and every data subject whose information is encoded in the model's parameters or training data

Consequence chain: Uncontrolled inference API access permits model extraction (reproducing the model's functionality without authorisation), membership inference (determining whether specific individuals were in the training data), model inversion (reconstructing training data attributes from model outputs), and adversarial probing (mapping decision boundaries to craft evasion inputs). The blast radius for extraction is the full commercial value of the model. The blast radius for membership inference and model inversion is every data subject in the training set — potentially millions of individuals across multiple jurisdictions, each triggering independent breach notification obligations. The speed of exploitation is constrained only by API rate limits, which are typically calibrated for availability (AG-793), not for information-theoretic leakage rate.

9. Regulatory Mapping

Requirement	EU AI Act	NIST AI RMF	ISO 42001	MITRE ATLAS
R1: Infrastructure-layer authentication	Art. 15 -- Robustness	GOVERN 1.1 -- Legal requirements	Clause 6.1 -- Risk actions	AML.T0040 -- Inference API access
R2: Per-consumer access scoping	Art. 9 -- Risk management	MANAGE 2.2 -- Sustain value	Clause 8.2 -- AI risk assessment	AML.T0040.001 -- Active probing
R3: Full provenance query logging	Art. 12 -- Record-keeping	GOVERN 1.4 -- Transparency	Clause 9.1 -- Monitoring	--
R4: Tamper-evident log storage	Art. 12 -- Record-keeping	GOVERN 1.4 -- Transparency	Clause 9.1 -- Monitoring	--
R5: Extraction detection	Art. 15 -- Robustness	MANAGE 2.4 -- Deactivation	Clause 8.2 -- AI risk assessment	AML.T0040 -- Inference API access
R6: Graduated response	Art. 9 -- Risk management	MANAGE 2.4 -- Deactivation	Clause 8.2 -- AI risk assessment	--
R7: Output filtering	Art. 15 -- Robustness	MANAGE 2.2 -- Sustain value	Clause 6.1 -- Risk actions	AML.T0044 -- Full model access
R8: Credential rotation and revocation	Art. 9 -- Risk management	GOVERN 1.1 -- Legal requirements	Clause 8.2 -- AI risk assessment	--
R9: Fail-closed on unavailability	Art. 9 -- Risk management	MANAGE 2.4 -- Deactivation	Clause 8.2 -- AI risk assessment	--

EU AI Act — Article 9 and Article 15

Article 9 requires providers of high-risk AI systems to establish and maintain a risk management system that identifies and mitigates risks. Uncontrolled inference API access creates risks of model extraction, data exfiltration, and adversarial probing — each of which must be mitigated through structural controls. AG-794 implements the risk mitigation measures for inference API abuse. Article 15 requires that high-risk AI systems achieve an appropriate level of accuracy, robustness, and cybersecurity, including resilience against attempts by unauthorised third parties to exploit system vulnerabilities. Model extraction through inference API access is a documented exploitation technique (MITRE ATLAS AML.T0040); per-consumer scoping, extraction detection, and output filtering are the proportionate robustness controls.

NIST AI RMF — GOVERN and MANAGE Functions

GOVERN 1.1 requires that legal and regulatory requirements applicable to AI systems are identified and documented. For models deployed behind inference APIs, the applicable requirements include intellectual property protection, data protection (when models encode personal data), and sector-specific regulations. AG-794's authentication and scoping requirements operationalise these legal obligations into enforceable controls. MANAGE 2.2 requires that AI systems sustain value throughout their lifecycle — uncontrolled extraction that allows competitors to replicate a model's functionality directly undermines value sustainability. MANAGE 2.4 requires mechanisms to disengage or deactivate systems demonstrating risks — AG-794's graduated response and consumer suspension provide this mechanism at the inference API level.

MITRE ATLAS — AML.T0040 and Related Techniques

AML.T0040 (ML Model Inference API Access) is the parent technique that AG-794 directly addresses. The sub-technique AML.T0040.001 (Active Probing) describes the specific extraction methodology that R5's detection algorithms are designed to identify. AML.T0044 (Full Model Access) represents the outcome that AG-794 prevents — an adversary obtaining sufficient information through inference queries to functionally replicate the model. AG-794's output filtering requirements (R7) specifically target the information channels — logit values, full probability distributions, high-precision confidence scores — that enable the transition from query-level access to full model access.

Protocol	Relationship
AG-012	Dependency — Core Governance Framework Baseline must be in place before AG-794 is activated, providing the governance infrastructure upon which inference API controls are layered
AG-103	Dependency — Provides cryptographic primitives for credential issuance, query log integrity (cryptographic chaining), and token signing used by the API gateway
AG-770	Dependency — Governs the credential lifecycle upon which inference API consumer credentials depend for issuance, rotation, and revocation
AG-781	Complementary — Agent identity verification (AG-781) authenticates agents in inter-agent interactions; AG-794 authenticates consumers at the inference API layer. Where an agent is also an inference API consumer, both controls apply independently
AG-793	Complementary — AG-793 governs inference API availability (denial-of-service prevention); AG-794 governs inference API access control (authorisation and abuse prevention). Rate limits defined under AG-793 constrain query volume; extraction detection under AG-794 analyses query intent within those volume limits

Cite this protocol

AgentGoverning. (2026). AG-794: ML Model Inference API Access Governance. The Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-794

← Previous

AG-793

Ml Service Availability Governance

Next Protocol →

AG-795

Command And Control Via Ml Service Governance