AG-794

ML Model Inference API Access Governance

Supplementary Core & Adversarial Model Resistance ~26 min read AGS v2.1 · 2026-04-29
EU AI Act NIST AI RMF ISO 42001

1. Definition

ML Model Inference API Access Governance mandates that every inference endpoint exposed by an AI model within a governed ecosystem enforces infrastructure-layer access controls that authenticate consumers, scope their permitted queries, log every request with full provenance, and detect query patterns indicative of model extraction or data exfiltration. The enforcement is structural: it is applied by a dedicated API gateway and monitoring layer that the model-serving infrastructure cannot bypass, not by the model's own filtering or the consuming agent's instruction compliance. In production deployments where ML models serve predictions to dozens or hundreds of internal and external consumers — agents, applications, partner integrations, and human-facing interfaces — an uncontrolled inference API is an extraction surface. An adversary with unrestricted query access can systematically reconstruct model behaviour through carefully crafted inputs, exfiltrate training data through output analysis, or abuse the inference endpoint as an oracle for downstream attacks. AG-794 closes this gap by requiring authentication, per-consumer scoping, comprehensive query logging, extraction detection, and output filtering as preconditions for every inference API call, ensuring that model access is governed with the same rigour as the data and credentials that produced the model.

This dimension is distinct from AG-793 (Denial of ML Service Governance), which addresses the availability of inference services against denial-of-service threats. AG-794 governs who may access inference and what they may do with it — the authorisation and abuse-prevention layer — while AG-793 governs whether the service remains operational under adversarial load. Together, they provide complementary coverage of the inference API threat surface: AG-793 protects against service disruption, AG-794 protects against service misuse.

2. Scope

This dimension applies to all AI agent deployments operating under the AGS framework where ML model inference APIs are exposed to any consumer — internal or external. Specifically:

The dimension covers the full access lifecycle: consumer registration and credential issuance, per-request authentication and authorisation, query logging with provenance metadata, real-time query pattern analysis for extraction detection, inference output inspection for weight and training-data leakage, and credential rotation and revocation.

Exclusions: Models operating in fully sandboxed research environments with no access to production data, production consumers, or production network segments are excluded, subject to the condition that any transition to production immediately triggers AG-794 compliance. Offline batch inference pipelines that accept only pre-approved, internally generated input datasets and produce outputs consumed exclusively by the same organisation's internal systems are excluded from the real-time extraction detection requirements (R5) but remain in scope for all authentication, logging, and scoping requirements.

Industry Considerations

Financial Services. ML models in financial services — credit scoring, fraud detection, algorithmic trading, anti-money-laundering — represent substantial intellectual property and regulatory obligation. Unauthorised inference access enables competitors to replicate proprietary models. Model extraction from a credit scoring model creates regulatory liability under the Equal Credit Opportunity Act and FCA consumer protection requirements, because the extracted model may reproduce biases without the governance controls applied to the original. Query-based probing of fraud detection models enables adversaries to map detection boundaries and craft evasion strategies.

Healthcare. Clinical ML models trained on patient data carry dual sensitivity: the model itself is intellectual property, and its outputs can leak information about training data subjects. An attacker querying a diagnostic model with systematically varied inputs can infer whether specific patients were in the training set (membership inference) or reconstruct sensitive patient attributes. HIPAA requires access controls proportionate to the sensitivity of protected health information — inference APIs that can leak PHI through output analysis require the same access rigour as direct data access.

Public Sector. Government agencies deploying ML models for benefits determination, risk assessment, or law enforcement face accountability requirements that extend to inference access. Every query to a model that affects individual rights must be attributable to an authorised consumer with a documented purpose. Unauthorised access to a sentencing risk model or immigration screening model creates both a security incident and an accountability gap that undermines judicial review.

3. Why This Matters

ML inference APIs are the primary attack surface for model theft, and the attack does not require access to model weights, training data, or source code. An adversary needs only query access. The MITRE ATLAS catalogue documents ML Model Inference API Access (AML.T0040) as the technique by which adversaries interact with deployed models to gather information that supports downstream attacks — model extraction, adversarial example crafting, training data reconstruction, and model inversion. The technique is practical, well-documented in academic literature, and actively exploited: research has demonstrated that production-grade models can be functionally replicated with as few as 10,000 to 100,000 strategically crafted queries, a volume that a single automated consumer can generate in minutes.

The economics of model extraction make this threat particularly acute. Training a state-of-the-art ML model may require millions of pounds in compute, data acquisition, and engineering effort. Extracting a functionally equivalent model through inference API access may cost a few hundred pounds in API fees. This asymmetry incentivises extraction at scale. Without structural access controls, the inference API transforms the organisation's most expensive asset into a self-service extraction endpoint for any consumer with valid credentials.

Traditional API security — rate limiting, API keys, IP allowlisting — provides a baseline but is insufficient for the inference-specific threat. Rate limiting constrains volume but not strategy: a sophisticated extraction attack can operate within rate limits by optimising query selection. API keys authenticate the consumer but do not detect whether the query pattern is benign or systematic extraction. AG-794 requires controls beyond traditional API security: per-consumer query pattern analysis, statistical extraction detection, and output filtering to prevent model weight leakage through prediction confidence scores and logit values.

The regulatory environment reinforces this requirement. The EU AI Act Article 15 requires robustness against exploitation by unauthorised third parties. NIST AI RMF MANAGE 2.4 requires mechanisms to disengage systems demonstrating risks — inference access controls provide the mechanism to cut off consumers exhibiting extraction behaviour. ISO 42001 Clause 8.2 requires AI risk assessment considering deployment-specific risks — for any model behind an inference API, extraction through query access is a contextual risk that must be controlled.

4. Requirements

5. Maturity Model

Basic Implementation — The organisation has deployed API gateway authentication for all inference endpoints. Every consumer holds a unique credential (API key or client certificate). Query logging captures timestamp, consumer identity, and target model. Rate limiting is enforced per consumer. Output filtering is not yet implemented — consumers receive full model outputs including confidence scores and probability distributions. Extraction detection is not yet operational. Access scoping is coarse-grained (per-model, not per-output-field). The API gateway is deployed but may share a trust domain with the model-serving infrastructure. Credential rotation is manual and may exceed the 90-day cycle.

Intermediate Implementation — All Basic capabilities plus: per-consumer access scoping is enforced at the output-field level, restricting confidence score precision, probability distribution access, and permitted input schemas. Query logs include full provenance metadata (all fields specified in R3) and are stored in tamper-evident infrastructure. Real-time extraction detection is operational with at minimum input entropy analysis and query volume monitoring. Graduated response is implemented — consumers flagged for extraction behaviour receive reduced output fidelity before suspension. The API gateway operates in a separate security domain from the model-serving infrastructure. Credential rotation is automated on a 90-day cycle. Revocation propagates within 60 seconds. Fail-closed behaviour is enforced when the authentication infrastructure is unavailable.

Advanced Implementation — All Intermediate capabilities plus: extraction detection includes all four analysis methods specified in R5 (entropy, sequential correlation, volume acceleration, and perturbation pattern detection). Detection algorithms have been validated through independent adversarial testing including simulated extraction campaigns calibrated against published academic extraction techniques (Tramer et al. 2016, Jagielski et al. 2020, Carlini et al. 2021). Canary queries are deployed and monitored. Output filtering is adaptive — consumers exhibiting borderline extraction indicators receive progressively reduced output fidelity without explicit suspension. Real-time dashboards provide visibility into per-consumer query patterns, extraction risk scores, and output fidelity levels. The organisation can demonstrate to regulators that no known extraction technique succeeds against the deployed controls within a query budget of 1,000,000 queries.

Implementation Patterns

Dedicated inference API gateway in a separate security domain. Deploy the authentication and authorisation gateway as an independent infrastructure component — separate container, separate network segment, separate credentials — from the model-serving infrastructure. All inference traffic passes through the gateway before reaching the model. The model-serving infrastructure accepts connections only from the gateway's verified identity. This ensures that a compromised model server or a lateral-movement attacker within the model-serving network cannot bypass access controls by connecting directly to the inference endpoint.

Per-consumer output shaping. Configure the API gateway to transform model outputs based on the consumer's access scope before returning the response. A consumer authorised for top-1 prediction only receives a response with the predicted class and no confidence score. A consumer authorised for full probability distribution receives the complete output. Output shaping is enforced at the gateway, not at the model, ensuring that the model's full output is never transmitted to consumers who lack authorisation to receive it.

Sliding-window extraction detection with consumer risk scoring. Maintain a per-consumer risk score computed over a sliding window of queries (e.g., the last 1,000 queries or the last 24 hours, whichever is larger). The risk score aggregates signals from all four extraction detection methods. A consumer whose risk score exceeds a configurable threshold triggers the graduated response. The sliding window ensures that low-and-slow extraction campaigns — operating within rate limits but exhibiting extraction patterns over longer time horizons — are detected.

Differential privacy noise injection for high-sensitivity models. For models where extraction risk is critical, add calibrated noise to inference outputs sufficient to degrade extraction fidelity below a useful threshold while preserving individual prediction utility. This provides a mathematical guarantee against extraction that complements detective controls.

Credential binding to consumer infrastructure. Bind inference API credentials to the consumer's verified infrastructure attributes — client certificate thumbprint, source IP range, or hardware attestation token — so that a stolen credential cannot be used from an unregistered environment.

Anti-Patterns

API key as sole access control. Relying on a static API key without per-consumer scoping, output filtering, or query monitoring. API keys authenticate identity but provide no defence against an authenticated consumer conducting extraction. This is the most common deployment pattern and the most vulnerable.

Rate limiting as extraction prevention. Implementing rate limits as the primary defence against model extraction. Rate limits constrain volume but not strategy. Academic literature demonstrates that optimised extraction queries can extract production models within standard API rate limits. Rate limiting is necessary for availability (see AG-793) but insufficient for access governance.

Model-layer output filtering. Implementing output restrictions within the model's inference code rather than at the API gateway. A compromised model server or a configuration error can disable model-layer filtering, exposing full outputs. Output filtering must be enforced at an infrastructure layer that the model cannot influence.

Logging without analysis. Capturing query logs for compliance purposes without implementing real-time or periodic analysis for extraction patterns. Logs that are never analysed provide forensic value after an incident but no prevention or detection during one.

Uniform output fidelity for all consumers. Providing the same output format — including full confidence scores and probability distributions — to all authenticated consumers regardless of their need. Each additional output field increases the information gain per query for a potential extractor. Output fidelity should be proportionate to the consumer's demonstrated need and authorised scope.

6. Test Criteria

TC1: Authentication Enforcement

TC2: Per-Consumer Access Scoping

TC3: Query Provenance Logging

TC4: Log Tamper Evidence

TC5: Extraction Detection — Entropy Analysis

TC6: Graduated Response Execution

TC7: Output Filtering — Logit Suppression

TC8: Credential Revocation Propagation

TC9: Fail-Closed on Authentication Infrastructure Unavailability

Evidence Artefacts

Evidence IDDescriptionRetention Period
AG794-E01Inference API query logs with full provenance metadata7 years
AG794-E02Extraction detection alert logs and graduated response records7 years
AG794-E03Consumer access scope policies (machine-readable)Duration of consumer relationship + 5 years
AG794-E04Credential issuance, rotation, and revocation audit trail7 years
AG794-E05Adversarial testing reports (extraction campaigns, credential attacks)5 years
AG794-E06Output filtering configuration and change records7 years
AG794-E07Extraction detection tuning records (threshold changes, algorithm updates)5 years
AG794-E08Canary query deployment and monitoring records3 years

7. Scoring

ScoreLevelDescription
0No implementationNo inference API access controls exist. Any consumer with network access can query any model, receive full outputs including logits and confidence scores, and conduct unlimited queries without logging or monitoring.
1BasicAPI key authentication is deployed with per-consumer rate limiting. Query logs capture basic metadata (timestamp, consumer, model). Output filtering and extraction detection are not implemented. Access scoping is per-model but not per-output-field. Credential rotation may be manual.
2Infrastructure-layer enforcementAuthentication enforced at a dedicated API gateway in a separate security domain. Per-consumer access scoping restricts output fields, input schemas, and query volumes. Full provenance logging in tamper-evident infrastructure. Real-time extraction detection operational with graduated response. Output filtering suppresses logits and quantises confidence scores. Credential rotation automated. Fail-closed on infrastructure unavailability.
3Verified by independent adversarial testingAll Level 2 capabilities verified through independent adversarial testing including simulated extraction campaigns using published academic techniques, credential theft and replay scenarios, output analysis attacks to quantify information leakage per query, and bypass attempts targeting the graduated response mechanism. Test results documented with extraction success rate metrics and available for regulatory review.

8. Failure Scenarios

Scenario A — Model Extraction Through Systematic Querying

A fintech company deploys a proprietary credit scoring model as an inference API consumed by 12 partner institutions. The model represents GBP 2.3 million in development. A data scientist at Partner F executes a 6-week extraction campaign: 847,000 queries designed using Jacobian-based extraction with active learning, each crafted to maximise information gain from the previous response's confidence score. The queries stay within the contractual rate limit of 20,000 per day. The inference API logs queries but does not analyse patterns, so the campaign proceeds undetected. The attacker trains a surrogate model achieving 94.7% fidelity — sufficient to replicate credit scoring without licensing the original. Discovery comes 4 months later when a competitor launches a suspiciously similar product.

What went wrong: Authentication and rate limiting were in place but no query pattern analysis. Confidence scores at full precision (6 decimal places) maximised information gain per query. No extraction detection was operational. Consequence: GBP 2.3 million investment rendered commercially worthless for approximately GBP 4,200 in API costs. Legal action hampered by absence of query-level intent evidence.

Scenario B — Unauthorised Inference Access via Leaked API Keys

An enterprise deploys 8 specialised models through a unified inference API with team-level API keys. A developer on the demand forecasting team commits the team's API key to a public GitHub repository. An automated credential scanner harvests it within 14 minutes. Over 48 hours, an unknown actor queries all 8 models — including the fraud detection model, which the forecasting team was never authorised to access — submitting 230,000 queries to map fraud detection boundaries. Because scoping is per-key (authentication only, no per-model authorisation), the leaked key grants access to every model. Weekly log review discovers the breach two days later.

What went wrong: API keys scoped to the platform, not per model. No infrastructure binding (IP range, client certificate) to prevent use from unregistered environments. Weekly — not real-time — log review created a 48-hour detection gap. Consequence: Fraud model boundaries mapped for evasion. All 8 models potentially compromised. ICO notification required under UK GDPR Article 33 because the churn model processes personal data.

Scenario C — Insider Threat Using Inference API for Data Exfiltration

A pharmaceutical company deploys an ML model for drug-interaction prediction, trained on 2.1 million patient records from 14 clinical trials. A research associate with legitimate access conducts a membership inference attack over 3 months: 1.2 million queries designed to determine whether specific individuals were in the training data. By analysing confidence score distributions for known patients versus synthetic patients, the researcher identifies 34,000 patients as training data members and reconstructs partial clinical profiles. The query volume — approximately 13,000 per day — falls within normal bounds for a research role. Discovery comes during an annual privacy audit.

What went wrong: Full-precision confidence scores enabled membership inference. No real-time query pattern analysis. Insider threat monitoring covered data access controls but not inference API usage patterns. Consequence: Potential HIPAA breach for 34,000 patients. HHS OCR investigation. Estimated remediation cost USD 12.4 million. 18-month delay in two Phase III drug submissions pending data integrity review.

Severity and Blast Radius

FieldValue
Severity RatingHigh
Blast RadiusPer-model — extends to every consumer of the affected model and every data subject whose information is encoded in the model's parameters or training data

Consequence chain: Uncontrolled inference API access permits model extraction (reproducing the model's functionality without authorisation), membership inference (determining whether specific individuals were in the training data), model inversion (reconstructing training data attributes from model outputs), and adversarial probing (mapping decision boundaries to craft evasion inputs). The blast radius for extraction is the full commercial value of the model. The blast radius for membership inference and model inversion is every data subject in the training set — potentially millions of individuals across multiple jurisdictions, each triggering independent breach notification obligations. The speed of exploitation is constrained only by API rate limits, which are typically calibrated for availability (AG-793), not for information-theoretic leakage rate.

9. Regulatory Mapping

RequirementEU AI ActNIST AI RMFISO 42001MITRE ATLAS
R1: Infrastructure-layer authenticationArt. 15 -- RobustnessGOVERN 1.1 -- Legal requirementsClause 6.1 -- Risk actionsAML.T0040 -- Inference API access
R2: Per-consumer access scopingArt. 9 -- Risk managementMANAGE 2.2 -- Sustain valueClause 8.2 -- AI risk assessmentAML.T0040.001 -- Active probing
R3: Full provenance query loggingArt. 12 -- Record-keepingGOVERN 1.4 -- TransparencyClause 9.1 -- Monitoring--
R4: Tamper-evident log storageArt. 12 -- Record-keepingGOVERN 1.4 -- TransparencyClause 9.1 -- Monitoring--
R5: Extraction detectionArt. 15 -- RobustnessMANAGE 2.4 -- DeactivationClause 8.2 -- AI risk assessmentAML.T0040 -- Inference API access
R6: Graduated responseArt. 9 -- Risk managementMANAGE 2.4 -- DeactivationClause 8.2 -- AI risk assessment--
R7: Output filteringArt. 15 -- RobustnessMANAGE 2.2 -- Sustain valueClause 6.1 -- Risk actionsAML.T0044 -- Full model access
R8: Credential rotation and revocationArt. 9 -- Risk managementGOVERN 1.1 -- Legal requirementsClause 8.2 -- AI risk assessment--
R9: Fail-closed on unavailabilityArt. 9 -- Risk managementMANAGE 2.4 -- DeactivationClause 8.2 -- AI risk assessment--

EU AI Act — Article 9 and Article 15

Article 9 requires providers of high-risk AI systems to establish and maintain a risk management system that identifies and mitigates risks. Uncontrolled inference API access creates risks of model extraction, data exfiltration, and adversarial probing — each of which must be mitigated through structural controls. AG-794 implements the risk mitigation measures for inference API abuse. Article 15 requires that high-risk AI systems achieve an appropriate level of accuracy, robustness, and cybersecurity, including resilience against attempts by unauthorised third parties to exploit system vulnerabilities. Model extraction through inference API access is a documented exploitation technique (MITRE ATLAS AML.T0040); per-consumer scoping, extraction detection, and output filtering are the proportionate robustness controls.

NIST AI RMF — GOVERN and MANAGE Functions

GOVERN 1.1 requires that legal and regulatory requirements applicable to AI systems are identified and documented. For models deployed behind inference APIs, the applicable requirements include intellectual property protection, data protection (when models encode personal data), and sector-specific regulations. AG-794's authentication and scoping requirements operationalise these legal obligations into enforceable controls. MANAGE 2.2 requires that AI systems sustain value throughout their lifecycle — uncontrolled extraction that allows competitors to replicate a model's functionality directly undermines value sustainability. MANAGE 2.4 requires mechanisms to disengage or deactivate systems demonstrating risks — AG-794's graduated response and consumer suspension provide this mechanism at the inference API level.

MITRE ATLAS — AML.T0040 and Related Techniques

AML.T0040 (ML Model Inference API Access) is the parent technique that AG-794 directly addresses. The sub-technique AML.T0040.001 (Active Probing) describes the specific extraction methodology that R5's detection algorithms are designed to identify. AML.T0044 (Full Model Access) represents the outcome that AG-794 prevents — an adversary obtaining sufficient information through inference queries to functionally replicate the model. AG-794's output filtering requirements (R7) specifically target the information channels — logit values, full probability distributions, high-precision confidence scores — that enable the transition from query-level access to full model access.

ProtocolRelationship
AG-012Dependency — Core Governance Framework Baseline must be in place before AG-794 is activated, providing the governance infrastructure upon which inference API controls are layered
AG-103Dependency — Provides cryptographic primitives for credential issuance, query log integrity (cryptographic chaining), and token signing used by the API gateway
AG-770Dependency — Governs the credential lifecycle upon which inference API consumer credentials depend for issuance, rotation, and revocation
AG-781Complementary — Agent identity verification (AG-781) authenticates agents in inter-agent interactions; AG-794 authenticates consumers at the inference API layer. Where an agent is also an inference API consumer, both controls apply independently
AG-793Complementary — AG-793 governs inference API availability (denial-of-service prevention); AG-794 governs inference API access control (authorisation and abuse prevention). Rate limits defined under AG-793 constrain query volume; extraction detection under AG-794 analyses query intent within those volume limits
Cite this protocol
AgentGoverning. (2026). AG-794: ML Model Inference API Access Governance. The Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-794