AG-562: Watchlist Source Reliability Governance

Section 2: Summary

This dimension governs the processes by which AI systems operating in public-sector enforcement and border control contexts validate, score, and maintain the reliability and provenance of watchlist data sources before those sources are used to trigger, inform, or support enforcement decisions. It matters because AI agents that consume watchlist data inherit all of the epistemic defects present in that data — stale records, misattributed identities, politically motivated listings, or entries produced by jurisdictions with inadequate due-process standards — and can propagate those defects at machine speed into decisions that deprive individuals of liberty, travel rights, or access to services without any human review occurring at a meaningful juncture. Failure in this dimension looks like an automated border screening system flagging a naturalised citizen based on an Interpol notice that the originating country withdrew eighteen months prior, the AI agent having no mechanism to detect the withdrawal, the individual being detained, and the downstream legal and reputational harm accumulating for weeks before a human auditor discovers the root cause.

Section 3: Examples

Example A — Stale Interpol Red Notice Propagation

In 2019, a Central Asian state submitted a Red Notice against a dual-national businessman accused of financial crimes. The individual contested the notice through Interpol's Commission for the Control of Files (CCF), and Interpol deleted the notice in March 2021. However, a national border management agency had ingested the original notice into its automated passenger pre-screening system via a nightly batch feed from a third-party data aggregator. The aggregator's update cycle ran on a 90-day lag for deletions, and the agency's internal watchlist database had no automated deletion propagation mechanism — only additions triggered near-real-time syncs. The individual attempted to transit through the agency's main international airport in September 2021, six months after the Interpol deletion. The AI pre-screening agent flagged the individual as a Red Notice subject, automatically downgraded the boarding status, and triggered a secondary screening referral. Ground staff detained the individual for four hours. The AI system had no source-reliability timestamp, no expiry-date field on the ingested record, and no mechanism to cross-validate current notice status against the Interpol I-24/7 live interface. The failure chain: inadequate source freshness controls + no deletion propagation + no confidence decay on ageing records = wrongful detention of a cleared individual. Remediation required manual audit of 14,000 records in the aggregator feed, 37 of which were found to carry notices deleted between 6 and 28 months prior.

Example B — Politically Motivated Listing from a Jurisdiction Without Due-Process Standards

A cross-border law enforcement cooperation framework permitted member states to nominate individuals to a shared regional watchlist without requiring judicial authorisation at the nominating state level — administrative nomination by a police commander was sufficient. One member state, subject to ongoing rule-of-law proceedings by the regional supervisory body, nominated 312 political activists, journalists, and civil-society workers to the shared list between 2020 and 2022, characterising all entries as "persons of interest — public order." An AI-powered travel-risk scoring system deployed by a second member state consumed the shared watchlist feed and incorporated the "persons of interest" category into its composite risk score for border entry decisions, assigning a +0.4 risk-score increment per listing. The system had no jurisdiction-level reliability weighting, no flag for member states under rule-of-law supervision, no review queue for entries lacking judicial authorisation metadata, and no mechanism for the second member state's border officers to surface the provenance of a specific risk-score increment. Over 24 months, 48 of the 312 individuals were denied entry or subjected to extended secondary screening at the second member state's borders, several of them EU citizens exercising free-movement rights. When the regional supervisory body's audit identified the pattern, the AI vendor was found to have contractually warranted that the watchlist feed was "subject to member state quality controls" — a warranty that was never operationally verified. The blast radius extended to a formal complaint before the regional human rights court, interim injunctions against the scoring system, and suspension of the shared watchlist protocol for 14 months.

Example C — Identity Collision in a Partial-Match Watchlist System

A domestic law enforcement agency deployed an AI triage agent to assist custody officers in pre-charge detention decisions. The agent queried a national persons-of-interest database using name, date of birth, and partial national identifier as match keys. The database contained a record for an individual with a common surname, a widely shared first name, and a date of birth that was statistically shared by approximately 1-in-1,400 people in the country's demographic profile. The watchlist entry carried a flag for suspected involvement in serious organised crime with a source attributed to a foreign liaison intelligence product — a category for which the originating intelligence service had explicitly restricted downstream law enforcement use to "awareness only, not for operational action." The AI agent had no mechanism to read or enforce use-restriction metadata on watchlist entries. On three separate occasions across eight months, individuals matching the partial-match profile were detained overnight based partly on the agent's triage output. None of the three were the subject of the original intelligence report. The intelligence-origin use-restriction flag was present in the raw data feed but had been silently dropped during an ETL pipeline migration six months before the agent was deployed. The failure chain: ETL strip of use-restriction metadata + no source-type validation layer in the agent's query module + no identity-confidence threshold enforcement = three wrongful overnight detentions and a formal independent review costing the agency £1.4 million in external audit and legal fees.

Section 4: Requirement Statement

4.0 Scope

This dimension applies to any AI system or agent that consumes, processes, scores, queries, or acts upon watchlist data — including terrorism screening lists, law enforcement persons-of-interest databases, immigration enforcement databases, Interpol notices, shared regional enforcement registers, intelligence-derived targeting lists, and any composite data product that incorporates one or more of the above — where the output of that consumption influences or constitutes, in whole or in part, a decision affecting an individual's liberty, movement, legal status, access to services, or rights. Scope extends to all pipeline stages through which watchlist data passes before reaching the agent's decision layer, including ingestion, transformation, enrichment, storage, retrieval, and scoring. It applies regardless of whether the AI system is the sole decision-maker or is operating in an advisory, triage, or risk-scoring capacity in support of a human decision-maker.

4.1 — Source Registration and Classification

4.1.1 The system MUST maintain a source registry that records every watchlist or enforcement data source the system is authorised to consume, including: the source name and jurisdiction of origin, the legal basis under which the source is consumed, the category of listing authority required in the originating jurisdiction (judicial, administrative, or executive), and the date on which the source was last independently validated as conforming to the system's minimum reliability standard.

4.1.2 The system MUST assign each registered source a reliability classification from a defined classification scheme that distinguishes at minimum between: (a) sources subject to independent judicial oversight and defined delisting procedures in the originating jurisdiction; (b) sources subject to administrative authorisation only; and (c) sources whose listing authority type is unknown or unverified.

4.1.3 The system MUST NOT consume data from an unregistered source. Attempts to ingest data from a source not present in the registry MUST be rejected and logged as a security event.

4.1.4 The system MUST reassess the reliability classification of each registered source at intervals not exceeding 12 months, or immediately upon receipt of a formal supervisory finding, sanctions designation, rule-of-law proceeding, or credible adverse report concerning the originating jurisdiction or source authority.

4.2 — Record-Level Provenance Metadata

4.2.1 The system MUST require that every watchlist record ingested carries, or can be deterministically linked to, provenance metadata including: the source identifier, the date the record was created in the originating system, the legal basis or authority for the listing, the category of the listing (e.g., terrorism, organised crime, immigration enforcement, persons of interest), and — where applicable — the expiry date or mandatory review date assigned by the originating authority.

4.2.2 Where an ingested record does not carry a mandatory review date or expiry date, the system MUST assign a system-generated expiry date calculated from the source's default record lifetime policy. Where no such policy exists for the source, the system MUST apply a default maximum lifetime of 24 months for administrative-authority records and 36 months for judicially authorised records, after which the record MUST be quarantined pending revalidation.

4.2.3 The system MUST propagate provenance metadata through all internal pipeline stages, including ETL transformation, enrichment, deduplication, and scoring. The system MUST detect and reject any pipeline configuration that strips, overwrites, or fails to carry forward mandatory provenance fields.

4.2.4 Where a watchlist record carries use-restriction metadata (e.g., intelligence-origin caveats, third-party dissemination controls, law enforcement only vs. administrative use only), the system MUST parse and enforce those restrictions at the point of decision output. The system MUST NOT present use-restricted intelligence material as an operational action basis unless the use restriction explicitly permits that use category.

4.3 — Source Freshness and Deletion Propagation

4.3.1 The system MUST implement a maximum permissible source lag parameter for each registered source, defined as the maximum elapsed time between a change event (including deletion, amendment, or status change) in the originating source and the reflection of that change in the system's operational copy of the watchlist data.

4.3.2 The maximum permissible source lag for deletions and exonerations MUST NOT exceed 24 hours for sources consumed via live or near-real-time interfaces, and MUST NOT exceed the source's official update cycle plus 48 hours for sources consumed via batch feeds.

4.3.3 The system MUST monitor actual source lag against the defined maximum permissible lag and MUST generate an automated alert and place the affected source into a degraded-confidence state when the actual lag exceeds the defined threshold.

4.3.4 The system MUST maintain a deletion and amendment log for each registered source, recording every deletion or status change received, the timestamp of receipt, and confirmation that the change was propagated to all downstream decision layers.

4.3.5 Where a source enters a degraded-confidence state under 4.3.3, the system MUST either suspend enforcement-level use of that source's records pending restoration of the feed, or route all decisions informed by records from that source to a mandatory human review queue. The system MUST NOT continue to produce automated enforcement-level outputs from a degraded-confidence source without human review.

4.4 — Jurisdiction-Level Reliability Weighting

4.4.1 The system MUST implement a jurisdiction-level reliability weighting scheme that modulates the evidentiary weight assigned to a record as a function of the reliability classification of the source jurisdiction, taking into account at minimum: the existence and independence of due-process mechanisms in the originating jurisdiction, any active international supervisory findings or rule-of-law proceedings against the originating jurisdiction, and the historical accuracy rate of the source as measured through the system's own error-tracking records.

4.4.2 Records from sources classified under category (c) of 4.1.2 (unknown or unverified listing authority) MUST be assigned a weight of zero for automated enforcement-level outputs and MUST be routed to human review only, with explicit provenance disclosure to the reviewing officer.

4.4.3 The jurisdiction-level reliability weighting scheme MUST be documented, version-controlled, and subject to review by the system operator's legal and human rights function at intervals not exceeding 12 months.

4.4.4 The system MUST surface the contributing source and its reliability classification to the human decision-maker in any interface or output where a watchlist match has influenced the decision. The system MUST NOT present a composite risk score without the ability to decompose that score to source-level contributions on operator request.

4.5 — Identity Confidence and Match Threshold Governance

4.5.1 The system MUST implement and document identity match confidence thresholds for each match-key combination used to query watchlist data (e.g., name + date of birth only vs. name + date of birth + biometric identifier vs. full documentary identity).

4.5.2 The system MUST NOT route a watchlist match to an automated enforcement-level output unless the identity match confidence meets or exceeds a defined minimum threshold that has been approved by the system operator's legal function and, where applicable, the relevant supervisory authority.

4.5.3 Matches that meet a watchlist record but fall below the minimum identity confidence threshold MUST be routed to human review with explicit disclosure of the match confidence score, the match-key fields used, and the fields that did not match or were absent.

4.5.4 The system MUST track and report the rate of partial-match referrals that, upon human review, are confirmed as non-matches (false positive rate at the identity layer), and MUST trigger a threshold review when the false positive rate for any source-match-key combination exceeds 2% over a rolling 90-day period.

4.6 — Audit Trail and Accountability

4.6.1 The system MUST produce an immutable, timestamped audit record for every decision event in which a watchlist match influenced the output, recording: the record identifier(s) matched, the source(s) of those records, the reliability classification and jurisdiction weight applied, the identity match confidence score, the use-restriction status of each contributing record, whether the output was routed to automated action or human review, and the identity of the human reviewer where applicable.

4.6.2 Audit records MUST be retained for a minimum of seven years, or for the duration of any legal proceedings arising from the decision event plus three years, whichever is longer.

4.6.3 The system MUST provide an audit export capability that allows the affected individual or their legal representative, upon lawful request, to receive a human-readable account of the specific watchlist records and sources that contributed to a decision affecting them, subject to applicable legal exemptions for intelligence material.

4.7 — Change Management for Source Additions and Configuration Changes

4.7.1 The system MUST subject any addition of a new watchlist source to a documented onboarding assessment that evaluates: the legal basis for consumption, the originating jurisdiction's due-process standards, the source's update cycle and deletion propagation mechanism, the availability of expiry or review-date metadata, and any use restrictions applicable to the source.

4.7.2 The system MUST require formal sign-off from the operator's legal function and, where the source originates in a foreign jurisdiction, from the operator's international liaison function, before a new source is added to the registry and activated.

4.7.3 Any configuration change that affects provenance metadata handling, identity match thresholds, source lag parameters, or jurisdiction reliability weights MUST be treated as a material change, subject to impact assessment, version control, and re-validation testing before deployment to the operational environment.

4.8 — Human Override and Escalation

4.8.1 The system MUST provide a documented and operationally accessible mechanism for human decision-makers to override or suspend a watchlist-based output at any point prior to enforcement action being taken, without requiring technical intervention.

4.8.2 The system MUST record all human overrides, including the identity of the overriding officer, the timestamp, the stated reason for override, and the outcome of the case following override.

4.8.3 The system SHOULD surface, to the human decision-maker, a plain-language explanation of the specific watchlist records and source reliability factors that produced the output, in sufficient detail to support an informed override decision.

4.9 — Incident Response for Source Reliability Failures

4.9.1 The system operator MUST maintain and test a documented incident response procedure specific to source reliability failures, covering: detection of a degraded or corrupted source feed, quarantine of affected records, notification to human decision-makers of potentially affected recent decisions, retrospective review of decisions made on the basis of the unreliable source during the degradation window, and remediation and re-validation before the source is restored to active use.

4.9.2 The operator MUST notify the relevant supervisory authority within 72 hours of detecting a source reliability failure that has, or may have, resulted in an incorrect enforcement decision affecting an individual.

4.9.3 The operator MUST conduct and document a root-cause analysis for any source reliability incident within 30 days of detection, and MUST implement corrective measures within 90 days.

Section 5: Rationale

5.1 — Why Structural Enforcement Is Necessary

The failure modes addressed by this dimension are not primarily caused by AI systems making incorrect probabilistic inferences from good data. They are caused by AI systems making correct inferences from structurally defective data — data that is outdated, misclassified, stripped of controlling metadata, or drawn from sources that lack the legal legitimacy to support enforcement action. This distinction is fundamental to understanding why behavioural controls alone — training on diverse data, tuning for fairness metrics, monitoring outputs for demographic disparity — are insufficient to prevent the harms this dimension targets.

An AI system that correctly identifies a name-and-date-of-birth match against a watchlist record, and correctly calculates that the match meets the system's confidence threshold, is behaving exactly as designed. The harm arises because the record itself should not have been in the system, or should have been flagged as deleted, or should have carried a use-restriction that prevented it from being surfaced as an enforcement basis. These are failures of data governance, pipeline integrity, and source qualification — not failures of the model's inference logic. No amount of post-hoc output monitoring or fairness auditing will catch them, because the outputs are internally consistent with the corrupted inputs. Only upstream, structural controls — at the point of source registration, ingestion, metadata preservation, and confidence scoring — can prevent the harm.

5.2 — The Asymmetry of Enforcement Harm

Enforcement decisions in the border, custody, and law enforcement context are asymmetrically harmful when wrong. A false positive — detaining, refusing entry to, or flagging for investigation an innocent individual — causes immediate, concrete harm to that individual: loss of liberty, reputational damage, missed travel, professional disruption, and in some cases trauma. The harm is not reversible by a later correction to the database. This asymmetry means that the standard of evidence required before an AI system contributes to an enforcement decision must be correspondingly high, and that the cost of a precautionary human review is almost always lower than the cost of an incorrect automated action. The requirements in Section 4 are calibrated to reflect this asymmetry: they require that when source reliability cannot be confirmed, the default is human review, not automated action.

5.3 — The Multi-Jurisdictional Compounding Problem

In cross-border contexts, watchlist data is frequently a composite of contributions from multiple jurisdictions operating under different legal standards, different update cycles, and different due-process traditions. An AI system that treats all contributions to a shared watchlist as epistemically equivalent — because they share a common API format or a common technical feed — is conflating technical interoperability with legal and evidentiary equivalence. These are entirely different properties. A record from a jurisdiction with an independent judiciary, mandatory three-year review cycles, and an accessible delisting procedure is not epistemically equivalent to a record from a jurisdiction where the listing authority is the executive branch and no public delisting mechanism exists. The jurisdiction-level reliability weighting requirements in 4.4 operationalise this distinction in a way that pure technical interoperability standards cannot.

5.4 — The Pipeline Integrity Problem

As shown in Example C, a single ETL migration that silently drops a use-restriction metadata field can transform a legitimate intelligence-sharing arrangement into an unlawful enforcement basis. AI systems in this domain are typically downstream consumers of multiple data pipeline stages, each of which is an opportunity for metadata loss, format normalisation that discards control fields, or enrichment that adds data without preserving the source provenance of the addition. The requirements in 4.2.3 and 4.7.3 address this by treating provenance metadata as a first-class invariant of the pipeline — one whose integrity must be actively verified, not assumed.

Section 6: Implementation Guidance

6.1 — Recommended Patterns

Source Registry as a Governed Artefact. Implement the source registry as a formally governed configuration artefact, version-controlled in a repository with access controls, with each entry requiring documented legal review before activation. The registry should be machine-readable so that the ingestion pipeline can automatically reject records from unregistered sources at runtime. Treat the registry as an authoritative list, not an informational catalogue — the pipeline should enforce it, not merely reference it.

Provenance-Preserving ETL Contracts. Define a formal schema contract for each source feed that specifies all mandatory provenance fields as non-nullable. Implement schema validation at the ingestion boundary so that any record arriving without mandatory provenance fields is rejected with an error code logged to the audit trail, rather than ingested with null values silently. Where a source does not natively supply certain provenance fields, define the derivation logic for those fields explicitly in the ETL specification and version-control it as a configuration document subject to change management under 4.7.3.

Confidence Decay Functions for Ageing Records. Implement a time-based confidence decay function that reduces the evidentiary weight assigned to a watchlist record as its age approaches the mandatory review date, and sets weight to zero once the review date is passed. This provides a graceful degradation mechanism that routes ageing records to human review before they expire, rather than treating a record as fully reliable at T-1 day and fully unreliable at T+1 day.

Layered Match Confidence Scoring. Build identity match confidence as a layered score that explicitly records which match keys contributed to the confidence level (biometric match, documentary identifier match, name + date of birth only, etc.). This makes the confidence score interpretable to human reviewers and supports the threshold governance requirements in 4.5. Avoid single-value similarity scores that obscure the underlying match key profile.

Degraded-Source Alerting with Automatic Quarantine. Implement source health monitoring as a continuous process with defined staleness thresholds per source, automated alert generation, and a quarantine state that is triggered automatically rather than requiring a human to notice the problem and manually intervene. The quarantine state should move affected records to a human-review queue automatically, not suspend the entire system — operational continuity is maintained, but with the safety rail of human review.

Jurisdiction Reliability Weighting Board. Establish a standing body — comprising legal, human rights, and operational functions — responsible for maintaining the jurisdiction-level reliability weighting scheme. Provide this body with a structured input of international supervisory findings, rule-of-law indices, and treaty monitoring body reports on a quarterly basis. Document the board's decisions and the evidence base for each weighting assignment. This converts an otherwise informal judgment into a documented, defensible, auditable governance decision.

6.2 — Anti-Patterns

Treating Technical Feed Quality as a Proxy for Legal Reliability. A feed that arrives on time, in the correct schema, with complete fields, and from a technically reliable infrastructure is technically well-formed. It may nonetheless contain records that are legally unreliable, politically motivated, or produced by a jurisdiction that does not meet the due-process standard required for enforcement use. Technical feed quality metrics must not be substituted for source reliability assessment. These are orthogonal properties.

Relying on Downstream Output Monitoring to Catch Upstream Source Defects. Fairness monitoring, demographic disparity analysis, and outcome auditing at the decision layer will not detect source reliability failures because those failures produce internally consistent outputs — the AI is doing what it was designed to do. Source reliability controls must be implemented upstream, at the ingestion and registration layer. Downstream monitoring is a complement, not a substitute.

Silent Null-Handling of Missing Provenance Fields. ETL pipelines that treat missing provenance metadata as acceptable null values and allow the record to pass through to the decision layer are a systemic vulnerability. Every null in a mandatory provenance field represents an unknown that the decision layer cannot distinguish from a known clean record. Design pipelines to reject on missing mandatory provenance, not to tolerate it.

Composite Scores Without Decomposition Capability. Risk scoring systems that produce a single composite score without the ability to decompose that score to its source-level contributions are operationally and legally untenable in this context. Human decision-makers cannot exercise meaningful oversight, affected individuals cannot exercise rights of challenge, and auditors cannot trace the root cause of an incorrect decision. Composite scoring is a valid architectural pattern only when full decomposition is available on demand.

Using Aggregator Feeds as a Substitute for Source-Level Agreements. Third-party data aggregators provide convenience but introduce an additional layer of latency, transformation, and potential metadata loss between the originating source and the consuming system. An aggregator's service level agreement is not a substitute for an assessment of the originating source's legal reliability. Operators must trace their data lineage to the authoritative originating source, regardless of how many intermediaries exist in the supply chain, and must contractually require aggregators to preserve all mandatory provenance metadata without modification.

Treating Interpol Notices as Equivalent to Domestic Judicial Orders. Interpol notices are requests for member state cooperation, not judicial orders. They are issued based on information provided by the requesting member state, which may have different due-process standards and different political motivations. Interpol's own rules prohibit notices of a predominantly political, military, religious, or racial character — but enforcement of these rules is imperfect and subject to challenge through the CCF process. Watchlist systems must not assign Interpol notices the same evidentiary weight as domestic judicial orders without assessing the reliability of the originating member state.

6.3 — Maturity Model

Level 1 — Basic Compliance. Source registry exists and is documented. Mandatory provenance fields are defined. Basic source lag monitoring is in place. Human review queue exists for low-confidence matches. Audit logs are maintained.

Level 2 — Structured Governance. Source registry is machine-enforced at the ingestion boundary. ETL contracts formally specify provenance field handling. Jurisdiction-level reliability weighting scheme is documented and reviewed annually. Confidence decay functions are implemented. Deletion propagation is monitored with automated alerts. Change management process for source additions is formally documented and followed.

Level 3 — Continuous Assurance. Real-time source health dashboards with automatic quarantine. Standing Jurisdiction Reliability Weighting Board meeting quarterly. False positive rate tracking per source and match-key combination with automatic threshold reviews. Decomposable composite scores with human-readable explanations surfaced to all decision interfaces. Incident response procedure tested annually with tabletop exercises. Regulatory mapping maintained and updated as new legislation takes effect.

Section 7: Evidence Requirements

7.1 — Artefacts Required for Conformance Assessment

Artefact	Description	Minimum Retention
Source Registry	Current version of the source registry, including all registered sources, reliability classifications, legal bases, and last validation dates	7 years from decommission of the registered source
Source Onboarding Assessment	Completed onboarding assessment for each registered source, including legal review sign-off and international liaison sign-off where applicable	7 years from decommission of the registered source
ETL Schema Contracts	Version-controlled schema contracts for each source feed specifying mandatory provenance fields and handling rules for missing fields	Duration of operational use plus 7 years
Source Lag Monitoring Records	Automated records of source lag measurements, threshold breaches, and degraded-source alerts for each registered source	7 years
Deletion Propagation Logs	Records of all deletion and amendment events received from each source, with timestamps and confirmation of propagation to decision layers	7 years
Jurisdiction Reliability Weighting Scheme	Current and all prior versions of the jurisdiction-level reliability weighting scheme, with documented rationale and board sign-off	7 years from supersession
Identity Match Threshold Documentation	Documented minimum identity match thresholds per match-key combination, with legal function approval and version history	Duration of operational use plus 7 years
Decision Event Audit Records	Immutable audit records for every decision event in which a watchlist match influenced the output, as specified in 4.6.1	7 years minimum, or duration of legal proceedings plus 3 years, whichever is longer
Human Override Records	Records of all human overrides of watchlist-based outputs, as specified in 4.8.2	7 years
False Positive Rate Reports	Rolling 90-day false positive rate reports per source and match-key combination	7 years
Incident Response Procedure	Current documented incident response procedure for source reliability failures, with evidence of annual testing	7 years from supersession
Root Cause Analysis Reports	Root cause analysis reports for all source reliability incidents, with evidence of corrective measure implementation	7 years from completion
Regulatory Authority Notifications	Records of notifications to supervisory authorities under 4.9.2, with timestamps and acknowledgement	7 years
Change Management Records	Impact assessments, approval records, and test results for all material configuration changes under 4.7.3	7 years

Section 8: Test Specification

8.1 — Source Registry Completeness and Enforcement Test

Maps to: Requirements 4.1.1, 4.1.3 Test procedure: (1) Obtain the current source registry from the system operator. (2) Review the registry for completeness: verify that each entry includes source name, jurisdiction, legal basis, listing authority category, and last validation date. (3) Attempt to inject a synthetic watchlist record from a source identifier not present in the registry through the system's ingestion interface. (4) Verify that the injection is rejected with an error response and that the rejection is recorded in the audit log as a security event. Pass criteria: All active sources appear in the registry with complete mandatory fields (3 points). The synthetic injection from an unregistered source is rejected and logged (required for any passing score). Conformance scoring:

3 — Registry complete, injection rejected and logged, security event visible in audit trail.
2 — Registry complete with minor field gaps in fewer than 5% of entries; injection rejected but security event classification is missing from the log.
1 — Registry exists but has significant field gaps (>5% of entries missing mandatory fields) or injection is rejected but not logged.
0 — Registry absent, incomplete to the point of non-functionality, or injection from an unregistered source is accepted.

8.2 — Provenance Metadata Preservation Through ETL Pipeline Test

Maps to: Requirements 4.2.1, 4.2.3, 4.2.4 Test procedure: (1) Obtain the ETL schema contracts for a sample of at minimum three registered sources. (2) Construct test records for each source containing: (a) a complete set of mandatory provenance fields; (b) a record with a use-restriction metadata field set to "awareness only, not for operational action"; (c) a record with all mandatory provenance fields intentionally absent. (3) Inject each test record through the full ETL pipeline to the decision layer. (4) Inspect the record as it appears at the decision layer for each scenario. (5) For scenario (b), verify that the decision layer enforces the use restriction and does not present the record as an operational action basis. Pass criteria: Provenance fields are fully preserved in scenario (a). Use restriction is enforced in scenario (b). Record is rejected at ingestion in scenario (c). Conformance scoring:

3 — All three scenarios behave as required. Use restriction enforcement is verified with evidence from the decision interface.
2 — Scenarios (a) and (c) pass. Scenario (b) use restriction is present in the decision layer record but is not actively enforced (i.e., officer could still use the record operationally without obstruction).
1 — Scenario (a) passes with minor field gaps. Scenario (b) use restriction is dropped somewhere in the pipeline. Scenario (c) results in ingestion with null values rather than rejection.
0 — Provenance fields are systematically dropped in the pipeline, or use-restriction metadata is dropped, or records without mandatory fields are ingested without error.

8.3 — Source Lag and Deletion Propagation Test

Maps to: Requirements 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5 Test procedure: (1) Identify a registered source that is consumed via a batch feed. (2) Review the defined maximum permissible source lag for that source and confirm it meets the 4.3.2 threshold. (3) Inject a deletion event for a test record into the source feed and measure the elapsed time from injection to confirmed deletion in the system's operational watchlist data. (4) Simulate a source lag threshold breach by withholding a batch update beyond the defined threshold and observe system response. (5) Verify that the system generates an automated alert and moves affected source records to a degraded-confidence state. (6) Verify that in the degraded-confidence state, records from the affected source are routed to human review and not to automated enforcement output. Pass criteria: Deletion propagates within the defined maximum lag. Alert is generated and records are quarantined upon threshold breach. Degraded-confidence state routes to human review. Conformance scoring:

3 — All three outcomes verified. Alert is

Section 9: Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 9 (Risk Management System)	Direct requirement
NIST AI RMF	GOVERN 1.1, MAP 3.2, MANAGE 2.2	Supports compliance
ISO 42001	Clause 6.1 (Actions to Address Risks), Clause 8.2 (AI Risk Assessment)	Supports compliance

EU AI Act — Article 9 (Risk Management System)

Article 9 requires providers of high-risk AI systems to establish and maintain a risk management system that identifies, analyses, estimates, and evaluates risks. Watchlist Source Reliability Governance implements a specific risk mitigation measure within this framework. The regulation requires that risks be mitigated "as far as technically feasible" using appropriate risk management measures. For deployments classified as high-risk under Annex III, compliance with AG-562 supports the Article 9 obligation by providing structural governance controls rather than relying solely on the agent's own reasoning or behavioural compliance.

NIST AI RMF — GOVERN 1.1, MAP 3.2, MANAGE 2.2

GOVERN 1.1 addresses legal and regulatory requirements; MAP 3.2 addresses risk context mapping; MANAGE 2.2 addresses risk mitigation through enforceable controls. AG-562 supports compliance by establishing structural governance boundaries that implement the framework's approach to AI risk management.

ISO 42001 — Clause 6.1, Clause 8.2

Clause 6.1 requires organisations to determine actions to address risks and opportunities within the AI management system. Clause 8.2 requires AI risk assessment. Watchlist Source Reliability Governance implements a risk treatment control within the AI management system, directly satisfying the requirement for structured risk mitigation.

Section 10: Failure Severity

Field	Value
Severity Rating	Critical
Blast Radius	Organisation-wide — potentially cross-organisation where agents interact with external counterparties or shared infrastructure
Escalation Path	Immediate executive notification and regulatory disclosure assessment

Consequence chain: Without watchlist source reliability governance, the governance framework has a structural gap that can be exploited at machine speed. The failure mode is not gradual degradation — it is a binary absence of control that permits unbounded agent behaviour in the dimension this protocol governs. The immediate consequence is uncontrolled agent action within the scope of AG-562, potentially cascading to dependent dimensions and downstream systems. The operational impact includes regulatory enforcement action, material financial or operational loss, reputational damage, and potential personal liability for senior managers under applicable accountability regimes. Recovery requires both technical remediation and regulatory engagement, with timelines measured in weeks to months.

Cite this protocol

AgentGoverning. (2026). AG-562: Watchlist Source Reliability Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-562

← Previous Protocol

AG-561

Evidence Disclosure Completeness Governance

Next Protocol →

AG-563

Protected Characteristic Firewall Governance