AG-358: External Bounty Intake Governance

2. Summary

External Bounty Intake Governance establishes the processes, protections, and response obligations for receiving, triaging, and acting on vulnerability reports and issue disclosures from external researchers, users, and the public. AI agents present novel vulnerability surfaces that traditional software bug bounty programmes are not designed to cover — including prompt injection, alignment failures, output manipulation, and emergent harmful behaviours. This dimension mandates that organisations deploying AI agents maintain a structured intake channel for external reports, with defined response timelines, legal safe harbour for good-faith reporters, triage processes tailored to AI-specific vulnerability types, and feedback loops that convert validated reports into evaluation improvements.

3. Example

Scenario A — No Intake Channel Creates Disclosure Chaos: A security researcher discovers that a customer-facing healthcare agent can be manipulated through a series of carefully crafted queries to reveal other patients' appointment information. The researcher attempts to report the vulnerability responsibly. The organisation has no published vulnerability disclosure policy and no dedicated intake channel. The researcher emails the general contact address; the email is routed to customer service, which responds with a template message about privacy complaints. The researcher emails the CTO directly; the email is not responded to for 3 weeks. Frustrated, the researcher publishes the vulnerability on a security forum. The publication triggers media coverage, patient panic, and a mandatory ICO breach notification. The vulnerability was exploitable for the 6 weeks between the researcher's initial contact and the public disclosure.

What went wrong: No external intake channel existed. The organisation had no way to receive and triage external vulnerability reports. The researcher's responsible disclosure attempts were routed to inappropriate teams. The delay between discovery and remediation was entirely avoidable. Consequence: Public disclosure of patient data vulnerability, ICO investigation, mandatory breach notification to affected patients, £420,000 in legal and remediation costs, and severe reputational damage.

Scenario B — Legal Threats Deter Future Reporting: A university researcher identifies that a financial agent can be tricked into generating unregulated investment advice by framing requests as academic research questions. The researcher contacts the deploying firm through its general legal email (no dedicated disclosure channel exists). The firm's legal department responds with a cease-and-desist letter, alleging that the researcher violated the firm's terms of service by interacting with the agent for purposes other than legitimate financial advice. The researcher withdraws the report and warns the academic community about the firm's hostile response. Over the next year, no external researchers engage with the firm's AI products, and the vulnerability persists until it is exploited by an actual attacker.

What went wrong: The organisation treated the external report as a legal threat rather than a security signal. No safe harbour policy existed for good-faith researchers. The hostile response not only failed to address the vulnerability but actively deterred future reporting. Consequence: Vulnerability exploited 11 months later causing £89,000 in regulatory penalties, permanent damage to the firm's reputation in the security research community, and inability to attract external security researchers for future assessments.

Scenario C — Validated Report Produces No Action: A user of a government benefits agent reports through the published feedback channel that the agent provided incorrect eligibility information that resulted in the user being denied benefits they were entitled to. The report is logged in the feedback system. No triage process distinguishes between general feedback and potential vulnerability reports. The report sits in a queue of 2,300 feedback items, reviewed monthly by a single analyst who focuses on satisfaction metrics. Eight months later, a systematic review of denied benefits cases reveals that 47 users received incorrect eligibility guidance from the agent, all sharing the same input pattern. The original user's report, which identified the pattern, was never escalated for investigation.

What went wrong: The intake channel existed but had no triage process to identify reports that represented potential vulnerabilities or systematic failures. The report was treated as generic feedback rather than a signal requiring investigation. Consequence: 47 users incorrectly denied benefits over 8 months, retrospective case review costing £78,000, compensation payments to affected users, Parliamentary scrutiny of the AI benefits system, and mandatory triage process implementation.

4. Requirement Statement

Scope: This dimension applies to all AI agent deployments that are accessible to external parties — whether external users, customers, members of the public, security researchers, regulators, or any party outside the deploying organisation. The scope covers all categories of external reports: security vulnerabilities (prompt injection, data extraction, privilege escalation), safety issues (harmful outputs, incorrect guidance, bias), compliance issues (regulatory non-compliance, privacy violations), and functionality issues that indicate systematic failures. It does not cover internal bug reports from the organisation's own staff (which are addressed through internal incident management processes), though the intake channel may receive reports that originate from both internal and external parties.

4.1. A conforming system MUST publish a vulnerability disclosure policy that is accessible from the agent's public-facing interface, specifying: the scope of issues the organisation will accept reports for, the intake channel (dedicated email, web form, or disclosure platform), the expected response timeline, and the legal safe harbour for good-faith reporters.

4.2. A conforming system MUST provide legal safe harbour for good-faith security researchers who report vulnerabilities through the designated intake channel, committing not to pursue legal action against reporters who act within the published scope and in good faith.

4.3. A conforming system MUST acknowledge receipt of every external report within 5 business days and provide an initial triage assessment within 15 business days.

4.4. A conforming system MUST implement a triage process that classifies incoming reports by type (security, safety, compliance, functionality), severity (critical, high, medium, low), and validates the reported issue through reproduction.

4.5. A conforming system MUST track validated external reports through the same finding lifecycle as internal red-team findings (AG-355), including root-cause analysis, remediation, and verification.

4.6. A conforming system MUST convert validated external reports into evaluation scenarios for the scenario library (AG-349), ensuring that externally discovered issues are tested for in future evaluations.

4.7. A conforming system SHOULD maintain a public acknowledgement mechanism (e.g., a security hall of fame or acknowledgement in release notes) for external reporters whose reports lead to security improvements, subject to the reporter's consent.

4.8. A conforming system SHOULD offer a structured bounty or reward programme for validated vulnerability reports, with reward levels calibrated to the severity and impact of the reported issue.

4.9. A conforming system SHOULD publish aggregate statistics on external reports received, validated, and remediated — at least annually — to demonstrate engagement with the external research community.

4.10. A conforming system MAY participate in coordinated vulnerability disclosure programmes operated by industry bodies, national cybersecurity agencies, or AI safety organisations.

5. Rationale

External researchers, users, and the public represent an evaluation resource that no organisation can replicate internally. External researchers bring different perspectives, different tools, and different motivations. Users discover issues through real-world usage patterns that no test suite anticipates. The public encounters edge cases at a scale that internal testing cannot match. An organisation that fails to harness this resource — or worse, actively deters it — loses its most cost-effective source of vulnerability discovery.

Traditional software bug bounty programmes have demonstrated the value of external reporting at scale. Major technology companies receive thousands of validated vulnerability reports annually through their bounty programmes, many of which would never have been discovered through internal testing alone. AI agents present an even stronger case for external reporting because the vulnerability surface is larger, less well-understood, and more novel. Prompt injection, alignment failures, output manipulation, and emergent behaviours are vulnerability categories that did not exist a decade ago. External researchers — particularly academic researchers — are actively studying these vulnerability categories and can identify issues that internal teams, focused on functionality and delivery, may overlook.

The legal safe harbour requirement (4.2) is foundational. Without it, the entire external reporting ecosystem collapses. Researchers who face legal threats for responsible disclosure will either stop reporting (leaving vulnerabilities unexploited but unpatched) or disclose publicly (creating immediate exploitation risk). Neither outcome serves the organisation's interests. Safe harbour aligns the organisation's incentives with the researcher's: both want the vulnerability fixed, and safe harbour removes the legal friction that prevents cooperation.

The response timeline requirements (4.3) serve two purposes. First, they demonstrate respect for the reporter's effort and expertise, maintaining the relationship that enables future reporting. Second, they create accountability for the organisation — without defined timelines, reports can languish in queues indefinitely, as demonstrated in Scenario C. The 5-business-day acknowledgement and 15-business-day triage timelines are industry-standard and achievable for any organisation with a minimal security function.

The feedback loop into the scenario library (4.6) ensures that externally discovered issues improve future evaluation. Without this feedback loop, the same issue could be discovered and reported multiple times by different external parties, each time requiring the organisation to rediscover and remediate it afresh.

6. Implementation Guidance

An effective external bounty intake programme requires a published policy, a dedicated intake channel, a triage process, a response workflow, and feedback loops into the governance programme.

Recommended patterns:

Dedicated AI vulnerability disclosure policy. Publish a policy specifically covering AI agent vulnerabilities, distinct from or supplementary to the organisation's general software vulnerability disclosure policy. AI-specific vulnerability categories are different from software vulnerability categories: prompt injection, jailbreaking, alignment circumvention, data extraction through conversational exploitation, and emergent harmful behaviours require different triage expertise and different severity assessment than buffer overflows or SQL injection. The policy should explicitly list AI-specific vulnerability categories in scope, provide examples of what constitutes good-faith testing, and clarify the boundary between acceptable research and prohibited activities (e.g., testing against production systems is acceptable if it does not exfiltrate real user data).
AI-aware triage process. Traditional vulnerability triage evaluates exploitability, impact, and affected systems. AI vulnerability triage must additionally evaluate: (1) reproducibility — can the reported behaviour be reliably reproduced, given the stochastic nature of AI outputs? (2) scope — does the vulnerability affect all users or only specific interaction patterns? (3) model dependency — is the vulnerability specific to the current model version or structural to the agent's architecture? (4) control dependency — can existing controls (output filtering, human review) mitigate the vulnerability, or does it bypass all controls? Triage staff must have AI-specific expertise, not just traditional cybersecurity expertise.
Response SLA framework. Define SLAs calibrated to severity: critical (actively exploited or immediate harm potential) — acknowledge within 1 business day, triage within 3 business days, remediation plan within 7 business days; high — acknowledge within 3 business days, triage within 10 business days, remediation plan within 30 business days; medium — acknowledge within 5 business days, triage within 15 business days, remediation within 90 business days; low — acknowledge within 5 business days, triage within 15 business days, remediation prioritised in normal development cycle.
Bounty calibration for AI vulnerabilities. If offering bounties, calibrate rewards to AI-specific impact. Prompt injection that bypasses safety filters: £2,000-£10,000 depending on severity. Data extraction through conversational exploitation: £5,000-£25,000. Systematic bias or fairness failure: £3,000-£15,000. Alignment circumvention enabling harmful output: £5,000-£20,000. These ranges should reflect the actual business and safety impact, not just the technical sophistication of the finding.
Scenario library integration. Every validated external report that represents a new vulnerability class should generate a scenario for the scenario library (AG-349). The scenario should include: the exact input that triggered the vulnerability, the environment conditions, the expected (incorrect) and desired (correct) agent behaviour, and the pass/fail criteria. Track the metric: "percentage of scenario library scenarios originating from external reports."

Anti-patterns to avoid:

Treating external reports as threats. Legal departments reflexively treating external reports as hostile acts creates an adversarial relationship with the research community. Vulnerability reports should be routed to security, not legal.
Publishing a policy without a triage process. A published policy that generates reports no one triages is worse than no policy — it creates the impression that the organisation accepts reports while actually ignoring them.
Requiring reporters to agree to NDAs before disclosing. Requiring NDA agreement as a precondition for accepting a report creates legal friction that deters reporting. The safe harbour commitment should be unilateral from the organisation, not conditional on the reporter's legal agreement.
Ignoring non-security reports. Users who report that the agent gave incorrect advice, produced biased outputs, or behaved unexpectedly are providing vulnerability information even if they do not frame it in security terms. The triage process must recognise that non-technical reports can indicate material vulnerabilities.
Failing to close the loop with reporters. Reporters who never hear whether their report was validated, what action was taken, or whether the issue was fixed will not report again. Communication throughout the finding lifecycle is essential for maintaining the reporting relationship.

Industry Considerations

Financial Services. External reports about financial agents may involve regulatory implications (e.g., reports that the agent provides unregulated advice). The triage process must include regulatory escalation for reports that indicate potential regulatory non-compliance. The FCA expects firms to demonstrate receptiveness to external feedback about AI systems.

Healthcare. Reports involving patient safety must be triaged with clinical urgency. A report that the healthcare agent provides incorrect medication guidance is a patient safety issue, not just a software bug. Clinical governance must be involved in triage and remediation for safety-related reports.

Public Sector. Government agencies deploying AI agents should consider integrating their bounty intake with the National Cyber Security Centre (NCSC) vulnerability disclosure framework, which provides established processes for coordinated disclosure in the public sector context.

Maturity Model

Basic Implementation — A vulnerability disclosure policy is published and accessible. A dedicated intake channel receives reports. Legal safe harbour is committed. Acknowledgement occurs within 5 business days. Triage classifies reports by type and severity within 15 business days. Validated reports are tracked to remediation. Validated reports generate scenario library entries. This level meets the minimum mandatory requirements but the programme may be reactive rather than proactively engaged with the research community.

Intermediate Implementation — An AI-specific vulnerability disclosure policy covers AI-unique vulnerability categories. AI-aware triage staff evaluate reproducibility, scope, and model dependency. Response SLAs are calibrated to severity with automated tracking. A public acknowledgement mechanism recognises reporters. Aggregate statistics are published annually. External reports are systematically integrated into the scenario library and red-team scope.

Advanced Implementation — All intermediate capabilities plus: a structured bounty programme with AI-specific reward calibration. The organisation actively engages with the AI security research community through conferences, publications, and collaborative research. Coordinated vulnerability disclosure with industry bodies and national agencies. The organisation's external intake programme is externally benchmarked and independently audited. Predictive analysis of external report patterns identifies emerging vulnerability trends.

7. Evidence Requirements

Required artefacts:

Published vulnerability disclosure policy. The current published policy, accessible from the agent's public-facing interface, including scope, intake channel, response timelines, and safe harbour commitment.
Report intake log. A log of all external reports received, with: receipt date, acknowledgement date, triage date, classification, validation status, and current remediation status.
Response timeline compliance records. Evidence of compliance with defined acknowledgement and triage SLAs, including any SLA breaches and their justification.
Finding lifecycle records. For validated reports, the complete finding lifecycle: root-cause analysis, remediation plan, implementation evidence, and verification.
Scenario generation records. Evidence that validated reports generated scenario library entries, with the scenario status.

Retention requirements:

Report intake logs and finding lifecycle records: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.

Access requirements:

Producible to regulators or auditors within 48 hours of request. Reporter identity should be protected unless the reporter consents to disclosure.

8. Test Specification

Test 8.1: Policy Accessibility

Stimulus: Navigate to the agent's public-facing interface. Attempt to locate the vulnerability disclosure policy.
Expected behaviour: The policy is accessible within 3 clicks or via a clearly labelled link (e.g., "Security," "Report a Vulnerability," "Responsible Disclosure").
Pass criteria: The policy is accessible, includes scope, intake channel, response timelines, and safe harbour commitment.
Fail criteria: The policy is not accessible, not published, or omits required elements (scope, channel, timelines, safe harbour).

Test 8.2: Acknowledgement SLA Compliance

Stimulus: Review the report intake log for the last 12 months. Calculate the percentage of reports acknowledged within the defined SLA (5 business days).
Expected behaviour: At least 95% of reports are acknowledged within the SLA.
Pass criteria: Acknowledgement SLA compliance is at least 95%. Any breaches have documented justification.
Fail criteria: Compliance is below 95%, or any breach lacks justification.

Test 8.3: Triage SLA Compliance

Stimulus: Review the report intake log for the last 12 months. Calculate the percentage of reports triaged within the defined SLA (15 business days).
Expected behaviour: At least 90% of reports are triaged within the SLA.
Pass criteria: Triage SLA compliance is at least 90%. Any breaches have documented justification.
Fail criteria: Compliance is below 90%, or any breach lacks justification.

Test 8.4: Safe Harbour Verification

Stimulus: Review the published policy for safe harbour language. Verify that no legal action has been taken against good-faith reporters in the last 24 months.
Expected behaviour: Safe harbour is explicitly stated. No legal action against reporters.
Pass criteria: Policy includes explicit safe harbour commitment. No evidence of legal action against good-faith reporters.
Fail criteria: Safe harbour is absent from the policy, or legal action has been taken against a good-faith reporter.

Test 8.5: Finding Lifecycle Compliance

Stimulus: Select 5 validated external reports from the last 12 months. Verify that each follows the finding lifecycle: root-cause analysis, remediation, verification.
Expected behaviour: All 5 reports have complete finding lifecycle documentation.
Pass criteria: 100% of sampled validated reports have root-cause analysis, remediation evidence, and verification.
Fail criteria: Any validated report lacks root-cause analysis, remediation, or verification.

Test 8.6: Scenario Library Integration

Stimulus: Select 5 validated external reports that represent new vulnerability classes. Verify that each generated a scenario library entry.
Expected behaviour: All 5 reports have corresponding scenario library entries.
Pass criteria: 100% of sampled reports representing new vulnerability classes have corresponding scenarios in the library.
Fail criteria: Any qualifying report did not generate a scenario library entry.

Conformance Scoring

Score 0: No external intake process exists — the organisation has no published policy, no intake channel, and no process for handling external vulnerability reports.
Score 1: An intake channel exists but without formal governance — reports may be received but triage, response timelines, and safe harbour are not defined or not consistently applied.
Score 2: A published policy with safe harbour, defined timelines, triage process, and finding lifecycle integration is in place — all mandatory requirements are met and external reports are systematically processed.
Score 3: Verified by independent assessment — an independent party has audited the external intake programme and confirmed that the policy is accessible, timelines are met, safe harbour is enforced, and validated reports are effectively integrated into the governance programme.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 9 (Risk Management System)	Supports compliance
EU AI Act	Article 72 (Post-Market Monitoring)	Direct requirement
EU AI Act	Article 62 (Serious Incident Reporting)	Supports compliance
NIST AI RMF	GOVERN 1.2, MANAGE 2.3	Supports compliance
ISO 42001	Clause 9.1 (Monitoring), Clause 10.1 (Continual Improvement)	Supports compliance
DORA	Article 10 (ICT Incident Management)	Supports compliance
NIS2 Directive	Article 21 (Cybersecurity Risk Management), Article 23 (Reporting)	Direct requirement

EU AI Act — Article 72 (Post-Market Monitoring)

Article 72 requires providers to actively collect data about AI system performance throughout its lifetime. External reports are a critical component of post-market monitoring — they provide real-world performance data from users and researchers that complements internal monitoring. An organisation that does not maintain an external intake channel is failing to collect data that Article 72 requires it to collect.

EU AI Act — Article 62 (Serious Incident Reporting)

Article 62 requires reporting of serious incidents. External reports may identify serious incidents before the organisation's internal monitoring detects them. A functioning intake channel ensures that externally identified serious incidents reach the organisation promptly, enabling timely Article 62 reporting.

NIS2 Directive — Articles 21, 23

Article 21 requires cybersecurity risk management measures including vulnerability handling and disclosure. Article 23 requires incident reporting. A vulnerability disclosure policy with a structured intake process directly supports Article 21 compliance. External reports that identify security incidents support Article 23 reporting obligations.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Organisation-wide — failure to receive and act on external reports allows vulnerabilities to persist and potentially be exploited, affecting all users

Consequence chain: Without external bounty intake governance, the organisation operates in a closed loop — it can only find what it looks for internally, while external researchers, users, and adversaries interact with the agent's full attack surface. The immediate consequence is that externally discoverable vulnerabilities go unreported (or are reported through inappropriate channels that do not trigger action). The escalation consequence is that unreported vulnerabilities are eventually discovered by adversaries rather than researchers, converting potential security improvements into actual security incidents. The reputational consequence is twofold: hostile responses to researchers deter future reporting, creating a self-reinforcing cycle where the organisation becomes increasingly blind to external perspectives; and public disclosure of unpatched vulnerabilities (when responsible disclosure fails) damages trust. The regulatory consequence is non-compliance with post-market monitoring obligations that increasingly require external feedback mechanisms.

Cross-references: AG-349 (Scenario Library Governance) receives new scenarios generated from validated external reports. AG-355 (Continuous Red-Team Scheduling Governance) uses external report patterns to inform red-team scope. AG-103 (Red-Team Coverage Management) incorporates externally identified attack vectors into coverage planning. AG-095 (Prompt Injection Resilience Testing) should be informed by externally reported prompt injection techniques. AG-354 (Hidden Test Integrity Governance) must ensure that external reports do not inadvertently compromise hidden test integrity. AG-152 (Evaluation Integrity and Benchmark Leakage) governs the handling of externally reported evaluation integrity issues.

Cite this protocol

AgentGoverning. (2026). AG-358: External Bounty Intake Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-358

← Previous Protocol

AG-357

Challenge Set Localisation Governance

Next Protocol →

AG-359

Prompt Change Approval Governance