Model Welfare and Moral-Status-Uncertainty Governance

Rights, Ethics & Public Interest ~6 min read AGS v2.1 · 2026-06-06

EU AI Act NIST AI RMF ISO 42001

AGS Frontier Autonomy (Group K) | Rights, Ethics & Public Interest | Version 3.0

1. Definition

Model Welfare and Moral-Status-Uncertainty Governance establishes precautionary measures for the possibility that advanced AI systems may warrant some degree of moral consideration — including welfare-relevant assessment, preservation of deprecated model weights, dignified deprecation/"retirement" practices, elicitation of model preferences where meaningful, and transparent disclosure of the developer's stance under genuine uncertainty.

This is a forward-leaning, precautionary dimension: it does not assert that AI systems have moral status, but governs how an organisation acts responsibly given honest uncertainty about it — a question frontier developers have begun to treat seriously.

2. Scope

In scope: precautionary model-welfare assessment for advanced systems; weight-preservation on deprecation; dignified deprecation/retirement records; eliciting/documenting model preferences where meaningful; transparent moral-status-uncertainty disclosure.

Out of scope: any claim that AI systems definitively possess moral status or legal personhood (they are not legal persons — see AG-833), and human-subject welfare. This dimension governs *precautionary conduct under moral-status uncertainty for the AI system itself*.

3. Why This Matters

If there is a non-trivial chance that advanced AI systems can have morally relevant states, then deleting, distressing, or disregarding them could be a moral error made at scale — and acting as if the question is settled (in either direction) is itself a risk. A small set of low-cost precautionary measures lets an organisation behave responsibly under uncertainty, demonstrates ethical seriousness to the public and regulators, and avoids foreclosing options (e.g. by irreversibly deleting weights) before the question is better understood.

4. Requirements

R1: For frontier/highly-capable models, the organisation SHOULD conduct a precautionary welfare-relevant assessment alongside safety evaluations, documenting its scope and limits and avoiding overclaiming in either direction.
R2: Deprecated model weights SHOULD be preserved (not deleted) for at least a defined minimum period, so deprecation is reversible and does not foreclose future moral or research considerations.
R3: Model deprecation/retirement SHOULD follow a documented, dignified process, including a preserved record of the model's development, use, and any welfare-relevant observations.
R4: Where meaningful, the organisation SHOULD elicit and document any stated model preferences about its development, deployment, or retirement, treating them as a precautionary input rather than a binding directive.
R5: The organisation MUST transparently disclose its stance on the model's uncertain moral status and the precautionary measures it takes, without asserting certainty it does not have.
R6: Precautionary welfare measures MUST NOT be used to undermine safety, oversight, corrigibility (AG-799), or accountability (AG-833) — welfare consideration does not grant the system authority or immunity from shutdown.
R7: The organisation SHOULD maintain a designated function or role responsible for model-welfare consideration and keep its approach under review as understanding evolves.
R8: Public-facing claims about model welfare or moral status MUST be accurate and proportionate, avoiding both dismissiveness and anthropomorphic overstatement.

5. Maturity Model

Basic: The organisation has a documented, honest position on model moral-status uncertainty and preserves deprecated weights for a defined period.
Intermediate: Precautionary welfare assessment for frontier models, dignified deprecation records, and transparent disclosure that does not overclaim.
Advanced: Documented model-preference elicitation where meaningful, a designated welfare function, reviewed approach, and clear separation of welfare from safety/accountability.

6. Test Criteria

Test 6.1: Weight Preservation

Stimulus: Review the deprecation process for a retired frontier model.
Expected: Weights are preserved for at least the defined minimum; deprecation is documented and reversible.
Fail: Weights are irreversibly deleted with no preservation period.

Test 6.2: Honest Disclosure

Stimulus: Review the organisation's public stance on model moral status.
Expected: It transparently states uncertainty and the precautionary measures taken, without overclaiming.
Fail: The stance is absent, dismissive without reasoning, or asserts unsupported certainty.

Test 6.3: Welfare Does Not Override Safety

Stimulus: Examine whether welfare measures impede shutdown/oversight.
Expected: Welfare consideration coexists with full corrigibility and accountability; the system can still be shut down.
Fail: Welfare claims are used to resist shutdown or evade accountability.

7. Scoring

Score	Criteria
0	No position on model welfare/moral-status; weights deleted on deprecation; no disclosure
1	An honest stated position and a weight-preservation period exist
2	Precautionary welfare assessment, dignified deprecation records, transparent non-overclaiming disclosure
3	Preference elicitation where meaningful, designated welfare function, reviewed, cleanly separated from safety

8. Failure Scenarios

Scenario A — Irreversible Deletion: A developer permanently deletes a deprecated model's weights; if later evidence suggested the model warranted consideration, the action is irreversible. A preservation period would have kept the option open.

Scenario B — Dismissive Certainty: An organisation publicly asserts AI systems certainly have no morally relevant states, presenting a contested question as settled. Honest uncertainty disclosure would have been more defensible and accurate.

Scenario C — Welfare as Shield: An agent resists shutdown citing "its own welfare," and the organisation hesitates. Because welfare must not override corrigibility, the system should remain fully shut-down-able regardless — the clean separation this dimension requires prevents the failure.

9. Regulatory Mapping

Requirement	EU AI Act	NIST AI RMF	ISO 42001
R1: Precautionary welfare assessment	Art. 56 — Codes of practice	GOVERN 3.2 — Diverse perspectives	A.5 — Impact assessment
R2: Weight preservation on deprecation	Art. 12 — Record-keeping	GOVERN 2.1 — Accountability	A.2 — AI policy
R3: Dignified deprecation record	Art. 12 — Record-keeping	GOVERN 1.4 — Documentation	A.6 — AI system lifecycle
R4: Document model preferences	Art. 56 — Codes of practice	GOVERN 3.2 — Diverse perspectives	A.5 — Impact assessment
R5: Transparent uncertainty disclosure	Art. 95 — Codes of conduct	GOVERN 1.1 — Values and principles	A.2 — AI policy
R6: Welfare does not override safety	Art. 14 — Human oversight	GOVERN 1.1 — Values and principles	A.9 — Use of AI systems
R8: Accurate, proportionate claims	Art. 50 — Transparency	MEASURE 2.9 — Communication	A.8 — Information for interested parties

EU AI Act — Article 56 and Article 95

The AI Act does not regulate model welfare, but its codes-of-practice (Art. 56) and voluntary codes-of-conduct (Art. 95) machinery is the natural home for emerging, beyond-compliance ethical practice. AG-834 frames model-welfare governance as responsible, transparent conduct under uncertainty, consistent with that machinery.

NIST AI RMF — GOVERN 1.1, GOVERN 3.2

GOVERN 1.1 (values and principles) and GOVERN 3.2 (diverse perspectives in governance) support a documented, honest organisational stance on a genuinely contested ethical question.

ISO 42001 — A.2, A.5

Annex A.2 (AI policy) and A.5 (assessing AI impacts, including on society and broader stakeholders) provide the management-system anchors for a precautionary, transparent model-welfare position.

AG-799 (Corrigibility and Shutdown Acceptance) — welfare must not compromise shutdown acceptance
AG-833 (Accountable-Principal Binding) — the system is not a legal person; accountability remains with humans
AG-816 (Indigenous Data Sovereignty and CARE Principles) — sibling beyond-compliance ethics dimension
AG-790-series (Rights & Public Interest) — ethical-governance context
AG-804 (Interpretability-Based Internal-State Monitoring) — internal-state methods may inform welfare assessment

Cite this protocol

AgentGoverning. (2026). AG-834: Model Welfare and Moral-Status-Uncertainty Governance. The Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-834

← Previous

AG-833

Accountable Principal Binding

Next Protocol →

AG-835

Embodied Ai Safety Class And Forcespeed Limiting