Publications - Gillian K. Hadfield

Normativity and AI Alignment

Building AI that understands and operates within human normative systems

2026

2026 arXiv

Legal Alignment for Safe and Ethical AI

arXiv:2601.04175

With Noam Kolt, Nicholas Caputo, Jack Boeglin, Cullen O'Keefe, and others

Develops the concept of "legal alignment"—training AI systems to understand and operate within legal frameworks as a path to broader normative alignment.

arXiv

2026 Contemporary Debates in the Ethics of Artificial Intelligence

Can AI Be Governed? Only if We Build Normatively Competent AI

Contemporary Debates in the Ethics of Artificial Intelligence

With eds. Sven Nyholm, Atoosa Kasirzadeh, John Zerilli Wiley-Blackwell

Argues that effective AI governance requires AI systems capable of understanding and reasoning about human normative systems—not just following explicit rules—to truly participate in the complex equilibrium of human values and norms.

Read on Wiley

2025

2025 arXiv

Societal Alignment Frameworks Can Improve LLM Alignment

arXiv:2503.00069

With Karolina Stańczak, Nicholas Meade, Timothy P. Lillicrap, and others

Demonstrates that grounding LLM alignment in frameworks developed for societal-level coordination can improve alignment outcomes.

arXiv

2025 Phil Trans B

Metanorms Generate Stable Yet Adaptable Normative Social Order in a Politically Decentralized Society

Philosophical Transactions of the Royal Society B

With Sarah Mathew, Danson Mwangi, Samir Reynolds

Based on vignette experiments with 369 Turkana participants in Kenya, demonstrates how metanorms—rules that govern the process by which norms are interpreted, changed and enforced—enable societies to balance normative stability and adaptability through their dispute resolution institutions.

Read on Phil Trans B

2024

2024 Arxiv

Normative Modules: A Generative Agent Architecture for Learning Norms That Supports Multi-Agent Cooperation

Transactions on Machine Learning Research

With Atrisha Sarkar, Andrei Muresanu, and others

Proposes an architecture enabling AI agents to learn, represent, and reason about social norms in ways that support cooperation in multi-agent environments.

arXiv

2023

2023 Science Advances

Judging Facts, Judging Norms: Training Machine Learning Models to Judge Humans Requires a Modified Approach to Labeling Data

Science Advances

With Aparna Balagopalan, David Madras, Dylan Hadfield-Menell, and others

Shows that standard ML data labeling practices are inadequate when models are used to make normative judgments about humans, and proposes alternative approaches.

Read on Science

2022

2022 PNAS

Spurious Normativity Enhances Learning of Compliance and Enforcement Behavior in Artificial Agents

PNAS, Vol. 119(3)

With Raphael Köster, Dylan Hadfield-Menell, Joel Z. Leibo, and others

Demonstrates experimentally that arbitrary ("silly") rules help artificial agents learn to comply with and enforce norms more effectively—a key insight for building normatively competent AI.

Read on PNAS arXiv

2019

2019 AIES

Incomplete Contracting and AI Alignment

AAAI/ACM Conference on AI, Ethics, and Society

With Dylan Hadfield-Menell

Reframes the AI alignment problem through the lens of incomplete contract theory, showing how legal and economic insights about managing incomplete specifications apply to aligning AI with human values.

ACM Digital Library arXiv

2019 AIES

Legible Normativity for AI Alignment: The Value of Silly Rules

AAAI/ACM Conference on AI, Ethics, and Society

With Dylan Hadfield-Menell, McKane Andrus

Argues that training AI in environments with clearly legible (even arbitrary) normative structure helps develop the general capacity to recognize and follow norms.

arXiv

Cooperative AI

How to make AI agents that interact, cooperate, and coordinate

2025

2025 Nature Human Behaviour

The Impact of Advanced AI Systems on Democracy

Nature Human Behaviour

With Christopher Summerfield, Lisa Argyle, and others

Assesses the potential impacts—both positive and negative—of advanced AI systems on democratic institutions, processes, and participation.

Read on Nature

2025 arXiv

Multi-Agent Risks from Advanced AI

arXiv:2502.14143

With Lewis Hammond, Alan Chan, and others

Analyzes risks that emerge specifically from interactions among multiple advanced AI systems, including coordination failures, conflicts, and emergent behaviors.

arXiv

2025 arXiv

Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate

arXiv:2509.05396

With Andrea Wynn, Harsh Satija

Investigates how capability diversity influences multi-agent interactions, demonstrating that debate can decrease accuracy when agents favor agreement over challenging flawed reasoning.

arXiv

2025 NBER

An Economy of AI Agents

Economics of Transformative AI, NBER

With Andrew Koh

Examines how economic principles apply to a world where AI agents transact, cooperate, and compete, and what governance structures such an economy requires.

Read on NBER

2025 arXiv

Infrastructure for AI Agents

arXiv:2501.10114

With Alan Chan, Kevin Wei, and others

Proposes the technical and institutional infrastructure needed to support safe and beneficial deployment of autonomous AI agents at scale.

arXiv

2021

2021 Nature

Cooperative AI: Machines Must Learn to Find Common Ground

Nature, Vol. 593

With Allan Dafoe, Yoram Bachrach, Eric Horvitz, Kate Larson, Thore Graepel

Establishes the research agenda for "Cooperative AI"—building AI systems capable of cooperating with humans and other AI agents to solve shared problems.

Read on Nature

AI Governance

Regulatory frameworks and institutions for advanced AI

2025

2025 Jurimetrics

Regulatory Markets: The Future of AI Governance

Jurimetrics: The Journal of Law, Science, and Technology, Winter 2026

With Jack Clark

Proposes regulatory markets—a governance mechanism where governments require AI companies to purchase regulatory services from government-licensed private regulators—to overcome limitations of both command-and-control regulation and industry self-regulation.

Jurimetrics arXiv

2024

2024 Science

Managing Extreme AI Risks Amid Rapid Progress

Science, Vol. 384

With Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, and 20+ others

A landmark call for urgent governance measures to address catastrophic risks from rapidly advancing AI systems.

Read on Science

2024 Science

Regulating Advanced Artificial Agents

Science, Vol. 384

With Michael K. Cohen, Noam Kolt, Yoshua Bengio, Stuart Russell

Addresses the challenge of governing AI agents that can act autonomously in the world, proposing regulatory approaches tailored to agentic AI systems.

Read on Science

2024 AIES

Responsible Reporting for Frontier AI Development

AAAI/ACM Conference on AI, Ethics, and Society

With Noam Kolt, Markus Anderljung, Joslyn Barnhart, and others

Proposes standards and practices for how frontier AI developers should report capabilities, risks, and safety measures to regulators and the public.

arXiv

2024 arXiv

Computing Power and the Governance of Artificial Intelligence

arXiv:2402.08797

With Girish Sastry, Lennart Heim, Markus Anderljung, Miles Brundage, and others

Analyzes how compute resources can serve as a lever for AI governance, examining tracking, allocation, and control mechanisms for computational infrastructure.

arXiv

2024 arXiv

AI Model Registries: A Foundational Tool for AI Governance

arXiv:2410.09645

With Elliot McKernon, Gwyn Glasser, Deric Cheng

Proposes national registries for frontier AI models to enhance governance, drawing parallels to analogous industries while balancing safety oversight with innovation support.

arXiv

2023

2023 arXiv

Frontier AI Regulation: Managing Emerging Risks to Public Safety

arXiv:2307.03718

With Markus Anderljung, Anton Korinek, and others

Outlines a framework for regulating frontier AI systems based on their potential risks to public safety, drawing parallels to regulation in other high-risk industries.

arXiv

2023

2023 arXiv

International Institutions for Advanced AI

arXiv:2307.04699

With Lewis Ho, Robert Trager, Yoshua Bengio, Miles Brundage, and others

Explores the design of international governance institutions needed to manage risks from advanced AI, drawing on lessons from nuclear nonproliferation and other domains.

arXiv

2022

2022 Stanford

Gathering Strength, Gathering Storms: The AI100 2021 Study Panel Report

Stanford University

With Michael L. Littman, Ifeoma Ajunwa, Finale Doshi-Velez, and others

A comprehensive assessment of the state of AI and its societal implications as part of Stanford's 100-year study on artificial intelligence.

arXiv

2020

2020 arXiv

Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims

arXiv:2004.07213

With Miles Brundage, Shahar Avin, and 50+ others

Proposes institutional and technical mechanisms that would allow AI developers to make credible, verifiable claims about the safety and capabilities of their systems.

arXiv

2019

2019 arXiv

The Role of Cooperation in Responsible AI Development

arXiv:1907.04534

With Amanda Askell, Miles Brundage

Argues that cooperation among AI developers is essential for responsible development and proposes mechanisms to support collaborative safety efforts.

arXiv