With eds. Sven Nyholm, Atoosa Kasirzadeh, John Zerilli
Wiley-Blackwell
Argues that effective AI governance requires AI systems capable of understanding and reasoning about human normative systems—not just following explicit rules—to truly participate in the complex equilibrium of human values and norms.
2025
arXiv
arXiv:2503.00069
With Karolina Stańczak, Nicholas Meade, Timothy P. Lillicrap, and others
Demonstrates that grounding LLM alignment in frameworks developed for societal-level coordination can improve alignment outcomes.
2025
Phil Trans B
Philosophical Transactions of the Royal Society B
With Sarah Mathew, Danson Mwangi, Samir Reynolds
Based on vignette experiments with 369 Turkana participants in Kenya, demonstrates how metanorms—rules that govern the process by which norms are interpreted, changed and enforced—enable societies to balance normative stability and adaptability through their dispute resolution institutions.
2024
Arxiv
Transactions on Machine Learning Research
With Atrisha Sarkar, Andrei Muresanu, and others
Proposes an architecture enabling AI agents to learn, represent, and reason about social norms in ways that support cooperation in multi-agent environments.
2023
Science Advances
Science Advances
With Aparna Balagopalan, David Madras, Dylan Hadfield-Menell, and others
Shows that standard ML data labeling practices are inadequate when models are used to make normative judgments about humans, and proposes alternative approaches.
2022
PNAS
PNAS, Vol. 119(3)
With Raphael Köster, Dylan Hadfield-Menell, Joel Z. Leibo, and others
Demonstrates experimentally that arbitrary ("silly") rules help artificial agents learn to comply with and enforce norms more effectively—a key insight for building normatively competent AI.
2019
AIES
AAAI/ACM Conference on AI, Ethics, and Society
With Dylan Hadfield-Menell
Reframes the AI alignment problem through the lens of incomplete contract theory, showing how legal and economic insights about managing incomplete specifications apply to aligning AI with human values.
2019
AIES
AAAI/ACM Conference on AI, Ethics, and Society
With Dylan Hadfield-Menell, McKane Andrus
Argues that training AI in environments with clearly legible (even arbitrary) normative structure helps develop the general capacity to recognize and follow norms.
2025
2025
Nature Human Behaviour
Nature Human Behaviour
With Christopher Summerfield, Lisa Argyle, and others
Assesses the potential impacts—both positive and negative—of advanced AI systems on democratic institutions, processes, and participation.
2025
arXiv
arXiv:2502.14143
With Lewis Hammond, Alan Chan, and others
Analyzes risks that emerge specifically from interactions among multiple advanced AI systems, including coordination failures, conflicts, and emergent behaviors.
2025
arXiv
arXiv:2509.05396
With Andrea Wynn, Harsh Satija
Investigates how capability diversity influences multi-agent interactions, demonstrating that debate can decrease accuracy when agents favor agreement over challenging flawed reasoning.
2025
NBER
Economics of Transformative AI, NBER
With Andrew Koh
Examines how economic principles apply to a world where AI agents transact, cooperate, and compete, and what governance structures such an economy requires.
2025
arXiv
arXiv:2501.10114
With Alan Chan, Kevin Wei, and others
Proposes the technical and institutional infrastructure needed to support safe and beneficial deployment of autonomous AI agents at scale.
2021
2021
Nature
Nature, Vol. 593
With Allan Dafoe, Yoram Bachrach, Eric Horvitz, Kate Larson, Thore Graepel
Establishes the research agenda for "Cooperative AI"—building AI systems capable of cooperating with humans and other AI agents to solve shared problems.
2025
2025
Jurimetrics
Jurimetrics: The Journal of Law, Science, and Technology, Winter 2026
With Jack Clark
Proposes regulatory markets—a governance mechanism where governments require AI companies to purchase regulatory services from government-licensed private regulators—to overcome limitations of both command-and-control regulation and industry self-regulation.
2024
2024
Science
Science, Vol. 384
With Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, and 20+ others
A landmark call for urgent governance measures to address catastrophic risks from rapidly advancing AI systems.
2024
Science
Science, Vol. 384
With Michael K. Cohen, Noam Kolt, Yoshua Bengio, Stuart Russell
Addresses the challenge of governing AI agents that can act autonomously in the world, proposing regulatory approaches tailored to agentic AI systems.
2024
AIES
AAAI/ACM Conference on AI, Ethics, and Society
With Noam Kolt, Markus Anderljung, Joslyn Barnhart, and others
Proposes standards and practices for how frontier AI developers should report capabilities, risks, and safety measures to regulators and the public.
2024
arXiv
arXiv:2402.08797
With Girish Sastry, Lennart Heim, Markus Anderljung, Miles Brundage, and others
Analyzes how compute resources can serve as a lever for AI governance, examining tracking, allocation, and control mechanisms for computational infrastructure.
2024
arXiv
arXiv:2410.09645
With Elliot McKernon, Gwyn Glasser, Deric Cheng
Proposes national registries for frontier AI models to enhance governance, drawing parallels to analogous industries while balancing safety oversight with innovation support.
2023
2023
arXiv
arXiv:2307.03718
With Markus Anderljung, Anton Korinek, and others
Outlines a framework for regulating frontier AI systems based on their potential risks to public safety, drawing parallels to regulation in other high-risk industries.
2023
2023
arXiv
arXiv:2307.04699
With Lewis Ho, Robert Trager, Yoshua Bengio, Miles Brundage, and others
Explores the design of international governance institutions needed to manage risks from advanced AI, drawing on lessons from nuclear nonproliferation and other domains.
2022
2022
Stanford
Stanford University
With Michael L. Littman, Ifeoma Ajunwa, Finale Doshi-Velez, and others
A comprehensive assessment of the state of AI and its societal implications as part of Stanford's 100-year study on artificial intelligence.
2020
2020
arXiv
arXiv:2004.07213
With Miles Brundage, Shahar Avin, and 50+ others
Proposes institutional and technical mechanisms that would allow AI developers to make credible, verifiable claims about the safety and capabilities of their systems.
2019
2019
arXiv
arXiv:1907.04534
With Amanda Askell, Miles Brundage
Argues that cooperation among AI developers is essential for responsible development and proposes mechanisms to support collaborative safety efforts.