Legal Alignment for Safe and Ethical AI
Develops the concept of "legal alignment"—training AI systems to understand and operate within legal frameworks as a path to broader normative alignment.
Selected from my contemporary research into Normativity and AI Alignment, Cooperative AI, and AI Governance. Complete publications available on Google Scholar.
Building AI that understands and operates within human normative systems
Develops the concept of "legal alignment"—training AI systems to understand and operate within legal frameworks as a path to broader normative alignment.
Argues that effective AI governance requires AI systems capable of understanding and reasoning about human normative systems—not just following explicit rules—to truly participate in the complex equilibrium of human values and norms.
Demonstrates that grounding LLM alignment in frameworks developed for societal-level coordination can improve alignment outcomes.
Based on vignette experiments with 369 Turkana participants in Kenya, demonstrates how metanorms—rules that govern the process by which norms are interpreted, changed and enforced—enable societies to balance normative stability and adaptability through their dispute resolution institutions.
Proposes an architecture enabling AI agents to learn, represent, and reason about social norms in ways that support cooperation in multi-agent environments.
Shows that standard ML data labeling practices are inadequate when models are used to make normative judgments about humans, and proposes alternative approaches.
Demonstrates experimentally that arbitrary ("silly") rules help artificial agents learn to comply with and enforce norms more effectively—a key insight for building normatively competent AI.
Reframes the AI alignment problem through the lens of incomplete contract theory, showing how legal and economic insights about managing incomplete specifications apply to aligning AI with human values.
Argues that training AI in environments with clearly legible (even arbitrary) normative structure helps develop the general capacity to recognize and follow norms.
How to make AI agents that interact, cooperate, and coordinate
Assesses the potential impacts—both positive and negative—of advanced AI systems on democratic institutions, processes, and participation.
Analyzes risks that emerge specifically from interactions among multiple advanced AI systems, including coordination failures, conflicts, and emergent behaviors.
Investigates how capability diversity influences multi-agent interactions, demonstrating that debate can decrease accuracy when agents favor agreement over challenging flawed reasoning.
Examines how economic principles apply to a world where AI agents transact, cooperate, and compete, and what governance structures such an economy requires.
Proposes the technical and institutional infrastructure needed to support safe and beneficial deployment of autonomous AI agents at scale.
Establishes the research agenda for "Cooperative AI"—building AI systems capable of cooperating with humans and other AI agents to solve shared problems.
Regulatory frameworks and institutions for advanced AI
Proposes regulatory markets—a governance mechanism where governments require AI companies to purchase regulatory services from government-licensed private regulators—to overcome limitations of both command-and-control regulation and industry self-regulation.
A landmark call for urgent governance measures to address catastrophic risks from rapidly advancing AI systems.
Addresses the challenge of governing AI agents that can act autonomously in the world, proposing regulatory approaches tailored to agentic AI systems.
Proposes standards and practices for how frontier AI developers should report capabilities, risks, and safety measures to regulators and the public.
Analyzes how compute resources can serve as a lever for AI governance, examining tracking, allocation, and control mechanisms for computational infrastructure.
Proposes national registries for frontier AI models to enhance governance, drawing parallels to analogous industries while balancing safety oversight with innovation support.
Outlines a framework for regulating frontier AI systems based on their potential risks to public safety, drawing parallels to regulation in other high-risk industries.
Explores the design of international governance institutions needed to manage risks from advanced AI, drawing on lessons from nuclear nonproliferation and other domains.
A comprehensive assessment of the state of AI and its societal implications as part of Stanford's 100-year study on artificial intelligence.
Proposes institutional and technical mechanisms that would allow AI developers to make credible, verifiable claims about the safety and capabilities of their systems.
Argues that cooperation among AI developers is essential for responsible development and proposes mechanisms to support collaborative safety efforts.