Research

Current Research

Everyone has their favorite answer to the question of what makes humans so different from our primate ancestors. My answer is “normativity.” By normativity I mean the, very human, practice of classifying behaviors as either “ok” or “not ok” and then coordinating third-party enforcement schemes to channel behaviors away from those classified as “not ok.” In my view, normativity is at the root of what distinguishes humans from other mammals. Human normative systems—social norms, informal dispute resolution, formal systems of law–are the fundamental infrastructure of human groups.

I am currently working on the challenge of AI alignment and governance and computational models to analyze the phenomenon and characteristics of normativity and legal order. For AI to be beneficial to society, it will have to integrate into our normative systems. At a minimum it must not break these complex dynamic processes—the systems that underpin our willingness to engage in the phenomenal levels of interdependence and cooperation that characterize modern human societies. This means figuring out how to build AI systems that advance our human normative efforts—enabling us to find and live by better rules, achieve more fair, just, and peaceful societies, and support human flourishing.

The Normativity Lab

How can we ensure AI systems and agents align with human values and norms? Maintain and enhance the complex cooperative economic, political and social systems humans have built? What will it take to ensure that the AI transformation puts us on the path to improved human well-being and flourishing, and not catastrophe? In the Normativity Lab we believe these questions are answered by studying the foundations of human normativity and human normative systems. We use economic theory and computational modeling to explore the dynamics of normative systems and explore how to build AI systems and agents that have the normative infrastructure and normative competence to do as humans have learned to do: create stable rule-based groups that can adapt to change while ensuring group well-being.

Current Projects

Normative infrastructure to improve generalization of cooperative behavior in learning agents

Generalization of cooperative behavior to novel environments is a core research challenge in multiagent reinforcement learning (MARL). This line of work builds on my prior work and my involvement in building the field of cooperative AI, and aims to make a major contribution to techniques for training AI agents to cooperate. We are building a MARL environment that includes a “classification institution” (represented as a special location in a grid-world) that represents variable “not ok” behavior that agents will be rewarded for sanctioning in others. We aim to train agents that look for this institution in any environment and so are able to more rapidly converge on desired behaviors that promote individual and group surplus.

Building normatively competent agents

The rapid development of LLM-based systems with advanced capabilities made available to the general public through interfaces such as ChatGPT has brought with it a major alignment challenge: how do we make sure these systems respond to user queries in ways that comply with local and universal rules and norms about what it is, and is not, acceptable to say. These rules and norms range from simple etiquette (using polite language) to AI safety (refusing to provide assistance in the commission of crime) to human rights (treating all people with dignity and respect.) The dominant approaches to alignment aim to fine-tune models on information about human norms and values, seeking to embed these norms and values directly into models. In my view, these approaches are inherently limited: human norms and values are dynamic, variable and incompletely specified. Humans have evolved to respond to the complexities of human normative systems not by encoding specific norms and values but rather by normative competence: cognitive architectures that allow us to go into almost any normative setting and look for, interpret, and respond appropriately to the normative institutions and behaviors in that setting. My work is focused on how to build normative competence into AI systems, including generative agents, that can robustly determine what appropriate norms and behaviors are in any setting.

Modeling normative classification

This project involves several formal modeling efforts to develop the theoretical framework introduced in Hadfield and Weingast (2012). A core question in this work is how groups come to classify novel behaviors and achieve norm change. One line of work examines this using cultural evolution theory and computational simulations. Another project examines the incentive of agents to share their opinions (evaluations) and seeks to identify methods for surfacing non-polarized and reliable information for policymakers on large-scale online platforms.

Normative infrastructure among the Turkana of Kenya

The processes of cultural evolution and cultural transmission of norms and knowledge (social learning) have recently gained significant attention in AI research, particularly from the perspective of alignment. In work currently funded by the Templeton Foundation, co-PI Sarah Mathew (ASU) and I are conducting field work among the Turkana of Kenya. We are studying the presence, or absence, of the attributes of classification institutions in this pastoralist semi-nomadic people. We frame this work within a broader question of how cultural group selection may operate not at the level of individual norms but rather at the level of normative infrastructure, suggesting that groups that successfully establish classification and enforcement mechanisms that secure robust decentralized participation from group members achieve higher fitness through stable, but adaptable, normative social order.

Normative classification and normative beliefs in contracting behavior

An early and significant contribution to behavioral economics found that individuals playing a trust game were more likely to behave in trusting and trustworthy ways if the agent to be trusted sent a free-form message before play that contained a “promise,” despite the absence of formal contract enforcement. The behavioral economics literature has interpreted this result as evidence of human-specific psychological traits such as guilt aversion. I am conducting human subjects experiments to test the hypothesis that this behavior is better understood within the Hadfield & Weingast (2012) framework. These experiments manipulate beliefs about how anonymous third-parties evaluate messages (is this a promise?) to test the hypothesis that variation in participants’ behavior is explained by variation in what the relevant ‘group’ considers to be appropriate behavior, rather than personal variation in degrees of guilt aversion or personal moral attitudes. Ultimately this work will inform the development of normative institutions and normative competence for AI systems and agents.

Judging facts, judging norms

In a paper recently published in Science Advances, I and co-authors demonstrated that human subjects exercise normative judgment (does this comment violate forum rules against threats?) differently than they exercise descriptive judgment (does this comment contain threats?) and showed that failure to appreciate this difference in training automated decision-making systems could lead to systematic failure to replicate human judgment. This could destabilize trust and confidence in automated judgment, which is likely to quickly play a substantial role in social and economic processes. Future work will explore how these insights can help us build better data and training processes to ensure automated decisions implement human judgment in a way that supports the continued confidence and warranted trust of human subjects.

Normative Intelligence
June 20, 2024 – Cooperative AI Foundation This lecture was… Read more: Normative Intelligence
AI2050 Community Perspective – Gillian Hadfield
September 16, 2024 – AI2050 Schmidt Sciences
Presentation: Vienna Alignment Workshop Panel Discussion – Current Issues in AI Safety
July 2024 – Vienna Alignment Workshop

Research

Current Research

The Normativity Lab

Current Projects

Related Posts