Workshop
Models of Human Feedback for AI Alignment
Thomas Kleine Buening · Harshit Sikchi · Christos Dimitrakakis · Scott Niekum · Constantin Rothkopf · Aadirupa Saha · Lirong Xia
Schubert 4 - 6
Fri 26 Jul, midnight PDT
Aligning AI agents with human intentions and values is one of the main barriers to the safe and ethical application of AI systems in the real world. Current approaches mostly rely on highly questionable assumptions about the meaning of observed human feedback or interactions. These include assumptions about rationality in decision-making and belief forming, homogeneity of the population, and other restrictive feedback assumptions. However, the role of such modeling assumptions has mostly been neglected in the literature on AI alignment. In this workshop, we want to bring together perspectives from various disciplines besides ML, including computational social choice, behavioral psychology, and economics, to share experiences and perspectives on models of human feedback and their importance for human-AI alignment and collaboration.