Timezone: »
As AI systems gain prominence in society, concerns about their safety become crucial to address. There have been repeated calls to align powerful AI systems with human morality. However, attempts to do this have used black-box systems that cannot be interpreted or explained. In response, we introduce a methodology leveraging the natural language processing abilities of large language models (LLMs) and the interpretability of symbolic models to form competitive neuro-symbolic models for predicting human moral judgment. Our method involves using LLMs to extract morally-relevant features from a stimulus and then passing those features through a cognitive model that predicts human moral judgment. This approach achieves state-of-the-art performance on the MoralExceptQA benchmark, improving on the previous F1 score by 20 points and accuracy by 18 points, while also enhancing model interpretability by baring all key features in the model's computation. We propose future directions for harnessing LLMs to develop more capable and interpretable neuro-symbolic models, emphasizing the critical role of interpretability in facilitating the safe integration of AI systems into society.
Author Information
joseph kwon (MIT)
Sydney Levine (Massachusetts Institute of Technology)
Josh Tenenbaum (MIT)
Joshua Brett Tenenbaum is Professor of Cognitive Science and Computation at the Massachusetts Institute of Technology. He is known for contributions to mathematical psychology and Bayesian cognitive science. He previously taught at Stanford University, where he was the Wasow Visiting Fellow from October 2010 to January 2011. Tenenbaum received his undergraduate degree in physics from Yale University in 1993, and his Ph.D. from MIT in 1999. His work primarily focuses on analyzing probabilistic inference as the engine of human cognition and as a means to develop machine learning.
More from the Same Authors
-
2023 : Neuro-Symbolic Models of Human Moral Judgment: LLMs as Automatic Feature Extractors »
joseph kwon · Sydney Levine · Josh Tenenbaum -
2023 : Neuro-Symbolic Models of Human Moral Judgment: LLMs as Automatic Feature Extractors »
joseph kwon · Sydney Levine · Josh Tenenbaum -
2023 : Building Community Driven Libraries of Natural Programs »
Leonardo Hernandez Cano · Yewen Pu · Robert Hawkins · Josh Tenenbaum · Armando Solar-Lezama -
2023 : Inferring the Future by Imagining the Past »
Kartik Chandra · Tony Chen · Tzu-Mao Li · Jonathan Ragan-Kelley · Josh Tenenbaum -
2023 : Inferring the Goals of Communicating Agents from Actions and Instructions »
Lance Ying · Tan Zhi-Xuan · Vikash Mansinghka · Josh Tenenbaum -
2023 : The Neuro-Symbolic Inverse Planning Engine (NIPE): Modeling probabilistic social inferences from linguistic inputs »
Lance Ying · Katie Collins · Megan Wei · Cedegao Zhang · Tan Zhi-Xuan · Adrian Weller · Josh Tenenbaum · Catherine Wong -
2023 : Inferring the Future by Imagining the Past »
Kartik Chandra · Tony Chen · Tzu-Mao Li · Jonathan Ragan-Kelley · Josh Tenenbaum -
2023 Oral: Inferring Relational Potentials in Interacting Systems »
Armand Comas · Yilun Du · Christian Fernandez Lopez · Sandesh Ghimire · Mario Sznaier · Josh Tenenbaum · Octavia Camps -
2023 Poster: On the Complexity of Bayesian Generalization »
Yu-Zhe Shi · Manjie Xu · John Hopcroft · Kun He · Josh Tenenbaum · Song-Chun Zhu · Ying Nian Wu · Wenjuan Han · Yixin Zhu -
2023 Poster: Inferring Relational Potentials in Interacting Systems »
Armand Comas · Yilun Du · Christian Fernandez Lopez · Sandesh Ghimire · Mario Sznaier · Josh Tenenbaum · Octavia Camps -
2023 Poster: Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC »
Yilun Du · Conor Durkan · Robin Strudel · Josh Tenenbaum · Sander Dieleman · Rob Fergus · Jascha Sohl-Dickstein · Arnaud Doucet · Will Grathwohl -
2023 Poster: Learning Neural Constitutive Laws from Motion Observations for Generalizable PDE Dynamics »
Pingchuan Ma · Peter Yichen Chen · Bolei Deng · Josh Tenenbaum · Tao Du · Chuang Gan · Wojciech Matusik -
2022 Poster: Discovering Generalizable Spatial Goal Representations via Graph-based Active Reward Learning »
Aviv Netanyahu · Tianmin Shu · Josh Tenenbaum · Pulkit Agrawal -
2022 Spotlight: Discovering Generalizable Spatial Goal Representations via Graph-based Active Reward Learning »
Aviv Netanyahu · Tianmin Shu · Josh Tenenbaum · Pulkit Agrawal -
2022 Poster: Planning with Diffusion for Flexible Behavior Synthesis »
Michael Janner · Yilun Du · Josh Tenenbaum · Sergey Levine -
2022 Oral: Planning with Diffusion for Flexible Behavior Synthesis »
Michael Janner · Yilun Du · Josh Tenenbaum · Sergey Levine -
2022 Poster: Learning Iterative Reasoning through Energy Minimization »
Yilun Du · Shuang Li · Josh Tenenbaum · Igor Mordatch -
2022 Poster: Prompting Decision Transformer for Few-Shot Policy Generalization »
Mengdi Xu · Yikang Shen · Shun Zhang · Yuchen Lu · Ding Zhao · Josh Tenenbaum · Chuang Gan -
2022 Spotlight: Learning Iterative Reasoning through Energy Minimization »
Yilun Du · Shuang Li · Josh Tenenbaum · Igor Mordatch -
2022 Spotlight: Prompting Decision Transformer for Few-Shot Policy Generalization »
Mengdi Xu · Yikang Shen · Shun Zhang · Yuchen Lu · Ding Zhao · Josh Tenenbaum · Chuang Gan -
2021 Poster: A large-scale benchmark for few-shot program induction and synthesis »
Ferran Alet · Javier Lopez-Contreras · James Koppel · Maxwell Nye · Armando Solar-Lezama · Tomas Lozano-Perez · Leslie Kaelbling · Josh Tenenbaum -
2021 Spotlight: A large-scale benchmark for few-shot program induction and synthesis »
Ferran Alet · Javier Lopez-Contreras · James Koppel · Maxwell Nye · Armando Solar-Lezama · Tomas Lozano-Perez · Leslie Kaelbling · Josh Tenenbaum -
2021 Poster: AGENT: A Benchmark for Core Psychological Reasoning »
Tianmin Shu · Abhishek Bhandwaldar · Chuang Gan · Kevin Smith · Shari Liu · Dan Gutfreund · Elizabeth Spelke · Josh Tenenbaum · Tomer Ullman -
2021 Spotlight: AGENT: A Benchmark for Core Psychological Reasoning »
Tianmin Shu · Abhishek Bhandwaldar · Chuang Gan · Kevin Smith · Shari Liu · Dan Gutfreund · Elizabeth Spelke · Josh Tenenbaum · Tomer Ullman -
2021 Poster: Improved Contrastive Divergence Training of Energy-Based Models »
Yilun Du · Shuang Li · Josh Tenenbaum · Igor Mordatch -
2021 Poster: Leveraging Language to Learn Program Abstractions and Search Heuristics »
Catherine Wong · Kevin Ellis · Josh Tenenbaum · Jacob Andreas -
2021 Spotlight: Leveraging Language to Learn Program Abstractions and Search Heuristics »
Catherine Wong · Kevin Ellis · Josh Tenenbaum · Jacob Andreas -
2021 Spotlight: Improved Contrastive Divergence Training of Energy-Based Models »
Yilun Du · Shuang Li · Josh Tenenbaum · Igor Mordatch -
2020 Poster: Visual Grounding of Learned Physical Models »
Yunzhu Li · Toru Lin · Kexin Yi · Daniel Bear · Daniel Yamins · Jiajun Wu · Josh Tenenbaum · Antonio Torralba -
2019 Poster: Learning to Infer Program Sketches »
Maxwell Nye · Luke Hewitt · Josh Tenenbaum · Armando Solar-Lezama -
2019 Oral: Learning to Infer Program Sketches »
Maxwell Nye · Luke Hewitt · Josh Tenenbaum · Armando Solar-Lezama -
2019 Poster: Infinite Mixture Prototypes for Few-shot Learning »
Kelsey Allen · Evan Shelhamer · Hanul Shin · Josh Tenenbaum -
2019 Oral: Infinite Mixture Prototypes for Few-shot Learning »
Kelsey Allen · Evan Shelhamer · Hanul Shin · Josh Tenenbaum -
2019 Poster: Neurally-Guided Structure Inference »
Sidi Lu · Jiayuan Mao · Josh Tenenbaum · Jiajun Wu -
2019 Oral: Neurally-Guided Structure Inference »
Sidi Lu · Jiayuan Mao · Josh Tenenbaum · Jiajun Wu -
2018 Invited Talk: Building Machines that Learn and Think Like People »
Josh Tenenbaum