Timezone: »

 
Neuro-Symbolic Models of Human Moral Judgment: LLMs as Automatic Feature Extractors
joseph kwon · Sydney Levine · Josh Tenenbaum
Event URL: https://openreview.net/forum?id=KKzm2S1Pfl »

As AI systems gain prominence in society, concerns about their safety become crucial to address. There have been repeated calls to align powerful AI systems with human morality. However, attempts to do this have used black-box systems that cannot be interpreted or explained. In response, we introduce a methodology leveraging the natural language processing abilities of large language models (LLMs) and the interpretability of symbolic models to form competitive neuro-symbolic models for predicting human moral judgment. Our method involves using LLMs to extract morally-relevant features from a stimulus and then passing those features through a cognitive model that predicts human moral judgment. This approach achieves state-of-the-art performance on the MoralExceptQA benchmark, improving on the previous F1 score by 20 points and accuracy by 18 points, while also enhancing model interpretability by baring all key features in the model's computation.

Author Information

joseph kwon (MIT)
Sydney Levine (Massachusetts Institute of Technology)
Josh Tenenbaum (MIT)

Joshua Brett Tenenbaum is Professor of Cognitive Science and Computation at the Massachusetts Institute of Technology. He is known for contributions to mathematical psychology and Bayesian cognitive science. He previously taught at Stanford University, where he was the Wasow Visiting Fellow from October 2010 to January 2011. Tenenbaum received his undergraduate degree in physics from Yale University in 1993, and his Ph.D. from MIT in 1999. His work primarily focuses on analyzing probabilistic inference as the engine of human cognition and as a means to develop machine learning.

More from the Same Authors