Timezone: »
Large language models have developed at a breathtaking pace, quickly advancing in their ability to generate, summarize, and work with long and short-form text. As these advances become further integrated into society, however, it becomes necessary to question and evaluate how well these models are actually capable of true reasoning, rather than simply mimicking their large training corpora. We argue that eliciting reasoning from language models is the new "explainability method" and introduce CReDETS, a novel and first-of-its-kind causal reasoning dataset with annotated hand-written explanations. We benchmark the latest and most powerful generation of transformer neural network models GPT-3, GPT-3.5 (chatGPT), and GPT-4 and discuss their accuracy, coherence, and consistency. Our staggering results show that even the most recent LLMs have stark weaknesses in reasoning ability that must be ameliorated before they can be integrated with public-facing applications worldwide.
Author Information
Isha Puri (MIT)
Hima Lakkaraju (Harvard)
More from the Same Authors
-
2021 : Towards the Unification and Robustness of Perturbation and Gradient Based Explanations »
· Sushant Agarwal · Shahin Jabbari · Chirag Agarwal · Sohini Upadhyay · Steven Wu · Hima Lakkaraju -
2021 : On the Connections between Counterfactual Explanations and Adversarial Examples »
· Martin Pawelczyk · Shalmali Joshi · Chirag Agarwal · Sohini Upadhyay · Hima Lakkaraju -
2021 : Towards a Rigorous Theoretical Analysis and Evaluation of GNN Explanations »
· Chirag Agarwal · Marinka Zitnik · Hima Lakkaraju -
2021 : What will it take to generate fairness-preserving explanations? »
· Jessica Dai · Sohini Upadhyay · Hima Lakkaraju -
2021 : Feature Attributions and Counterfactual Explanations Can Be Manipulated »
· Dylan Slack · Sophie Hilgard · Sameer Singh · Hima Lakkaraju -
2021 : On the Connections between Counterfactual Explanations and Adversarial Examples »
Martin Pawelczyk · Shalmali Joshi · Chirag Agarwal · Sohini Upadhyay · Hima Lakkaraju -
2021 : Towards Robust and Reliable Algorithmic Recourse »
Sohini Upadhyay · Shalmali Joshi · Hima Lakkaraju -
2021 : Reliable Post hoc Explanations: Modeling Uncertainty in Explainability »
Dylan Slack · Sophie Hilgard · Sameer Singh · Hima Lakkaraju -
2021 : Towards a Unified Framework for Fair and Stable Graph Representation Learning »
Chirag Agarwal · Hima Lakkaraju · Marinka Zitnik -
2021 : Reliable Post hoc Explanations: Modeling Uncertainty in Explainability »
Dylan Slack · Sophie Hilgard · Sameer Singh · Hima Lakkaraju -
2023 : Fair Machine Unlearning: Data Removal while Mitigating Disparities »
Alex Oesterling · Jiaqi Ma · Flavio Calmon · Hima Lakkaraju -
2023 : Himabindu Lakkaraju - Regulating Explainable AI: Technical Challenges and Opportunities »
Hima Lakkaraju -
2023 : Efficient Estimation of Local Robustness of Machine Learning Models »
Tessa Han · Suraj Srinivas · Hima Lakkaraju -
2023 Tutorial: Responsible AI for Generative AI in Practice: Lessons Learned and Open Challenges »
Krishnaram Kenthapadi · Hima Lakkaraju · Nazneen Rajani -
2022 Workshop: New Frontiers in Adversarial Machine Learning »
Sijia Liu · Pin-Yu Chen · Dongxiao Zhu · Eric Wong · Kathrin Grosse · Hima Lakkaraju · Sanmi Koyejo -
2021 Workshop: ICML Workshop on Algorithmic Recourse »
Stratis Tsirtsis · Amir-Hossein Karimi · Ana Lucic · Manuel Gomez-Rodriguez · Isabel Valera · Hima Lakkaraju -
2021 : Towards Robust and Reliable Model Explanations for Healthcare »
Hima Lakkaraju -
2021 Poster: Towards the Unification and Robustness of Perturbation and Gradient Based Explanations »
Sushant Agarwal · Shahin Jabbari · Chirag Agarwal · Sohini Upadhyay · Steven Wu · Hima Lakkaraju -
2021 Spotlight: Towards the Unification and Robustness of Perturbation and Gradient Based Explanations »
Sushant Agarwal · Shahin Jabbari · Chirag Agarwal · Sohini Upadhyay · Steven Wu · Hima Lakkaraju -
2020 Poster: Robust and Stable Black Box Explanations »
Hima Lakkaraju · Nino Arsov · Osbert Bastani