ICML 2023

Skip to yearly menu bar Skip to main content

12 Results

Workshop	Fri 12:40	Prof. Gagandeep Singh (UIUC): Trust and Safety with Certified AI Gagandeep Singh
Workshop	Sat 14:15	Using Causality to Improve Safety Throughout the AI Lifecycle Suchi Saria · Adarsh Subbaswamy
Workshop		Do Users Write More Insecure Code with AI Assistants? Neil Perry · Megha Srivastava · Deepak Kumar · Dan Boneh
Workshop		On feasibility of intent obfuscating attacks
Workshop		How vulnerable are doctors to unsafe hallucinatory AI suggestions? A framework for evaluation of safety in clinical human-AI cooperation Paul Festor · Myura Nagendran · Anthony Gordon · Matthieu Komorowski · Aldo Faisal
Workshop	Sat 18:30	SCIS 2023 Panel, The Future of Generalization: Scale, Safety and Beyond Maggie Makar · Samuel Bowman · Zachary Lipton · Adam Gleave
Workshop		On feasibility of intent obfuscating attacks ZhaoBin Li · Patrick Shafto
Workshop		Neuro-Symbolic Models of Human Moral Judgment: LLMs as Automatic Feature Extractors joseph kwon · Sydney Levine · Josh Tenenbaum
Workshop		Mitigating Inappropriateness in Image Generation: Can there be Value in Reflecting the Worlds Ugliness? Manuel Brack · Felix Friedrich · Patrick Schramowski · Kristian Kersting
Workshop		Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models Sanghyun Kim · Seohyeon Jung · Balhae Kim · Moonseok Choi · Jinwoo Shin · Juho Lee
Poster	Thu 13:30	Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark Alexander Pan · Jun Shern Chan · Andy Zou · Nathaniel Li · Steven Basart · Thomas Woodside · Hanlin Zhang · Scott Emmons · Dan Hendrycks
Oral	Tue 20:30	Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark Alexander Pan · Jun Shern Chan · Andy Zou · Nathaniel Li · Steven Basart · Thomas Woodside · Hanlin Zhang · Scott Emmons · Dan Hendrycks