Skip to yearly menu bar Skip to main content


Search All 2023 Events
 

12 Results

<<   <   Page 1 of 1   >>   >
Workshop
Fri 12:40 Prof. Gagandeep Singh (UIUC): Trust and Safety with Certified AI
Gagandeep Singh
Workshop
Sat 14:15 Using Causality to Improve Safety Throughout the AI Lifecycle
Suchi Saria · Adarsh Subbaswamy
Workshop
Do Users Write More Insecure Code with AI Assistants?
Neil Perry · Megha Srivastava · Deepak Kumar · Dan Boneh
Workshop
On feasibility of intent obfuscating attacks
Workshop
How vulnerable are doctors to unsafe hallucinatory AI suggestions? A framework for evaluation of safety in clinical human-AI cooperation
Paul Festor · Myura Nagendran · Anthony Gordon · Matthieu Komorowski · Aldo Faisal
Workshop
Sat 18:30 SCIS 2023 Panel, The Future of Generalization: Scale, Safety and Beyond
Maggie Makar · Samuel Bowman · Zachary Lipton · Adam Gleave
Workshop
On feasibility of intent obfuscating attacks
ZhaoBin Li · Patrick Shafto
Workshop
Neuro-Symbolic Models of Human Moral Judgment: LLMs as Automatic Feature Extractors
joseph kwon · Sydney Levine · Josh Tenenbaum
Workshop
Mitigating Inappropriateness in Image Generation: Can there be Value in Reflecting the Worlds Ugliness?
Manuel Brack · Felix Friedrich · Patrick Schramowski · Kristian Kersting
Workshop
Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models
Sanghyun Kim · Seohyeon Jung · Balhae Kim · Moonseok Choi · Jinwoo Shin · Juho Lee
Poster
Thu 13:30 Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark
Alexander Pan · Jun Shern Chan · Andy Zou · Nathaniel Li · Steven Basart · Thomas Woodside · Hanlin Zhang · Scott Emmons · Dan Hendrycks
Oral
Tue 20:30 Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark
Alexander Pan · Jun Shern Chan · Andy Zou · Nathaniel Li · Steven Basart · Thomas Woodside · Hanlin Zhang · Scott Emmons · Dan Hendrycks