Workshop
|
Fri 12:40
|
Prof. Gagandeep Singh (UIUC): Trust and Safety with Certified AI
Gagandeep Singh
|
|
Workshop
|
Sat 14:15
|
Using Causality to Improve Safety Throughout the AI Lifecycle
Suchi Saria · Adarsh Subbaswamy
|
|
Workshop
|
|
Do Users Write More Insecure Code with AI Assistants?
Neil Perry · Megha Srivastava · Deepak Kumar · Dan Boneh
|
|
Workshop
|
|
On feasibility of intent obfuscating attacks
|
|
Workshop
|
|
How vulnerable are doctors to unsafe hallucinatory AI suggestions? A framework for evaluation of safety in clinical human-AI cooperation
Paul Festor · Myura Nagendran · Anthony Gordon · Matthieu Komorowski · Aldo Faisal
|
|
Workshop
|
Sat 18:30
|
SCIS 2023 Panel, The Future of Generalization: Scale, Safety and Beyond
Maggie Makar · Samuel Bowman · Zachary Lipton · Adam Gleave
|
|
Workshop
|
|
On feasibility of intent obfuscating attacks
ZhaoBin Li · Patrick Shafto
|
|
Workshop
|
|
Neuro-Symbolic Models of Human Moral Judgment: LLMs as Automatic Feature Extractors
joseph kwon · Sydney Levine · Josh Tenenbaum
|
|
Workshop
|
|
Mitigating Inappropriateness in Image Generation: Can there be Value in Reflecting the Worlds Ugliness?
Manuel Brack · Felix Friedrich · Patrick Schramowski · Kristian Kersting
|
|
Workshop
|
|
Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models
Sanghyun Kim · Seohyeon Jung · Balhae Kim · Moonseok Choi · Jinwoo Shin · Juho Lee
|
|
Poster
|
Thu 13:30
|
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark
Alexander Pan · Jun Shern Chan · Andy Zou · Nathaniel Li · Steven Basart · Thomas Woodside · Hanlin Zhang · Scott Emmons · Dan Hendrycks
|
|
Oral
|
Tue 20:30
|
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark
Alexander Pan · Jun Shern Chan · Andy Zou · Nathaniel Li · Steven Basart · Thomas Woodside · Hanlin Zhang · Scott Emmons · Dan Hendrycks
|
|