Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

98 Results

<<   <   Page 1 of 9   >   >>
Workshop
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko · Francesco Croce · Nicolas Flammarion
Workshop
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
Bang An · Sicheng Zhu · Ruiyi Zhang · Michael-Andrei Panaitescu-Liess · Yuancheng Xu · Furong Huang
Poster
Thu 4:30 Scalable AI Safety via Doubly-Efficient Debate
Jonah Brown-Cohen · Geoffrey Irving · Georgios Piliouras
Oral Session
Thu 7:30 Oral 6E Robustness and Safety
Oral
Tue 7:45 Position: AI-Powered Autonomous Weapons Risk Geopolitical Instability and Threaten AI Research
Riley Simmons-Edler · Ryan Badman · Shayne Longpre · Kanaka Rajan
Oral
Wed 8:00 AI Control: Improving Safety Despite Intentional Subversion
Ryan Greenblatt · Buck Shlegeris · Kshitij Sachan · Fabien Roger
Poster
Thu 2:30 Fair Data Representation for Machine Learning at the Pareto Frontier
Shizhou Xu · Thomas Strohmer
Tue 8:30 AI Safety Social: Navigating Misuse, Ethical Challenges, and Systemic Risks
Poster
Wed 4:30 Monotone Individual Fairness
Yahav Bechavod
Poster
Thu 2:30 Standardized Interpretable Fairness Measures for Continuous Risk Scores
Ann-Kristin Becker · Oana Dumitrasc · Klaus Broelemann
Poster
Wed 2:30 Position: Machine Learning-powered Assessments of the EU Digital Services Act Aid Quantify Policy Impacts on Online Harms
Eleonora Bonel · Luca Nannini · Davide Bassi · Michele Maggini
Oral
Thu 8:15 Scalable AI Safety via Doubly-Efficient Debate
Jonah Brown-Cohen · Geoffrey Irving · Georgios Piliouras