Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

22 Results

<<   <   Page 1 of 2   >   >>
Workshop
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko · Francesco Croce · Nicolas Flammarion
Workshop
Can Editing LLMs Inject Harm?
Canyu Chen · Baixiang Huang · Zekun Li · Zhaorun Chen · Shiyang Lai · Xiongxiao Xu · Jia-Chen Gu · Jindong Gu · Huaxiu Yao · Chaowei Xiao · Xifeng Yan · William Wang · Phil Torr · Dawn Song · Kai Shu
Workshop
Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs
Swanand Kadhe · Farhan Ahmed · Dennis Wei · Nathalie Baracaldo · Inkit Padhi
Workshop
PrimeGuard: Safe and Helpful LLMs through Tuning-Free Routing
Blazej Manczak · Eric Lin · Eliott Zemour · Vaikkunth Mugunthan
Poster
Wed 4:30 Position: Intent-aligned AI Systems Must Optimize for Agency Preservation
Catalin Mitelut · Benjamin Smith · Peter Vamplew
Workshop
Sat 7:30 Boyi Li - Leveraging LLMs to Imagine Like Humans by Aligning Representations from Vision and Language
Boyi Li
Workshop
Fri 8:00 Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping
Haoyu Wang · Guozheng Ma · Ziqiao Meng · Zeyu Qin · Li Shen · Zhong Zhang · Bingzhe Wu · Liu Liu · Yatao Bian · Tingyang Xu · Xueqian Wang · Peilin Zhao
Workshop
Graph2Token: Make LLMs Understand Molecule Graphs
Runze Wang · Mingqi Yang · Yanming Shen
Workshop
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Ashwinee Panda · Berivan Isik · Xiangyu Qi · Sanmi Koyejo · Tsachy Weissman · Prateek Mittal
Workshop
Using Large Language Models for Humanitarian Frontline Negotiation: Opportunities and Considerations
Zilin Ma · Susannah (Cheng) Su · Nathan Zhao · Linn Bieske · Blake Bullwinkel · Jinglun Gao · Gekai Liao · Siyao Li · Ziqing Luo · Boxiang Wang · Zihan Wen · Yanrui Yang · Yanyi Zhang · Claude Bruderlein · Weiwei Pan
Workshop
Fri 8:00 Distributional Preference Alignment of LLMs via Optimal Transport
Igor Melnyk · Youssef Mroueh · Brian Belgodere · Mattia Rigotti · Apoorva Nitsure · Mikhail Yurochkin · Kristjan Greenewald · Jiri Navratil · Jarret Ross
Workshop
One-Shot Safety Alignment for Large Language Models via Optimal Dualization
Xinmeng Huang · Shuo Li · Edgar Dobriban · Osbert Bastani · Hamed Hassani · Dongsheng Ding