firstbacksecondback
22 Results
Workshop
|
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Maksym Andriushchenko · Francesco Croce · Nicolas Flammarion |
||
Workshop
|
Can Editing LLMs Inject Harm? Canyu Chen · Baixiang Huang · Zekun Li · Zhaorun Chen · Shiyang Lai · Xiongxiao Xu · Jia-Chen Gu · Jindong Gu · Huaxiu Yao · Chaowei Xiao · Xifeng Yan · William Wang · Phil Torr · Dawn Song · Kai Shu |
||
Workshop
|
Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs Swanand Kadhe · Farhan Ahmed · Dennis Wei · Nathalie Baracaldo · Inkit Padhi |
||
Workshop
|
PrimeGuard: Safe and Helpful LLMs through Tuning-Free Routing Blazej Manczak · Eric Lin · Eliott Zemour · Vaikkunth Mugunthan |
||
Poster
|
Wed 4:30 |
Position: Intent-aligned AI Systems Must Optimize for Agency Preservation Catalin Mitelut · Benjamin Smith · Peter Vamplew |
|
Workshop
|
Sat 7:30 |
Boyi Li - Leveraging LLMs to Imagine Like Humans by Aligning Representations from Vision and Language Boyi Li |
|
Workshop
|
Fri 8:00 |
Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping Haoyu Wang · Guozheng Ma · Ziqiao Meng · Zeyu Qin · Li Shen · Zhong Zhang · Bingzhe Wu · Liu Liu · Yatao Bian · Tingyang Xu · Xueqian Wang · Peilin Zhao |
|
Workshop
|
Graph2Token: Make LLMs Understand Molecule Graphs Runze Wang · Mingqi Yang · Yanming Shen |
||
Workshop
|
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs Ashwinee Panda · Berivan Isik · Xiangyu Qi · Sanmi Koyejo · Tsachy Weissman · Prateek Mittal |
||
Workshop
|
Using Large Language Models for Humanitarian Frontline Negotiation: Opportunities and Considerations Zilin Ma · Susannah (Cheng) Su · Nathan Zhao · Linn Bieske · Blake Bullwinkel · Jinglun Gao · Gekai Liao · Siyao Li · Ziqing Luo · Boxiang Wang · Zihan Wen · Yanrui Yang · Yanyi Zhang · Claude Bruderlein · Weiwei Pan |
||
Workshop
|
Fri 8:00 |
Distributional Preference Alignment of LLMs via Optimal Transport Igor Melnyk · Youssef Mroueh · Brian Belgodere · Mattia Rigotti · Apoorva Nitsure · Mikhail Yurochkin · Kristjan Greenewald · Jiri Navratil · Jarret Ross |
|
Workshop
|
One-Shot Safety Alignment for Large Language Models via Optimal Dualization Xinmeng Huang · Shuo Li · Edgar Dobriban · Osbert Bastani · Hamed Hassani · Dongsheng Ding |