Toggle Poster Visibility
Oral
Tue Jul 23 11:30 PM -- 11:45 PM (KST) @ Straus 1-3 None
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape
Oral
Tue Jul 23 11:45 PM -- 12:00 AM (KST) @ Straus 1-3 None
I/O Complexity of Attention, or How Optimal is FlashAttention?
[
Slides]
Oral
Wed Jul 24 12:00 AM -- 12:15 AM (KST) @ Straus 1-3 None
Improving Transformers with Dynamically Composable Multi-Head Attention
Oral
Wed Jul 24 12:15 AM -- 12:30 AM (KST) @ Straus 1-3 None
Less is More: on the Over-Globalizing Problem in Graph Transformers
Successful Page Load