Toggle Poster Visibility
Oral
Tue Jul 23 07:30 AM -- 07:45 AM (PDT) @ Straus 1-3 None
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape
Oral
Tue Jul 23 07:45 AM -- 08:00 AM (PDT) @ Straus 1-3 None
I/O Complexity of Attention, or How Optimal is FlashAttention?
[
Slides]
Oral
Tue Jul 23 08:00 AM -- 08:15 AM (PDT) @ Straus 1-3 None
Improving Transformers with Dynamically Composable Multi-Head Attention
Oral
Tue Jul 23 08:15 AM -- 08:30 AM (PDT) @ Straus 1-3 None
Less is More: on the Over-Globalizing Problem in Graph Transformers