Skip to yearly menu bar Skip to main content


Poster
in
Workshop: HiLD: High-dimensional Learning Dynamics Workshop

Characterizing and Improving Transformer Solutions for Dyck Grammars

Kaiyue Wen · Yuchen Li · Bingbin Liu · Andrej Risteski


Abstract:

Transformer-based models are capable of solving many complex tasks. Prior works formally justified such capabilities by providing a small set of constructions for each task, claiming that Transformers can implement certain classic algorithms. However, do Transformers typically converge to those proposed solutions through common optimization approaches? We tackle this question via the lens of a formal language called Dyck. We prove that even under this simple setup, the set of Transformer solutions is qualitatively rich, most of which do not match the intuitive constructions provided in prior works. Moreover, our analysis inspires a modified pre-training objective to guide Transformers towards better generalizing to processing longer Dyck sequences unseen during training. Extensive controlled experiments corroborate our findings.

Chat is not available.