Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Next Generation of Sequence Modeling Architectures

When can transformers compositionally generalize in-context?

Seijin Kobayashi · Simon Schug · Yassir Akram · Florian Redhardt · Johannes Von Oswald · Razvan Pascanu · Guillaume Lajoie · Joao Sacramento


Abstract:

Many tasks can be composed from a few independent components. This gives rise to a combinatorial explosion of possible tasks, only some of which might be encountered during training. Under what circumstances can transformers compositionally generalize from a subset of tasks to all possible combinations of tasks that share similar components? Here we study a modular multitask setting that allows us to precisely control compositional structure in the data generation process. We present evidence that transformers learning in-context struggle to generalize compositionally on this task despite being in principle expressive enough to do so. Only when introducing a bottleneck that enforces an explicit separation into task inference and task execution can compositional generalization becomes possible.

Chat is not available.