Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Dynamic Neural Networks

Dynamic Transformer Networks

Amanuel Mersha


Abstract:

Deep neural networks have been very successful in recent years, some of which can be attributed to the introduction of Transformers. Dynamic neural networks on the other hand are being studied for better efficiency in various circumstances such as resource-constrained environments. Enabling transformers to be dynamic lets them only execute the needed layers of the models. In this work, we present a simple way of the oracle function that enables the model to determine the dependency of layers in transformers, just like soft attention. It can then be used as a strategy to skip layers without an RL agent. We show that such a model learns to skip on average, half of its layer for each sample in a batch input.

Chat is not available.