Timezone: »

 
Poster
Efficient Training of Language Models using Few-Shot Learning
Sashank Jakkam Reddi · Sobhan Miryoosefi · Stefani Karp · Shankar Krishnan · Satyen Kale · Seungyeon Kim · Sanjiv Kumar

Tue Jul 25 02:00 PM -- 04:30 PM (PDT) @ Exhibit Hall 1 #505

Large deep learning models have achieved state-of-the-art performance across various natural language processing (NLP) tasks and demonstrated remarkable few-shot learning performance. However, training them is often challenging and resource-intensive. In this paper, we study an efficient approach to train language models using few-shot learners. We show that, by leveraging the fast learning nature of few-shot learners, one can train language models efficiently in a stagewise manner. Our main insight is that stacking a good few-shot learner on a good small language model provides a good initializer for a larger language model. Using this insight and building upon progressive stacking approaches, we develop novel approaches for training such networks in a stagewise manner. Furthermore, we also provide a theoretical framework and accompanying empirical studies to support our insights, thereby creating a theoretical foundation for progressive stacking. Finally, we provide empirical results to demonstrate the effectiveness of our approach in reducing the training time of few-shot learners.

Author Information

Sashank Jakkam Reddi (Google)
Sobhan Miryoosefi (Google)
Stefani Karp (Google)
Shankar Krishnan (Google)
Satyen Kale (Google Research)
Seungyeon Kim (Google)
Sanjiv Kumar (Google Research, NY)

More from the Same Authors