ICML Resource-constrained Neural Architecture Search on Language Models: A Case Study

Poster
in
Workshop: 2nd Workshop on Advancing Neural Network Training : Computational Efficiency, Scalability, and Resource Optimization (WANT@ICML 2024)

Resource-constrained Neural Architecture Search on Language Models: A Case Study

Andreas Paraskeva · Joao Reis · Suzan Verberne · Jan Rijn

[ Abstract ] [ Project Page ]

[ Slides] [ Poster] [ OpenReview]

Abstract: Transformer-based language models have achieved milestones in natural language processing, but they come with challenges, mainly due to their computational footprint. Applying automated machine learning to these models can democratize their use and foster further research and development. We present a case study using neural architecture search (NAS) to optimize DistilBERT in a resource-constrained environment with a $4\,000$ GPU-hour budget. We employ an evolutionary algorithm that uses a two-level hierarchical search space and a segmented pipeline for component enhancement. While in order to obtain state-of-the-art results more compute budget is required, our results show efficient exploration, and a strong correlation between pre-training and downstream performance. This suggests a potential applicability of using pre-training validation as a cutoff criterion during model training. Finally, our learning curves analysis emphasizes the potential for efficient resource allocation through the adoption of an epoch-level stopping strategy, thus directing resources towards more promising candidate models. Future work should focus on scaling these insights to larger language models and more diverse tasks.

Chat is not available.

Poster in Workshop: 2nd Workshop on Advancing Neural Network Training : Computational Efficiency, Scalability, and Resource Optimization (WANT@ICML 2024)

Resource-constrained Neural Architecture Search on Language Models: A Case Study

Andreas Paraskeva · Joao Reis · Suzan Verberne · Jan Rijn

Poster
in
Workshop: 2nd Workshop on Advancing Neural Network Training : Computational Efficiency, Scalability, and Resource Optimization (WANT@ICML 2024)