Poster
in
Workshop: ICML Workshop on Human in the Loop Learning (HILL)
Less is more: An Empirical Analysis of Model Compression for Dialogue
Ahmed Baruwa
Large language models have achieved near human performance across wide Natural Language Generation tasks such as Question Answering and Open-Domain Conversation. These large models take up large memory footprints and also inference time. Compressed models with fewer parameters are easily deployable on FPGAs and low-end devices with limited storage memory and processing power. In this work, we carry out an empirical evaluation of three model compression techniques on conversational agents specifically pre-trained on large language transformer networks. Using OpenAI GPT-2 transformer network, we evaluate and compare the performance of open-domain dialogue models before and after undergoing compression. When trained and tested on the DailiyDialog corpus, compressed models exhibit performances achieving state-of-the-art results on the corpus while maintaining human likeness.