Large language models have achieved near human performance across wide Natural Language Generation tasks such as Question Answering and Open-Domain Conversation. These large models take up large memory footprints and also inference time. Compressed models with fewer parameters are easily deployable on FPGAs and low-end devices with limited storage memory and processing power. In this work, we carry out an empirical evaluation of three model compression techniques on conversational agents specifically pre-trained on large language transformer networks. Using OpenAI GPT-2 transformer network, we evaluate and compare the performance of open-domain dialogue models before and after undergoing compression. When trained and tested on the DailiyDialog corpus, compressed models exhibit performances achieving state-of-the-art results on the corpus while maintaining human likeness.
Ahmed Baruwa (InstaDeep)
More from the Same Authors
2021 : Poster »
Shiji Zhou · Nastaran Okati · Wichinpong Sinchaisri · Kim de Bie · Ana Lucic · Mina Khan · Ishaan Shah · JINGHUI LU · Andreas Kirsch · Julius Frost · Ze Gong · Gokul Swamy · Ah Young Kim · Ahmed Baruwa · Ranganath Krishnan