An Empirical Analysis Towards Replacing Vocabulary-Rigid Embeddings by a Vocabulary-Free Mechanism
Alejandro Rodriguez Perez ⋅ Korn Sooksatra ⋅ Pablo Rivas ⋅ Ernesto Quevedo Caballero ⋅ Javier Turek ⋅ Gisela Bichler ⋅ Tomas Cerny ⋅ Laurie Giddens ⋅ Stacie Petter
Keywords:
BERT
transformers
model distillation
word embeddings
Natural Language Processing
transfer learning
2023 Poster
in
Affinity Event: LatinX in AI (LXAI) Workshop
in
Affinity Event: LatinX in AI (LXAI) Workshop
Abstract
This paper addresses the limitations of subword based models in NLP by aligning the word embedding layer of a vocabulary-rigid transformer model to a vocabulary-free one. In order to do so, a CNN is trained to mimic the word embeddings layer of a BERT model, using a sequence of byte tokens as input. The study compares cosine-based and Euclidean-based loss functions for training the student network and finds better results with cosine-based metrics. The research contributes techniques for re-training transformer embedding layers and provides insights into loss function selection. The findings have implications for developing flexible and robust NLP models.
Video
Chat is not available.
Successful Page Load