Poster
in
Affinity Workshop: LatinX in AI (LXAI) Workshop
An Empirical Analysis Towards Replacing Vocabulary-Rigid Embeddings by a Vocabulary-Free Mechanism
Alejandro Rodriguez Perez · Korn Sooksatra · Pablo Rivas · Ernesto Quevedo Caballero · Javier Turek · Gisela Bichler · Tomas Cerny · Laurie Giddens · Stacie Petter
Keywords: [ BERT ] [ transformers ] [ model distillation ] [ word embeddings ] [ Natural Language Processing ] [ transfer learning ]
This paper addresses the limitations of subword based models in NLP by aligning the word embedding layer of a vocabulary-rigid transformer model to a vocabulary-free one. In order to do so, a CNN is trained to mimic the word embeddings layer of a BERT model, using a sequence of byte tokens as input. The study compares cosine-based and Euclidean-based loss functions for training the student network and finds better results with cosine-based metrics. The research contributes techniques for re-training transformer embedding layers and provides insights into loss function selection. The findings have implications for developing flexible and robust NLP models.