Skip to yearly menu bar Skip to main content


Poster
in
Affinity Workshop: LatinX in AI (LXAI) Workshop

An Empirical Analysis Towards Replacing Vocabulary-Rigid Embeddings by a Vocabulary-Free Mechanism

Alejandro Rodriguez Perez · Korn Sooksatra · Pablo Rivas · Ernesto Quevedo Caballero · Javier Turek · Gisela Bichler · Tomas Cerny · Laurie Giddens · Stacie Petter

Keywords: [ BERT ] [ transformers ] [ model distillation ] [ word embeddings ] [ Natural Language Processing ] [ transfer learning ]


Abstract:

This paper addresses the limitations of subword based models in NLP by aligning the word embedding layer of a vocabulary-rigid transformer model to a vocabulary-free one. In order to do so, a CNN is trained to mimic the word embeddings layer of a BERT model, using a sequence of byte tokens as input. The study compares cosine-based and Euclidean-based loss functions for training the student network and finds better results with cosine-based metrics. The research contributes techniques for re-training transformer embedding layers and provides insights into loss function selection. The findings have implications for developing flexible and robust NLP models.

Chat is not available.