Skip to yearly menu bar Skip to main content

Workshop: The First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward

Learning Large-scale Universal User Representation with Sparse Mixture of Experts

Caigao Jiang · Siqiao Xue · James Zhang · Lingyue Liu · Zhibo Zhu · Hongyan Hao


Learning user sequence behaviour embedding is very sophisticated and challenging due to the complicate feature interaction over time and high dimension of user features. Recent emerging foundation models \textit{e}.\textit{g}. BERT and its variants, encourage a large body of researchers to investigate in this field. However, unlike natural language processing(NLP) tasks, the parameters of user behaviour model comes mostly from user embedding layer which makes most existing works fail to train an universal user embedding at large scale. Furthermore, user representations are learned from multiple downstream tasks, the past research did not address the seesaw phenomenon.In this paper, we propose SUPERMOE, a generic framework for obtain high quality user representation from multiple tasks. Specifically, the user behaviour sequences are encoded by MoE transformer, thus we can improve the model capacity to billions of parameters even trillions. In order to deal with seesaw phenomenon when learning across multiple tasks, we design a new loss function with task indicators. We perform extensive offline experiments on public datasets and online experiments on private real world business scenarios. Our approach achieves best performance over state-of-art models, the results demonstrate the effectiveness of our user behaviour representation framework using MOE transformer.

Chat is not available.