Skip to yearly menu bar Skip to main content


Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?

Jonathan Hayase · Alisa Liu · Yejin Choi · Sewoong Oh · Noah Smith

Abstract

Chat is not available.