Skip to yearly menu bar Skip to main content


Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?

Jonathan Hayase ⋅ Alisa Liu ⋅ Yejin Choi ⋅ Sewoong Oh ⋅ Noah Smith

Abstract

Chat is not available.