Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Tokenization Workshop (TokShop)
Fri, Jul 18, 2025 • 10:50 AM – 12:00 PM PDT

Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8

Preston Firestone · Shubham Ugare · Gagandeep Singh · Sasa Misailovic

Abstract

Chat is not available.