Workshop
WoTok: Workshop on Tokenization at ICML 2025
Tomasz Limisiewicz · Valentin Hofmann · Sachin Kumar · Farhan Samir · Jindřich Libovický · Orevaoghene Ahia · Elizabeth Salesky
West Meeting Room 111-112
Fri 18 Jul, 8:30 a.m. PDT
Tokenization defines how data are represented as input and output for many current machine learning systems, including language models. Tokenization has been shown to significantly affect the utility and effectiveness of these models (Mielke et al., 2021). This finding has stirred considerable interest in tokenization as a research direction in machine learning and its subfields, such as natural language processing, but currently, there is no venue specifically dedicated to it. Our initiative—WoTok (Workshop on Tokenization)—aims to fill this gap and will focus on tokenization in a broad sense.
Live content is unavailable. Log in and register to view live content