Skip to yearly menu bar Skip to main content


Invited Talk
in
Workshop: Tokenization Workshop (TokShop)

Learning Dynamic Segmentation and Compression of Sequences in Transformer LLMs

Adrian Łańcucki
2025 Invited Talk
in
Workshop: Tokenization Workshop (TokShop)

Abstract

Speaker

Adrian Łańcucki

Adrian Łańcucki

I'm a senior research engineer at NVIDIA, currently working on LLM optimization for inference. This includes teaching models to compress KV cache, finding and pruning redundancies, and architecture search. My previous research focused on representation learning and generative modeling for text and speech. I hold a Ph.D. in machine learning and remain in active collaboration with academia.

Video

Chat is not available.