Skip to yearly menu bar Skip to main content


(160 events)   Timezone:  
Show all
Toggle Poster Visibility
Sat Jul 19 08:30 AM -- 08:40 AM (PDT) None
Opening Remarks
Invited Talk
Sat Jul 19 08:40 AM -- 09:10 AM (PDT) None
Hagay Lupesko: Zero to 50 ExaFLOPS in under a year - lessons from the trenches
Hagay Lupesko
Invited Talk
Sat Jul 19 09:10 AM -- 09:40 AM (PDT) None
Wanchao Liang: TorchTitan
Sat Jul 19 09:40 AM -- 10:00 AM (PDT) None
Break
Invited Talk
Sat Jul 19 10:00 AM -- 10:30 AM (PDT) None
Baris Kasikci
Invited Talk
Sat Jul 19 10:00 AM -- 10:30 AM (PDT) None
Baris Kasikci: The Quest For Blazingly Fast LLM Serving
Oral
Sat Jul 19 10:30 AM -- 10:45 AM (PDT) None
FPTQuant: Function-Preserving Transforms for LLM Quantization
Boris van Breugel · Yelysei Bondarenko · Paul Whatmough · Markus Nagel
[ OpenReview
Oral
Sat Jul 19 10:45 AM -- 11:00 AM (PDT) None
Cartridges: Lightweight and general-purpose long context representations via self-study
Sabri Eyuboglu · Ryan Ehrlich · Simran Arora · Neel Guha · Dylan Zinsley · Emily Liu · Atri Rudra · James Zou · Azalia Mirhoseini · Christopher Re
[ OpenReview
Oral
Sat Jul 19 11:00 AM -- 11:15 AM (PDT) None
zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression
Saibo Geng · Nathan Thomas Elian Ranchin · Yunzhen Yao · Maxime Peyrard · Chris Wendler · Michael Gastpar · Robert West
[ OpenReview
Sat Jul 19 11:15 AM -- 11:30 AM (PDT) None
Spotlight Lightning Talks
Sat Jul 19 11:30 AM -- 01:00 PM (PDT) None
Lunch break
Poster Session
Sat Jul 19 01:00 PM -- 02:30 PM (PDT) None
Poster Session
Invited Talk
Sat Jul 19 02:30 PM -- 03:00 PM (PDT) None
Avanika Narayan: Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models
Avanika Narayan
Sat Jul 19 03:00 PM -- 03:30 PM (PDT) None
Break
Oral
Sat Jul 19 03:30 PM -- 03:45 PM (PDT) None
Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture
Shuchen Xue · Tianyu Xie · Tianyang Hu · Zijin Feng · Jiacheng Sun · Kenji Kawaguchi · Zhenguo Li · Zhi-Ming Ma
[ Slides [ OpenReview
Oral
Sat Jul 19 03:45 PM -- 04:00 PM (PDT) None
Hardware-Efficient Attention for Fast Decoding
Ted Zadouri · Hubert Strauss · Tri Dao
[ OpenReview
Invited Talk
Sat Jul 19 04:00 PM -- 04:30 PM (PDT) None
Zachary Charles Invited Talk
Sat Jul 19 04:30 PM -- 05:00 PM (PDT) None
Albert Gu: H-Nets
Sat Jul 19 05:00 PM -- 05:10 PM (PDT) None
Closing Remarks / Awards
Poster
None
SpecCoT: Accelerating Chain-of-Thought Reasoning through Speculative Exploration
Junhan Shi · Yijia Zhu · Zhenning Shi · Dan Zhao · Qing Li · Yong Jiang
[ OpenReview
Spotlight
None
Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas
Austin Silveria · Soham Govande · Daniel Y Fu
[ OpenReview
Poster
None
Zero-Shot Conversion to Monarch-Structured Attention
Can Yaras · Alec Xu · Pierre Abillama · Changwoo Lee · Laura Balzano
[ OpenReview
Poster
None
Compressing Large Language Models to Any Size Without Re-Computation
Martin Genzel · Patrick Putzky · Pengfei Zhao · Sebastian Schulze · Mattes Mollenhauer · Robert Seidel · Stefan Dietzel · Thomas Wollmann
[ Slides [ OpenReview
Spotlight
None
Learning to Discover Abstractions for LLM Reasoning
Yuxiao Qu · Anikait Singh · Yoonho Lee · Amrith Setlur · Russ Salakhutdinov · Chelsea Finn · Aviral Kumar
[ OpenReview
Poster
None
Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile
Hangliang Ding · Dacheng Li · Runlong Su · Peiyuan Zhang · Zhijie Deng · Ion Stoica · Hao Zhang
[ OpenReview
Poster
None
ConMeZO: Adaptive Directional Sampling for Gradient-Free Finetuning of Language Models
Lejs Behric · Liang Zhang · Bingcong Li · Kiran Thekumparampil
[ OpenReview
Poster
None
Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis
Gholamali Aminian · Idan Shenfeld · Amir R. Asadi · Ahmad Beirami · Youssef Mroueh
[ OpenReview
Poster
None
Ultra-Efficient and Effective Large Language Models with Multi-Boolean Architectures
Ba-Hien Tran · Van Minh NGUYEN
[ OpenReview
Poster
None
Cache Saver: A Modular Framework for Efficient, Affordable, and Reproducible LLM Inference
Nearchos Potamitis · Lars Klein · Chongyang Xu · Attreyee Mukherjee · Bardia Mohammadi · Niket Tandon · Laurent Bindschaedler · Akhil Arora
[ OpenReview
Poster
None
TORCHSIM: High Fidelity Runtime and Memory Estimation for Distributed Training
Sanket Jayant Purandare · Emma Yang · Andrew Zhao · Qitong Wang · Wei Feng · Alban Desmaison · Andrew Gu · Tianyu Liu · Less Wright · Gokul Nadathur · Stratos Idreos
[ OpenReview
Poster
None
Private Zeroth-Order Optimization with Public Data
Xuchen Gong · Tian Li
[ OpenReview
Poster
None
Toward Dataset Distillation for Regression Problems
Jamie Mahowald · Ravi Srinivasan · Zhangyang “Atlas” Wang
[ OpenReview
Poster
None
PoLAR: Polar-Decomposed Low-Rank Adapter Representation
Kai Lion · Liang Zhang · Bingcong Li · Niao He
[ OpenReview
Poster
None
HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations
Marco Federici · Riccardo Del Chiaro · Boris van Breugel · Paul Whatmough · Markus Nagel
[ OpenReview
Poster
None
Exchangeability in Neural Network Architectures and its Application to Dynamic Pruning
lukeyi Yi · Tianlang Chen · Yifan Yang · Sara Achour
[ OpenReview
Poster
None
LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs
Reza Arabpour · Haitz Sáez de Ocáriz Borde · Anastasis Kratsios
[ OpenReview
Poster
None
CoDM: A Co-design Framework for Efficient Sparse Diffusion Models
Xiaolong Wu · Xiang Gao · Xiyun Song · Zongfang Lin · Heather Yu · Xianfeng GU
[ OpenReview
Poster
None
Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention
Zhihao Zhan · Jianan Zhao · Zhaocheng Zhu · Jian Tang
[ OpenReview
Poster
None
Model Parallelism With Subnetwork Data Parallelism
Vaibhav Singh · Zafir Khalid · Eugene Belilovsky · Edouard Oyallon
[ OpenReview
Poster
None
Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training
Vaibhav Singh · Paul Janson · Paria Mehrbod · Adam Ibrahim · Irina Rish · Eugene Belilovsky · Benjamin Thérien
[ OpenReview
Poster
None
Tensor Product Attention Is All You Need
Yifan Zhang · Yifeng Liu · Huizhuo Yuan · Zhen Qin · Yang Yuan · Quanquan Gu · Andrew Yao
[ Slides [ Poster [ OpenReview
Poster
None
MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models
Mugilan Ganesan · Shane Segal · Ankur Aggarwal · Nish Sinnadurai · Sean Lie · Vithursan Thangarasa
[ OpenReview
Poster
None
Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers
Woomin Song · Sai Muralidhar Jayanthi · Srikanth Ronanki · Kanthashree Sathyendra · Jinwoo Shin · Aram Galstyan · Shubham Katiyar · Sravan Babu Bodapati
[ OpenReview
Poster
None
Scaling Fine-Grained MoE Beyond 50B Parameters: Empirical Evaluation and Practical Insights
Jakub Krajewski · Marcin Chochowski · Daniel Korzekwa
[ OpenReview
Poster
None
Mu-Parametrization for Mixture of Experts
Jan Małaśnicki · Kamil Ciebiera · Mateusz Boruń · Maciej Pióro · Jan Ludziejewski · Maciej Stefaniak · Michał Krutul · Sebastian Jaszczur · Marek Cygan · Kamil Adamczewski · Jakub Krajewski
[ OpenReview
Poster
None
Revisit What You See: Disclose Language Prior in Vision Tokens for Efficient Guided Decoding of LVLMs
Beomsik Cho · Jaehyung Kim
[ OpenReview
Poster
None
Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection
Shivam Chandhok · Qian Yang · Oscar Mañas · Kanishk Jain · Aishwarya Agrawal · Leonid Sigal
[ OpenReview
Poster
None
Mamba Drafters for Speculative Decoding
Daewon Choi · Seunghyuk Oh · Saket Dingliwal · Jihoon Tack · Kyuyoung Kim · Woomin Song · Seojin Kim · Insu Han · Jinwoo Shin · Aram Galstyan · Shubham Katiyar · Sravan Babu Bodapati
[ OpenReview
Spotlight
None
Resource-efficient Inference with Foundation Model Programs
Lunyiu Nie · Zhimin Ding · Kevin Yu · Marco Cheung · Chris Jermaine · Swarat Chaudhuri
[ OpenReview
Poster
None
VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs
Raghavv Goel · Sudhanshu Agrawal · Mukul Gagrani · Junyoung Park · Yifan Zao · He Zhang · Tian Liu · Yiping Yang · Xin Yuan · Jiuyuan Lu · Christopher Lott · Mingu Lee
[ OpenReview
Poster
None
Kevin: Multi-Turn RL for Generating CUDA Kernels
Carlo Baronio · Pietro Marsella · Ben Pan · Simon Guo · Silas Alberti
[ Poster [ OpenReview
Poster
None
Multi-student Diffusion Distillation for Better One-step Generators
Yanke Song · Jonathan Lorraine · Weili Nie · Karsten Kreis · James Lucas
[ Poster [ OpenReview
Poster
None
A Survey on Prompt Tuning
Zongqian Li · Yixuan Su · Nigel Collier
[ OpenReview
Poster
None
Flexi-LoRA: Efficient LoRA Finetuning with Input-Adaptive Dynamic Ranks
Zongqian Li · Yixuan Su · Han Zhou · Zihao Fu · Nigel Collier
[ OpenReview
Poster
None
PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning
Zongqian Li · Yixuan Su · Nigel Collier
[ OpenReview
Poster
None
DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic
Yuheng Wu · Jianwen Xie · Denghui Zhang · Zhaozhuo Xu
[ OpenReview
Poster
None
Continuous Autoregressive Generation with Mixture of Gaussians
Alex Quach · Johnson Tsun-Hsuan Wang · Ramin Hasani · Mathias Lechner · Alexander Amini
[ OpenReview
Spotlight
None
Quartet: Native FP4 Training Can Be Optimal for Large Language Models
Roberto Castro · Andrei Panferov · Soroush Tabesh · Jiale Chen · Oliver Sieberling · Mahdi Nikdan · Saleh Ashkboos · Dan Alistarh
[ OpenReview
Poster
None
Context-lite Multi-turn Reinforcement Learning for LLM Agents
Chen · Jiayu Chen · Hao Zhu · Jeff Schneider
[ OpenReview
Poster
None
FrugalRAG: Learning to retrieve and reason for multi-hop QA
Abhinav Java · Srivathsan Koundinyan · Nagarajan Natarajan · Amit Sharma
[ OpenReview
Spotlight
None
ABBA: Highly Expressive Hadamard Product Adaptation for Large Language Models
Raghav Singhal · Kaustubh Ponkshe · Rohit Vartak · Praneeth Vepakomma
[ OpenReview
Poster
None
Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning
Raghav Singhal · Kaustubh Ponkshe · Rohit Vartak · Lav Varshney · Praneeth Vepakomma
[ OpenReview
Spotlight
None
Training-free LLM Verification via Recycling Few-shot Examples
Dongseok Lee · JIMYUNG HONG · Dongyoung Kim · Jaehyung Kim
[ OpenReview
Poster
None
The Road Not Taken: Hindsight Exploration for LLMs in Multi-Turn RL
Huaxiaoyue Wang · Sanjiban Choudhury
[ OpenReview
Spotlight
None
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Jonas Geiping · Sean McLeish · Neel Jain · John Kirchenbauer · Siddharth Singh · Brian Bartoldson · Bhavya Kailkhura · Abhinav Bhatele · Tom Goldstein
[ OpenReview
Poster
None
GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching
Guinan Su · Li Shen · Lu Yin · Shiwei Liu · Yanwu Yang · Jonas Geiping
[ OpenReview
Poster
None
SPECS: Faster Test-Time Scaling through Speculative Drafts
Mert Cemri · Nived Rajaraman · Rishabh Tiwari · Xiaoxuan Liu · Kurt Keutzer · Ion Stoica · Kannan Ramchandran · Ahmad Beirami · Ziteng Sun
[ OpenReview
Poster
None
InterLoRA: An Adaptive LoRA Structure Based on The Mechanistic Interpretability of Transformer
Jihao Gu · Zelin Wang · Yibo Zhang · Ping Gong · Zhisong Bie
[ OpenReview
Poster
None
Towards Large Scale Training on Apple Silicon
Tycho van der Ouderaa · Mohamed Baioumy · Matt Beton · Seth Howes · Gelu Vrabie · Alex Cheema
[ OpenReview
Poster
None
How Many Tokens Do 3D Point Cloud Transformer Architectures Really Need?
Tuan Tran · Duy Nguyen · Hoai-Chau Tran · Michael Barz · Khoa Doan · Roger Wattenhofer · Vien Ngo · Mathias Niepert · Daniel Sonntag · Paul Swoboda
[ OpenReview
Poster
None
Byzantine-Resilient Zero-Order Optimization for Scalable Federated Fine-Tuning of Large Language Models
Maximilian Egger · Mayank Bakshi · Rawad Bitar
[ OpenReview
Poster
None
Exploring Diffusion Transformer Designs via Grafting
Keshigeyan Chandrasegaran · Michael Poli · Daniel Y Fu · Dongjun Kim · Lea Hadzic · Manling Li · Agrim Gupta · Stefano Massaroli · Azalia Mirhoseini · Juan Carlos Niebles · Stefano Ermon · Li Fei-Fei
[ OpenReview
Poster
None
Shrinking the Generation-Verification Gap with Weak Verifiers
Jon Saad-Falcon · Estefany Kelly Buchanan · Mayee Chen · Tzu-Heng Huang · Brendan McLaughlin · Tanvir Bhathal · Shang Zhu · Ben Athiwaratkun · Frederic Sala · Scott Linderman · Azalia Mirhoseini · Christopher Re
[ OpenReview
Poster
None
One-Pass to Reason: Token Duplication and Block-Sparse Mask for Efficient Fine-Tuning on Multi-Turn Reasoning
Ritesh Goru · Shanay Mehta · Prateek Jain
[ OpenReview
Poster
None
Foreign Sparse Attention: Effective Distillation into Sparse Attention
Vijaykaarti Sundarapandiyan · Tom Goldstein · Ashwinee Panda
[ OpenReview
Poster
None
WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference
Sihan Chen · Dan Zhao · Jongwoo Ko · Colby Banbury · HUIPING ZHUANG · Luming Liang · Tianyi Chen
[ OpenReview
Spotlight
None
Guided Speculative Inference for Efficient Test-Time Alignment of LLMs
Jonathan Geuter · Youssef Mroueh · David Alvarez-Melis
[ OpenReview
Poster
None
Radio: Rate–Distortion Optimization for Large Language Model Compression
Sean I. Young
[ OpenReview
Poster
None
Predictive Scheduling for Efficient Inference-Time Reasoning in Large Language Models
Aneesh Muppidi · Katrina Brown · Rana Shahout
[ OpenReview
Poster
None
Adaptive Self-improvement LLM Agentic System for ML Library Development
Genghan Zhang · Weixin Liang · Olivia Hsu · Kunle Olukotun
[ OpenReview
Poster
None
Balancing LoRA Performance and Efficiency with Simple Shard Sharing
Jiale Kang · Qingyu Yin
[ OpenReview
Poster
None
TMA-Adaptive FP8 Grouped GEMM: Eliminating Padding Requirements in Low-Precision Training and Inference on Hopper
zhongling su · Rong Fu · Weihan Cao · Jianfei Gao · Minxi Jin · PeiZhilin · Hui Wang
[ OpenReview
Poster
None
MTraining: Efficient Distributed Training for Ultra-Long Contexts via Dynamic Sparse Attention
Wenxuan Li · Chengruidong Zhang · Huiqiang Jiang · Yucheng Li · Yuqing Yang · Lili Qiu
[ OpenReview
Poster
None
Language System: A Lightweight Ranking Framework for Language Models
Chenheng Zhang · Tianqi Du · Jizhe Zhang · Mingqing Xiao · Yifei Wang · Yisen Wang · Zhouchen Lin
[ OpenReview
Poster
None
Training Language Models to Reason Efficiently
Daman Arora · Andrea Zanette
[ OpenReview
Poster
None
PiKE: Adaptive Data Mixing for Large-Scale Multi-Task Learning Under Low Gradient Conflicts
Zeman Li · Yuan Deng · Peilin Zhong · Meisam Razaviyayn · Vahab Mirrokni
[ OpenReview
Poster
None
MatMuls are Enough for Efficient and Performant Linear-Time Attention
Andrew Argatkiny · Ilya Makarov
[ OpenReview
Poster
None
Thinformer: Guaranteed Attention Approximation via Low-Rank Thinning
Annabelle Carrell · Albert Gong · Abhishek Shetty · Raaz Dwivedi · Lester Mackey
[ OpenReview
Poster
None
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
Siyan Zhao · Devaansh Gupta · Qinqing Zheng · Aditya Grover
[ OpenReview
Poster
None
QuarterMap: Efficient Post-Training Token Pruning for Visual State Space Models
Tien-Yu Chi · Hung-Yueh Chiang · Diana Marculescu · Kai-Chiang Wu
[ OpenReview
Poster
None
Mitigating Over-Smoothing in Mamba2 via Spectral Domain Analysis
Seojin Kim · Yehjin Shin · Noseong Park
[ OpenReview
Poster
None
pLSTM: parallelizable Linear Source Transition Mark networks
Korbinian Pöppel · Richard Freinschlag · Thomas Schmied · Wei Lin · Sepp Hochreiter
[ OpenReview
Poster
None
$\mu$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts
Toshiaki Koike-Akino · Jing Liu · Ye Wang
[ OpenReview
Poster
None
Towards Understanding Orthogonalization in Muon
Valentyn Boreiko · Zhiqi Bu · Sheng Zha
[ OpenReview
Poster
None
LoRA Merging with SVD: Understanding Interference and Preserving Performance
Dennis Tang · Prateek Yadav · Yi-Lin Sung · Jaehong Yoon · Mohit Bansal
[ OpenReview
Poster
None
Partition Generative Modeling: Masked Modeling Without Masks
Justin Deschenaux · Lan Tran · Caglar Gulcehre
[ OpenReview
Poster
None
Next-Token Prediction Should be Ambiguity-Sensitive : A Meta-Learing Perspective
Léo Gagnon · Eric Elmoznino · Sarthak Mittal · Tom Marty · Tejas Kasetty · Dhanya Sridhar · Guillaume Lajoie
[ OpenReview
Poster
None
SortedRL: Accelerating RL Training for LLMs through Online Length-aware Scheduling
Yiqi Zhang · Huiqiang Jiang · Xufang Luo · Zhihe Yang · Chengruidong Zhang · Yifei Shen · Dongsheng Li · Yuqing Yang · Lili Qiu · Yang You
[ OpenReview
Poster
None
LOGAH: Initialize Large Transformers via Small Graph HyperNetworks
xinyu Zhou · Boris Knyazev · Alexia Jolicoeur-Martineau · Jie Fu
[ OpenReview
Poster
None
Unbounded Memory and Consistent Imagination via Unified Diffusion–SSM World Models
Jia-Hua Lee · Bor Jiun Lin · Wei-Fang Sun · Chun-Yi Lee
[ OpenReview
Poster
None
Vision Language Model Distillation Using Partial Information Decomposition
Stephen Liang
[ OpenReview
Poster
None
Optimal Formats for Weight Quantisation
Douglas Orr · Luka Ribar · Carlo Luschi
[ OpenReview
Poster
None
Steering LLM Reasoning Through Bias-Only Adaptation
Viacheslav Sinii · Alexey Gorbatovski · Artem Cherepanov · Boris Shaposhnikov · Nikita Balagansky · Daniil Gavrilov
[ OpenReview
Poster
None
Efficient Pre-Training of LLMs via Topology-Aware Communication Alignment on More Than 9600 GPUs
Guoliang HE · Youhe Jiang · Wencong Xiao · Jiang Kaihua · Shuguang Wang · Jun Wang · Du Zixian · Zhuo Jiang · Xinlei Zhang · Binhang Yuan · Eiko Yoneki
[ OpenReview
Poster
None
SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression
Yiqiao Jin · Kartik Sharma · Vineeth Rakesh · Yingtong Dou · Menghai Pan · Mahashweta Das · Srijan Kumar
[ OpenReview
Poster
None
Iterative Amortized Inference: Unifying In-Context Learning and Learned Optimizers
Sarthak Mittal · Divyat Mahajan · Guillaume Lajoie · Mohammad Pezeshki
[ OpenReview
Poster
None
A Minimalist Optimizer Design for LLM Pretraining
Athanasios Glentis · Jiaxiang Li · Andi Han · Mingyi Hong
[ OpenReview
Poster
None
Demystifying Language Model Forgetting with Low-rank Example Associations
Xisen Jin · Xiang Ren
[ OpenReview
Poster
None
Graph Signal Processing Meets Mamba2: Adaptive Filter Bank via Delta Modulation
Yehjin Shin · Seojin Kim · Noseong Park
[ OpenReview
Poster
None
Adaptive Backbone Selection for Efficient and Real-Time Vision Inference
Syed Amir Hamza · Alexander Jesser
[ OpenReview
Poster
None
Accelerating Linear Attention Design by Unifying Forward & Backward Propagation
Zhen Qin · Xuyang Shen · Dong Li · Yiran Zhong
[ OpenReview
Poster
None
Towards Efficient Pre-training: Exploring FP4 Precision in Large Language Models
Zhou Jiecheng · DING TANG · Rong Fu · Boni Hu · Haoran Xu · Yi Wang · zhongling su · Liang Liu · PeiZhilin · Hengjie Li · Xingcheng ZHANG · Weiming Zhang
[ OpenReview
Poster
None
SageAttention2++: A More Efficient Implementation of SageAttention2
Jintao Zhang · Xiaoming Xu · Jia wei · Haofeng Huang · Pengle Zhang · Chendong Xiang · Jun Zhu · Jianfei Chen
[ OpenReview
Poster
None
How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach
Ayeong Lee · Ethan Che · Tianyi Peng
[ OpenReview
Poster
None
CarbonGearRL: Precision-Elastic, Carbon-Aware Scheduling for Foundation-Model Training
Thomas Chen
[ OpenReview
Poster
None
Unified Scaling Laws for Compressed Representations
Andrei Panferov · Alexandra Volkova · Ionut-Vlad Modoranu · Vage Egiazarian · Mher Safaryan · Dan Alistarh
[ OpenReview
Poster
None
LATTICE: Learning to Efficiently Compress the Memory
Mahdi Karami · Vahab Mirrokni
[ OpenReview
Poster
None
Multi-stream Sequence Learning
Mohamed Elsayed · Rupam Mahmood
[ OpenReview
Poster
None
GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization
Martin Andrews · Sam Witteveen
[ OpenReview
Poster
None
Efficient and Accurate KV-cache Management for Long-Sequence LLMs
Yuzhen Mao · Qitong Wang · Martin Ester · Ke Li
[ OpenReview
Poster
None
Autoregressive Language Modeling by Compressed Sequence Mixing
Jatin Prakash · Aahlad Puli · Rajesh Ranganath
[ OpenReview
Poster
None
SD$^2$: Self-Distilled Sparse Drafters
Mike Lasby · Nish Sinnadurai · Valavan Manohararajah · Sean Lie · Yani Ioannou · Vithursan Thangarasa
[ Slides [ OpenReview
Poster
None
DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness
Ahmad Mohammadshirazi · Pinaki Prasad Guha Neogi · Ser-Nam Lim · Rajiv Ramnath
[ OpenReview
Poster
None
Proof-of-Concept for Private Local-to-Cloud LLM Chat via Trusted Execution Environments
Avanika Narayan · Dan Biderman · Christopher Re
[ OpenReview
Poster
None
BlockBPE: Parallel BPE Tokenization
Amos You
[ OpenReview
Poster
None
TinyServe: Query-Aware Cache Selection for Efficient LLM Inference
Dong Liu · Yanxuan Yu
[ OpenReview
Poster
None
Learning Adaptive Parallel Reasoning with Language Models
Jiayi Pan · Xiuyu Li · Long (Tony) Lian · Charlie Snell · Yifei Zhou · Adam Yala · Trevor Darrell · Kurt Keutzer · Alane Suhr
[ OpenReview
Poster
None
Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling
Mónika Farsang · Ramin Hasani · Radu Grosu
[ OpenReview
Poster
None
Cost-Efficient Serving of LLM Agents via Test-Time Plan Caching
Qizheng Zhang · Michael Wornow · Kunle Olukotun
[ OpenReview
Poster
None
Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation
Liliang Ren · Congcong Chen · Haoran Xu · Young Jin Kim · Adam Atkinson · Zheng Zhan · Jiankai Sun · Baolin Peng · Liyuan Liu · Shuohang Wang · Hao Cheng · Jianfeng Gao · Weizhu Chen · Yelong Shen
[ OpenReview
Poster
None
KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction
Jang-Hyun Kim · Jinuk Kim · Sangwoo Kwon · Jae W. Lee · Sangdoo Yun · Hyun Oh Song
[ OpenReview
Poster
None
Tail-Optimized Caching for LLM Inference
Wenxin Zhang · Yueying Li · Tianyi Peng · Ciamac Moallemi
[ OpenReview
Poster
None
An Efficient Row-Based Sparse Fine-Tuning with Low Quantization Error
Cen-Jhih Li · Aditya Bhaskara
[ OpenReview
Poster
None
Accelerated Test-Time Scaling with Model-Free Speculative Sampling
Woomin Song · Saket Dingliwal · Sai Muralidhar Jayanthi · Bhavana Ganesh · Jinwoo Shin · Aram Galstyan · Sravan Babu Bodapati
[ OpenReview
Poster
None
Think Clearly: Improving Reasoning via Redundant Token Pruning
Daewon Choi · Jimin Lee · Jihoon Tack · Woomin Song · Saket Dingliwal · Sai Muralidhar Jayanthi · Bhavana Ganesh · Jinwoo Shin · Aram Galstyan · Sravan Babu Bodapati
[ OpenReview
Poster
None
VScan: A Two-Stage Visual Token Reduction Framework for Accelerating Large Vision-Language Models
Ce Zhang · Kaixin Ma · Tianqing Fang · Wenhao Yu · Hongming ZHANG · Zhisong Zhang · Yaqi Xie · Katia Sycara · Haitao Mi · Dong Yu
[ OpenReview
Poster
None
Q-Adam-mini: Memory-Efficient 8-bit Quantized Optimizer for Large Language Model Training
Yizhou Han · Chaohao Yang · Congliang Chen · Xingjian Wang · Ruoyu Sun
[ OpenReview
Poster
None
Outlier-Free Genomic Foundation Models for Resource-Efficient Training and Low-Bit Inference
Chenghao Qiu · Haozheng Luo · Maojiang Su · Zhihan Zhou · Zoe Mehta · Guo Ye · Jerry Yao-Chieh Hu · Han Liu
[ OpenReview
Poster
None
LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
Penghui Yang · Cunxiao Du · Fengzhuo Zhang · Haonan Wang · Tianyu Pang · Chao Du · Bo An
[ OpenReview
Poster
None
Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts
Haizhong Zheng · Yang Zhou · Brian Bartoldson · Bhavya Kailkhura · Fan Lai · Jiawei Zhao · Beidi Chen
[ OpenReview
Poster
None
Efficient Temporal Tokenization for Mobility Prediction with Large Language Models
Haoyu He · Haozheng Luo · Yan Chen · Qi Wang
[ OpenReview
Poster
None
JSONSchemaBench: Evaluating Constrained Decoding with LLMs on Efficiency, Coverage and Quality
Saibo Geng · Hudson Cooper · Michal Moskal · Samuel Jenkins · Julian Berman · Nathan Thomas Elian Ranchin · Robert West · Eric Horvitz · Harsha Nori
[ OpenReview
Poster
None
ThinkingViT: Nested Thinking Vision Transformer for Elastic Inference
Ali Hojjat · Janek Haberer · Soeren Pirk · Olaf Landsiedel
[ OpenReview
Poster
None
Early Attentive Sparsification Accelerates Neural Speech Transcription
Zifei Xu · Sayeh Sharify · Hesham Mostafa · Tristan Webb · Wanzin Yazar · Xin Wang
[ OpenReview
Poster
None
AWP: Activation-aware Weight Pruning and Quantization with Projected Gradient Descent
Jing Liu · Toshiaki Koike-Akino · Ye Wang · Hassan Mansour · Matthew Brand
[ OpenReview
Poster
None
Speeding up Speculative Decoding via Sequential Approximate Verification
Meiyu Zhong · Noel Teku · Ravi Tandon
[ Poster [ OpenReview
Poster
None
MuLoCo: Muon is a practical inner optimizer for DiLoCo
Benjamin Thérien · Xiaolong Huang · Irina Rish · Eugene Belilovsky
[ OpenReview
Poster
None
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Thinking
Sangmin Bae · Yujin Kim · Reza Bayat · Sungnyun Kim · Jiyoun Ha · Tal Schuster · Adam Fisch · Hrayr Harutyunyan · Ziwei Ji · Aaron Courville · Se-Young Yun
[ OpenReview
Poster
None
PiKV: KV Cache Management System for MoE Architecture
Dong Liu · Yanxuan Yu · Ben Lengerich · Ying Nian Wu · Xuhong Wang
[ OpenReview
Poster
None
Large Reasoning Models Know How to Think Efficiently
Zeyu Xing · Xing Li · Huiling Zhen · Xianzhi Yu · Mingxuan Yuan · Sinno Jialin Pan
[ OpenReview
Poster
None
Batch-Max: Higher LLM Throughput using Larger Batch Sizes and KV Cache Compression
Michael R. Metel · Boxing Chen · Mehdi Rezagholizadeh
[ OpenReview
Poster
None
Is Visual Prompting the Right Setup for Knowledge Transfer in new Foundation Models?
Niclas Hergenröther · Antonio Orvieto
[ OpenReview
Poster
None
Towards Understanding Self-Pretraining for Sequence Classification
Omar Coser · Antonio Orvieto
[ OpenReview
Spotlight
None
AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
Wei Fu · Jiaxuan Gao · Shusheng Xu · Zhiyu Mei · Chen Zhu · Xujie Shen · Chuyi He · Guo Wei · Jun Mei · Jiashu Wang · Tongkai Yang · Binhang Yuan · Yi Wu
[ OpenReview
Poster
None
Privacy Isn’t Free: Benchmarking the Systems Cost of Privacy-Preserving ML
Nnaemeka Obiefuna · Samuel Oyeneye · Similoluwa Odunaiya · Iremide Oyelaja · Steven Kolawole
[ OpenReview
Poster
None
PoTPTQ: A Two-step Power-of-Two Post-training for LLMs
Xinyu Wang · Vahid Partovi Nia · Peng Lu · Jerry Huang · Xiao-Wen Chang · Boxing Chen · Yufei Cui
[ OpenReview
Poster
None
BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning
Xuechen Zhang · Zijian Huang · Yingcong Li · Chenshun Ni · Jiasi Chen · Samet Oymak
[ OpenReview
Poster
None
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
Xuechen Zhang · Zijian Huang · Chenshun Ni · Ziyang Xiong · Jiasi Chen · Samet Oymak
[ OpenReview
Poster
None
Training-Free Semantic Deferrals for Open-Ended LLM Cascades
Duncan Soiffer · Steven Kolawole · Virginia Smith
[ OpenReview