Toggle Poster Visibility
Sat Jul 19 08:30 AM -- 08:40 AM (PDT) None
Opening Remarks
Invited Talk
Sat Jul 19 08:40 AM -- 09:10 AM (PDT) None
Hagay Lupesko: Zero to 50 ExaFLOPS in under a year - lessons from the trenches
Invited Talk
Sat Jul 19 09:10 AM -- 09:40 AM (PDT) None
Wanchao Liang: TorchTitan
Sat Jul 19 09:40 AM -- 10:00 AM (PDT) None
Break
Invited Talk
Sat Jul 19 10:00 AM -- 10:30 AM (PDT) None
Baris Kasikci
Invited Talk
Sat Jul 19 10:00 AM -- 10:30 AM (PDT) None
Baris Kasikci: The Quest For Blazingly Fast LLM Serving
Oral
Sat Jul 19 10:30 AM -- 10:45 AM (PDT) None
FPTQuant: Function-Preserving Transforms for LLM Quantization
[
OpenReview]
Oral
Sat Jul 19 10:45 AM -- 11:00 AM (PDT) None
Cartridges: Lightweight and general-purpose long context representations via self-study
[
OpenReview]
Oral
Sat Jul 19 11:00 AM -- 11:15 AM (PDT) None
zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression
[
OpenReview]
Sat Jul 19 11:15 AM -- 11:30 AM (PDT) None
Spotlight Lightning Talks
Sat Jul 19 11:30 AM -- 01:00 PM (PDT) None
Lunch break
Poster Session
Sat Jul 19 01:00 PM -- 02:30 PM (PDT) None
Poster Session
Invited Talk
Sat Jul 19 02:30 PM -- 03:00 PM (PDT) None
Avanika Narayan: Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models
Sat Jul 19 03:00 PM -- 03:30 PM (PDT) None
Break
Oral
Sat Jul 19 03:30 PM -- 03:45 PM (PDT) None
Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture
[
Slides]
[
OpenReview]
Oral
Sat Jul 19 03:45 PM -- 04:00 PM (PDT) None
Hardware-Efficient Attention for Fast Decoding
[
OpenReview]
Invited Talk
Sat Jul 19 04:00 PM -- 04:30 PM (PDT) None
Zachary Charles Invited Talk
Sat Jul 19 04:30 PM -- 05:00 PM (PDT) None
Albert Gu: H-Nets
Sat Jul 19 05:00 PM -- 05:10 PM (PDT) None
Closing Remarks / Awards
Poster
None
SpecCoT: Accelerating Chain-of-Thought Reasoning through Speculative Exploration
[
OpenReview]
Spotlight
None
Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas
[
OpenReview]
Poster
None
Compressing Large Language Models to Any Size Without Re-Computation
[
Slides]
[
OpenReview]
Poster
None
Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile
[
OpenReview]
Poster
None
ConMeZO: Adaptive Directional Sampling for Gradient-Free Finetuning of Language Models
[
OpenReview]
Poster
None
Ultra-Efficient and Effective Large Language Models with Multi-Boolean Architectures
[
OpenReview]
Poster
None
Cache Saver: A Modular Framework for Efficient, Affordable, and Reproducible LLM Inference
[
OpenReview]
Poster
None
TORCHSIM: High Fidelity Runtime and Memory Estimation for Distributed Training
[
OpenReview]
Poster
None
HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations
[
OpenReview]
Poster
None
Exchangeability in Neural Network Architectures and its Application to Dynamic Pruning
[
OpenReview]
Poster
None
LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs
[
OpenReview]
Poster
None
Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention
[
OpenReview]
Poster
None
Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training
[
OpenReview]
Poster
None
MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models
[
OpenReview]
Poster
None
Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers
[
OpenReview]
Poster
None
Scaling Fine-Grained MoE Beyond 50B Parameters: Empirical Evaluation and Practical Insights
[
OpenReview]
Poster
None
Revisit What You See: Disclose Language Prior in Vision Tokens for Efficient Guided Decoding of LVLMs
[
OpenReview]
Poster
None
Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection
[
OpenReview]
Poster
None
Multi-student Diffusion Distillation for Better One-step Generators
[
Poster]
[
OpenReview]
Poster
None
PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning
[
OpenReview]
Poster
None
DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic
[
OpenReview]
Spotlight
None
ABBA: Highly Expressive Hadamard Product Adaptation for Large Language Models
[
OpenReview]
Poster
None
Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning
[
OpenReview]
Spotlight
None
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
[
OpenReview]
Poster
None
GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching
[
OpenReview]
Poster
None
InterLoRA: An Adaptive LoRA Structure Based on The Mechanistic Interpretability of Transformer
[
OpenReview]
Poster
None
Byzantine-Resilient Zero-Order Optimization for Scalable Federated Fine-Tuning of Large Language Models
[
OpenReview]
Poster
None
One-Pass to Reason: Token Duplication and Block-Sparse Mask for Efficient Fine-Tuning on Multi-Turn Reasoning
[
OpenReview]
Poster
None
WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference
[
OpenReview]
Poster
None
Predictive Scheduling for Efficient Inference-Time Reasoning in Large Language Models
[
OpenReview]
Poster
None
TMA-Adaptive FP8 Grouped GEMM: Eliminating Padding Requirements in Low-Precision Training and Inference on Hopper
[
OpenReview]
Poster
None
MTraining: Efficient Distributed Training for Ultra-Long Contexts via Dynamic Sparse Attention
[
OpenReview]
Poster
None
PiKE: Adaptive Data Mixing for Large-Scale Multi-Task Learning Under Low Gradient Conflicts
[
OpenReview]
Poster
None
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
[
OpenReview]
Poster
None
QuarterMap: Efficient Post-Training Token Pruning for Visual State Space Models
[
OpenReview]
Poster
None
LoRA Merging with SVD: Understanding Interference and Preserving Performance
[
OpenReview]
Poster
None
Next-Token Prediction Should be Ambiguity-Sensitive : A Meta-Learing Perspective
[
OpenReview]
Poster
None
SortedRL: Accelerating RL Training for LLMs through Online Length-aware Scheduling
[
OpenReview]
Poster
None
Unbounded Memory and Consistent Imagination via Unified Diffusion–SSM World Models
[
OpenReview]
Poster
None
Vision Language Model Distillation Using Partial Information Decomposition
[
OpenReview]
Poster
None
Efficient Pre-Training of LLMs via Topology-Aware Communication Alignment on More Than 9600 GPUs
[
OpenReview]
Poster
None
SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression
[
OpenReview]
Poster
None
Iterative Amortized Inference: Unifying In-Context Learning and Learned Optimizers
[
OpenReview]
Poster
None
Graph Signal Processing Meets Mamba2: Adaptive Filter Bank via Delta Modulation
[
OpenReview]
Poster
None
Accelerating Linear Attention Design by Unifying Forward & Backward Propagation
[
OpenReview]
Poster
None
Towards Efficient Pre-training: Exploring FP4 Precision in Large Language Models
[
OpenReview]
Poster
None
How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach
[
OpenReview]
Poster
None
CarbonGearRL: Precision-Elastic, Carbon-Aware Scheduling for Foundation-Model Training
[
OpenReview]
Poster
None
GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization
[
OpenReview]
Poster
None
DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness
[
OpenReview]
Poster
None
Proof-of-Concept for Private Local-to-Cloud LLM Chat via Trusted Execution Environments
[
OpenReview]
Poster
None
Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling
[
OpenReview]
Poster
None
Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation
[
OpenReview]
Poster
None
VScan: A Two-Stage Visual Token Reduction Framework for Accelerating Large Vision-Language Models
[
OpenReview]
Poster
None
Q-Adam-mini: Memory-Efficient 8-bit Quantized Optimizer for Large Language Model Training
[
OpenReview]
Poster
None
Outlier-Free Genomic Foundation Models for Resource-Efficient Training and Low-Bit Inference
[
OpenReview]
Poster
None
LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
[
OpenReview]
Poster
None
Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts
[
OpenReview]
Poster
None
Efficient Temporal Tokenization for Mobility Prediction with Large Language Models
[
OpenReview]
Poster
None
JSONSchemaBench: Evaluating Constrained Decoding with LLMs on Efficiency, Coverage and Quality
[
OpenReview]
Poster
None
AWP: Activation-aware Weight Pruning and Quantization with Projected Gradient Descent
[
OpenReview]
Poster
None
Speeding up Speculative Decoding via Sequential Approximate Verification
[
Poster]
[
OpenReview]
Poster
None
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Thinking
[
OpenReview]
Poster
None
Batch-Max: Higher LLM Throughput using Larger Batch Sizes and KV Cache Compression
[
OpenReview]
Poster
None
Is Visual Prompting the Right Setup for Knowledge Transfer in new Foundation Models?
[
OpenReview]
Spotlight
None
AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
[
OpenReview]
Poster
None
Privacy Isn’t Free: Benchmarking the Systems Cost of Privacy-Preserving ML
[
OpenReview]
Poster
None
BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning
[
OpenReview]
Poster
None
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
[
OpenReview]