Downloads 2025
            Number of events: 3389
        
    
    - $\infty$-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation
 - $K^2$VAE: A Koopman-Kalman Enhanced Variational AutoEncoder for Probabilistic Time Series Forecasting
 - $\mathcal{V}ista\mathcal{DPO}$: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
 - $S^2$FGL: Spatial Spectral Federated Graph Learning
 - $\texttt{I$^2$MoE}$: Interpretable Multimodal Interaction-aware Mixture-of-Experts
 - 1st Workshop on Foundation Models for Structured Data (FMSD)
 - 2nd AI for Math Workshop @ ICML 2025
 - 2nd Generative AI for Biology Workshop
 - 2nd Workshop on Models of Human Feedback for AI Alignment (MoFA)
 - 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)
 - 3D-LMVIC: Learning-based Multi-View Image Compression with 3D Gaussian Geometric Priors
 - 3D Question Answering via only 2D Vision-Language Models
 - 3rd Workshop on High-dimensional Learning Dynamics (HiLD)
 - AAAR-1.0: Assessing AI’s Potential to Assist Research
 - A Bayesian Model Selection Criterion for Selecting Pretraining Checkpoints
 - Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large $p$
 - ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via $\alpha$-$\beta$-Divergence
 - ABNet: Adaptive explicit-Barrier Net for Safe and Scalable Robot Learning
 - A Bregman Proximal Viewpoint on Neural Operators
 - A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment
 - Accelerated Diffusion Models via Speculative Sampling
 - Accelerating Large Language Model Reasoning via Speculative Search
 - Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity
 - Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies
 - Accelerating PDE-Constrained Optimization by the Derivative of Neural Operators
 - Accelerating Quantum Reinforcement Learning with a Quantum Natural Policy Gradient Based Approach
 - Accelerating Spectral Clustering under Fairness Constraints
 - Accelerating Unbiased LLM Evaluation via Synthetic Feedback
 - Accurate and Efficient World Modeling with Masked Latent Transformers
 - Accurate Identification of Communication Between Multiple Interacting Neural Populations
 - A Certified Unlearning Approach without Access to Source Data
 - A Chaotic Dynamics Framework Inspired by Dorsal Stream for Event Signal Processing
 - A Checks-and-Balances Framework for Context-Aware Ethical AI Alignment
 - Achieving Linear Speedup and Near-Optimal Complexity for Decentralized Optimization over Row-stochastic Networks
 - A Classification View on Meta Learning Bandits
 - A Closer Look at Backdoor Attacks on CLIP
 - A Closer Look at Generalized BH Algorithm for Out-of-Distribution Detection
 - A Closer Look at Multimodal Representation Collapse
 - A Closer Look at Transformers for Time Series Forecasting: Understanding Why They Work and Where They Struggle
 - A Cognac Shot To Forget Bad Memories: Corrective Unlearning for Graph Neural Networks
 - A Comprehensive Framework for Analyzing the Convergence of Adam: Bridging the Gap with SGD
 - A Computationally Efficient Algorithm for Infinite-Horizon Average-Reward Linear MDPs
 - A Cross Modal Knowledge Distillation & Data Augmentation Recipe for Improving Transcriptomics Representations through Morphological Features
 - Actionable Interpretability
 - Action-Constrained Imitation Learning
 - Action-Dependent Optimality-Preserving Reward Shaping
 - Action Dubber: Timing Audible Actions via Inflectional Flow
 - Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with the Onsager-Machlup Functional
 - ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation
 - Activation by Interval-wise Dropout: A Simple Way to Prevent Neural Networks from Plasticity Loss
 - Activation Space Interventions Can Be Transferred Between Large Language Models
 - Active Evaluation Acquisition for Efficient LLM Benchmarking
 - Active feature acquisition via explainability-driven ranking
 - Active Fine-Tuning of Multi-Task Policies
 - Active Learning for Efficient Discovery of Optimal Combinatorial Perturbations
 - Active Learning of Deep Neural Networks via Gradient-Free Cutting Planes
 - Active Learning with Selective Time-Step Acquisition for PDEs
 - Active Reward Modeling: Adaptive Preference Labeling for Large Language Model Alignment
 - Active Treatment Effect Estimation via Limited Samples
 - Actor-Critics Can Achieve Optimal Sample Efficiency
 - AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism
 - Adapter Naturally Serves as Decoupler for Cross-Domain Few-Shot Semantic Segmentation
 - Adapting Precomputed Features for Efficient Graph Condensation
 - Adapting to Evolving Adversaries with Regularized Continual Robust Training
 - Adapting to Linear Separable Subsets with Large-Margin in Differentially Private Learning
 - Adapting While Learning: Grounding LLMs for Scientific Problems with Tool Usage Adaptation
 - Adaptive Alignment: Designing AI for a Changing World - Frauke Kreuter
 - Adaptive Data Collection for Robust Learning Across Multiple Distributions
 - Adaptive Elicitation of Latent Information Using Natural Language
 - Adaptive Estimation and Learning under Temporal Distribution Shift
 - Adaptive Exploration for Multi-Reward Multi-Policy Evaluation
 - Adaptive Flow Matching for Resolving Small-Scale Physics
 - Adaptive kernel predictors from feature-learning infinite limits of neural networks
 - Adaptive Learn-then-Test: Statistically Valid and Efficient Hyperparameter Selection
 - Adaptive Localization of Knowledge Negation for Continual LLM Unlearning
 - Adaptive Median Smoothing: Adversarial Defense for Unlearned Text-to-Image Diffusion Models at Inference Time
 - Adaptive Message Passing: A General Framework to Mitigate Oversmoothing, Oversquashing, and Underreaching
 - Adaptive Multi-prompt Contrastive Network for Few-shot Out-of-distribution Detection
 - Adaptive Partitioning Schemes for Optimistic Optimization
 - Adaptive Sample Sharing for Multi Agent Linear Bandits
 - Adaptive Self-improvement LLM Agentic System for ML Library Development
 - Adaptive Sensitivity Analysis for Robust Augmentation against Natural Corruptions in Image Segmentation
 - AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence
 - AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasting
 - AdaSplash: Adaptive Sparse Flash Attention
 - AdaWorld: Learning Adaptable World Models with Latent Actions
 - ADDQ: Adaptive distributional double Q-learning
 - Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization
 - Addressing Imbalanced Domain-Incremental Learning through Dual-Balance Collaborative Experts
 - Addressing Misspecification in Simulation-based Inference through Data-driven Calibration
 - ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization
 - Ad-Hoc Human-AI Coordination Challenge
 - Ad Hoc Teamwork via Offline Goal-Based Decision Transformers
 - ADIOS: Antibody Development via Opponent Shaping
 - Adjoint Sampling: Highly Scalable Diffusion Samplers via Adjoint Matching
 - Adjusting Model Size in Continual Gaussian Processes: How Big is Big Enough?
 - Adjustment for Confounding using Pre-Trained Representations
 - AdvAgent: Controllable Blackbox Red-teaming on Web Agents
 - Advancing Constrained Monotonic Neural Networks: Achieving Universal Approximation Beyond Bounded Activations
 - Advancing Personalized Learning with Neural Collapse for Long-Tail Challenge
 - Adversarial Combinatorial Semi-bandits with Graph Feedback
 - Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets
 - Adversarial Inception Backdoor Attacks against Reinforcement Learning
 - Adversarial Inputs for Linear Algebra Backends
 - Adversarial Perturbations Are Formed by Iteratively Learning Linear Combinations of the Right Singular Vectors of the Adversarial Jacobian
 - Adversarial Reasoning at Jailbreaking Time
 - Adversarial Robust Generalization of Graph Neural Networks
 - Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees
 - Adversarial Robustness via Deformable Convolution with Stochasticity
 - Adversaries Can Misuse Combinations of Safe Models
 - AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion Models
 - AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
 - A Dynamical Systems-Inspired Pruning Strategy for Addressing Oversmoothing in Graph Attention Networks
 - AEQA-NAT : Adaptive End-to-end Quantization Alignment Training Framework for Non-autoregressive Machine Translation
 - Aequa: Fair Model Rewards in Collaborative Learning via Slimmable Networks
 - AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models
 - AffinityFlow: Guided Flows for Antibody Affinity Maturation
 - A First-order Generative Bilevel Optimization Framework for Diffusion Models
 - A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control
 - AGAV-Rater: Adapting Large Multimodal Model for AI-Generated Audio-Visual Quality Assessment
 - A General Framework for Inference-time Scaling and Steering of Diffusion Models
 - A General Graph Spectral Wavelet Convolution via Chebyshev Order Decomposition
 - A Generalizable Physics-Enhanced State Space Model for Long-Term Dynamics Forecasting in Complex Environments
 - A Generalization Result for Convergence in Learning-to-Optimize
 - A Generalization Theory for Zero-Shot Prediction
 - A General Representation-Based Approach to Multi-Source Domain Adaptation
 - A Generic Family of Graphical Models: Diversity, Efficiency, and Heterogeneity
 - Agent-as-a-Judge: Evaluate Agents with Agents
 - Agent-Centric Actor-Critic for Asynchronous Multi-Agent Reinforcement Learning
 - Agent Reviewers: Domain-specific Multimodal Agents with Shared Memory for Paper Review
 - Agent Workflow Memory
 - A Geometric Approach to Personalized Recommendation with Set-Theoretic Constraints Using Box Embeddings
 - Aggregation Buffer: Revisiting DropEdge with a New Parameter Block
 - Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders
 - Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
 - A Hitchhiker's Guide to Scaling Law Estimation
 - AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N
 - AI Heard That! ICML 2025 Workshop on Machine Learning for Audio
 - AI's Models of the World, and Ours
 - AKORN: Adaptive Knots generated Online for RegressioN splines
 - AKRMap: Adaptive Kernel Regression for Trustworthy Visualization of Cross-Modal Embeddings
 - A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
 - Alberta Wells Dataset: Pinpointing Oil and Gas Wells from Satellite Imagery
 - A Lens into Interpretable Transformer Mistakes via Semantic Dependency
 - Algorithm Development in Neural Networks: Insights from the Streaming Parity Task
 - Algorithmic Recourse for Long-Term Improvement
 - Algorithms and Hardness for Active Learning on Graphs
 - Algorithms with Calibrated Machine Learning Predictions
 - Aligned Multi Objective Optimization
 - Aligning LLMs by Predicting Preferences from User Writing Samples
 - Aligning Multimodal Representations through an Information Bottleneck
 - Aligning Protein Conformation Ensemble Generation with Physical Feedback
 - Aligning Spoken Dialogue Models from User Interactions
 - Aligning with Logic: Measuring, Evaluating and Improving Logical Preference Consistency in Large Language Models
 - A Likelihood Based Approach to Distribution Regression Using Conditional Deep Generative Models
 - All-atom Diffusion Transformers: Unified generative modelling of molecules and materials
 - All-atom inverse protein folding through discrete flow matching
 - All-Purpose Mean Estimation over R: Optimal Sub-Gaussianity with Outlier Robustness and Low Moments Performance
 - Almost Optimal Fully Dynamic $k$-Center Clustering with Recourse
 - ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
 - AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization
 - AlphaPO: Reward Shape Matters for LLM Alignment
 - AlphaQCM: Alpha Discovery in Finance with Distributional Reinforcement Learning
 - Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search
 - AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement
 - A Machine Learning Approach to Duality in Statistical Physics
 - A Manifold Perspective on the Statistical Generalization of Graph Neural Networks
 - A Market for Accuracy: Classification Under Competition
 - A Mathematical Framework for AI-Human Integration in Work
 - am-ELO: A Stable Framework for Arena-based LLM Evaluation
 - A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models
 - A Meta-learner for Heterogeneous Effects in Difference-in-Differences
 - A Mixed-Curvature based Pre-training Paradigm for Multi-Task Vehicle Routing Solver
 - A Mixture-Based Framework for Guiding Diffusion Models
 - A Model of Place Field Reorganization During Reward Maximization
 - AMPO: Active Multi Preference Optimization for Self-play Preference Selection
 - A Multi-Region Brain Model to Elucidate the Role of Hippocampus in Spatially Embedded Decision-Making
 - An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures
 - An All-Atom Generative Model for Designing Protein Complexes
 - AnalogGenie-Lite: Enhancing Scalability and Precision in Circuit Topology Discovery through Lightweight Graph Modeling
 - Analytical Construction on Geometric Architectures: Transitioning from Static to Temporal Link Prediction
 - Analytical Lyapunov Function Discovery: An RL-based Generative Approach
 - Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
 - An Analysis for Reasoning Bias of Language Models with Small Initialization
 - An Analysis of Quantile Temporal-Difference Learning
 - An analytic theory of creativity in convolutional diffusion models
 - An Architecture Search Framework for Inference-Time Techniques
 - An Asymptotically Optimal Approximation Algorithm for Multiobjective Submodular Maximization at Scale
 - An Augmentation-Aware Theory for Self-Supervised Contrastive Learning
 - A Near Linear Query Lower Bound for Submodular Maximization
 - A Near-Optimal Single-Loop Stochastic Algorithm for Convex Finite-Sum Coupled Compositional Optimization
 - An Effective and Secure Federated Multi-View Clustering Method with Information-Theoretic Perspective
 - An Efficient Matrix Multiplication Algorithm for Accelerating Inference in Binary and Ternary Neural Networks
 - An Efficient Private GPT Never Autoregressively Decodes
 - An Efficient Pruner for Large Language Model with Theoretical Guarantee
 - An Efficient Search-and-Score Algorithm for Ancestral Graphs using Multivariate Information Scores for Complex Non-linear and Categorical Data
 - An Empirical Study on Configuring In-Context Learning Demonstrations for Unleashing MLLMs' Sentimental Perception Capability
 - An End-to-End Model for Logits-Based Large Language Models Watermarking
 - An Entropy-Based Model for Hierarchical Learning
 - An Error Analysis of Flow Matching for Deep Generative Modeling
 - A New Approach to Backtracking Counterfactual Explanations: A Unified Causal Framework for Efficient Model Interpretability
 - A New Concentration Inequality for Sampling Without Replacement and Its Application for Transductive Learning
 - An Expressive and Self-Adaptive Dynamical System for Efficient Function Learning
 - Angle Domain Guidance: Latent Diffusion Requires Rotation Rather Than Extrapolation
 - An Improved Clique-Picking Algorithm for Counting Markov Equivalent DAGs via Super Cliques Transfer
 - An in depth look at the Procrustes-Wasserstein distance: properties and barycenters
 - An Instrumental Value for Data Production and its Application to Data Pricing
 - An Interpretable N-gram Perplexity Threat Model for Large Language Model Jailbreaks
 - Annealing Flow Generative Models Towards Sampling High-Dimensional and Multi-Modal Distributions
 - A Non-Asymptotic Convergent Analysis for Scored-Based Graph Generative Model via a System of Stochastic Differential Equations
 - A Non-isotropic Time Series Diffusion Model with Moving Average Transitions
 - An Online Adaptive Sampling Algorithm for Stochastic Difference-of-convex Optimization with Time-varying Distributions
 - An Online Learning Approach to Prompt-based Selection of Generative Models and LLMs
 - An Optimistic Algorithm for online CMDPS with Anytime Adversarial Constraints
 - A Novel Characterization of the Population Area Under the Risk Coverage Curve (AURC) and Rates of Finite Sample Estimators
 - Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning Attack
 - any4: Learned 4-bit Numeric Representation for LLMs
 - AnyEdit: Edit Any Knowledge Encoded in Language Models
 - Anytime-Constrained Equilibria in Polynomial Time
 - A Online Statistical Framework for Out-of-Distribution Detection
 - A Parameter-Free and Near-Optimal Zeroth-Order Algorithm for Stochastic Convex Optimization
 - A Parametric Contextual Online Learning Theory of Brokerage
 - A Peer-review Look on Multi-modal Clustering: An Information Bottleneck Realization Method
 - A Physics-Augmented Deep Learning Framework for Classifying Single Molecule Force Spectroscopy Data
 - A Physics-Informed Machine Learning Framework for Safe and Optimal Control of Autonomous Systems
 - Approximate Differential Privacy of the $\ell_2$ Mechanism
 - Approximate Forest Completion and Learning-Augmented Algorithms for Metric Minimum Spanning Trees
 - Approximately Correct Label Distribution Learning
 - Approximating Latent Manifolds in Neural Networks via Vanishing Ideals
 - Approximation to Smooth Functions by Low-Rank Swish Networks
 - A-PSRO: A Unified Strategy Learning Method with Advantage Metric for Normal-form Games
 - Arbitrarily-Conditioned Multi-Functional Diffusion for Multi-Physics Emulation
 - Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models
 - A Reasoning-Based Approach to Cryptic Crossword Clue Solving
 - A Recipe for Causal Graph Regression: Confounding Effects Revisited
 - A Reduction Framework for Distributionally Robust Reinforcement Learning under Average Reward
 - A Reductions Approach to Risk-Sensitive Reinforcement Learning with Optimized Certainty Equivalents
 - Are High-Quality AI-Generated Images More Difficult for Models to Detect?
 - Are Large Brainwave Foundation Models Capable Yet ? Insights from Fine-Tuning
 - Are Large Language Models Ready for Multi-Turn Tabular Data Analysis?
 - Are LLMs Prescient? A Continuous Evaluation using Daily News as the Oracle
 - A Rescaling-Invariant Lipschitz Bound Based on Path-Metrics for Modern ReLU Network Parameterizations
 - Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
 - Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster
 - ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion Prior
 - Arrow: Accelerator for Time Series Causal Discovery with Time Weaving
 - ARS: Adaptive Reward Scaling for Multi-Task Reinforcement Learning
 - A Sample Efficient Conditional Independence Test in the Presence of Discretization
 - A Selective Learning Method for Temporal Graph Continual Learning
 - A Sharper Global Convergence Analysis for Average Reward Reinforcement Learning via an Actor-Critic Approach
 - A Simple Model of Inference Scaling Laws
 - A Square Peg in a Square Hole: Meta-Expert for Long-Tailed Semi-Supervised Learning
 - Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models
 - Assessing World Models: Methods and Metrics for Evaluating Understanding
 - AssistanceZero: Scalably Solving Assistance Games
 - A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models
 - A Sub-Problem Quantum Alternating Operator Ansatz for Correlation Clustering
 - Asymmetric Decision-Making in Online Knowledge Distillation: Unifying Consensus and Divergence
 - AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
 - ATA: Adaptive Task Allocation for Efficient Resource Management in Distributed Machine Learning
 - A Tale of Two Structures: Do LLMs Capture the Fractal Complexity of Language?
 - A Theoretical Framework For Overfitting In Energy-based Modeling
 - A Theoretical Justification for Asymmetric Actor-Critic Algorithms
 - A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization
 - A Theory for Conditional Generative Modeling on Multiple Data Sources
 - AtlasD: Automatic Local Symmetry Discovery
 - A Trichotomy for List Transductive Online Learning
 - Attention-Level Speculation
 - Attention Mechanisms Perspective: Exploring LLM Processing of Graph-Structured Data
 - Attention-Only Transformers via Unrolled Subspace Denoising
 - Attributes Shape the Embedding Space of Face Recognition Models
 - A Two-Stage Learning-to-Defer Approach for Multi-Task Learning
 - Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
 - Auditing $f$-differential privacy in one run
 - Auditing Prompt Caching in Language Model APIs
 - A Unified Approach to Routing and Cascading for LLMs
 - A Unified Comparative Study with Generalized Conformity Scores for Multi-Output Conformal Regression
 - A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization
 - A Unified Framework for Generalization Error Analysis of Learning with Arbitrary Discrete Weak Features
 - A Unified Theoretical Analysis of Private and Robust Offline Alignment: from RLHF to DPO
 - A Unified View on Learning Unnormalized Distributions via Noise-Contrastive Estimation
 - AuPair: Golden Example Pairs for Code Repair
 - AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses
 - AutoAL: Automated Active Learning with Differentiable Query Strategy Search
 - AutoCATE: End-to-End, Automated Treatment Effect Estimation
 - AUTOCIRCUIT-RL: Reinforcement Learning-Driven LLM for Automated Circuit Topology Generation
 - AutoElicit: Using Large Language Models for Expert Prior Elicitation in Predictive Modelling
 - Autoencoder-Based Hybrid Replay for Class-Incremental Learning
 - AutoEval Done Right: Using Synthetic Data for Model Evaluation
 - Autoformulation of Mathematical Optimization Models Using LLMs
 - AutoGFM: Automated Graph Foundation Model with Adaptive Architecture Customization
 - Automated Benchmark Generation for Repository-Level Coding Tasks
 - Automated Hypothesis Validation with Agentic Sequential Falsifications
 - Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
 - Automatically Identify and Rectify: Robust Deep Contrastive Multi-view Clustering in Noisy Scenarios
 - Automatically Interpreting Millions of Features in Large Language Models
 - Automatic Differentiation of Optimization Algorithms with Time-Varying Updates
 - Automatic Reward Shaping from Confounded Offline Data
 - AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML
 - Autonomy-of-Experts Models
 - Auto-reconfiguration for Latency Minimization in CPU-based DNN Serving
 - AutoStep: Locally adaptive involutive MCMC
 - A Variational Framework for Improving Naturalness in Generative Spoken Language Models
 - A Variational Information Theoretic Approach to Out-of-Distribution Detection
 - A Variational Perspective on Generative Protein Fitness Optimization
 - Average Certified Radius is a Poor Metric for Randomized Smoothing
 - Average Sensitivity of Hierarchical $k$-Median Clustering
 - A Versatile Influence Function for Data Attribution with Non-Decomposable Loss
 - Avoiding Catastrophe in Online Learning by Asking for Help
 - Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts
 - Avoiding spurious sharpness minimization broadens applicability of SAM
 - AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
 - Backdoor Attacks in Token Selection of Attention Mechanism
 - BackSlash: Rate Constrained Optimized Training of Large Language Models
 - BalancEdit: Dynamically Balancing the Generality-Locality Trade-off in Multi-modal Model Editing
 - Balanced Learning for Domain Adaptive Semantic Segmentation
 - Balancing Efficiency and Expressiveness: Subgraph GNNs with Walk-Based Centrality
 - Balancing Interference and Correlation in Spatial Experimental Designs: A Causal Graph Cut Approach
 - Balancing Model Efficiency and Performance: Adaptive Pruner for Long-tailed Data
 - Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing
 - Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data
 - BAME: Block-Aware Mask Evolution for Efficient N:M Sparse Training
 - BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms
 - BAnG: Bidirectional Anchored Generation for Conditional RNA Design
 - Banyan: Improved Representation Learning with Explicit Structure
 - BARK: A Fully Bayesian Tree Kernel for Black-box Optimization
 - BARNN: A Bayesian Autoregressive and Recurrent Neural Network
 - Batch List-Decodable Linear Regression via Higher Moments
 - BaWA: Automatic Optimizing Pruning Metric for Large Language Models with Balanced Weight and Activation
 - BaxBench: Can LLMs Generate Correct and Secure Backends?
 - Bayesian Active Learning for Bivariate Causal Discovery
 - Bayesian Basis Function Approximation for Scalable Gaussian Process Priors in Deep Generative Models
 - Bayesian Inference for Correlated Human Experts and Classifiers
 - Bayesian Neural Scaling Law Extrapolation with Prior-Data Fitted Networks
 - Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds
 - Bayesian Weight Enhancement with Steady-State Adaptation for Test-time Adaptation in Dynamic Environments
 - BCE vs. CE in Deep Feature Learning
 - BDC-CLIP: Brownian Distance Covariance for Adapting CLIP to Action Recognition
 - Be a Goldfish: Forgetting Bad Conditioning in Sparse Linear Regression via Variational Autoencoders
 - BECAME: Bayesian Continual Learning with Adaptive Model Merging
 - Be Confident: Uncovering Overfitting in MLLM Multi-Task Tuning
 - Behavior-agnostic Task Inference for Robust Offline In-context Reinforcement Learning
 - Behavioral Exploration: Learning to Explore via In-Context Adaptation
 - Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning
 - Bellman Unbiasedness: Toward Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation
 - Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective
 - Benchmarking Quantum Reinforcement Learning
 - Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression
 - Benign Overfitting in Token Selection of Attention Mechanism
 - Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety
 - Best of Both Worlds: Advantages of Hybrid Graph Sequence Models
 - Best of Both Worlds: Regret Minimization versus Minimax Play
 - BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute
 - Best Subset Selection: Optimal Pursuit for Feature Selection and Elimination
 - Better to Teach than to Give: Domain Generalized Semantic Segmentation via Agent Queries with Diffusion Model Guidance
 - Beyond Atoms: Enhancing Molecular Pretrained Representations with 3D Space Modeling
 - Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
 - Beyond Communication Overhead: A Multilevel Monte Carlo Approach for Mitigating Compression Bias in Distributed Learning
 - Beyond Confidence: Exploiting Homogeneous Pattern for Semi-Supervised Semantic Segmentation
 - Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts
 - Beyond CVaR: Leveraging Static Spectral Risk Measures for Enhanced Decision-Making in Distributional Reinforcement Learning
 - Beyond Entropy: Region Confidence Proxy for Wild Test-Time Adaptation
 - Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence
 - Beyond Log-Concavity and Score Regularity: Improved Convergence Bounds for Score-Based Generative Models in W2-distance
 - Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning
 - Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
 - Beyond Message Passing: Neural Graph Pattern Machine
 - Beyond Minimax Rates in Group Distributionally Robust Optimization via a Novel Notion of Sparsity
 - Beyond One-Hot Labels: Semantic Mixing for Model Calibration
 - Beyond Self-Interest: How Group Strategies Reshape Content Creation in Recommendation Platforms?
 - Beyond Self-Repellent Kernels: History-Driven Target Towards Efficient Nonlinear MCMC on General Graphs
 - Beyond Sensor Data: Foundation Models of Behavioral Data from Wearables Improve Health Predictions
 - Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning
 - Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion
 - Beyond The Rainbow: High Performance Deep Reinforcement Learning on a Desktop PC
 - Beyond Topological Self-Explainable GNNs: A Formal Explainability Perspective
 - Beyond Zero Initialization: Investigating the Impact of Non-Zero Initialization on LoRA Fine-Tuning Dynamics
 - BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly
 - Bifurcate then Alienate: Incomplete Multi-view Clustering via Coupled Distribution Learning with Linear Overhead
 - BILBO: BILevel Bayesian Optimization
 - BiMaCoSR: Binary One-Step Diffusion Model Leveraging Flexible Matrix Compression for Real Super-Resolution
 - BiMark: Unbiased Multilayer Watermarking for Large Language Models
 - Binary Hypothesis Testing for Softmax Models and Leverage Score Models
 - BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models
 - Bipartite Ranking From Multiple Labels: On Loss Versus Label Aggregation
 - Bi-perspective Splitting Defense: Achieving Clean-Seed-Free Backdoor Security
 - Bivariate Causal Discovery with Proxy Variables: Integral Solving and Beyond
 - Black-Box Adversarial Attacks on LLM-Based Code Completion
 - Blink of an eye: a simple theory for feature localization in generative models
 - BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference
 - BoA: Attention-aware Post-training Quantization without Backpropagation
 - Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
 - BOOD: Boundary-based Out-Of-Distribution Data Generation
 - Boost-and-Skip: A Simple Guidance-Free Diffusion for Minority Generation
 - Boosting Adversarial Robustness with CLAT: Criticality Leveraged Adversarial Training
 - Boosting Masked ECG-Text Auto-Encoders as Discriminative Learners
 - Boosting Multi-Domain Fine-Tuning of Large Language Models through Evolving Interactions between Samples
 - Boosting Protein Graph Representations through Static-Dynamic Fusion
 - Boosting Virtual Agent Learning and Reasoning: A Step-Wise, Multi-Dimensional, and Generalist Reward Model with Benchmark
 - Bootstrapping Self-Improvement of Language Model Programs for Zero-Shot Schema Matching
 - BOPO: Neural Combinatorial Optimization via Best-anchored and Objective-guided Preference Optimization
 - Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time
 - BounDr.E: Predicting Drug-likeness via Biomedical Knowledge Alignment and EM-like One-Class Boundary Optimization
 - BoxLM: Unifying Structures and Semantics of Medical Concepts for Diagnosis Prediction in Healthcare
 - Branches: Efficiently Seeking Optimal Sparse Decision Trees via AO*
 - Breaking Barriers: Combinatorial Algorithms for Non-Monotone Submodular Maximization with Sublinear Adaptivity and $1/e$ Approximation
 - Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting
 - Breaking the $n^{1.5}$ Additive Error Barrier for Private and Efficient Graph Sparsification via Private Expander Decomposition
 - Breaking the Barrier of Hard Samples: A Data-Centric Approach to Synthetic Data for Medical Tasks
 - Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning
 - Breaking the Quadratic Barrier: Robust Cardinality Sketches for Adaptive Queries
 - BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modeling
 - Bridging Fairness and Efficiency in Conformal Inference: A Surrogate-Assisted Group-Clustered Approach
 - Bridging Layout and RTL: Knowledge Distillation based Timing Prediction
 - Bridging Protein Sequences and Microscopy Images with Unified Diffusion Models
 - Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging
 - BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
 - Broadband Ground Motion Synthesis by Diffusion Model with Minimal Condition
 - B-score: Detecting biases in large language models using response history
 - BSemiFL: Semi-supervised Federated Learning via a Bayesian Approach
 - BSLoRA: Enhancing the Parameter Efficiency of LoRA with Intra-Layer and Inter-Layer Sharing
 - BSO: Binary Spiking Online Optimization Algorithm
 - Building Physically Plausible World Models
 - Byzantine-Resilient Federated Alternating Gradient Descent and Minimization for Partly-Decoupled Low Rank Matrix Learning
 - C2IQL: Constraint-Conditioned Implicit Q-learning for Safe Offline Reinforcement Learning
 - C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation
 - Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
 - CABS: Conflict-Aware and Balanced Sparsification for Enhancing Model Merging
 - Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models
 - CACTI: Leveraging Copy Masking and Contextual Information to Improve Tabular Data Imputation
 - CaDA: Cross-Problem Routing Solver with Constraint-Aware Dual-Attention
 - CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing
 - Calibrated Language Models and How to Find Them with Label Smoothing
 - Calibrated Physics-Informed Uncertainty Quantification
 - Calibrated Value-Aware Model Learning with Probabilistic Environment Models
 - Calibrating Video Watch-time Predictions with Credible Prototype Alignment
 - Calibration and Bias in Algorithms, Data, and Models: a tutorial on metrics and plots for measuring calibration, bias, fairness, reliability, and robustness
 - CALM: Consensus-Aware Localized Merging for Multi-Task Learning
 - Can Biologically Plausible Temporal Credit Assignment Rules Match BPTT for Neural Similarity? E-prop as an Example
 - CANCELED: Alignment Methods for Large Language Models
 - Can Classic GNNs Be Strong Baselines for Graph-level Tasks? Simple Architectures Meet Excellence
 - Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression
 - Can DBNNs Robust to Environmental Noise for Resource-constrained Scenarios?
 - Can Diffusion Models Learn Hidden Inter-Feature Rules Behind Images?
 - Can Large Language Models Understand Intermediate Representations in Compilers?
 - CAN: Leveraging Clients As Navigators for Generative Replay in Federated Continual Learning
 - Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark
 - Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs
 - Canonical Rank Adaptation: An Efficient Fine-Tuning Strategy for Vision Transformers
 - Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
 - Can Transformers Learn Full Bayesian Inference in Context?
 - Can Transformers Reason Logically? A Study in SAT Solving
 - Can We Predict Performance of Large Models across Vision-Language Tasks?
 - Cape: Context-Aware Prompt Perturbation Mechanism with Differential Privacy
 - Capturing Temporal Dynamics in Large-Scale Canopy Tree Height Estimation
 - CASE-Bench: Context-Aware SafEty Benchmark for Large Language Models
 - Catching Two Birds with One Stone: Reward Shaping with Dual Random Networks for Balancing Exploration and Exploitation
 - Catch Your Emotion: Sharpening Emotion Perception in Multimodal Large Language Models
 - CAT: Contrastive Adversarial Training for Evaluating the Robustness of Protective Perturbations in Latent Diffusion Models
 - Categorical Distributional Reinforcement Learning with Kullback-Leibler Divergence: Convergence and Asymptotics
 - Categorical Schrödinger Bridge Matching
 - CateKV: On Sequential Consistency for Long-Context LLM Inference Acceleration
 - CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging
 - Catoni Contextual Bandits are Robust to Heavy-tailed Rewards
 - Causal Abstraction Inference under Lossy Representations
 - Causal Abstraction Learning based on the Semantic Embedding Principle
 - Causal Attribution Analysis for Continuous Outcomes
 - Causal Discovery from Conditionally Stationary Time Series
 - Causal Effect Identification in lvLiNGAM from Higher-Order Cumulants
 - Causal Invariance-aware Augmentation for Brain Graph Contrastive Learning
 - Causality-Aware Contrastive Learning for Robust Multivariate Time-Series Anomaly Detection
 - Causality Inspired Federated Learning for OOD Generalization
 - Causal Logistic Bandits with Counterfactual Fairness Constraints
 - Causal-PIK: Causality-based Physical Reasoning with a Physics-Informed Kernel
 - Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention
 - CEGA: A Cost-Effective Approach for Graph-Based Model Extraction and Acquisition
 - CellFlux: Simulating Cellular Morphology Changes via Flow Matching
 - Censor Dependent Variational Inference
 - CERTAIN: Context Uncertainty-aware One-Shot Adaptation for Context-based Offline Meta Reinforcement Learning
 - Certifiably Robust Model Evaluation in Federated Learning under Meta-Distributional Shifts
 - Certification for Differentially Private Prediction in Gradient-Based Training
 - Certified Unlearning for Neural Networks
 - CFP-Gen: Combinatorial Functional Protein Generation via Diffusion Language Models
 - CFPT: Empowering Time Series Forecasting through Cross-Frequency Interaction and Periodic-Aware Timestamp Modeling
 - Chameleon: A Flexible Data-mixing Framework for Language Model Pretraining and Finetuning
 - Channel Normalization for Time Series Channel Identification
 - Chaos Meets Attention: Transformers for Large-Scale Dynamical Prediction
 - CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation
 - Chip Placement with Diffusion Models
 - Circumventing Backdoor Space via Weight Symmetry
 - CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries
 - Clients Collaborate: Flexible Differentially Private Federated Learning with Guaranteed Improvement of Utility-Privacy Trade-off
 - CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models
 - Clipped SGD Algorithms for Performative Prediction: Tight Bounds for Stochastic Bias and Remedies
 - Clipping Improves Adam-Norm and AdaGrad-Norm when the Noise Is Heavy-Tailed
 - Clone-Robust AI Alignment
 - Closed-form Solutions: A New Perspective on Solving Differential Equations
 - Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling
 - Closing the Loop: Machine Learning for Optimization and Discovery
 - CLOVER: Cross-Layer Orthogonal Vectors Pruning
 - Clustering Items through Bandit Feedback: Finding the Right Feature out of Many
 - Clustering Properties of Self-Supervised Learning
 - Clustering via Self-Supervised Diffusion
 - CMoS: Rethinking Time Series Prediction Through the Lens of Chunk-wise Spatial Correlations
 - CoastalBench: A Decade-Long High-Resolution Dataset to Emulate Complex Coastal Processes
 - CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model for Context Optimization
 - Code-Generated Graph Representations Using Multiple LLM Agents for Material Properties Prediction
 - CodeIO: Condensing Reasoning Patterns via Code Input-Output Prediction
 - CODEML: Championing Open-source DEvelopment in Machine Learning
 - CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance
 - CodeSync: Synchronizing Large Language Models with Dynamic Code Evolution at Scale
 - CoDy: Counterfactual Explainers for Dynamic Graphs
 - COExpander: Adaptive Solution Expansion for Combinatorial Optimization
 - CogMath: Assessing LLMs' Authentic Mathematical Ability from a Human Cognitive Perspective
 - COGNATE: Acceleration of Sparse Tensor Programs on Emerging Hardware using Transfer Learning
 - CogReact: A Reinforced Framework to Model Human Cognitive Reaction Modulated by Dynamic Intervention
 - COKE: Core Kernel for More Efficient Approximation of Kernel Weights in Multiple Kernel Clustering
 - CollabLLM: From Passive Responders to Active Collaborators
 - Collaborative Mean Estimation Among Heterogeneous Strategic Agents: Individual Rationality, Fairness, and Truthful Contribution
 - Collapse or Thrive: Perils and Promises of Synthetic Data in a Self-Generating World
 - Collapse-Proof Non-Contrastive Self-Supervised Learning
 - CombiMOTS: Combinatorial Multi-Objective Tree Search for Dual-Target Molecule Generation
 - Combinatorial Reinforcement Learning with Preference Feedback
 - CoMemo: LVLMs Need Image Context with Image Memory
 - Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation
 - Communicating Activations Between Language Model Agents
 - Commute Graph Neural Networks
 - CommVQ: Commutative Vector Quantization for KV Cache Compression
 - Compact Matrix Quantum Group Equivariant Neural Networks
 - Comparing Comparisons: Informative and Easy Human Feedback with Distinguishability Queries
 - Comparing Few to Rank Many: Active Human Preference Learning Using Randomized Frank-Wolfe Method
 - Compelling ReLU Networks to Exhibit Exponentially Many Linear Regions at Initialization and During Training
 - Competing Bandits in Matching Markets via Super Stability
 - Competitively Consistent Clustering
 - Complete-Tree Space Favors Data-Efficient Link Prediction
 - Complex Wavelet Mutual Information Loss: A Multi-Scale Loss Function for Semantic Segmentation
 - Componential Prompt-Knowledge Alignment for Domain Incremental Learning
 - Compositional Causal Reasoning Evaluation in Language Models
 - Compositional Condition Question Answering in Tabular Understanding
 - Compositional Flows for 3D Molecule and Synthesis Pathway Co-design
 - Compositional Generalization via Forced Rendering of Disentangled Latents
 - Compositional Risk Minimization
 - Compositional Scene Understanding through Inverse Generative Modeling
 - Compressed and distributed least-squares regression: convergence rates with applications to federated learning
 - Compressed Image Generation with Denoising Diffusion Codebook Models
 - Compressing tree ensembles through Level-wise Optimization and Pruning
 - Compression via Pre-trained Transformers: A Study on Byte-Level Multimodal Data
 - Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead
 - Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders
 - Compute or Load KV Cache? Why Not Both?
 - Computing Optimal Transport Maps and Wasserstein Barycenters Using Conditional Normalizing Flows
 - Computing Voting Rules with Improvement Feedback
 - COMRECGC: Global Graph Counterfactual Explainer through Common Recourse
 - Concentration Distribution Learning from Label Distributions
 - ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features
 - Concept-Based Unsupervised Domain Adaptation
 - Concept-Centric Token Interpretation for Vector-Quantized Generative Models
 - Concept Reachability in Diffusion Models: Beyond Dataset Constraints
 - Concurrent Reinforcement Learning with Aggregated States via Randomized Least Squares Value Iteration
 - Conditional Diffusion Model with Nonlinear Data Transformation for Time Series Forecasting
 - Conditioning Diffusions Using Malliavin Calculus
 - Confidence Difference Reflects Various Supervised Signals in Confidence-Difference Classification
 - Confidential Guardian: Cryptographically Prohibiting the Abuse of Model Abstention
 - Conformal Anomaly Detection in Event Sequences
 - Conformal Prediction as Bayesian Quadrature
 - Conformal Prediction with Cellwise Outliers: A Detect-then-Impute Approach
 - Conformal Tail Risk Control for Large Language Model Alignment
 - Conformity Score Averaging for Classification
 - Confounder-Free Continual Learning via Recursive Feature Normalization
 - ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization
 - Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret
 - Consensus Based Stochastic Optimal Control
 - Consensus Is All You Get: The Role of Attention in Transformers
 - Conservative Offline Goal-Conditioned Implicit V-Learning
 - Constant Stepsize Local GD for Logistic Regression: Acceleration by Instability
 - Constrain Alignment with Sparse Autoencoders
 - Constrained Belief Updates Explain Geometric Structures in Transformer Representations
 - Constrained Exploitability Descent: An Offline Reinforcement Learning Method for Finding Mixed-Strategy Nash Equilibrium
 - Constrained Online Convex Optimization with Polyak Feasibility Steps
 - Constrained Pareto Set Identification with Bandit Feedback
 - ConText: Driving In-context Learning for Text Removal and Segmentation
 - Context-Informed Neural ODEs Unexpectedly Identify Broken Symmetries: Insights from the Poincaré–Hopf Theorem
 - Context is Key: A Benchmark for Forecasting with Essential Textual Information
 - Context Matters: Query-aware Dynamic Long Sequence Modeling of Gigapixel Images
 - Contextual Bandits for Unbounded Context Distributions
 - Contextual Linear Bandits with Delay as Payoff
 - Contextual Online Decision Making with Infinite-Dimensional Functional Regression
 - Contextual Optimization Under Model Misspecification: A Tractable and Generalizable Approach
 - Contextures: Representations from Contexts
 - Continual Generalized Category Discovery: Learning and Forgetting from a Bayesian Perspective
 - Continual Reinforcement Learning by Planning with Online World Models
 - Continuous Bayesian Model Selection for Multivariate Causal Discovery
 - Continuously Updating Digital Twins using Large Language Models
 - Continuous Semi-Implicit Models
 - Continuous-Time Analysis of Heavy Ball Momentum in Min-Max Games
 - Continuous Visual Autoregressive Generation via Score Maximization
 - Contour Integration Underlies Human-Like Vision
 - Contract Design Under Approximate Best Responses
 - Contradiction Retrieval via Contrastive Learning with Sparsity
 - Contrastive Learning with Simplicial Convolutional Networks for Short-Text Classification
 - Contrastive Localized Language-Image Pre-Training
 - Contrastive Private Data Synthesis via Weighted Multi-PLM Fusion
 - Contrastive Visual Data Augmentation
 - Control and Realism: Best of Both Worlds in Layout-to-Image without Training
 - Controllable Data Generation with Hierarchical Neural Representations
 - Controlled Generation with Equivariant Variational Flow Matching
 - Controlling Large Language Model with Latent Action
 - Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning
 - Controlling Underestimation Bias in Constrained Reinforcement Learning for Safe Exploration
 - Convergence Analysis of Policy Gradient Methods with Dynamic Stochasticity
 - Convergence of Consistency Model with Multistep Sampling under General Data Assumptions
 - Convergence of Mean-Field Langevin Stochastic Descent-Ascent for Distributional Minimax Optimization
 - Convergence of Policy Mirror Descent Beyond Compatible Function Approximation
 - Convex Markov Games: A New Frontier for Multi-Agent Reinforcement Learning
 - Cooperation of Experts: Fusing Heterogeneous Information with Large Margin
 - Copilot Arena: A Platform for Code LLM Evaluation in the Wild
 - CoPINN: Cognitive Physics-Informed Neural Networks
 - Core Context Aware Transformers for Long Context Language Modeling
 - Core Knowledge Deficits in Multi-Modal Language Models
 - CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models
 - Correlated Errors in Large Language Models
 - Correlation Clustering Beyond the Pivot Algorithm
 - COSDA: Counterfactual-based Susceptibility Risk Framework for Open-Set Domain Adaptation
 - CoSER: Coordinating LLM-Based Persona Simulation of Established Roles
 - Cost-efficient Collaboration between On-device and Cloud Language Models
 - CostFilter-AD: Enhancing Anomaly Detection through Matching Cost Filtering
 - Counterfactual Contrastive Learning with Normalizing Flows for Robust Treatment Effect Estimation
 - Counterfactual Effect Decomposition in Multi-Agent Sequential Decision Making
 - Counterfactual Graphical Models: Constraints and Inference
 - Counterfactual Voting Adjustment for Quality Assessment and Fairer Voting in Online Platforms with Helpfulness Evaluation
 - Counting atoms faster: policy-based nuclear magnetic resonance pulse sequencing for atomic abundance measurement
 - Counting in Small Transformers: The Delicate Interplay between Attention and Feed-Forward Layers
 - Covered Forest: Fine-grained generalization analysis of graph neural networks
 - Cover learning for large-scale topology representation
 - Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems
 - CPCF: A Cross-Prompt Contrastive Framework for Referring Multimodal Large Language Models
 - Cradle: Empowering Foundation Agents towards General Computer Control
 - Craftium: Bridging Flexibility and Efficiency for Rich 3D Single- and Multi-Agent Environments
 - CRANE: Reasoning with constrained LLM generation
 - Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM’s Reasoning Capability
 - Cross-City Latent Space Alignment for Consistency Region Embedding
 - Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination
 - Cross-Modal Alignment via Variational Copula Modelling
 - Cross-regularization: Adaptive Model Complexity through Validation Gradients
 - CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
 - CSG-ODE: ControlSynth Graph ODE For Modeling Complex Evolution of Dynamic Graphs
 - CSTrack: Enhancing RGB-X Tracking via Compact Spatiotemporal Features
 - CSV-Occ: Fusing Multi-frame Alignment for Occupancy Prediction with Temporal Cross State Space Model and Central Voting Mechanism
 - CTBench: A Library and Benchmark for Certified Training
 - CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning
 - CUPS: Improving Human Pose-Shape Estimators with Conformalized Deep Uncertainty
 - Curriculum Learning for Biological Sequence Prediction: The Case of De Novo Peptide Sequencing
 - Curse of High Dimensionality Issue in Transformer for Long Context Modeling
 - CursorCore: Assist Programming through Aligning Anything
 - Curvature-aware Graph Attention for PDEs on Manifolds
 - Curvature Enhanced Data Augmentation for Regression
 - CurvGAD: Leveraging Curvature for Enhanced Graph Anomaly Detection
 - Customizing the Inductive Biases of Softmax Attention using Structured Matrices
 - Cut out and Replay: A Simple yet Versatile Strategy for Multi-Label Online Continual Learning
 - CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities
 - DA-KD: Difficulty-Aware Knowledge Distillation for Efficient Large Language Models
 - DAMA: Data- and Model-aware Alignment of Multi-modal LLMs
 - DANCE: Dual Unbiased Expansion with Group-acquired Alignment for Out-of-distribution Graph Fairness Learning
 - DataDecide: How to Predict Best Pretraining Data with Small Experiments
 - Data-driven Design of Randomized Control Trials with Guaranteed Treatment Effects
 - Data-Driven Selection of Instrumental Variables for Additive Nonlinear, Constant Effects Models
 - Dataflow-Guided Neuro-Symbolic Language Models for Type Inference
 - Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development
 - Data Mixing Optimization for Supervised Fine-Tuning of Large Language Models
 - DataWorld: Unifying data curation frameworks across domains
 - David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training
 - DCBM: Data-Efficient Visual Concept Bottleneck Models
 - DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space
 - DEALing with Image Reconstruction: Deep Attentive Least Squares
 - De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks
 - Decision-aware Training of Spatiotemporal Forecasting Models to Select a Top-K Subset of Sites for Intervention
 - Decision Making under the Exponential Family: Distributionally Robust Optimisation with Bayesian Ambiguity Sets
 - Decision Mixer: Integrating Long-term and Local Dependencies via Dynamic Token Selection for Decision-Making
 - Decision Theoretic Foundations for Conformal Prediction: Optimal Uncertainty Quantification for Risk-Averse Agents
 - Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization
 - Decomposition of Graphic Design with Unified Multimodal Model
 - De-coupled NeuroGF for Shortest Path Distance Approximations on Large Terrain Graphs
 - Decoupled SGDA for Games with Intermittent Strategy Communication
 - Deep Bayesian Filter for Bayes-Faithful Data Assimilation
 - DeepCrossAttention: Supercharging Transformer Residual Connections
 - Deep Electromagnetic Structure Design Under Limited Evaluation Budgets
 - Deep Fuzzy Multi-view Learning for Reliable Classification
 - DeepLayout: Learning Neural Representations of Circuit Placement Layout
 - Deep Linear Network Training Dynamics from Random Initialization: Data, Width, Depth, and Hyperparameter Transfer
 - Deep Neural Cellular Potts Models
 - Deep Principal Support Vector Machines for Nonlinear Sufficient Dimension Reduction
 - Deep Reinforcement Learning from Hierarchical Preference Design
 - Deep Ridgelet Transform and Unified Universality Theorem for Deep and Shallow Joint-Group-Equivariant Machines
 - Deep Streaming View Clustering
 - Deep Sturm–Liouville: From Sample-Based to 1D Regularization with Learnable Orthogonal Basis Functions
 - Deep Unsupervised Hashing via External Guidance
 - DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts
 - Defending LVLMs Against Vision Attacks Through Partial-Perception Supervision
 - DeFoG: Discrete Flow Matching for Graph Generation
 - Delay-DSGN: A Dynamic Spiking Graph Neural Network with Delay Mechanisms for Evolving Graph
 - Deliberation in Latent Space via Differentiable Cache Augmentation
 - Delta Decompression for MoE-based LLMs Compression
 - De-mark: Watermark Removal in Large Language Models
 - Demeaned Sparse: Efficient Anomaly Detection by Residual Estimate
 - Demonstration Selection for In-Context Learning via Reinforcement Learning
 - Demystifying Catastrophic Forgetting in Two-Stage Incremental Object Detector
 - Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs
 - Demystifying Long Chain-of-Thought Reasoning
 - Demystifying Singular Defects in Large Language Models
 - Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation
 - Dendritic Localized Learning: Toward Biologically Plausible Algorithm
 - Density Ratio Estimation-based Bayesian Optimization with Semi-Supervised Learning
 - Density Ratio Estimation with Conditional Probability Paths
 - Depth Degeneracy in Neural Networks: Vanishing Angles in Fully Connected ReLU Networks on Initialization
 - Dequantified Diffusion-Schrödinger Bridge for Density Ratio Estimation
 - Design Considerations in Offline Preference-based RL
 - Designing Cyclic Peptides via Harmonic SDE with Atom-Bond Modeling
 - Detecting Strategic Deception with Linear Probes
 - Determinant Estimation under Memory Constraints and Neural Scaling Laws
 - Determining Layer-wise Sparsity for Large Language Models Through a Theoretical Perspective
 - Deterministic Sparse Fourier Transform for Continuous Signals with Frequency Gap
 - Devil is in the Details: Density Guidance for Detail-Aware Generation with Flow Models
 - DexScale: Automating Data Scaling for Sim2Real Generalizable Robot Control
 - D-Fusion: Direct Preference Optimization for Aligning Diffusion Models with Visually Consistent Samples
 - Diagonal Symmetrization of Neural Network Solvers for the Many-Electron Schrödinger Equation
 - Dialogue Without Limits: Constant-Sized KV Caches for Extended Response in LLMs
 - DiffAdvMAP: Flexible Diffusion-Based Framework for Generating Natural Unrestricted Adversarial Examples
 - Differentiable Quadratic Optimization For the Maximum Independent Set Problem
 - Differentiable Solver Search for Fast Diffusion Sampling
 - Differentiable Structure Learning with Ancestral Constraints
 - Differential Coding for Training-Free ANN-to-SNN Conversion
 - Differentially Private Analysis for Binary Response Models: Optimality, Estimation, and Inference
 - Differentially Private Boxplots
 - Differentially Private Federated $k$-Means Clustering with Server-Side Data
 - Differentially Private Space-Efficient Algorithms for Counting Distinct Elements in the Turnstile Model
 - Differential Privacy Guarantees of Markov Chain Monte Carlo Algorithms
 - Differential Privacy Under Class Imbalance: Methods and Empirical Insights
 - Diff-MoE: Diffusion Transformer with Time-Aware and Space-Adaptive Experts
 - DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra
 - Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces
 - Diffusion Adversarial Post-Training for One-Step Video Generation
 - Diffusion-based Adversarial Purification from the Perspective of the Frequency Domain
 - Diffusion Counterfactual Generation with Semantic Abduction
 - Diffusion Instruction Tuning
 - Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Auto Speculation
 - Diffusion models for Gaussian distributions: Exact solutions and Wasserstein errors
 - Diffusion on Language Model Encodings for Protein Sequence Generation
 - Diffusion Sampling Correction via Approximately 10 Parameters
 - DiffusionVLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression
 - DIG-BUGS: Data in Generative Models (The Bad, the Ugly, and the Greats)
 - DiLQR: Differentiable Iterative Linear Quadratic Regulator via Implicit Differentiation
 - DiMa: Understanding the Hardness of Online Matching Problems via Diffusion Models
 - DIME: Diffusion-Based Maximum Entropy Reinforcement Learning
 - Dimensionality Reduction on Complex Vector Spaces for Euclidean Distance with Dynamic Weights
 - Dimension-Free Adaptive Subgradient Methods with Frequent Directions
 - Dimension-Independent Rates for Structured Neural Density Estimation
 - DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning
 - DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy
 - Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models
 - Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator
 - Directed Graph Grammars for Sequence-based Learning
 - Directly Forecasting Belief for Reinforcement Learning with Delays
 - Direct Motion Models for Assessing Generated Videos
 - Direct Prediction Set Minimization via Bilevel Conformal Classifier Training
 - DIS-CO: Discovering Copyrighted Content in VLMs Training Data
 - DISCO: learning to DISCover an evolution Operator for multi-physics-agnostic prediction
 - Discovering a Zero (Zero-Vector Class of Machine Learning)
 - Discovering Global False Negatives On the Fly for Self-supervised Contrastive Learning
 - Discovering Latent Causal Graphs from Spatiotemporal Data
 - Discovering Physics Laws of Dynamical Systems via Invariant Function Learning
 - Discovering Spoofing Attempts on Language Model Watermarks
 - Discovering Symbolic Cognitive Models from Human and Animal Behavior
 - Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension
 - Discrepancy Minimization in Input-Sparsity Time
 - Discrete and Continuous Difference of Submodular Minimization
 - Discrete Markov Probabilistic Models: An Improved Discrete Score-Based Framework with sharp convergence bounds under minimal assumptions
 - Discrete Neural Algorithmic Reasoning
 - Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data
 - Discriminative Policy Optimization for Token-Level Reward Models
 - Disentangled Graph Spectral Domain Adaptation
 - Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
 - Disentangling Invariant Subgraph via Variance Contrastive Estimation under Distribution Shifts
 - Disparate Conditional Prediction in Multiclass Classifiers
 - Dissecting Submission Limit in Desk-Rejections: A Mathematical Analysis of Fairness in AI Conference Policies
 - Diss-l-ECT: Dissecting Graph Data with Local Euler Characteristic Transforms
 - Distillation of Discrete Diffusion through Dimensional Correlations
 - Distillation Scaling Laws
 - Distilling the Knowledge in Data Pruning
 - DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs
 - Distinguishing Cause from Effect with Causal Velocity Models
 - Distributed Conformal Prediction via Message Passing
 - Distributed Differentially Private Data Analytics via Secure Sketching
 - Distributed Event-Based Learning via ADMM
 - Distributed Nonparametric Estimation: from Sparse to Dense Samples per Terminal
 - Distributed Parallel Gradient Stacking(DPGS): Solving Whole Slide Image Stacking Challenge in Multi-Instance Learning
 - Distributed Retraction-Free and Communication-Efficient Optimization on the Stiefel Manifold
 - Distributional Diffusion Models with Scoring Rules
 - Distributionally Robust Active Learning for Gaussian Process Regression
 - Distributionally Robust Multi-Agent Reinforcement Learning for Dynamic Chute Mapping
 - Distributionally Robust Policy Learning under Concept Drifts
 - Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic Perspective
 - DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation
 - Diverging Preferences: When do Annotators Disagree and do Models Know?
 - Diverse Prototypical Ensembles Improve Robustness to Subpopulation Shift
 - Diversified Flow Matching with Translation Identifiability
 - Diversifying Robot Locomotion Behaviors with Extrinsic Behavioral Curiosity
 - Diversity By Design: Leveraging Distribution Matching for Offline Model-Based Optimization
 - Divide and Conquer: Exploring Language-centric Tree Reasoning for Video Question-Answering
 - Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning
 - Divide and Conquer: Learning Label Distribution with Subtasks
 - Diving into Self-Evolving Training for Multimodal Reasoning
 - DLP: Dynamic Layerwise Pruning in Large Language Models
 - DMM: Distributed Matrix Mechanism for Differentially-Private Federated Learning Based on Constant-Overhead Linear Secret Resharing
 - DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
 - Do Bayesian Neural Networks Actually Behave Like Bayesian Models?
 - DocKS-RAG: Optimizing Document-Level Relation Extraction through LLM-Enhanced Hybrid Prompt Tuning
 - DocVXQA: Context-Aware Visual Explanations for Document Question Answering
 - Does Data Scaling Lead to Visual Compositional Generalization?
 - Does Generation Require Memorization? Creative Diffusion Models using Ambient Diffusion
 - Does Graph Prompt Work? A Data Operation Perspective with Theoretical Analysis
 - Does learning the right latent variables necessarily improve in-context learning?
 - Does Low Rank Adaptation Lead to Lower Robustness against Training-Time Attacks?
 - Does One-shot Give the Best Shot? Mitigating Model Inconsistency in One-shot Federated Learning
 - DOLPHIN: A Programmable Framework for Scalable Neurosymbolic Learning
 - Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training
 - Domain-Adapted Diffusion Model for PROTAC Linker Design Through the Lens of Density Ratio in Chemical Space
 - Do Multiple Instance Learning Models Transfer?
 - Do Not Mimic My Voice : Speaker Identity Unlearning for Zero-Shot Text-to-Speech
 - Do NOT Think That Much for 2+3=? On the Overthinking of Long Reasoning Models
 - Don't Restart, Just Reuse: Reoptimizing MILPs with Dynamic Parameters
 - Double-Filter: Efficient Fine-tuning of Pre-trained Vision-Language Models via Patch&Layer Filtering
 - Double Machine Learning for Causal Inference under Shared-State Interference
 - Doubly Protected Estimation for Survival Outcomes Utilizing External Controls for Randomized Clinical Trials
 - Doubly Robust Conformalized Survival Analysis with Right-Censored Data
 - Doubly Robust Fusion of Many Treatments for Policy Learning
 - Do Vision-Language Models Really Understand Visual Language?
 - Do We Need to Verify Step by Step? Rethinking Process Supervision from a Theoretical Perspective
 - Do We Really Need Message Passing in Brain Network Modeling?
 - DPCore: Dynamic Prompt Coreset for Continual Test-Time Adaptation
 - DP-fy your DATA: How to (and why) synthesize Differentially Private Synthetic Data
 - DPO Meets PPO: Reinforced Token Optimization for RLHF
 - DRAG: Data Reconstruction Attack using Guided Diffusion
 - DragLoRA: Online Optimization of LoRA Adapters for Drag-based Image Editing in Diffusion Model
 - DragSolver: A Multi-Scale Transformer for Real-World Automotive Drag Coefficient Estimation
 - DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization
 - DriveGPT: Scaling Autoregressive Behavior Models for Driving
 - Drug-TTA: Test-Time Adaptation for Drug Virtual Screening via Multi-task Meta-Auxiliary Learning
 - DSBRouter: End-to-end Global Routing via Diffusion Schr\"{o}dinger Bridge
 - DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers
 - DS-VLM: Diffusion Supervision Vision Language Model
 - DTZO: Distributed Trilevel Zeroth Order Learning with Provable Non-Asymptotic Convergence
 - Dual Feature Reduction for the Sparse-group Lasso and its Adaptive Variant
 - Dueling Convex Optimization with General Preferences
 - DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications
 - DVI:A Derivative-based Vision Network for INR
 - DyCodeEval: Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination
 - Dynamical Modeling of Behaviorally Relevant Spatiotemporal Patterns in Neural Imaging Data
 - Dynamical phases of short-term memory mechanisms in RNNs
 - Dynamic Mixture of Curriculum LoRA Experts for Continual Multimodal Instruction Tuning
 - Dynamic Similarity Graph Construction with Kernel Density Estimation
 - Dynamic Sparse Training of Diagonally Sparse Networks
 - DynaMind: Reasoning over Abstract Video Dynamics for Embodied Decision-Making
 - DyPolySeg: Taylor Series-Inspired Dynamic Polynomial Fitting Network for Few-shot Point Cloud Semantic Segmentation
 - EAGLES: Towards Effective, Efficient, and Economical Federated Graph Learning via Unified Sparsification
 - EARL-BO: Reinforcement Learning for Multi-Step Lookahead, High-Dimensional Bayesian Optimization
 - Earley-Driven Dynamic Pruning for Efficient Structured Decoding
 - EARTH: Epidemiology-Aware Neural ODE with Continuous Disease Transmission Graph
 - EasyInv: Toward Fast and Better DDIM Inversion
 - EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
 - EcoMapper: Generative Modeling for Climate-Aware Satellite Imagery
 - Edge-Colored Clustering in Hypergraphs: Beyond Minimizing Unsatisfied Edges
 - Editable Concept Bottleneck Models
 - Editable Noise Map Inversion: Encoding Target-image into Noise For High-Fidelity Image Manipulation
 - EditLord: Learning Code Transformation Rules for Code Editing
 - EduLLM: Leveraging Large Language Models and Framelet-Based Signed Hypergraph Neural Networks for Student Performance Prediction
 - EEG-Language Pretraining for Highly Label-Efficient Clinical Phenotyping
 - EFDTR: Learnable Elliptical Fourier Descriptor Transformer for Instance Segmentation
 - Effective and Efficient Masked Image Generation Models
 - Efficient and Privacy-Preserving Soft Prompt Transfer for LLMs
 - Efficient and Scalable Density Functional Theory Hamiltonian Prediction through Adaptive Sparsity
 - Efficient and Separate Authentication Image Steganography Network
 - Efficient ANN-SNN Conversion with Error Compensation Learning
 - Efficient Bisection Projection to Ensure Neural-Network Solution Feasibility for Optimization over General Set
 - Efficient Core-set Selection for Deep Learning Through Squared Loss Minimization
 - Efficient Curvature-Aware Hypergradient Approximation for Bilevel Optimization
 - Efficient Diffusion Models for Symmetric Manifolds
 - Efficient Distributed Optimization under Heavy-Tailed Noise
 - Efficient Federated Incomplete Multi-View Clustering
 - Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation
 - Efficient First-Order Optimization on the Pareto Set for Multi-Objective Learning under Preference Guidance
 - Efficient Generative Modeling with Residual Vector Quantization-Based Tokens
 - Efficient Graph Continual Learning via Lightweight Graph Neural Tangent Kernels-based Dataset Distillation
 - Efficient Heterogeneity-Aware Federated Active Data Selection
 - Efficient Length-Generalizable Attention via Causal Retrieval for Long-Context Language Modeling
 - Efficient LiDAR Reflectance Compression via Scanning Serialization
 - Efficient Logit-based Knowledge Distillation of Deep Spiking Neural Networks for Full-Range Timestep Deployment
 - Efficient Long Context Fine-tuning with Chunk Flow
 - Efficiently Access Diffusion Fisher: Within the Outer Product Span Space
 - Efficiently Serving Large Multimodal Models Using EPD Disaggregation
 - Efficiently Vectorized MCMC on Modern Accelerators
 - Efficient Molecular Conformer Generation with SO(3)-Averaged Flow Matching and Reflow
 - Efficient Motion Prompt Learning for Robust Visual Tracking
 - Efficient Multi-modal Long Context Learning for Training-free Adaptation
 - Efficient Multivariate Robust Mean Estimation Under Mean-Shift Contamination
 - Efficient Network Automatic Relevance Determination
 - Efficient Noise Calculation in Deep Learning-based MRI Reconstructions
 - Efficient Online Reinforcement Learning for Diffusion Policy
 - Efficient Optimization with Orthogonality Constraint: a Randomized Riemannian Submanifold Method
 - Efficient Parallel Training Methods for Spiking Neural Networks with Constant Time Complexity
 - Efficient Personalized Adaptation for Physiological Signal Foundation Model
 - Efficient Quantification of Multimodal Interaction at Sample Level
 - Efficient Robotic Policy Learning via Latent Space Backward Planning
 - Efficient Robust Conformal Prediction via Lipschitz-Bounded Networks
 - Efficient Skill Discovery via Regret-Aware Optimization
 - Efficient Source-free Unlearning via Energy-Guided Data Synthesis and Discrimination-Aware Multitask Optimization
 - Efficient Time Series Processing for Transformers and State-Space Models through Token Merging
 - EffiCoder: Enhancing Code Generation in Large Language Models through Efficiency-Aware Fine-tuning
 - e-GAI: e-value-based Generalized $\alpha$-Investing for Online False Discovery Rate Control
 - EgoPrivacy: What Your First-Person Camera Says About You?
 - EGPlace: An Efficient Macro Placement Method via Evolutionary Search with Greedy Repositioning Guided Mutation
 - Ehrenfeucht-Haussler Rank and Chain of Thought
 - Eigen Analysis of Conjugate Kernel and Neural Tangent Kernel
 - Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias
 - E-LDA: Toward Interpretable LDA Topic Models with Strong Guarantees in Logarithmic Parallel Time
 - ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics
 - Eliciting Language Model Behaviors with Investigator Agents
 - ELITE: Enhanced Language-Image Toxicity Evaluation for Safety
 - ELMO : Efficiency via Low-precision and Peak Memory Optimization in Large Output Spaces
 - ELoRA: Low-Rank Adaptation for Equivariant GNNs
 - Elucidating Flow Matching ODE Dynamics via Data Geometry and Denoisers
 - Elucidating the design space of language models for image generation
 - Elucidating the Design Space of Multimodal Protein Language Models
 - Embedding Safety into RL: A New Take on Trust Region Methods
 - EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
 - Emergence and Effectiveness of Task Vectors in In-Context Learning: An Encoder Decoder Perspective
 - Emergence in non-neural models: grokking modular arithmetic via average gradient outer product
 - Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
 - Emergent Response Planning in LLMs
 - Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models
 - EmoGrowth: Incremental Multi-label Emotion Decoding with Augmented Emotional Relation Graph
 - Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection
 - Emotional Face-to-Speech
 - Empirical Design in Reinforcement Learning
 - Empirical Privacy Variance
 - Empowering World Models with Reflection for Embodied Video Prediction
 - Empower Structure-Based Molecule Optimization with Gradient Guided Bayesian Flow Networks
 - Enabling Optimal Decisions in Rehearsal Learning under CARE Condition
 - ENAHPool: The Edge-Node Attention-based Hierarchical Pooling for Graph Neural Networks
 - EncryptedLLM: Privacy-Preserving Large Language Model Inference via GPU-Accelerated Fully Homomorphic Encryption
 - End-to-End Learning Framework for Solving Non-Markovian Optimal Control
 - Energy-Based Flow Matching for Generating 3D Molecular Structure
 - Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model
 - Enforcing Idempotency in Neural Networks
 - Enforcing Latent Euclidean Geometry in Single-Cell VAEs for Manifold Interpolation
 - Enhancing Adversarial Robustness with Conformal Prediction: A Framework for Guaranteed Model Reliability
 - Enhancing Certified Robustness via Block Reflector Orthogonal Layers and Logit Annealing Loss
 - Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration
 - Enhancing Decision-Making of Large Language Models via Actor-Critic
 - Enhancing Diversity In Parallel Agents: A Maximum State Entropy Exploration Story
 - Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization
 - Enhancing Foundation Models with Federated Domain Knowledge Infusion
 - Enhancing Graph Contrastive Learning for Protein Graphs from Perspective of Invariance
 - Enhancing Graph Invariant Learning from a Negative Inference Perspective
 - Enhancing Ligand Validity and Affinity in Structure-Based Drug Design with Multi-Reward Optimization
 - Enhancing Logits Distillation with Plug&Play Kendall's $\tau$ Ranking Loss
 - Enhancing Parallelism in Decentralized Stochastic Convex Optimization
 - Enhancing Performance of Explainable AI Models with Constrained Concept Refinement
 - Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models
 - Enhancing Spectral GNNs: From Topology and Perturbation Perspectives
 - Enhancing Statistical Validity and Power in Hybrid Controlled Trials: A Randomization Inference Approach with Conformal Selective Borrowing
 - Enhancing Target-unspecific Tasks through a Features Matrix
 - Enhancing the Influence of Labels on Unlabeled Nodes in Graph Convolutional Networks
 - Enhancing Treatment Effect Estimation via Active Learning: A Counterfactual Covering Perspective
 - Enhancing Visual Localization with Cross-Domain Image Generation
 - EnIGMA: Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities
 - Ensemble Distribution Distillation via Flow Matching
 - Ensemble Learned Bloom Filters: Two Oracles are Better than One
 - EnsLoss: Stochastic Calibrated Loss Ensembles for Preventing Overfitting in Classification
 - ENSUR: Equitable and Statistically Unbiased Recommendation
 - EPIC: Efficient Position-Independent Caching for Serving Large Language Models
 - EpiCoder: Encompassing Diversity and Complexity in Code Generation
 - Epsilon-VAE: Denoising as Visual Decoding
 - Equivalence is All: A Unified View for Self-supervised Graph Learning
 - EquivaMap: Leveraging LLMs for Automatic Equivalence Checking of Optimization Formulations
 - Equivariant Neural Tangent Kernels
 - Equivariant Polynomial Functional Networks
 - EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
 - EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers
 - Ergodic Generative Flows
 - ERICT: Enhancing Robustness by Identifying Concept Tokens in Zero-Shot Vision Language Models
 - Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems
 - ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models
 - ESPFormer: Doubly-Stochastic Attention with Expected Sliced Transport Plans
 - ETTA: Elucidating the Design Space of Text-to-Audio Models
 - Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
 - Evaluating LLMs Across Multi-Cognitive Levels: From Medical Knowledge Mastery to Scenario-Based Problem Solving
 - Evaluating Neuron Explanations: A Unified Framework with Sanity Checks
 - Event-Customized Image Generation
 - Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition
 - EvFocus: Learning to Reconstruct Sharp Images from Out-of-Focus Event Streams
 - EvoControl: Multi-Frequency Bi-Level Control for High-Frequency Continuous Control
 - EVOLvE: Evaluating and Optimizing LLMs For In-Context Exploration
 - Evolving Minds: Logic-Informed Inference from Temporal Action Patterns
 - Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective
 - EvoMesh: Adaptive Physical Simulation with Hierarchical Graph Evolutions
 - EvoPress: Accurate Dynamic Model Compression via Evolutionary Search
 - Exactly Tight Information-theoretic Generalization Bounds via Binary Jensen-Shannon Divergence
 - Exact Recovery of Sparse Binary Vectors from Generalized Linear Measurements
 - Exact risk curves of signSGD in High-Dimensions: quantifying preconditioning and noise-compression effects
 - Exact Upper and Lower Bounds for the Output Distribution of Neural Networks with Random Inputs
 - ExLM: Rethinking the Impact of $\texttt{[MASK]}$ Tokens in Masked Language Models
 - Exogenous Isomorphism for Counterfactual Identifiability
 - Expected Variational Inequalities
 - Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts
 - Explainable Concept Generation through Vision-Language Preference Learning for Understanding Neural Networks' Internal Representations
 - Explaining, Fast and Slow: Abstraction and Refinement of Provable Explanations
 - Explaining the role of Intrinsic Dimensionality in Adversarial Training
 - Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization
 - Explicit Discovery of Nonlinear Symmetries from Dynamic Data
 - Explicit Exploration for High-Welfare Equilibria in Game-Theoretic Multiagent Reinforcement Learning
 - Explicit Preference Optimization: No Need for an Implicit Reward Model
 - Exploiting Curvature in Online Convex Optimization with Delayed Feedback
 - Exploiting Presentative Feature Distributions for Parameter-Efficient Continual Learning of Large Language Models
 - Exploiting Similarity for Computation and Communication-Efficient Decentralized Optimization
 - ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts
 - Exploration in AI Today (EXAIT)
 - Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
 - Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning
 - Exploring Invariance in Images through One-way Wave Equations
 - Exploring Large Action Sets with Hyperspherical Embeddings using von Mises-Fisher Sampling
 - Exploring Representations and Interventions in Time Series Foundation Models
 - Exploring Vision Semantic Prompt for Efficient Point Cloud Understanding
 - Exponential Family Variational Flow Matching for Tabular Data Generation
 - ExpProof : Operationalizing Explanations for Confidential Models with ZKPs
 - Expressive Power of Graph Neural Networks for (Mixed-Integer) Quadratic Programs
 - Expressive Score-Based Priors for Distribution Matching with Geometry-Preserving Regularization
 - ExtPose: Robust and Coherent Pose Estimation by Extending ViTs
 - Extracting Rare Dependence Patterns via Adaptive Sample Reweighting
 - Extractive Structures Learned in Pretraining Enable Generalization on Finetuned Facts
 - Extreme Value Policy Optimization for Safe Reinforcement Learning
 - Ex-VAD: Explainable Fine-grained Video Anomaly Detection Based on Visual-Language Models
 - FAB-PPI: Frequentist, Assisted by Bayes, Prediction-Powered Inference
 - FACTER: Fairness-Aware Conformal Thresholding and Prompt Engineering for Enabling Fair LLM-Based Recommender Systems
 - FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees
 - Fair Clustering via Alignment
 - FairICP: Encouraging Equalized Odds via Inverse Conditional Permutation
 - Fairness on Principal Stratum: A New Perspective on Counterfactual Fairness
 - Fairness Overfitting in Machine Learning: An Information-Theoretic Perspective
 - FairPFN: A Tabular Foundation Model for Causal Fairness
 - Falcon: Fast Visuomotor Policies via Partial Denoising
 - False Coverage Proportion Control for Conformal Prediction
 - Falsification of Unconfoundedness by Testing Independence of Causal Mechanisms
 - Fast, Accurate Manifold Denoising by Tunneling Riemannian Optimization
 - Fast and Low-Cost Genomic Foundation Models via Outlier Removal
 - Fast and Provable Algorithms for Sparse PCA with Improved Sample Complexity
 - Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments
 - FastCAV: Efficient Computation of Concept Activation Vectors for Explaining Deep Neural Networks
 - Faster and Stronger: When ANN-SNN Conversion Meets Parallel Spiking Calculation
 - Faster Approximation Algorithms for k-Center via Data Reduction
 - Faster Global Minimum Cut with Predictions
 - Faster Rates for Private Adversarial Bandits
 - Faster Stochastic Optimization with Arbitrary Delays via Adaptive Asynchronous Mini-Batching
 - Fast Estimation of Partial Dependence Functions using Trees
 - Fast Exact Unlearning for In-Context Learning Data for LLMs
 - Fast Incomplete Multi-view Clustering by Flexible Anchor Learning
 - Fast Inference with Kronecker-Sparse Matrices
 - Fast Large Language Model Collaborative Decoding via Speculation
 - Fast Min-$\epsilon$ Segmented Regression using Constant-Time Segment Merging
 - Fast Tensor Completion via Approximate Richardson Iteration
 - Fast Video Generation with Sliding Tile Attention
 - FDGen: A Fairness-Aware Graph Generation Model
 - Feasible Action Search for Bandit Linear Programs via Thompson Sampling
 - FEAT-KD: Learning Concise Representations for Single and Multi-Target Regression via TabNet Knowledge Distillation
 - FeatSharp: Your Vision Model Features, Sharper
 - Feature Importance Metrics in the Presence of Missing Data
 - Feature Learning beyond the Lazy-Rich Dichotomy: Insights from Representational Geometry
 - Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions
 - Feature-Mapping Topology Optimization with Neural Heaviside Signed Distance Functions
 - Feature out! Let Raw Image as Your Condition for Blind Face Restoration
 - Features are fate: a theory of transfer learning in high-dimensional regression
 - Feature Shift Localization Network
 - FedBEns: One-Shot Federated Learning based on Bayesian Ensemble
 - FedClean: A General Robust Label Noise Correction for Federated Learning
 - FedECADO: A Dynamical System Model of Federated Learning
 - Federated Causal Structure Learning with Non-identical Variable Sets
 - Federated Disentangled Tuning with Textual Prior Decoupling and Visual Dynamic Adaptation
 - Federated Generalised Variational Inference: A Robust Probabilistic Federated Learning Framework
 - Federated Incomplete Multi-view Clustering with Globally Fused Graph Guidance
 - Federated In-Context Learning: Iterative Refinement for Improved Answer Quality
 - Federated Learning for Feature Generalization with Convex Constraints
 - Federated Node-Level Clustering Network with Cross-Subgraph Link Mending
 - Federated Oriented Learning: A Practical One-Shot Personalized Federated Learning Framework
 - FedOne: Query-Efficient Federated Learning for Black-box Discrete Prompt Learning
 - FedPHA: Federated Prompt Learning for Heterogeneous Client Adaptation
 - FedSMU: Communication-Efficient and Generalization-Enhanced Federated Learning through Symbolic Model Updates
 - FedSSI: Rehearsal-Free Continual Federated Learning with Synergistic Synaptic Intelligence
 - Feedforward Few-shot Species Range Estimation
 - Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models
 - Few-Shot Learner Generalizes Across AI-Generated Image Detection
 - Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts
 - FG-CLIP: Fine-Grained Visual and Textual Alignment
 - FicGCN: Unveiling the Homomorphic Encryption Efficiency from Irregular Graph Convolutional Networks
 - FIC-TSC: Learning Time Series Classification with Fisher Information Constraint
 - Field Matching: an Electrostatic Paradigm to Generate and Transfer Data
 - Finding Wasserstein Ball Center: Efficient Algorithm and The Applications in Fairness
 - Fine-Grained Captioning of Long Videos through Scene Graph Consolidation
 - Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean Field Games
 - Finite-Time Analysis of Discrete-Time Stochastic Interpolants
 - Finite-Time Convergence Rates in Stochastic Stackelberg Games with Smooth Algorithmic Agents
 - Finite-Time Global Optimality Convergence in Deep Neural Actor-Critic Methods for Decentralized Multi-Agent Reinforcement Learning
 - FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing
 - Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator
 - FisherSFT: Data-Efficient Supervised Fine-Tuning of Language Models Using Information Gain
 - Fixed-Confidence Multiple Change Point Identification under Bandit Feedback
 - Fixing the Double Penalty in Data-Driven Weather Forecasting Through a Modified Spherical Harmonic Loss Function
 - Fixing the Loose Brake: Exponential-Tailed Stopping Time in Best Arm Identification
 - FLAM: Frame-Wise Language-Audio Modeling
 - FlashTP: Fused, Sparsity-Aware Tensor Product for Machine Learning Interatomic Potentials
 - Flat-LoRA: Low-Rank Adaptation over a Flat Loss Landscape
 - FlatQuant: Flatness Matters for LLM Quantization
 - Fleet of Agents: Coordinated Problem Solving with Large Language Models
 - Flex3D: Feed-Forward 3D Generation with Flexible Reconstruction Model and Input View Curation
 - FlexControl: Computation-Aware Conditional Control with Differentiable Router for Text-to-Image Generation
 - Flexibility-conditioned protein structure design with flow matching
 - Flexible and Efficient Grammar-Constrained Decoding
 - Flexible, Efficient, and Stable Adversarial Attacks on Machine Unlearning
 - Flexible Tails for Normalizing Flows
 - FlexiClip: Locality-Preserving Free-Form Character Animation
 - FlexiReID: Adaptive Mixture of Expert for Multi-Modal Person Re-Identification
 - FlexTok: Resampling Images into 1D Token Sequences of Flexible Length
 - FlipAttack: Jailbreak LLMs via Flipping
 - Floating-Point Neural Networks Can Represent Almost All Floating-Point Functions
 - FloE: On-the-Fly MoE Inference on Memory-constrained GPU
 - Flopping for FLOPs: Leveraging Equivariance for Computational Efficiency
 - FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching
 - Flow-based Domain Randomization for Learning and Sequencing Robotic Skills
 - FlowDrag: 3D-aware Drag-based Image Editing with Mesh-guided Deformation Vector Flow Fields
 - Flow-field inference from neural data using deep recurrent networks
 - Flowing Datasets with Wasserstein over Wasserstein Gradient Flows
 - Flowing Through Continuous-Time Generative Models: A Clear and Systematic Tour
 - Flow Matching for Denoised Social Recommendation
 - Flow Matching for Few-Trial Neural Adaptation with Stable Latent Dynamics
 - Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options
 - Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
 - Flow Q-Learning
 - Fluctuations of the largest eigenvalues of transformed spiked Wigner matrices
 - Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification
 - FOCoOp: Enhancing Out-of-Distribution Robustness in Federated Prompt Learning for Vision-Language Models
 - Focus On This, Not That! Steering LLMs with Adaptive Feature Specification
 - Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
 - Foundation Model Insights and a Multi-Model Approach for Superior Fine-Grained One-shot Subset Selection
 - Foundation Molecular Grammar: Multi-Modal Foundation Models Induce Interpretable Molecular Graph Languages
 - FOUNDER: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making
 - FourierMamba: Fourier Learning Integration with State Space Models for Image Deraining
 - Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization
 - Fragments to Facts: Partial-Information Fragment Inference from LLMs
 - FrameBridge: Improving Image-to-Video Generation with Bridge Models
 - Fraud-Proof Revenue Division on Subscription Platforms
 - FreeMesh: Boosting Mesh Generation with Coordinates Merging
 - Free Process Rewards without Process Labels
 - Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
 - From Black Boxes to Transparent Minds: Evaluating and Enhancing the Theory of Mind in Multimodal Large Language Models
 - From Complex to Atomic: Enhancing Augmented Generation via Knowledge-Aware Dual Rewriting and Reasoning
 - From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline
 - From Debate to Equilibrium: Belief‑Driven Multi‑Agent LLM Reasoning via Bayesian Nash Equilibrium
 - From Feature Interaction to Feature Generation: A Generative Paradigm of CTR Prediction Models
 - From Individual Experience to Collective Evidence: A Reporting-Based Framework for Identifying Systemic Harms
 - From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set
 - From Kernels to Features: A Multi-Scale Adaptive Theory of Feature Learning
 - From Language Models over Tokens to Language Models over Characters
 - From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection
 - From Logits to Hierarchies: Hierarchical Clustering made Simple
 - From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications
 - From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models
 - From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?
 - From Pixels to Perception: Interpretable Predictions via Instance-wise Grouped Feature Selection
 - From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
 - From Spectrum-free towards Baseline-view-free: Double-track Proximity Driven Multi-view Clustering
 - From Theory to Practice: Rethinking Green and Martin Kernels for Unleashing Graph Transformers
 - From Thousands to Billions: 3D Visual Language Grounding via Render-Supervised Distillation from 2D VLMs
 - From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining
 - From Uncertain to Safe: Conformal Adaptation of Diffusion Models for Safe PDE Control
 - From Weight-Based to State-Based Fine-Tuning: Further Memory Reduction on LoRA with Parallel Control
 - FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training
 - FSL-SAGE: Accelerating Federated Split Learning via Smashed Activation Gradient Estimation
 - FSTLLM: Spatio-Temporal LLM for Few Shot Time Series Forecasting
 - Fully Dynamic Embedding into $\ell_p$ Spaces
 - Fully Dynamic Euclidean Bi-Chromatic Matching in Sublinear Update Time
 - Fully Heteroscedastic Count Regression with Deep Double Poisson Networks
 - FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch
 - Functional Alignment Can Mislead: Examining Model Stitching
 - Function Encoders: A Principled Approach to Transfer Learning in Hilbert Spaces
 - Function-Space Learning Rates
 - Function-to-Style Guidance of LLMs for Code Translation
 - Fundamental Bias in Inverting Random Sampling Matrices with Application to Sub-sampled Newton
 - Fundamental limits of learning in sequence multi-index models and deep attention networks: high-dimensional asymptotics and sharp thresholds
 - Fundamental Limits of Visual Autoregressive Transformers: Universal Approximation Abilities
 - FuseUNet: A Multi-Scale Feature Fusion Method for U-like Networks
 - Fusing Reward and Dueling Feedback in Stochastic Bandits
 - G-Adaptivity: optimised graph-based mesh relocation for finite element methods
 - Galileo: Learning Global & Local Features of Many Remote Sensing Modalities
 - Game-theoretic Statistics and Sequential Anytime-Valid Inference
 - Gamma Distribution PCA-Enhanced Feature Learning for Angle-Robust SAR Target Recognition
 - Gandalf the Red: Adaptive Security for LLMs
 - GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language Models
 - Gap-Dependent Bounds for Federated $Q$-Learning
 - GAPrompt: Geometry-Aware Point Cloud Prompt for 3D Vision Model
 - Gaussian Mixture Flow Matching Models
 - GaussMark: A Practical Approach for Structural Watermarking of Language Models
 - GaussMarker: Robust Dual-Domain Watermark for Diffusion Models
 - GCAL: Adapting Graph Models to Evolving Domain Shifts
 - G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks
 - GEFA: A General Feature Attribution Framework Using Proxy Gradient Estimation
 - General agents need world models
 - General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimization
 - Generalists vs. Specialists: Evaluating LLMs on Highly-Constrained Biophysical Sequence Optimization Tasks
 - Generalizable Multi-Camera 3D Object Detection from a Single Source via Fourier Cross-View Learning
 - Generalization Analysis for Controllable Learning
 - Generalization Analysis for Supervised Contrastive Representation Learning under Non-IID Settings
 - Generalization and Robustness of the Tilted Empirical Risk
 - Generalization Bounds via Meta-Learned Model Representations: PAC-Bayes and Sample Compression Hypernetworks
 - Generalization in Federated Learning: A Conditional Mutual Information Framework
 - Generalization of noisy SGD in unbounded non-convex settings
 - Generalization Performance of Ensemble Clustering: From Theory to Algorithm
 - Generalization Principles for Inference over Text-Attributed Graphs with Large Language Models
 - Generalized additive models via direct optimization of regularized decision stump forests
 - Generalized Category Discovery via Reciprocal Learning and Class-Wise Distribution Regularization
 - Generalized Interpolating Discrete Diffusion
 - Generalized Random Forests Using Fixed-Point Trees
 - Generalized Smooth Bilevel Optimization with Nonconvex Lower-Level
 - Generalized Venn and Venn-Abers Calibration with Applications in Conformal Prediction
 - Generalizing Causal Effects from Randomized Controlled Trials to Target Populations across Diverse Environments
 - Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
 - Generating Hypotheses of Dynamic Causal Graphs in Neuroscience: Leveraging Generative Factor Models of Observed Time Series
 - Generation from Noisy Examples
 - Generative AI Meets Reinforcement Learning
 - Generative AI's Collision with Copyright Law
 - Generative Audio Language Modeling with Continuous-valued Tokens and Masked Next-Token Prediction
 - Generative Data Mining with Longtail-Guided Diffusion
 - Generative Human Trajectory Recovery via Embedding-Space Conditional Diffusion
 - Generative Intervention Models for Causal Perturbation Modeling
 - Generative Modeling Reinvents Supervised Learning: Label Repurposing with Predictive Consistency Learning
 - Generative Point Cloud Registration
 - Generative Social Choice: The Next Generation
 - GenMol: A Drug Discovery Generalist with Discrete Diffusion
 - GenZSL: Generative Zero-Shot Learning Via Inductive Variational Autoencoder
 - Geometric Algebra Planes: Convex Implicit Neural Volumes
 - Geometric and Physical Constraints Synergistically Enhance Neural PDE Surrogates
 - Geometric Contact Flows: Contactomorphisms for Dynamics and Control
 - Geometric Feature Embedding for Effective 3D Few-Shot Class Incremental Learning
 - Geometric Generative Modeling with Noise-Conditioned Graph Networks
 - Geometric Hyena Networks for Large-scale Equivariant Learning
 - Geometric Median (GM) Matching for Robust k-Subset Selection from Noisy Data
 - Geometric Representation Condition Improves Equivariant Molecule Generation
 - Geometric Resampling in Nearly Linear Time for Follow-the-Perturbed-Leader with Best-of-Both-Worlds Guarantee in Bandit Problems
 - Geometry-Informed Neural Networks
 - Geometry Informed Tokenization of Molecules for Language Model Generation
 - GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing
 - GHOST: Generalizable One-Shot Federated Graph Learning with Proxy-Based Topology Knowledge Retention
 - GIVE: Structured Reasoning of Large Language Models with Knowledge Graph Inspired Veracity Extrapolation
 - GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric Algebras
 - Global Context-aware Representation Learning for Spatially Resolved Transcriptomics
 - Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $\mu$ Parametrization
 - Global curvature for second-order optimization of neural networks
 - Global-Local Dirichlet Processes for Clustering Grouped Data in the Presence of Group-Specific Idiosyncratic Variables
 - Global Optimization with a Power-Transformed Objective and Gaussian Smoothing
 - GMAIL: Generative Modality Alignment for generated Image Learning
 - Goal-Oriented Skill Abstraction for Offline Multi-Task Reinforcement Learning
 - Goal-Space Planning with Subgoal Models
 - Going Deeper into Locally Differentially Private Graph Neural Networks
 - GoIRL: Graph-Oriented Inverse Reinforcement Learning for Multimodal Trajectory Prediction
 - GPEN: Global Position Encoding Network for Enhanced Subgraph Representation Learning
 - GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric Calibration
 - GRADEO: Towards Human-Like Evaluation for Text-to-Video Generation via Multi-Step Reasoning
 - Gradient Aligned Regression via Pairwise Losses
 - Gradient-based Explanations for Deep Learning Survival Models
 - Gradient Boosting Reinforcement Learning
 - Gradient Descent Converges Arbitrarily Fast for Logistic Regression via Large and Adaptive Stepsizes
 - Gradient Flow Provably Learns Robust Classifiers for Orthonormal GMMs
 - Gradient Inversion of Multimodal Models
 - GradPS: Resolving Futile Neurons in Parameter Sharing Network for Multi-Agent Reinforcement Learning
 - Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning
 - GRAIL: Graph Edit Distance and Node Alignment using LLM-Generated Code
 - GRAM: A Generative Foundation Reward Model for Reward Generalization
 - Grammar-Forced Translation of Natural Language to Temporal Logic using LLMs
 - Graph4MM: Weaving Multimodal Learning with Structural Information
 - Graph Adaptive Autoregressive Moving Average Models
 - Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning
 - Graph Attention is Not Always Beneficial: A Theoretical Analysis of Graph Attention Mechanisms via Contextual Stochastic Block Models
 - Graph-Based Algorithms for Diverse Similarity Search
 - GraphCL: Graph-based Clustering for Semi-Supervised Medical Image Segmentation
 - Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models
 - Graph Diffusion for Robust Multi-Agent Coordination
 - Graph Generative Pre-trained Transformer
 - GraphGPT: Generative Pre-trained Graph Eulerian Transformer
 - Graph Inverse Style Transfer for Counterfactual Explainability
 - Graph Minimum Factor Distance and Its Application to Large-Scale Graph Data Clustering
 - Graph Neural Network Generalization With Gaussian Mixture Model Based Augmentation
 - Graph-Supported Dynamic Algorithm Configuration for Multi-Objective Combinatorial Optimization
 - Graph World Model
 - Gravity-Bench-v1: A Benchmark on Gravitational Physics Discovery for Agents
 - Great Models Think Alike and this Undermines AI Oversight
 - Gridded Transformer Neural Processes for Spatio-Temporal Data
 - Griffin: Towards a Graph-Centric Relational Database Foundation Model
 - GrokFormer: Graph Fourier Kolmogorov-Arnold Transformers
 - Grokking at the Edge of Linear Separability
 - Grokking Beyond the Euclidean Norm of Model Parameters
 - Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers
 - GRU: Mitigating the Trade-off between Unlearning and Retention for LLMs
 - GS-Bias: Global-Spatial Bias Learner for Single-Image Test-Time Adaptation of Vision-Language Models
 - G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration
 - GSM-$\infty$: How Do your LLMs Behave over Infinitely Increasing Reasoning Complexity and Context Length?
 - GTR: A General, Multi-View, and Dynamic Framework for Trajectory Representation Learning
 - Guarantees of a Preconditioned Subgradient Algorithm for Overparameterized Asymmetric Low-rank Matrix Recovery
 - GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning
 - Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality Metrics
 - GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance
 - Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents
 - Guided Structural Inference: Leveraging Priors with Soft Gating Mechanisms
 - Guided Zeroth-Order Methods for Stochastic Non-convex Problems with Decision-Dependent Distributions
 - Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding
 - Habitizing Diffusion Planning for Efficient and Effective Decision Making
 - HALoS: Hierarchical Asynchronous Local SGD over Slow Networks for Geo-Distributed Large Language Model Training
 - Handling Imbalanced Pseudolabels for Vision-Language Models with Concept Alignment and Confusion-Aware Calibrated Margin
 - HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding
 - Hardware and Software Platform Inference
 - HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration
 - Harmonizing Geometry and Uncertainty: Diffusion with Hyperspheres
 - Harnessing Heterogeneous Statistical Strength for Personalized Federated Learning via Hierarchical Bayesian Inference
 - Harnessing Low Dimensionality in Diffusion Models: From Theory to Practice
 - HashAttention: Semantic Sparsity for Faster Inference
 - Haste Makes Waste: A Simple Approach for Scaling Graph Neural Networks
 - Heads up! Large Language Models Can Perform Tasks Without Your Instruction via Selective Attention Head Masking
 - HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation
 - HEAP: Hyper Extended A-PDHG Operator for Constrained High-dim PDEs
 - Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update
 - Hessian Geometry of Latent Space in Generative Models
 - Heterogeneous Data Game: Characterizing the Model Competition Across Multiple Data Sources
 - Heterogeneous Label Shift: Theory and Algorithm
 - Heterogeneous Sufficient Dimension Reduction and Subspace Clustering
 - Heterogeneous Treatment Effect in Time-to-Event Outcomes: Harnessing Censored Data with Recursively Imputed Trees
 - HetSSNet: Spatial-Spectral Heterogeneous Graph Learning Network for Panchromatic and Multispectral Images Fusion
 - Hgformer: Hyperbolic Graph Transformer for Collaborative Filtering
 - HGOT: Self-supervised Heterogeneous Graph Neural Network with Optimal Transport
 - Hidden No More: Attacking and Defending Private Third-Party LLM Inference
 - Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It
 - Hierarchical Equivariant Policy via Frame Transfer
 - Hierarchical Graph Tokenization for Molecule-Language Alignment
 - Hierarchical Masked Autoregressive Models with Low-Resolution Token Pivots
 - Hierarchical Overlapping Clustering on Graphs: Cost Function, Algorithm and Scalability
 - Hierarchical Planning for Complex Tasks with Knowledge Graph-RAG and Symbolic Verification
 - Hierarchical Refinement: Optimal Transport to Infinity and Beyond
 - Hierarchical Reinforcement Learning with Targeted Causal Interventions
 - Hierarchical Reinforcement Learning with Uncertainty-Guided Diffusional Subgoals
 - High-Dimensional Prediction for Sequential Decision Making
 - High-Dimensional Tensor Regression With Oracle Properties
 - High Dynamic Range Novel View Synthesis with Single Exposure
 - High-Fidelity Simultaneous Speech-To-Speech Translation
 - Highly Compressed Tokenizer Can Generate Without Training
 - High Probability Bound for Cross-Learning Contextual Bandits with Unknown Context Distributions
 - Hi-Patch: Hierarchical Patch GNN for Irregular Multivariate Time Series
 - HiRemate: Hierarchical Approach for Efficient Re-materialization of Neural Networks
 - Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models
 - History-Guided Video Diffusion
 - Holistic Physics Solver: Learning PDEs in a Unified Spectral-Physical Space
 - Homophily Enhanced Graph Domain Adaptation
 - (How) Can Transformers Predict Pseudo-Random Numbers?
 - How Compositional Generalization and Creativity Improve as Diffusion Models are Trained
 - How Contaminated Is Your Benchmark? Measuring Dataset Leakage in Large Language Models with Kernel Divergence
 - How Distributed Collaboration Influences the Diffusion Model Training? A Theoretical Perspective
 - How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction
 - How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation
 - (How) Do Language Models Track State?
 - How Do Large Language Monkeys Get Their Power (Laws)?
 - How Do Transformers Learn Variable Binding in Symbolic Programs?
 - How Effective Can Dropout Be in Multiple Instance Learning ?
 - How Expressive are Knowledge Graph Foundation Models?
 - How Far Is Video Generation from World Model: A Physical Law Perspective
 - How Much Can Transfer? BRIDGE: Bounded Multi-Domain Graph Foundation Model with Generalization Guarantees
 - How Much Can We Forget about Data Contamination?
 - How to Evaluate and Mitigate IP Infringement in Visual Generative AI?
 - How to Move Your Dragon: Text-to-Motion Synthesis for Large-Vocabulary Objects
 - How to set AdamW's weight decay as you scale model and dataset size
 - How to Synthesize Text Data without Model Collapse?
 - How to Train Your Multi-Exit Model? Analyzing the Impact of Training Strategies
 - How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
 - How Transformers Learn Structured Data: Insights From Hierarchical Filtering
 - HPS: Hard Preference Sampling for Human Preference Alignment
 - H-Tuning: Toward Low-Cost and Efficient ECG-based Cardiovascular Disease Detection with Pre-Trained Models
 - Human-Aligned Image Models Improve Visual Decoding from the Brain
 - Human Body Restoration with One-Step Diffusion Model and A New Benchmark
 - Human Cognition-Inspired Hierarchical Fuzzy Learning Machine
 - Hybrid Batch Normalisation: Resolving the Dilemma of Batch Normalisation in Federated Learning
 - HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder
 - Hybrid Quantum-Classical Multi-Agent Pathfinding
 - Hybrid Spiking Vision Transformer for Object Detection with Event Cameras
 - HYGMA: Hypergraph Coordination Networks with Dynamic Grouping for Multi-Agent Reinforcement Learning
 - Hyperband-based Bayesian Optimization for Black-box Prompt Selection
 - Hyperbolic-PDE GNN: Spectral Graph Neural Networks in the Perspective of A System of Hyperbolic Partial Differential Equations
 - Hyper: Hyperparameter Robust Efficient Exploration in Reinforcement Learning
 - HyperIMTS: Hypergraph Neural Network for Irregular Multivariate Time Series Forecasting
 - HyperIV: Real-time Implied Volatility Smoothing
 - HyperNear: Unnoticeable Node Injection Attacks on Hypergraph Neural Networks
 - Hyperspherical Normalization for Scalable Deep Reinforcement Learning
 - Hyper-Transforming Latent Diffusion Models
 - HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking
 - Hypo3D: Exploring Hypothetical Reasoning in 3D
 - Hypothesis Testing for Generalized Thurstone Models
 - IBCircuit: Towards Holistic Circuit Discovery with Information Bottleneck
 - ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks
 - ICML 2025 Workshop on Collaborative and Federated Agentic Workflows (CFAgentic @ ICML'25)
 - ICML 2025 Workshop on Computational Optimization of Buildings (CO-BUILD)
 - Identifiable Object Representations under Spatial Ambiguities
 - Identification of Latent Confounders via Investigating the Tensor Ranks of the Nonlinear Observations
 - Identifying and Understanding Cross-Class Features in Adversarial Training
 - Identifying biological perturbation targets through causal differential networks
 - Identifying Causal Direction via Variational Bayesian Compression
 - Identifying Metric Structures of Deep Latent Variable Models
 - Identifying Neural Dynamics Using Interventional State Space Models
 - Idiosyncrasies in Large Language Models
 - iDPA: Instance Decoupled Prompt Attention for Incremental Medical Object Detection
 - IL-SOAR : Imitation Learning with Soft Optimistic Actor cRitic
 - Imagine While Reasoning in Space: Multimodal Visualization-of-Thought
 - Imitation Learning from a Single Temporally Misaligned Video
 - IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling
 - Implicit Bias of Gradient Descent for Non-Homogeneous Deep Networks
 - Implicit degree bias in the link prediction task
 - Implicit Language Models are RNNs: Balancing Parallelization and Expressivity
 - Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent
 - Implicit Riemannian Optimism with Applications to Min-Max Problems
 - Implicit Subgraph Neural Network
 - Importance Corrected Neural JKO Sampling
 - Importance Sampling for Nonlinear Models
 - Impossible Videos
 - Improved Algorithm for Deep Active Learning under Imbalance via Optimal Separation
 - Improved and Oracle-Efficient Online $\ell_1$-Multicalibration
 - Improved Approximations for Hard Graph Problems using Predictions
 - Improved Coresets for Vertical Federated Learning: Regularized Linear and Logistic Regressions
 - Improved Discretization Complexity Analysis of Consistency Models: Variance Exploding Forward Process and Decay Discretization Scheme
 - Improved Expressivity of Hypergraph Neural Networks through High-Dimensional Generalized Weisfeiler-Leman Algorithms
 - Improved Last-Iterate Convergence of Shuffling Gradient Methods for Nonsmooth Convex Optimization
 - Improved Learning via k-DTW: A Novel Dissimilarity Measure for Curves
 - Improved Lower Bounds for First-order Stochastic Non-convex Optimization under Markov Sampling
 - Improved Off-policy Reinforcement Learning in Biological Sequence Design
 - Improved Online Confidence Bounds for Multinomial Logistic Bandits
 - Improved Regret Analysis in Gaussian Process Bandits: Optimality for Noiseless Reward, RKHS norm, and Non-Stationary Variance
 - Improved Sample Complexity for Private Nonsmooth Nonconvex Optimization
 - Improved Theoretically-Grounded Evolutionary Algorithms for Subset Selection with a Linear Cost Constraint
 - Improving Compositional Generation with Diffusion Models Using Lift Scores
 - Improving Consistency Models with Generator-Augmented Flows
 - Improving Continual Learning Performance and Efficiency with Auxiliary Classifiers
 - Improving Diversity in Language Models: When Temperature Fails, Change the Loss
 - Improving Flow Matching by Aligning Flow Divergence
 - Improving Generalization in Federated Learning with Highly Heterogeneous Data via Momentum-Based Stochastic Controlled Weight Averaging
 - Improving Generalization with Flat Hilbert Bayesian Inference
 - Improving LLM Safety Alignment with Dual-Objective Optimization
 - Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens
 - Improving LLM Video Understanding with 16 Frames Per Second
 - Improving Memory Efficiency for Training KANs via Meta Learning
 - Improving Model Alignment Through Collective Intelligence of Open-Source Models
 - Improving Multi-Class Calibration through Normalization-Aware Isotonic Techniques
 - Improving Multimodal Learning Balance and Sufficiency through Data Remixing
 - Improving Out-of-Distribution Detection via Dynamic Covariance Calibration
 - Improving Out-of-Distribution Detection with Markov Logic Networks
 - Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces
 - Improving Rationality in the Reasoning Process of Language Models through Self-playing Game
 - Improving Reward Model Generalization from Adversarial Process Enhanced Preferences
 - Improving Soft Unification with Knowledge Graph Embedding Methods
 - Improving the Continuity of Goal-Achievement Ability via Policy Self-Regularization for Goal-Conditioned Reinforcement Learning
 - Improving the Diffusability of Autoencoders
 - Improving the Effective Receptive Field of Message-Passing Neural Networks
 - Improving the Scaling Laws of Synthetic Data with Deliberate Practice
 - Improving the Statistical Efficiency of Cross-Conformal Prediction
 - Improving the Variance of Differentially Private Randomized Experiments through Clustering
 - Improving Transformer World Models for Data-Efficient RL
 - Improving Value Estimation Critically Enhances Vanilla Policy Gradient
 - Improving Your Model Ranking on Chatbot Arena by Vote Rigging
 - Improving Zero-Shot Adversarial Robustness in Vision-Language Models by Closed-form Alignment of Adversarial Path Simplices
 - IMTS is Worth Time $\times$ Channel Patches: Visual Masked Autoencoders for Irregular Multivariate Time Series Prediction
 - iN2V: Bringing Transductive Node Embeddings to Inductive Graphs
 - Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games
 - In-Context Adaptation to Concept Drift for Learned Database Operations
 - In-Context Deep Learning via Transformer Models
 - In-Context Denoising with One-Layer Transformers: Connections between Attention and Associative Memory Retrieval
 - In-Context Fine-Tuning for Time-Series Foundation Models
 - In-Context Learning and Occam's Razor
 - In-Context Learning as Conditioned Associative Memory Retrieval
 - In-Context Linear Regression Demystified: Training Dynamics and Mechanistic Interpretability of Multi-Head Softmax Attention
 - In-Context Reinforcement Learning From Suboptimal Historical Data
 - Incorporating Arbitrary Matrix Group Equivariance into KANs
 - Incremental Gradient Descent with Small Epoch Counts is Surprisingly Slow on Ill-Conditioned Problems
 - Independence Tests for Language Models
 - Inducing, Detecting and Characterising Neural Modules: A Pipeline for Functional Interpretability in Reinforcement Learning
 - Inductive Gradient Adjustment for Spectral Bias in Implicit Neural Representations
 - Inductive Moment Matching
 - InfAlign: Inference-aware language model alignment
 - Inference-Time Alignment of Diffusion Models with Direct Noise Optimization
 - Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models
 - Info-Coevolution: An Efficient Framework for Data Model Coevolution
 - InfoCons: Identifying Interpretable Critical Concepts in Point Clouds via Information Theory
 - Information Bottleneck-guided MLPs for Robust Spatial-temporal Forecasting
 - InfoSAM: Fine-Tuning the Segment Anything Model from An Information-Theoretic Perspective
 - InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference
 - INRFlow: Flow Matching for INRs in Ambient Space
 - Instance Correlation Graph-based Naive Bayes
 - Instance-Optimal Pure Exploration for Linear Bandits on Continuous Arms
 - Instruct2See: Learning to Remove Any Obstructions Across Distributions
 - Instruction-Following Pruning for Large Language Models
 - Integer Programming for Generalized Causal Bootstrap Designs
 - Integrating Intermediate Layer Optimization and Projected Gradient Descent for Solving Inverse Problems with Diffusion Models
 - Integration-free Kernels for Equivariant Gaussian Process Modelling
 - Interaction-Aware Gaussian Weighting for Clustered Federated Learning
 - Interchangeable Token Embeddings for Extendable Vocabulary and Alpha-Equivalence
 - Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors
 - Interpolating Neural Network-Tensor Decomposition (INN-TD): a scalable and interpretable approach for large-scale physics-based problems
 - Interpreting CLIP with Hierarchical Sparse Autoencoders
 - Interpreting the Repeated Token Phenomenon in Large Language Models
 - Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces
 - IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models
 - Introducing 3D Representation for Dense Volume-to-Volume Translation via Score Fusion
 - Invariance Makes LLM Unlearning Resilient Even to Unanticipated Downstream Fine-Tuning
 - Invariant Deep Uplift Modeling for Incentive Assignment in Online Marketing via Probability of Necessity and Sufficiency
 - Inverse Bridge Matching Distillation
 - Inverse Flow and Consistency Models
 - Inverse Optimization via Learning Feasible Regions
 - Inverse Problem Sampling in Latent Space Using Sequential Monte Carlo
 - Inverse problems with experiment-guided AlphaFold
 - Inverse Reinforcement Learning with Switching Rewards and History Dependency for Characterizing Animal Behaviors
 - Investigating Non-Transitivity in LLM-as-a-Judge
 - Investigating the Overlooked Hessian Structure: From CNNs to LLMs
 - IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models
 - Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment
 - Is Complex Query Answering Really Complex?
 - Is Noise Conditioning Necessary for Denoising Generative Models?
 - Isolated Causal Effects of Natural Language
 - Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs
 - IT$^3$: Idempotent Test-Time Training
 - ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks
 - Iterative Vectors: In-Context Gradient Steering without Backpropagation
 - ITFormer: Bridging Time Series and Natural Language for Multi-Modal QA with Large-Scale Multitask Dataset
 - I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
 - It's My Data Too: Private ML for Datasets with Multi-User Training Examples
 - Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
 - Jailbreaking LLMs and Agentic Systems: Attacks, Defenses, and Evaluations
 - Janus: Dual-Server Multi-Round Secure Aggregation with Verifiability for Federated Learning
 - Joint Learning of Energy-based Models and their Partition Function
 - Joint Localization and Activation Editing for Low-Resource Fine-Tuning
 - Joint Metric Space Embedding by Unbalanced Optimal Transport with Gromov–Wasserstein Marginal Penalization
 - Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient
 - Joker: Joint Optimization Framework for Lightweight Kernel Machines
 - Just Enough Shifts: Mitigating Over-Refusal in Aligned Language Models with Targeted Representation Fine-Tuning
 - K$^2$IE: Kernel Method-based Kernel Intensity Estimators for Inhomogeneous Poisson Processes
 - KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems
 - KAN-AD: Time Series Anomaly Detection with Kolmogorov–Arnold Networks
 - Kandinsky Conformal Prediction: Beyond Class- and Covariate-Conditional Coverage
 - KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search
 - KEA: Keeping Exploration Alive by Proactively Coordinating Exploration Strategies
 - Kernel-based Unsupervised Embedding Alignment for Enhanced Visual Representation in Vision-language Models
 - KernelBench: Can LLMs Write Efficient GPU Kernels?
 - Kernel Quantile Embeddings and Associated Probability Metrics
 - KGMark: A Diffusion Watermark for Knowledge Graphs
 - KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors
 - KIND: Knowledge Integration and Diversion for Training Decomposable Models
 - Kinetic Langevin Diffusion for Crystalline Materials Generation
 - Knowledge-Guided Wasserstein Distributionally Robust Optimization
 - Knowledge Retention in Continual Model-Based Reinforcement Learning
 - Knowledge Swapping via Learning and Unlearning
 - Kona: An Efficient Privacy-Preservation Framework for KNN Classification by Communication Optimization
 - KoNODE: Koopman-Driven Neural Ordinary Differential Equations with Evolving Parameters for Time Series Analysis
 - KoopSTD: Reliable Similarity Analysis between Dynamical Systems via Approximating Koopman Spectrum with Timescale Decoupling
 - KV Shifting Attention Enhances Language Modeling
 - KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
 - L3A: Label-Augmented Analytic Adaptation for Multi-Label Class Incremental Learning
 - Label Distribution Propagation-based Label Completion for Crowdsourcing
 - LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models
 - LADA: Scalable Label-Specific CLIP Adapter for Continual Learning
 - Ladder-Residual: Parallelism-Aware Architecture for Accelerating Large Model Inference with Communication Overlapping
 - LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models
 - LaMAGIC2: Advanced Circuit Formulations for Language Model-Based Analog Topology Generation
 - LangDAug: Langevin Data Augmentation for Multi-Source Domain Generalization in Medical Image Segmentation
 - LangTime: A Language-Guided Unified Model for Time Series Forecasting with Proximal Policy Optimization
 - Language Models as Implicit Tree Search
 - Language Models May Verbatim Complete Text They Were Not Explicitly Trained On
 - Language Models over Canonical Byte-Pair Encodings
 - Laplace Transform Based Low-Complexity Learning of Continuous Markov Semigroups
 - LapSum - One Method to Differentiate Them All: Ranking, Sorting and Top-k Selection
 - LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs – No Silver Bullet for LC or RAG Routing
 - Large Continual Instruction Assistant
 - Large Displacement Motion Transfer with Unsupervised Anytime Interpolation
 - Large Language-Geometry Model: When LLM meets Equivariance
 - Large Language Model-driven Large Neighborhood Search for Large-Scale MILP Problems
 - Large Language Models are Demonstration Pre-Selectors for Themselves
 - Large Language Models to Diffusion Finetuning
 - Larger or Smaller Reward Margins to Select Preferences for LLM Alignment?
 - LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence
 - La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation
 - LASER: Attention with Exponential Transformation
 - LAST SToP for Modeling Asynchronous Time Series
 - Latent Action Learning Requires Supervision in the Presence of Distractors
 - Latent Diffusion Planning for Imitation Learning
 - Latent Imputation before Prediction: A New Computational Paradigm for De Novo Peptide Sequencing
 - Latent Mamba Operator for Partial Differential Equations
 - Latent Preference Coding: Aligning Large Language Models via Discrete Latent Codes
 - Latent Score-Based Reweighting for Robust Classification on Imbalanced Tabular Data
 - Latent Thought Models with Variational Bayes Inference-Time Computation
 - Latent Variable Causal Discovery under Selection Bias
 - Latent Variable Estimation in Bayesian Black-Litterman Models
 - LAuReL: Learned Augmented Residual Layer
 - Layer by Layer: Uncovering Hidden Representations in Language Models
 - Layer-wise Alignment: Examining Safety Alignment Across Image Encoder Layers in Vision Language Models
 - Layer-wise Quantization for Quantized Optimistic Dual Averaging
 - LBI-FL: Low-Bit Integerized Federated Learning with Temporally Dynamic Bit-Width Allocation
 - L-Diffusion: Laplace Diffusion for Efficient Pathology Image Segmentation
 - LDMol: A Text-to-Molecule Diffusion Model with Structurally Informative Latent Space Surpasses AR Models
 - Lean and Mean Adaptive Optimization via Subset-Norm and Subspace-Momentum with Convergence Guarantees
 - LEAPS: A discrete neural sampler via locally equivariant networks
 - Learnable Spatial-Temporal Positional Encoding for Link Prediction
 - Learn Beneficial Noise as Graph Augmentation
 - Learn from Downstream and Be Yourself in Multimodal Large Language Models Fine-Tuning
 - Learngene Tells You How to Customize: Task-Aware Parameter Initialization at Flexible Scales
 - Learning Adaptive Lighting via Channel-Aware Guidance
 - Learning Adversarial MDPs with Stochastic Hard Constraints
 - Learning Along the Arrow of Time: Hyperbolic Geometry for Backward-Compatible Representation Learning
 - Learning Attribute-Aware Hash Codes for Fine-Grained Image Retrieval via Query Optimization
 - Learning-Augmented Algorithms for MTS with Bandit Access to Multiple Predictors
 - Learning-Augmented Hierarchical Clustering
 - Learning Bayesian Nash Equilibrium in Auction Games via Approximate Best Response
 - Learning Cascade Ranking as One Network
 - Learning Changes in Graphon Attachment Network Models
 - Learning Classifiers That Induce Markets
 - Learning Compact Semantic Information for Incomplete Multi-View Missing Multi-Label Classification
 - Learning Condensed Graph via Differentiable Atom Mapping for Reaction Yield Prediction
 - Learning Configurations for Data-Driven Multi-Objective Optimization
 - Learning Curves of Stochastic Gradient Descent in Kernel Regression
 - Learning curves theory for hierarchically compositional data with power-law distributed features
 - Learning Distances from Data with Normalizing Flows and Score Matching
 - Learning Distribution-wise Control in Representation Space for Language Models
 - Learning Dynamics in Continual Pre-Training for Large Language Models
 - Learning dynamics in linear recurrent neural networks
 - Learning Dynamics under Environmental Constraints via Measurement-Induced Bundle Structures
 - Learning Efficient Robotic Garment Manipulation with Standardization
 - Learning Event Completeness for Weakly Supervised Video Anomaly Detection
 - Learning Extrapolative Sequence Transformations from Markov Chains
 - Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning
 - Learning from others' mistakes: Finetuning machine translation models with span-level error annotations
 - Learning from Sample Stability for Deep Clustering
 - Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network
 - Learning from True-False Labels via Multi-modal Prompt Retrieving
 - Learning Fused State Representations for Control from Multi-View Observations
 - Learning Gaussian DAG Models without Condition Number Bounds
 - Learning Imbalanced Data with Beneficial Label Noise
 - Learning Imperfect Information Extensive-form Games with Last-iterate Convergence under Bandit Feedback
 - Learning In-context $n$-grams with Transformers: Sub-$n$-grams Are Near-Stationary Points
 - Learning Initial Basis Selection for Linear Programming via Duality-Inspired Tripartite Graph Representation and Comprehensive Supervision
 - Learning Input Encodings for Kernel-Optimal Implicit Neural Representations
 - Learning Invariant Causal Mechanism from Vision-Language Models
 - Learning Joint Interventional Effects from Single-Variable Interventions in Additive Models
 - Learning Latent Graph Structures and their Uncertainty
 - Learning Likelihood-Free Reference Priors
 - Learning Mean Field Control on Sparse Graphs
 - Learning Minimum-Size BDDs: Towards Efficient Exact Algorithms
 - Learning Mixtures of Experts with EM: A Mirror Descent Perspective
 - Learning Monotonic Probabilities with a Generative Cost Model
 - Learning Multi-Level Features with Matryoshka Sparse Autoencoders
 - Learning multivariate Gaussians with imperfect advice
 - Learning Optimal Multimodal Information Bottleneck Representations
 - Learning-Order Autoregressive Models with Application to Molecular Graph Generation
 - Learning Parametric Distributions from Samples and Preferences
 - Learning Policy Committees for Effective Personalization in MDPs with Diverse Tasks
 - Learning Progress Driven Multi-Agent Curriculum
 - Learning Representations of Instruments for Partial Identification of Treatment Effects
 - Learning Robust Neural Processes with Risk-Averse Stochastic Optimization
 - Learning Safe Control via On-the-Fly Bandit Exploration
 - Learning Safe Strategies for Value Maximizing Buyers in Uniform Price Auctions
 - Learning Safety Constraints for Large Language Models
 - Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
 - Learning Single Index Models with Diffusion Priors
 - Learning Smooth and Expressive Interatomic Potentials for Physical Property Prediction
 - Learning Soft Sparse Shapes for Efficient Time-Series Classification
 - Learning State-Based Node Representations from a Class Hierarchy for Fine-Grained Open-Set Detection
 - Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization
 - Learning Survival Distributions with the Asymmetric Laplace Distribution
 - Learning the Electronic Hamiltonian of Large Atomic Structures
 - Learning the RoPEs: Better 2D and 3D Position Encodings with STRING
 - Learning Time-Aware Causal Representation for Model Generalization in Evolving Domains
 - Learning Time-Varying Multi-Region Brain Communications via Scalable Markovian Gaussian Processes
 - Learning to Generate Projections for Reducing Dimensionality of Heterogeneous Linear Programming Problems
 - Learning to Incentivize in Repeated Principal-Agent Problems with Adversarial Agent Arrivals
 - Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding
 - Learning to (Learn at Test Time): RNNs with Expressive Hidden States
 - Learning to Match Unpaired Data with Minimum Entropy Coupling
 - Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
 - Learning to Quantize for Training Vector-Quantized Networks
 - Learning to Reuse Policies in State Evolvable Environments
 - Learning to Route LLMs with Confidence Tokens
 - Learning to Steer Learners in Games
 - Learning to Stop: Deep Learning for Mean Field Optimal Stopping
 - Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL
 - Learning Utilities from Demonstrations in Markov Decision Processes
 - Learning Vision and Language Concepts for Controllable Image Generation
 - Learning with Exact Invariances in Polynomial Time
 - Learning with Expected Signatures: Theory and Applications
 - Learning With Multi-Group Guarantees For Clusterable Subpopulations
 - Learning without Isolation: Pathway Protection for Continual Learning
 - Learning with Selectively Labeled Data from Multiple Decision-makers
 - Learn Singularly Perturbed Solutions via Homotopy Dynamics
 - Learn to Vaccinate: Combining Structure Learning and Effective Vaccination for Epidemic and Outbreak Control
 - Learnware Specification via Dual Alignment
 - Lego Sketch: A Scalable Memory-augmented Neural Network for Sketching Data Streams
 - LEMoN: Label Error Detection using Multimodal Neighbors
 - LensLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
 - Less is More: Federated Graph Learning with Alleviating Topology Heterogeneity from A Causal Perspective
 - Let LLM Tell What to Prune and How Much to Prune
 - LETS Forecast: Learning Embedology for Time Series Forecasting
 - Leveraging Diffusion Model as Pseudo-Anomalous Graph Generator for Graph-Level Anomaly Detection
 - Leveraging Model Guidance to Extract Training Data from Personalized Diffusion Models
 - Leveraging Offline Data in Linear Latent Contextual Bandits
 - Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation
 - Leveraging Per-Instance Privacy for Machine Unlearning
 - Leveraging Predictive Equivalence in Decision Trees
 - Leveraging Randomness in Model and Data Partitioning for Privacy Amplification
 - Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration
 - Leveraging Sparsity for Sample-Efficient Preference Learning: A Theoretical Perspective
 - LEVIS: Large Exact Verifiable Input Spaces for Neural Networks
 - Lexico: Extreme KV Cache Compression via Sparse Coding over Universal Dictionaries
 - LGDM: Latent Guidance in Diffusion Models for Perceptual Evaluations
 - LieRE: Lie Rotational Positional Encodings
 - LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning
 - Liger: Linearizing Large Language Models to Gated Recurrent Structures
 - LightGTS: A Lightweight General Time Series Forecasting Model
 - LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos
 - Lightspeed Geometric Dataset Distance via Sliced Optimal Transport
 - Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty
 - Lightweight-Mark: Rethinking Deep Learning-Based Watermarking
 - Lightweight Online Adaption for Time Series Foundation Model Forecasts
 - Lightweight Protocols for Distributed Private Quantile Estimation
 - LIMEFLDL: A Local Interpretable Model-Agnostic Explanations Approach for Label Distribution Learning
 - Limitations of measure-first protocols in quantum machine learning
 - Linear $Q$-Learning Does Not Diverge in $L^2$: Convergence Rates to a Bounded Set
 - Linear Bandits with Partially Observable Features
 - Linear Contextual Bandits With Interference
 - Linear convergence of Sinkhorn's algorithm for generalized static Schrödinger bridge
 - Linearization Turns Neural Operators into Function-Valued Gaussian Processes
 - Linear Mode Connectivity between Multiple Models modulo Permutation Symmetries
 - Linear Transformers as VAR Models: Aligning Autoregressive Attention Mechanisms with Autoregressive Forecasting
 - LineFlow: A Framework to Learn Active Control of Production Lines
 - LipsNet++: Unifying Filter and Controller into a Policy Network
 - LIVS: A Pluralistic Alignment Dataset for Inclusive Public Spaces
 - LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models
 - LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification
 - LLM Alignment as Retriever Optimization: An Information Retrieval Perspective
 - LLM-Assisted Semantically Diverse Teammate Generation for Efficient Multi-agent Coordination
 - LLM-Augmented Chemical Synthesis and Design Decision Programs
 - LLM Data Selection and Utilization via Dynamic Bi-level Optimization
 - LLM Enhancers for GNNs: An Analysis from the Perspective of Causal Mechanism Identification
 - LLMScan: Causal Scan for LLM Misbehavior Detection
 - LLMs Can Reason Faster Only If We Let Them
 - LLMs can see and hear without any training
 - LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws
 - LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models
 - LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations
 - LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
 - LOB-Bench: Benchmarking Generative AI for Finance - an Application to Limit Order Book Data
 - Local Identifying Causal Relations in the Presence of Latent Variables
 - Locality Preserving Markovian Transition for Instance Retrieval
 - Local Manifold Approximation and Projection for Manifold-Aware Diffusion Planning
 - Local Pan-privacy for Federated Analytics
 - LOCATE 3D: Real-World Object Localization via Self-Supervised Learning in 3D
 - Locate-then-edit for Multi-hop Factual Recall under Knowledge Editing
 - Logarithmic Regret for Online KL-Regularized Reinforcement Learning
 - Logits are All We Need to Adapt Closed Models
 - LOGO --- Long cOntext aliGnment via efficient preference Optimization
 - Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning
 - Long-Form Speech Generation with Spoken Language Models
 - LongRoPE2: Near-Lossless LLM Context Window Scaling
 - Long-Short Alignment for Effective Long-Context Modeling in LLMs
 - Long-Term TalkingFace Generation via Motion-Prior Conditional Diffusion Model
 - LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
 - Looking Beyond the Top-1: Transformers Determine Top Tokens in Order
 - Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models
 - LoRA-Gen: Specializing Large Language Model via Online LoRA Generation
 - LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently
 - LoRA Training Provably Converges to a Low-Rank Global Minimum Or It Fails Loudly (But it Probably Won't Fail)
 - Loss Functions and Operators Generated by f-Divergences
 - LotteryCodec: Searching the Implicit Representation in a Random Network for Low-Complexity Image Compression
 - Low-Dimension-to-High-Dimension Generalization and Its Implications for Length Generalization
 - Low-distortion and GPU-compatible Tree Embeddings in Hyperbolic Space
 - Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
 - LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits
 - Low-Rank Adapting Models for Sparse Autoencoders
 - Low-Rank Tensor Transitions (LoRT) for Transferable Tensor Regression
 - Low-Rank Thinning
 - LRA-QViT: Integrating Low-Rank Approximation and Quantization for Robust and Efficient Vision Transformers
 - LSCD: Lomb--Scargle Conditioned Diffusion for Time series Imputation
 - LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
 - M2PDE: Compositional Generative Multiphysics and Multi-component PDE Simulation
 - M³HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality
 - M3-JEPA: Multimodal Alignment via Multi-gate MoE based on the Joint-Embedding Predictive Architecture
 - Machine Learning for Wireless Communication and Networks (ML4Wireless)
 - Machine Learning meets Algebraic Combinatorics: A Suite of Datasets Capturing Research-level Conjecturing Ability in Pure Mathematics
 - Machines and Mathematical Mutations: Using GNNs to Characterize Quiver Mutation Classes
 - Machine Unlearning for Generative AI
 - MAGELLAN: Metacognitive predictions of learning progress guide autotelic LLM agents in large goal spaces
 - Mahalanobis++: Improving OOD Detection via Feature Normalization
 - Maintaining Proportional Committees with Dynamic Candidate Sets
 - Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
 - Making Hard Problems Easier with Custom Data Distributions and Loss Regularization: A Case Study in Modular Arithmetic
 - MA-LoT: Model-Collaboration Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving
 - MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models
 - MAPLE: Many-Shot Adaptive Pseudo-Labeling for In-Context Learning
 - MARGE: Improving Math Reasoning with Guided Exploration
 - MARS: Unleashing the Power of Variance Reduction for Training Large Models
 - MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems
 - Masked Autoencoders Are Effective Tokenizers for Diffusion Models
 - Masked Generative Nested Transformers with Decode Time Scaling
 - Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More
 - MaskTwins: Dual-form Complementary Masking for Domain-Adaptive Image Segmentation
 - Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding
 - MASS: Mathematical Data Selection via Skill Graphs for Pretraining Large Language Models
 - Mastering Board Games by External and Internal Planning with Language Models
 - Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer
 - Mastering Multiple-Expert Routing: Realizable $H$-Consistency and Strong Guarantees for Learning to Defer
 - MathConstruct: Challenging LLM Reasoning with Constructive Proofs
 - MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
 - Matrix Completion with Incomplete Side Information via Orthogonal Complement Projection
 - Matryoshka Quantization
 - MATS: An Audio Language Model under Text-only Supervision
 - Maximizing Intermediate Checkpoint Value in LLM Pretraining with Bayesian Optimization
 - Maximum Coverage in Turnstile Streams with Applications to Fingerprinting Measures
 - Maximum Entropy Reinforcement Learning with Diffusion Policy
 - Maximum Total Correlation Reinforcement Learning
 - Maximum Update Parametrization and Zero-Shot Hyperparameter Transfer for Fourier Neural Operators
 - MCU: An Evaluation Framework for Open-Ended Game Agents
 - MDDM: Practical Message-Driven Generative Image Steganography Based on Diffusion Models
 - Measuring Diversity: Axioms and Challenges
 - Measuring Diversity in Synthetic Datasets
 - Measuring In-Context Computation Complexity via Hidden State Prediction
 - Measuring Representational Shifts in Continual Learning: A Linear Transformation Perspective
 - Measuring Variable Importance in Heterogeneous Treatment Effects with Confidence
 - Mechanisms of Projective Composition of Diffusion Models
 - Mechanistic PDE Networks for Discovery of Governing Equations
 - Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
 - MedRAX: Medical Reasoning Agent for Chest X-ray
 - MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
 - MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents
 - MemFreezing: A Novel Adversarial Attack on Temporal Graph Neural Networks under Limited Future Knowledge
 - Memorization Sinks: Isolating Memorization during LLM Training
 - Memory Layers at Scale
 - MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning
 - MERGE$^3$: Efficient Evolutionary Merging on Consumer-grade GPUs
 - Merge-Friendly Post-Training Quantization for Multi-Target Domain Adaptation
 - MERIT: Maximum-normalized Element-wise Ratio for Language Model Large-batch Training
 - MetaAgent: Automatically Constructing Multi-Agent Systems Based on Finite State Machines
 - Meta-Black-Box-Optimization through Offline Q-function Learning
 - Metadata Conditioning Accelerates Language Model Pre-training
 - Meta Optimality for Demographic Parity Constrained Regression via Post-Processing
 - MetaOptimize: A Framework for Optimizing Step Sizes and Other Meta-parameters
 - Meta-Reinforcement Learning with Adaptation from Human Feedback via Preference-Order-Preserving Task Embedding
 - Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation
 - Methods and Opportunities at Small Scale (MOSS)
 - MetricEmbedding: Accelerate Metric Nearness by Tropical Inner Product
 - M+: Extending MemoryLLM with Scalable Long-Term Memory
 - MF-LAL: Drug Compound Generation Using Multi-Fidelity Latent Space Active Learning
 - MGD$^3$ : Mode-Guided Dataset Distillation using Diffusion Models
 - MIB: A Mechanistic Interpretability Benchmark
 - MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
 - MindAligner: Explicit Brain Functional Alignment for Cross-Subject Visual Decoding from Limited fMRI Data
 - MindCustomer: Multi-Context Image Generation Blended with Brain Signal
 - MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-text Decoding
 - Mind the Gap: A Practical Attack on GGUF Quantization
 - Mind the Gap: a Spectral Analysis of Rank Collapse and Signal Propagation in Attention Layers
 - Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse
 - Minerva: A Programmable Memory Test Benchmark for Language Models
 - Minimalist Concept Erasure in Generative Models
 - Minimax Optimal Regret Bound for Reinforcement Learning with Trajectory Feedback
 - Minimum Width for Universal Approximation using Squashable Activation Functions
 - MIPT: Multilevel Informed Prompt Tuning for Robust Molecular Property Prediction
 - MiraGe: Editable 2D Images using Gaussian Splatting
 - MIRROR: Make Your Object-Level Multi-View Generation More Consistent with Training-Free Rectification
 - Mirror, Mirror of the Flow: How Does Regularization Shape Implicit Bias?
 - MissScore: High-Order Score Estimation in the Presence of Missing Data
 - Mitigating Heterogeneous Token Overfitting in LLM Knowledge Editing
 - Mitigating Local Cohesion and Global Sparseness in Graph Contrastive Learning with Fuzzy Boundaries
 - Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance
 - MITIGATING OVER-EXPLORATION IN LATENT SPACE OPTIMIZATION USING LES
 - Mitigating Over-Squashing in Graph Neural Networks by Spectrum-Preserving Sparsification
 - Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn
 - MixBridge: Heterogeneous Image-to-Image Backdoor Attack through Mixture of Schrödinger Bridges
 - Mixed-curvature decision trees and random forests
 - MixMin: Finding Data Mixtures via Convex Minimization
 - Mixture of Experts Made Intrinsically Interpretable
 - Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning
 - Mixture of Hidden-Dimensions: Not All Hidden-States’ Dimensions are Needed in Transformer
 - Mixture of Lookup Experts
 - ML$^2$-GCL: Manifold Learning Inspired Lightweight Graph Contrastive Learning
 - MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
 - MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization
 - MMInference: Accelerating Pre-filling for Long-Context Visual Language Models via Modality-Aware Permutation Sparse Attention
 - MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
 - Modalities Contribute Unequally: Enhancing Medical Multi-modal Learning through Adaptive Modality Token Re-balancing
 - MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding
 - Model-Based Exploration in Monitored Markov Decision Processes
 - Model Immunization from a Condition Number Perspective
 - Modeling All-Atom Glycan Structures via Hierarchical Message Passing and Multi-Scale Pre-training
 - Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent
 - Models of Heavy-Tailed Mechanistic Universality
 - Model Steering: Learning with a Reference Model Improves Generalization Bounds and Scaling Laws
 - Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence
 - Model Uncertainty Quantification by Conformal Prediction in Continual Learning
 - Modern Methods in Associative Memory
 - Modified K-means Algorithm with Local Optimality Guarantees
 - Modular Duality in Deep Learning
 - Modularized Self-Reflected Video Reasoner for Multimodal LLM with Application to Video Question Answering
 - Modulated Diffusion: Accelerating Generative Modeling with Modulated Quantization
 - MODULI: Unlocking Preference Generalization via Diffusion Models for Offline Multi-Objective Reinforcement Learning
 - MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
 - MoE-SVD: Structured Mixture-of-Experts LLMs Compression via Singular Value Decomposition
 - MOGIC: Metadata-infused Oracle Guidance for Improved Extreme Classification
 - MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition
 - MoH: Multi-Head Attention as Mixture-of-Head Attention
 - Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts
 - MoMa: Modulating Mamba for Adapting Image Foundation Models to Video Recognition
 - Momentum-Driven Adaptivity: Towards Tuning-Free Asynchronous Federated Learning
 - MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking
 - Monte Carlo Tree Diffusion for System 2 Planning
 - Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design
 - Monte-Carlo Tree Search with Uncertainty Propagation via Optimal Transport
 - MoRAgent: Parameter Efficient Agent Tuning with Mixture-of-Roles
 - More Than Meets the Eye: Enhancing Multi-Object Tracking Even with Prolonged Occlusions
 - Morse: Dual-Sampling for Lossless Acceleration of Diffusion Models
 - MP-Nav: Enhancing Data Poisoning Attacks against Multimodal Learning
 - MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment
 - MTL-UE: Learning to Learn Nothing for Multi-Task Learning
 - MTSTRec: Multimodal Time-Aligned Shared Token Recommender
 - MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
 - MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost
 - Multiaccuracy and Multicalibration via Proxy Groups
 - Multi-agent Architecture Search via Agentic Supernet
 - Multi-Agent Systems in the Era of Foundation Models: Opportunities, Challenges and Futures
 - Multi-Armed Bandits with Interference: Bridging Causal Inference and Adversarial Bandits
 - Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
 - Multidimensional Adaptive Coefficient for Inference Trajectory Optimization in Flow and Diffusion
 - Multi-Domain Graph Foundation Models: Robust Knowledge Transfer via Topology Alignment
 - Multilayer Matrix Factorization via Dimension-Reducing Diffusion Variational Inference
 - Multi-Marginal Stochastic Flow Matching for High-Dimensional Snapshot Data at Irregular Time Points
 - Multimodal Medical Code Tokenizer
 - Multi-Modal Object Re-identification via Sparse Mixture-of-Experts
 - Multinoulli Extension: A Lossless Yet Effective Probabilistic Framework for Subset Selection over Partition Constraints
 - Multi-Objective Causal Bayesian Optimization
 - Multiobjective distribution matching
 - Multi-objective Linear Reinforcement Learning with Lexicographic Rewards
 - MultiPDENet: PDE-embedded Learning with Multi-time-stepping for Accelerated Flow Simulation
 - Multiple-policy Evaluation via Density Estimation
 - Multi-Session Budget Optimization for Forward Auction-based Federated Learning
 - Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning
 - Multi-Timescale Dynamics Model Bayesian Optimization for Plasma Stabilization in Tokamaks
 - Multi-Turn Code Generation Through Single-Step Rewards
 - Multivariate Conformal Selection
 - Multi-View Graph Clustering via Node-Guided Contrastive Encoding
 - MuseControlLite: Multifunctional Music Generation with Lightweight Conditioners
 - Mutual Learning for SAM Adaptation: A Dual Collaborative Network Framework for Source-Free Domain Transfer
 - MVA: Linear Attention with High-order Query-Keys Integration and Multi-level Vocabulary Decomposition
 - MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
 - N2GON: Neural Networks for Graph-of-Net with Position Awareness
 - Natural Perturbations for Black-box Training of Neural Networks by Zeroth-Order Optimization
 - Navigating Conflicting Views: Harnessing Trust for Learning
 - Navigating Semantic Drift in Task-Agnostic Class-Incremental Learning
 - Navigating the Social Welfare Frontier: Portfolios for Multi-objective Reinforcement Learning
 - NBDI: A Simple and Effective Termination Condition for Skill Extraction from Task-Agnostic Demonstrations
 - Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
 - Nearly Optimal Sample Complexity for Learning with Label Proportions
 - NEAR: Neural Electromagnetic Array Response
 - Near Optimal Best Arm Identification for Clustered Bandits
 - Near-Optimal Consistency-Robustness Trade-Offs for Learning-Augmented Online Knapsack Problems
 - Near-Optimal Decision Trees in a SPLIT Second
 - Near Optimal Non-asymptotic Sample Complexity of 1-Identification
 - Near-optimal Regret Using Policy Optimization in Online MDPs with Aggregate Bandit Feedback
 - Near-Optimal Sample Complexity for MDPs via Anchoring
 - Near-optimal Sketchy Natural Gradients for Physics-Informed Neural Networks
 - NegMerge: Sign-Consensual Weight Merging for Machine Unlearning
 - Neighbour-Driven Gaussian Process Variational Autoencoders for Scalable Structured Latent Modelling
 - Nemotron-CORTEXA: Enhancing LLM Agents for Software Engineering Tasks via Improved Localization and Solution Diversity
 - Nested Expectations with Kernel Quadrature
 - Nesterov Method for Asynchronous Pipeline Parallel Optimization
 - NestQuant: nested lattice quantization for matrix products and LLMs
 - NETS: A Non-equilibrium Transport Sampler
 - Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning
 - NeuralCohort: Cohort-aware Neural Representation Learning for Healthcare Analytics
 - Neural Collapse Beyond the Unconstrained Features Model: Landscape, Dynamics, and Generalization in the Mean-Field Regime
 - Neural Discovery in Mathematics: Do Machines Dream of Colored Planes?
 - Neural Encoding and Decoding at Scale
 - Neural Event-Triggered Control with Optimal Scheduling
 - Neural Genetic Search in Discrete Spaces
 - Neural Graph Matching Improves Retrieval Augmented Generation in Molecular Machine Learning
 - Neural Guided Diffusion Bridges
 - Neural Interpretable PDEs: Harmonizing Fourier Insights with Attention for Scalable and Interpretable Physics Discovery
 - Neural Representational Consistency Emerges from Probabilistic Neural-Behavioral Representation Alignment
 - Neural Solver Selection for Combinatorial Optimization
 - NeuronTune: Towards Self-Guided Spurious Bias Mitigation
 - Neurosymbolic World Models for Sequential Decision Making
 - NeuroTree: Hierarchical Functional Brain Pathway Decoding for Mental Health Disorders
 - Neutral residues: revisiting adapters for model extension
 - New Bounds for Sparse Variational Gaussian Processes
 - NextCoder: Robust Adaptation of Code LMs to Diverse Code Edits
 - NExtLong: Toward Effective Long-Context Training without Long Documents
 - NICE Data Selection for Instruction Tuning in LLMs with Non-differentiable Evaluation Metric
 - NMA-tune: Generating Highly Designable and Dynamics Aware Protein Backbones
 - No Free Lunch from Random Feature Ensembles: Scaling Laws and Near-Optimality Conditions
 - Noise Conditional Variational Score Distillation
 - Noise-Guided Predicate Representation Extraction and Diffusion-Enhanced Discretization for Scene Graph Generation
 - Noisy SIGNSGD Is More Differentially Private Than You (Might) Think
 - NoLiMa: Long-Context Evaluation Beyond Literal Matching
 - No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets
 - Non-Asymptotic and Non-Lipschitzian Bounds on Optimal Values in Stochastic Optimization Under Heavy Tails
 - Non-asymptotic Error Bounds in $\mathcal{W}_2$-Distance with Sqrt(d) Dimension Dependence and First Order Convergence for Langevin Monte Carlo beyond Log-Concavity
 - Non-Asymptotic Length Generalization
 - Nonconvex Theory of $M$-estimators with Decomposable Regularizers
 - Nonlinearly Preconditioned Gradient Methods under Generalized Smoothness
 - Nonlinear transformers can perform inference-time feature learning
 - Nonparametric Identification of Latent Concepts
 - Nonparametric Modern Hopfield Models
 - Nonparametric Teaching for Graph Property Learners
 - Non-stationary Diffusion For Probabilistic Time Series Forecasting
 - Non-stationary Online Learning for Curved Losses: Improved Dynamic Regret via Mixability
 - Non-Stationary Predictions May Be More Informative: Exploring Pseudo-Labels with a Two-Phase Pattern of Training Dynamics
 - No-Regret is not enough! Bandits with General Constraints through Adaptive Regret Minimization
 - Normalizing Flows are Capable Generative Models
 - No Soundness in the Real World: On the Challenges of the Verification of Deployed Neural Networks
 - Not all solutions are created equal: An analytical dissociation of functional and representational similarity in deep linear neural networks
 - Not All Tokens Matter All The Time: Dynamic Token Aggregation Towards Efficient Detection Transformers
 - Not All Wrong is Bad: Using Adversarial Examples for Unlearning
 - No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces
 - Novelty Detection in Reinforcement Learning with World Models
 - NTK-DFL: Enhancing Decentralized Federated Learning in Heterogeneous Settings via Neural Tangent Kernel
 - NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction
 - Objective drives the consistency of representational similarity across datasets
 - Observation Interference in Partially Observable Assistance Games
 - Occult: Optimizing Collaborative Communications across Experts for Accelerated Parallel MoE Training and Inference
 - Offline Learning for Combinatorial Multi-armed Bandits
 - Offline Model-based Optimization for Real-World Molecular Discovery
 - Offline Opponent Modeling with Truncated Q-driven Instant Policy Refinement
 - Offline-to-Online Reinforcement Learning with Classifier-Free Diffusion Generation
 - Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training via Symmetric Policy Evaluation
 - Off-Policy Evaluation under Nonignorable Missing Data
 - Of Mice and Machines: A Comparison of Learning Between Real World Mice and RL Agents
 - Olica: Efficient Structured Pruning of Large Language Models without Retraining
 - O-MAPL: Offline Multi-agent Preference Learning
 - OmiAD: One-Step Adaptive Masked Diffusion Model for Multi-class Anomaly Detection via Adversarial Distillation
 - Omni-Angle Assault: An Invisible and Powerful Physical Adversarial Attack on Face Recognition
 - OmniArch: Building Foundation Model for Scientific Computing
 - OmniAudio: Generating Spatial Audio from 360-Degree Video
 - OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniverse Computation Balance
 - On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists
 - On Differential Privacy for Adaptively Solving Search Problems via Sketching
 - One Arrow, Two Hawks: Sharpness-aware Minimization for Federated Learning via Global Model Trajectory
 - One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation
 - One-dimensional Path Convolution
 - One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs
 - On Efficient Estimation of Distributional Treatment Effects under Covariate-Adaptive Randomization
 - OneForecast: A Universal Framework for Global and Regional Weather Forecasting
 - One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework
 - One Leaf Reveals the Season: Occlusion-Based Contrastive Learning with Semantic-Aware Views for Efficient Visual Representation
 - One-Pass Feature Evolvable Learning with Theoretical Guarantees
 - One-Shot Heterogeneous Federated Learning with Local Model-Guided Diffusion Models
 - One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation
 - One-Step Generalization Ratio Guided Optimization for Domain Generalization
 - One Stone, Two Birds: Enhancing Adversarial Defense Through the Lens of Distributional Discrepancy
 - One Wave To Explain Them All: A Unifying Perspective On Feature Attribution
 - On Exact Bit-level Reversible Transformers Without Changing Architecture
 - On Explaining Equivariant Graph Networks via Improved Relevance Propagation
 - On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
 - On Fine-Grained Distinct Element Estimation
 - On Learning Parallel Pancakes with Mostly Uniform Weights
 - On Linear Convergence in Smooth Convex-Concave Bilinearly-Coupled Saddle-Point Optimization: Lower Bounds and Optimal Algorithms
 - Online Clustering of Dueling Bandits
 - Online Conformal Prediction via Online Optimization
 - Online Curvature-Aware Replay: Leveraging $\mathbf{2^{nd}}$ Order Information for Online Continual Learning
 - Online Detection of LLM-Generated Texts via Sequential Hypothesis Testing by Betting
 - Online Differentially Private Conformal Prediction for Uncertainty Quantification
 - Online Episodic Convex Reinforcement Learning
 - Online Laplacian-Based Representation Learning in Reinforcement Learning
 - Online Learning in Risk Sensitive constrained MDP
 - Online Learning in the Random-Order Model
 - Online Learning with Unknown Constraints
 - Online Linear Classification with Massart Noise
 - Online Pre-Training for Offline-to-Online Reinforcement Learning
 - Online Robust Reinforcement Learning Through Monte-Carlo Planning
 - Online Sparsification of Bipartite-Like Clusters in Graphs
 - On Measuring Long-Range Interactions in Graph Neural Networks
 - On Mitigating Affinity Bias through Bandits with Evolving Biased Feedback
 - On Path to Multimodal Generalist: General-Level and General-Bench
 - On Teacher Hacking in Language Model Distillation
 - On Temperature Scaling and Conformal Prediction of Deep Classifiers
 - On the Adversarial Robustness of Multi-Kernel Clustering
 - On the Alignment between Fairness and Accuracy: from the Perspective of Adversarial Robustness
 - On the Benefits of Active Data Collection in Operator Learning
 - On the Clean Generalization and Robust Overfitting in Adversarial Training from Two Theoretical Views: Representation Complexity and Training Dynamics
 - On The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learning
 - On the Convergence of Continuous Single-timescale Actor-critic
 - On the Diversity of Adversarial Ensemble Learning
 - On the Duality between Gradient Transformations and Adapters
 - On the Dynamic Regret of Following the Regularized Leader: Optimism with History Pruning
 - On the Emergence of Position Bias in Transformers
 - On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention for Long-Context LLM Serving
 - On the Generalization Ability of Next-Token-Prediction Pretraining
 - On the Guidance of Flow Matching
 - On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training
 - On the Impact of Performative Risk Minimization for Binary Random Variables
 - On the Importance of Embedding Norms in Self-Supervised Learning
 - On the Importance of Gaussianizing Representations
 - On the Interplay between Graph Structure and Learning Algorithms in Graph Neural Networks
 - On the Learnability of Distribution Classes with Adaptive Adversaries
 - On the Local Complexity of Linear Regions in Deep ReLU Networks
 - On the Out-of-Distribution Generalization of Self-Supervised Learning
 - On the Power of Context-Enhanced Learning in LLMs
 - On the Power of Learning-Augmented Search Trees
 - On the Private Estimation of Smooth Transport Maps
 - On the Provable Separation of Scales in Maximal Update Parameterization
 - On the Query Complexity of Verifier-Assisted Language Generation
 - On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents
 - On the Robustness of Reward Models for Language Model Alignment
 - On the Role of Label Noise in the Feature Learning Process
 - On the Similarities of Embeddings in Contrastive Learning
 - On the Statistical Mechanisms of Distributional Compositional Generalization
 - On the Tension between Byzantine Robustness and No-Attack Accuracy in Distributed Learning
 - On the Training Convergence of Transformers for In-Context Classification of Gaussian Mixtures
 - On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains
 - On Understanding Attention-Based In-Context Learning for Categorical Data
 - On Volume Minimization in Conformal Regression
 - On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation
 - OOD-Chameleon: Is Algorithm Selection for OOD Generalization Learnable?
 - Open-Det: An Efficient Learning Framework for Open-Ended Detection
 - Open Materials Generation with Stochastic Interpolants
 - OpenworldAUC: Towards Unified Evaluation and Optimization for Open-world Prompt Tuning
 - Open Your Eyes: Vision Enhances Message Passing Neural Networks in Link Prediction
 - Optimal Algorithm for Max-Min Fair Bandit
 - Optimal and Practical Batched Linear Bandit Algorithm
 - Optimal Auction Design in the Joint Advertising
 - Optimal Decision Tree Pruning Revisited: Algorithms and Complexity
 - Optimal Fair Learning Robust to Adversarial Distribution Shift
 - Optimal Information Retention for Time-Series Explanations
 - Optimal Sensor Scheduling and Selection for Continuous-Discrete Kalman Filtering with Auxiliary Dynamics
 - Optimal Survey Design for Private Mean Estimation
 - Optimal Task Order for Continual Learning of Multiple Tasks
 - Optimal Transfer Learning for Missing Not-at-Random Matrix Completion
 - Optimal Transport Barycenter via Nonconvex-Concave Minimax Optimization
 - Optimal transport-based conformal prediction
 - Optimistic Algorithms for Adaptive Estimation of the Average Treatment Effect
 - Optimization for Neural Operators can Benefit from Width
 - Optimization over Sparse Support-Preserving Sets: Two-Step Projection with Global Optimality Guarantees
 - Optimization Proxies using Limited Labeled Data and Training Time -- A Semi-Supervised Bayesian Neural Network Approach
 - Optimizing Adaptive Attacks against Watermarks for Language Models
 - Optimizing Language Models for Inference Time Objectives using Reinforcement Learning
 - Optimizing Large Language Model Training Using FP4 Quantization
 - Optimizing Noise Distributions for Differential Privacy
 - Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach
 - Optimizing Social Network Interventions via Hypergradient-Based Recommender System Design
 - Optimizing Temperature for Language Models with Multi-Sample Inference
 - Optimizing Test-Time Compute via Meta Reinforcement Finetuning
 - OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling
 - Oracle-MoE: Locality-preserving Routing in the Oracle Space for Memory-constrained Large Language Model Inference
 - OR-Bench: An Over-Refusal Benchmark for Large Language Models
 - OrcaLoca: An LLM Agent Framework for Software Issue Localization
 - Organize the Web: Constructing Domains Enhances Pre-Training Data Curation
 - Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models
 - Origin Identification for Text-Guided Image-to-Image Diffusion Models
 - Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection
 - OrthoRank: Token Selection via Sink Token Orthogonality for Efficient LLM inference
 - Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads
 - Oscillation-Reduced MXFP4 Training for Vision Transformers
 - OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
 - Otter: Generating Tests from Issues to Validate SWE Patches
 - Outlier-Aware Post-Training Quantization for Discrete Graph Diffusion Models
 - Outlier Gradient Analysis: Efficiently Identifying Detrimental Training Samples for Deep Learning Models
 - Outsourced Diffusion Sampling: Efficient Posterior Inference in Latent Spaces of Generative Models
 - Overcoming Multi-step Complexity in Multimodal Theory-of-Mind Reasoning: A Scalable Bayesian Planner
 - Overcoming Non-monotonicity in Transducer-based Streaming Generation
 - Overcoming Spurious Solutions in Semi-Dual Neural Optimal Transport: A Smoothing Approach for Learning the Optimal Transport Plan
 - Overcoming the Curse of Dimensionality in Reinforcement Learning Through Approximate Factorization
 - Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling
 - Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination’s Impact on Machine Translation
 - Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
 - Overtrained Language Models Are Harder to Fine-Tune
 - OV-MER: Towards Open-Vocabulary Multimodal Emotion Recognition
 - OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models
 - OW-VAP: Visual Attribute Parsing for Open World Object Detection
 - PAC-Bayes Analysis for Recalibration in Classification
 - PAC Learning with Improvements
 - Pairwise Maximum Likelihood For Multi-Class Logistic Regression Model With Multiple Rare Classes
 - P(all-atom) Is Unlocking New Path For Protein Design
 - PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling
 - PaperBench: Evaluating AI’s Ability to Replicate AI Research
 - ParallelComp: Parallel Long-Context Compressor for Length Extrapolation
 - Parallel Simulation for Log-concave Sampling and Score-based Diffusion Models
 - Parameter-Efficient Fine-Tuning of State Space Models
 - Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
 - Parametric Scaling Law of Tuning Bias in Conformal Prediction
 - Pareto-frontier Entropy Search with Variational Lower Bound Maximization
 - Pareto Merging: Multi-Objective Optimization for Preference-Aware Model Merging
 - Pareto-Optimal Fronts for Benchmarking Symbolic Regression Algorithms
 - Pareto-Optimality, Smoothness, and Stochasticity in Learning-Augmented One-Max-Search
 - PARM: Multi-Objective Test-Time Alignment via Preference-Aware Autoregressive Reward Model
 - PARQ: Piecewise-Affine Regularized Quantization
 - Parrot: Multilingual Visual Instruction Tuning
 - Partially Observable Reinforcement Learning with Memory Traces
 - Partition First, Embed Later: Laplacian-Based Feature Partitioning for Refined Embedding and Visualization of High-Dimensional Data
 - PASS: Private Attributes Protection with Stochastic Data Substitution
 - PatchPilot: A Cost-Efficient Software Engineering Agent with Early Attempts on Formal Verification
 - Patch-wise Structural Loss for Time Series Forecasting
 - PCEvolve: Private Contrastive Evolution for Synthetic Dataset Generation via Few-Shot Private Data and Generative APIs
 - PDE-Controller: LLMs for Autoformalization and Reasoning of PDEs
 - PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations
 - PDUDT: Provable Decentralized Unlearning under Dynamic Topologies
 - PEAKS: Selecting Key Training Examples Incrementally via Prediction Error Anchored by Kernel Similarity
 - PEINR: A Physics-enhanced Implicit Neural Representation for High-Fidelity Flow Field Reconstruction
 - Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data
 - PENCIL: Long Thoughts with Short Memory
 - PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion
 - Perception in Reflection
 - Perceptual-GS: Scene-adaptive Perceptual Densification for Gaussian Splatting
 - Perceptually Constrained Precipitation Nowcasting Model
 - Peri-LN: Revisiting Normalization Layer in the Transformer Architecture
 - Peripheral Memory for LLMs: Integration of Sequential Memory Banks with Adaptive Querying
 - Permutation-based Rank Test in the Presence of Discretization and Application in Causal Discovery with Mixed Data
 - Permutation Equivariant Neural Networks for Symmetric Tensors
 - Permutation-Free High-Order Interaction Tests
 - Persistent Topological Features in Large Language Models
 - PertEval-scFM: Benchmarking Single-Cell Foundation Models for Perturbation Effect Prediction
 - Pessimism Principle Can Be Effective: Towards a Framework for Zero-Shot Transfer Reinforcement Learning
 - PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting for Novel View Synthesis
 - Pfeife: Automatic Pipeline Parallelism for PyTorch
 - PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
 - Phase and Amplitude-aware Prompting for Enhancing Adversarial Robustness
 - Phase transitions for the existence of unregularized M-estimators in single index models
 - Physics Aware Neural Networks for Unsupervised Binding Energy Prediction
 - Physics-Informed DeepONets for drift-diffusion on metric graphs: simulation and parameter identification
 - Physics-Informed Generative Modeling of Wireless Channels
 - Physics-informed Temporal Alignment for Auto-regressive PDE Foundation Models
 - Physics-Informed Weakly Supervised Learning For Interatomic Potentials
 - PhySpec: Physically Consistent Spectral Reconstruction via Orthogonal Subspace Decomposition and Self-Supervised Meta-Auxiliary Learning
 - PiD: Generalized AI-Generated Images Detection with Pixelwise Decomposition Residuals
 - PieClam: A Universal Graph Autoencoder Based on Overlapping Inclusive and Exclusive Communities
 - PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning
 - PILAF: Optimal Human Preference Sampling for Reward Modeling
 - Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule
 - PINNsAgent: Automated PDE Surrogation with Large Language Models
 - PIPA: Preference Alignment as Prior-Informed Statistical Estimation
 - PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization
 - PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop
 - Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models
 - Pixel2Feature Attack (P2FA): Rethinking the Perturbed Space to Enhance Adversarial Transferability
 - Pixel-level Certified Explanations via Randomized Smoothing
 - Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks
 - Plausible Token Amplification for Improving Accuracy of Differentially Private In-Context Learning Based on Implicit Bayesian Inference
 - Playmate: Flexible Control of Portrait Animation via 3D-Implicit Space Guided Diffusion
 - PlaySlot: Learning Inverse Latent Dynamics for Controllable Object-Centric Video Prediction and Planning
 - Point Cloud Dataset Distillation
 - Point-Level Topological Representation Learning on Point Clouds
 - Pointwise Information Measures as Confidence Estimators in Deep Neural Networks: A Comparative Study
 - PoisonBench: Assessing Language Model Vulnerability to Poisoned Preference Data
 - PoisonedEye: Knowledge Poisoning Attack on Retrieval-Augmented Generation based Large Vision-Language Models
 - PokéChamp: an Expert-level Minimax Language Agent
 - Policy Design for Two-sided Platforms with Participation Dynamics
 - Policy Filtration for RLHF to Mitigate Noise in Reward Models
 - Policy Gradient with Tree Expansion
 - Policy Guided Tree Search for Enhanced LLM Reasoning
 - Policy-labeled Preference Learning: Is Preference Enough for RLHF?
 - Policy Optimization for CMDPs with Bandit Feedback: Learning Stochastic and Adversarial Constraints
 - Policy-Regret Minimization in Markov Games with Function Approximation
 - Policy Regularization on Globally Accessible States in Cross-Dynamics Reinforcement Learning
 - Poly2Vec: Polymorphic Fourier-Based Encoding of Geospatial Objects for GeoAI Applications
 - polybasic Speculative Decoding Through a Theoretical Perspective
 - PolyConf: Unlocking Polymer Conformation Generation through Hierarchical Generative Models
 - Polynomial-Delay MAG Listing with Novel Locally Complete Orientation Rules
 - Polynomial-Time Approximability of Constrained Reinforcement Learning
 - Polynomial Time Learning Augmented Algorithms for NP-hard Permutation Problems
 - POQD: Performance-Oriented Query Decomposer for Multi-vector retrieval
 - POROver: Improving Safety and Reducing Overrefusal in Large Language Models with Overgeneration and Preference Optimization
 - Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models
 - Position: AI Agents Need Authenticated Delegation
 - Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation
 - Position: AI Evaluation Should Learn from How We Test Humans
 - Position: AI Safety Must Embrace an Antifragile Perspective
 - Position: AI Safety should prioritize the Future of Work
 - Position: AI Scaling: From Up to Down and Out
 - Position: AI's growing due process problem
 - Position: AI Should Not Be An Imitation Game: Centaur Evaluations
 - Positional Attention: Expressivity and Learnability of Algorithmic Computation
 - Positional Encoding meets Persistent Homology on Graphs
 - Position: Algebra Unveils Deep Learning - An Invitation to Neuroalgebraic Geometry
 - Position: All Current Generative Fidelity and Diversity Metrics are Flawed
 - Position: An Empirically Grounded Identifiability Theory Will Accelerate Self Supervised Learning Research
 - Position: A Theory of Deep Learning Must Include Compositional Sparsity
 - Position: Beyond Assistance – Reimagining LLMs as Ethical and Adaptive Co-Creators in Mental Health Care
 - Position: Build Agent Advocates, Not Platform Agents
 - Position: Causal Machine Learning Requires Rigorous Synthetic Experiments for Broader Adoption
 - Position: Certified Robustness Does Not (Yet) Imply Model Security
 - Position: Challenges and Future Directions of Data-Centric AI Alignment
 - Position: Constants are Critical in Regret Bounds for Reinforcement Learning
 - Position: Contextual Integrity is Inadequately Applied to Language Models
 - Position: Current Model Licensing Practices are Dragging Us into a Quagmire of Legal Noncompliance
 - Position: Deep Learning is Not So Mysterious or Different
 - Position: Democratic AI is Possible. The Democracy Levels Framework Shows How It Might Work.
 - Position: Don't Use the CLT in LLM Evals With Fewer Than a Few Hundred Datapoints
 - Position: Editing Large Language Models Poses Serious Safety Risks
 - Position: Enough of Scaling LLMs! Lets Focus on Downscaling
 - Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge
 - Position: Explainable AI Cannot Advance Without Better User Studies
 - Position: Formal Mathematical Reasoning—A New Frontier in AI
 - Position: Future Research and Challenges Remain Towards AI for Software Engineering
 - Position: General Intelligence Requires Reward-based Pretraining
 - Position: Generative AI Regulation Can Learn from Social Media Regulation
 - Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks
 - Position: Graph Matching Systems Deserve Better Benchmarks
 - Position: Human Baselines in Model Evaluations Need Rigor and Transparency (With Recommendations & Reporting Checklist)
 - Position: Humanity Faces Existential Risk from Gradual Disempowerment
 - Position: In-House Evaluation Is Not Enough. Towards Robust Third-Party Evaluation and Flaw Disclosure for General-Purpose AI
 - Position: Iterative Online-Offline Joint Optimization is Needed to Manage Complex LLM Copyright Risks
 - Position: It Is Time We Test Neural Computation In Vitro
 - Position: Language model developers should report train-test overlap
 - Position: Lifetime tuning is incompatible with continual reinforcement learning
 - Position: LLMs Need a Bayesian Meta-Reasoning Framework for More Robust and Generalizable Reasoning
 - Position: LLM Social Simulations Are a Promising Research Method
 - Position: Machine Learning Models Have a Supply Chain Problem
 - Position: Medical Large Language Model Benchmarks Should Prioritize Construct Validity
 - Position: Not All Explanations for Deep Learning Phenomena Are Equally Valuable
 - Position: Political Neutrality in AI Is Impossible — But Here Is How to Approximate It
 - Position: Principles of Animal Cognition to Improve LLM Evaluations
 - Position: Probabilistic Modelling is Sufficient for Causal Inference
 - Position: Rethinking Explainable Machine Learning as Applied Statistics
 - Position: Rethinking LLM Bias Probing Using Lessons from the Social Sciences
 - Position: Retrieval-augmented systems can be dangerous medical communicators
 - Position: Scaling LLM Agents Requires Asymptotic Analysis with LLM Primitives
 - Position: Societal Impacts Research Requires Benchmarks for Creative Composition Tasks
 - Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
 - Position: Spectral GNNs Rely Less on Graph Fourier Basis than Conceived
 - Position: Stop treating `AGI' as the north-star goal of AI research
 - Position: Strong Consumer Protection is an Inalienable Defense for AI Safety in the United States
 - Position: Supervised Classifiers Answer the Wrong Questions for OOD Detection
 - Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards
 - Position: The Artificial Intelligence and Machine Learning Community Should Adopt a More Transparent and Regulated Peer Review Process
 - Position: The Categorization of Race in ML is a Flawed Premise
 - Position: The Future of Bayesian Prediction Is Prior-Fitted
 - Position: The Most Expensive Part of an LLM *should* be its Training Data
 - Position: Theory of Mind Benchmarks are Broken for Large Language Models
 - Position: The Right to AI
 - Position: Truly Self-Improving Agents Require Intrinsic Metacognitive Learning
 - Position: Trustworthy AI Agents Require the Integration of Large Language Models and Formal Methods
 - Position: Uncertainty Quantification Needs Reassessment for Large Language Model Agents
 - Position: We Can’t Understand AI Using our Existing Vocabulary
 - Position: We Need An Algorithmic Understanding of Generative AI
 - Position: We Need Responsible, Application-Driven (RAD) AI Research
 - Position: When Incentives Backfire, Data Stops Being Human
 - Position: You Can't Manufacture a NeRF
 - Positive-unlabeled AUC Maximization under Covariate Shift
 - Posterior Inference with Diffusion Models for High-dimensional Black-box Optimization
 - Potemkin Understanding in Large Language Models
 - Power Mean Estimation in Stochastic Continuous Monte-Carlo Tree Search
 - PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design
 - Preconditioned Riemannian Gradient Descent Algorithm for Low-Multilinear-Rank Tensor Completion
 - Predicting High-precision Depth on Low-Precision Devices Using 2D Hilbert Curves
 - Predicting mutational effects on protein binding from folding energy
 - Predicting the Susceptibility of Examples to Catastrophic Forgetting
 - Prediction-Aware Learning in Multi-Agent Systems
 - Prediction models that learn to avoid missing values
 - Prediction-Powered Adaptive Shrinkage Estimation
 - Prediction-Powered E-Values
 - Prediction via Shapley Value Regression
 - Predictive Data Selection: The Data That Predicts Is the Data That Teaches
 - Predictive Performance of Deep Quantum Data Re-uploading Models
 - Preference Adaptive and Sequential Text-to-Image Generation
 - Preference-CFR: Beyond Nash Equilibrium for Better Game Strategies
 - Preference Controllable Reinforcement Learning with Advanced Multi-Objective Optimization
 - Preference Learning for AI Alignment: a Causal Perspective
 - Preference learning made easy: Everything should be understood through win rate
 - Preference Optimization for Combinatorial Optimization Problems
 - Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs
 - Preserving AUC Fairness in Learning with Noisy Protected Groups
 - Pre-training Auto-regressive Robotic Models with 4D Representations
 - Pretraining Generative Flow Networks with Inexpensive Rewards for Molecular Graph Generation
 - Pre-Training Graph Contrastive Masked Autoencoders are Strong Distillers for EEG
 - Prices, Bids, Values: One ML-Powered Combinatorial Auction to Rule Them All
 - Primal-Dual Neural Algorithmic Reasoning
 - PRIME: Deep Imbalanced Regression with Proxies
 - Primitive Vision: Improving Diagram Understanding in MLLMs
 - Primphormer: Efficient Graph Transformers with Primal Representations
 - Principal-Agent Bandit Games with Self-Interested and Exploratory Learning Agents
 - Principled Algorithms for Optimizing Generalized Metrics in Binary Classification
 - Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples
 - Prior Knowledge Guided Neural Architecture Generation
 - Privacy Amplification by Structured Subsampling for Deep Differentially Private Time Series Forecasting
 - Privacy Amplification Through Synthetic Data: Insights from Linear Regression
 - Privacy Attacks on Image AutoRegressive Models
 - Privacy-Preserving Federated Convex Optimization: Balancing Partial-Participation and Efficiency via Noise Cancellation
 - Privacy-Shielded Image Compression: Defending Against Exploitation from Vision-Language Pretrained Models
 - Private Federated Learning using Preference-Optimized Synthetic Data
 - Private Lossless Multiple Release
 - Private Model Personalization Revisited
 - Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty
 - Probabilistic Factorial Experimental Design for Combinatorial Interventions
 - Probabilistic Group Mask Guided Discrete Optimization for Incremental Learning
 - Probabilistic Interactive 3D Segmentation with Hierarchical Neural Processes
 - Probably Approximately Global Robustness Certification
 - Probing Visual Language Priors in VLMs
 - Procurement Auctions via Approximately Optimal Submodular Optimization
 - ProDiff: Prototype-Guided Diffusion for Minimal Information Trajectory Imputation
 - Product of Experts with LLMs: Boosting Performance on ARC Is a Matter of Perspective
 - Programmatic Representations for Agent Learning
 - Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
 - Progressively Label Enhancement for Large Language Model Alignment
 - Progressive Tempering Sampler with Diffusion
 - Projection Optimization: A General Framework for Multi-Objective and Multi-Group RLHF
 - Projection Pursuit Density Ratio Estimation
 - Promoting Ensemble Diversity with Interactive Bayesian Distributional Robustness for Fine-tuning Foundation Models
 - Prompt-based Depth Pruning of Large Language Models
 - Prompt-to-Leaderboard: Prompt-Adaptive LLM Evaluations
 - ProofAug: Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis
 - Propagate and Inject: Revisiting Propagation-Based Feature Imputation for Graphs with Partially Observed Features
 - Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble
 - Proposer-Agent-Evaluator (PAE): Autonomous Skill Discovery For Foundation Model Internet Agents
 - ProSec: Fortifying Code LLMs with Proactive Security Alignment
 - Protein Structure Tokenization: Benchmarking and New Recipe
 - PROTOCOL: Partial Optimal Transport-enhanced Contrastive Learning for Imbalanced Multi-view Clustering
 - Proto Successor Measure: Representing the Behavior Space of an RL Agent
 - Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction
 - Provable and Practical Online Learning Rate Adaptation with Hypergradient Descent
 - Provable Benefit of Random Permutations over Uniform Sampling in Stochastic Coordinate Descent
 - Provable Benefits of Unsupervised Pre-training and Transfer Learning via Single-Index Models
 - Provable Efficiency of Guidance in Diffusion Models for General Data Distribution
 - Provable In-Context Vector Arithmetic via Retrieving Task Concepts
 - Provable Length Generalization in Sequence Prediction via Spectral Filtering
 - Provable Maximum Entropy Manifold Exploration via Diffusion Models
 - Provable Policy Gradient for Robust Average-Reward MDPs Beyond Rectangularity
 - Provable Zero-Shot Generalization in Offline Reinforcement Learning
 - Provably Cost-Sensitive Adversarial Defense via Randomized Smoothing
 - Provably Efficient Algorithm for Best Scoring Rule Identification in Online Principal-Agent Information Acquisition
 - Provably Efficient Exploration in Inverse Constrained Reinforcement Learning
 - Provably Efficient RL for Linear MDPs under Instantaneous Safety Constraints in Non-Convex Feature Spaces
 - Provably Improving Generalization of Few-shot models with Synthetic Data
 - Provably Near-Optimal Federated Ensemble Distillation with Negligible Overhead
 - PROXSPARSE: REGULARIZED LEARNING OF SEMI-STRUCTURED SPARSITY MASKS FOR PRETRAINED LLMS
 - Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting
 - Prune 'n Predict: Optimizing LLM Decision-making with Conformal Prediction
 - Pruning for GNNs: Lower Complexity with Comparable Expressiveness
 - PTTA: Purifying Malicious Samples for Test-Time Model Adaptation
 - Putnam-AXIOM: A Functional & Static Benchmark for Measuring Higher Level Mathematical Reasoning in LLMs
 - Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
 - PyTDC: A multimodal machine learning training, evaluation, and inference platform for biomedical foundation models
 - QEM-Bench: Benchmarking Learning-based Quantum Error Mitigation and QEMFormer as a Multi-ranged Context Learning Baseline
 - QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
 - QMamba: On First Exploration of Vision Mamba for Image Quality Assessment
 - QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration
 - QPRL : Learning Optimal Policies with Quasi-Potential Functions for Asymmetric Traversal
 - Q-Supervised Contrastive Representation: A State Decoupling Framework for Safe Offline Reinforcement Learning
 - QT-DoG: Quantization-Aware Training for Domain Generalization
 - Quadratic Upper Bound for Boosting Robustness
 - Quadruple Attention in Many-body Systems for Accurate Molecular Property Predictions
 - Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
 - QuanONet: Quantum Neural Operator with Application to Differential Equation
 - Quantifying Memory Utilization with Effective State-Size
 - Quantifying Prediction Consistency Under Fine-tuning Multiplicity in Tabular LLMs
 - Quantifying Treatment Effects: Estimating Risk Ratios via Observational Studies
 - QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
 - Quantum Algorithms for Finite-horizon Markov Decision Processes
 - Quantum Optimization via Gradient-Based Hamiltonian Descent
 - Quantum Speedup for Hypergraph Sparsification
 - Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes
 - QuEst: Enhancing Estimates of Quantile-Based Distributional Measures Using Model Predictions
 - QuEST: Stable Training of LLMs with 1-Bit Weights and Activations
 - QuRe: Query-Relevant Retrieval through Hard Negative Sampling in Composed Image Retrieval
 - QUTE: Quantifying Uncertainty in TinyML models with Early-exit-assisted ensembles for model-monitoring
 - Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers
 - R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
 - R3DM: Enabling Role Discovery and Diversity Through Dynamics Models in Multi-agent Reinforcement Learning
 - Radio: Rate–Distortion Optimization for Large Language Model Compression
 - RAGGED: Towards Informed Design of Scalable and Stable RAG Systems
 - Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
 - Random Feature Representation Boosting
 - Randomized Dimensionality Reduction for Euclidean Maximization and Diversity Measures
 - Random Policy Evaluation Uncovers Policies of Generative Flow Networks
 - Random Registers for Cross-Domain Few-Shot Learning
 - Ranked Entropy Minimization for Continual Test-Time Adaptation
 - Ranked from Within: Ranking Large Multimodal Models Without Labels
 - Ranking with Multiple Oracles: From Weak to Strong Stochastic Transitivity
 - Rank-One Modified Value Iteration
 - RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding
 - Rapid Overfitting of Multi-Pass SGD in Stochastic Convex Optimization
 - Raptor: Scalable Train-Free Embeddings for 3D Medical Volumes Leveraging Pretrained 2D Foundation Models
 - RATE: Causal Explainability of Reward Models with Imperfect Counterfactuals
 - RBench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
 - Reaction Graph: Towards Reaction-Level Modeling for Chemical Reactions with 3D Structures
 - RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning
 - Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment
 - Reasoning Limitations of Multimodal Large Language Models. A case study of Bongard Problems
 - Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation
 - RE-Bench: Evaluating Frontier AI R&D Capabilities of Language Model Agents against Human Experts
 - Recommendations with Sparse Comparison Data: Provably Fast Convergence for Nonconvex Matrix Factorization
 - Reconstructing Cell Lineage Trees from Phenotypic Features with Metric Learning
 - Rectifying Conformity Scores for Better Conditional Coverage
 - Reducing Confounding Bias without Data Splitting for Causal Inference via Optimal Transport
 - Reducing Tool Hallucination via Reliability Alignment
 - Reducing Variance of Stochastic Optimization for Approximating Nash Equilibria in Normal-Form Games
 - Redundancy Undermines the Trustworthiness of Self-Interpretable GNNs
 - ReferSplat: Referring Segmentation in 3D Gaussian Splatting
 - R*: Efficient Reward Design via Reward Structure Evolution and Parameter Alignment Optimization with Large Language Models
 - Refined generalization analysis of the Deep Ritz Method and Physics-Informed Neural Networks
 - Refining Adaptive Zeroth-Order Optimization at Ease
 - Reflection-Bench: Evaluating Epistemic Agency in Large Language Models
 - Reflection-Window Decoding: Text Generation with Selective Refinement
 - Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens
 - ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding
 - ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering
 - REG: Rectified Gradient Guidance for Conditional Diffusion Models
 - Regress, Don't Guess: A Regression-like Loss on Number Tokens for Language Models
 - Regression for the Mean: Auto-Evaluation and Inference with Few Labels through Post-hoc Regression
 - Regret-Free Reinforcement Learning for Temporal Logic Specifications
 - Regularized Langevin Dynamics for Combinatorial Optimization
 - Reidentify: Context-Aware Identity Generation for Contextual Multi-Agent Reinforcement Learning
 - RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation
 - ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning
 - REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective
 - Reinforced Learning Explicit Circuit Representations for Quantum State Characterization from Local Measurements
 - Reinforced Lifelong Editing for Language Models
 - Reinforce LLM Reasoning through Multi-Agent Reflection
 - Reinforcement Learning Control of a Physical Robot Device for Assisted Human Walking without a Simulator
 - Reinforcement Learning for Quantum Control under Physical Constraints
 - Reinforcement Learning with Adaptive Reward Modeling for Expensive-to-Evaluate Systems
 - Reinforcement Learning with Random Time Horizons
 - Reinforcement Learning with Segment Feedback
 - Rejecting Hallucinated State Targets during Planning
 - Relating Misfit to Gain in Weak-to-Strong Generalization Beyond the Squared Loss
 - Relational Conformal Prediction for Correlated Time Series
 - Relational Invariant Learning for Robust Solvation Free Energy Prediction
 - Relative Error Fair Clustering in the Weak-Strong Oracle Model
 - RelGNN: Composite Message Passing for Relational Deep Learning
 - Reliable Algorithm Selection for Machine Learning-Guided Design
 - Reliable and Efficient Amortized Model-based Evaluation
 - Rényi Neural Processes
 - RePaViT: Scalable Vision Transformer Acceleration via Structural Reparameterization on Feedforward Network Layers
 - RepLoRA: Reparameterizing Low-rank Adaptation via the Perspective of Mixture of Experts
 - RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing
 - Representation Preserving Multiclass Agnostic to Realizable Reduction
 - Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing
 - Representations Shape Weak-to-Strong Generalization: Theoretical Insights and Empirical Predictions
 - Representation Surgery in Model Merging with Probabilistic Modeling
 - Representative Language Generation
 - Representative Ranking for Deliberation in the Public Sphere
 - ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation
 - Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger
 - ResearchTown: Simulator of Human Research Community
 - Residual Matrix Transformers: Scaling the Size of the Residual Stream
 - Residual TPP: A Unified Lightweight Approach for Event Stream Data Analysis
 - ResKoopNet: Learning Koopman Representations for Complex Dynamics with Spectral Residuals
 - Resolving Lexical Bias in Model Editing
 - ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals
 - RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior
 - Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
 - Rethink GraphODE Generalization within Coupled Dynamical System
 - Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding
 - Rethinking Aleatoric and Epistemic Uncertainty
 - Rethinking Benign Overfitting in Two-Layer Neural Networks
 - Rethinking Causal Ranking: A Balanced Perspective on Uplift Model Evaluation
 - Rethinking Chain-of-Thought from the Perspective of Self-Training
 - Rethinking Confidence Scores and Thresholds in Pseudolabeling-based SSL
 - Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning
 - Rethinking Latent Redundancy in Behavior Cloning: An Information Bottleneck Approach for Robot Manipulation
 - Rethinking Point Cloud Data Augmentation: Topologically Consistent Deformation
 - Rethinking Score Distilling Sampling for 3D Editing and Generation
 - Rethinking the Bias of Foundation Model under Long-tailed Distribution
 - Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective
 - Rethinking the Temperature for Federated Heterogeneous Distillation
 - Rethinking Time Encoding via Learnable Transformation Functions
 - Rethink the Role of Deep Learning towards Large-scale Quantum Systems
 - Retraining-free Merging of Sparse MoE via Hierarchical Clustering
 - Retraining with Predicted Hard Labels Provably Increases Model Accuracy
 - Retrieval-Augmented Language Model for Knowledge-aware Protein Encoding
 - Retrieval-Augmented Perception: High-resolution Image Perception Meets Visual RAG
 - Retrieval Augmented Time Series Forecasting
 - Retrieval Augmented Zero-Shot Enzyme Generation for Specified Substrate
 - Return Capping: Sample Efficient CVaR Policy Gradient Optimisation
 - Return of the Latent Space COWBOYS: Re-thinking the use of VAEs for Bayesian Optimisation of Structured Spaces
 - Revealing Weaknesses in Text Watermarking Through Self-Information Rewrite Attacks
 - ReverB-SNN: Reversing Bit of the Weight and Activation for Spiking Neural Networks
 - ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification
 - Revisiting Chain-of-Thought in Code Generation: Do Language Models Need to Learn Reasoning before Coding?
 - Revisiting Continuity of Image Tokens for Cross-domain Few-shot Learning
 - Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness
 - Revisiting Cooperative Off-Policy Multi-Agent Reinforcement Learning
 - Revisiting Differentially Private Algorithms for Decentralized Online Learning
 - Revisiting Diffusion Models: From Generative Pre-training to One-Step Generation
 - Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model
 - Revisiting Neural Networks for Few-Shot Learning: A Zero-Cost NAS Perspective
 - Revisiting Noise Resilience Strategies in Gesture Recognition: Short-Term Enhancement in sEMG Analysis
 - Revisiting Non-Acyclic GFlowNets in Discrete Environments
 - Revisiting the Predictability of Performative, Social Events
 - Revisiting Unbiased Implicit Variational Inference
 - Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization
 - Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
 - Reward-free World Models for Online Imitation Learning
 - Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design
 - Reward-Guided Prompt Evolving in Reinforcement Learning for LLMs
 - Reward-Guided Speculative Decoding for Efficient LLM Reasoning
 - Reward Modeling with Ordinal Feedback: Wisdom of the Crowd
 - Reward Translation via Reward Machine in Semi-Alignable MDPs
 - Rhomboid Tiling for Geometric Graph Deep Learning
 - Riemannian Diffusion Adaptation for Distributed Optimization on Manifolds
 - Riemann Tensor Neural Networks: Learning Conservative Systems with Physics-Constrained Networks
 - RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers
 - Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift
 - Right Time to Learn: Promoting Generalization via Bio-inspired Spacing Effect in Knowledge Distillation
 - Ringmaster ASGD: The First Asynchronous SGD with Optimal Time Complexity
 - R.I.P.: Better Models by Survival of the Fittest Prompts
 - RISE: Radius of Influence based Subgraph Extraction for 3D Molecular Graph Explanation
 - Risk and cross validation in ridge regression with correlated samples
 - Risk-Sensitive Theory of Mind: Coordinating with Agents of Unknown Bias using Cumulative Prospect Theory
 - RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
 - RLTHF: Targeted Human Feedback for LLM Alignment
 - Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism
 - Robust and Conjugate Spatio-Temporal Gaussian Processes
 - Robust Automatic Modulation Classification with Fuzzy Regularization
 - Robust Autonomy Emerges from Self-Play
 - Robust Conformal Outlier Detection under Contaminated Reference Data
 - Robust Consensus Anchor Learning for Efficient Multi-view Subspace Clustering
 - RobustLight: Improving Robustness via Diffusion Reinforcement Learning for Traffic Signal Control
 - Robust ML Auditing using Prior Knowledge
 - Robust Multi-Agent Reinforcement Learning with Stochastic Adversary
 - Robust Multi-bit Text Watermark with LLM-based Paraphrasers
 - Robust Multimodal Large Language Models Against Modality Conflict
 - Robust Noise Attenuation via Adaptive Pooling of Transformer Outputs
 - Robust Offline Reinforcement Learning with Linearly Structured $f$-Divergence Regularization
 - Robust Reward Alignment via Hypothesis Space Batch Cutting
 - Robust Secure Swap: Responsible Face Swap With Persons of Interest Redaction and Provenance Traceability
 - Robust Sparsification via Sensitivity
 - Robust Spatio-Temporal Centralized Interaction for OOD Learning
 - RobustZero: Enhancing MuZero Reinforcement Learning Robustness to State Perturbations
 - RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
 - RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer
 - Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
 - ROME is Forged in Adversity: Robust Distilled Datasets via Information Bottleneck
 - ROPO: Robust Preference Optimization for Large Language Models
 - ROS: A GNN-based Relax-Optimize-and-Sample Framework for Max-$k$-Cut Problems
 - RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models
 - rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
 - RuleAdapter: Dynamic Rules for training Safety Reward Models in RLHF
 - RULEBREAKERS: Challenging LLMs at the Crossroads between Formal Logic and Human-like Reasoning
 - RUN: Reversible Unfolding Network for Concealed Object Segmentation
 - Runtime Analysis of Evolutionary NAS for Multiclass Classification
 - RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization
 - RZ-NAS: Enhancing LLM-guided Neural Architecture Search via Reflective Zero-Cost Strategy
 - S2-Track: A Simple yet Strong Approach for End-to-End 3D Multi-Object Tracking
 - S4S: Solving for a Fast Diffusion Model Solver
 - Sable: a Performant, Efficient and Scalable Sequence Model for MARL
 - SADA: Stability-guided Adaptive Diffusion Acceleration
 - SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
 - SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders
 - SAE-V: Interpreting Multimodal Models for Enhanced Alignment
 - SafeArena: Evaluating the Safety of Autonomous Web Agents
 - SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
 - Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets
 - Safe-EF: Error Feedback for Non-smooth Constrained Optimization
 - SAFE: Finding Sparse and Flat Minima to Improve Pruning
 - Safely Learning Optimal Auctions: A Testable Learning Framework for Mechanism Design
 - SafeMap: Robust HD Map Construction from Incomplete Observations
 - SAFER: A Calibrated Risk-Aware Multimodal Recommendation Model for Dynamic Treatment Regimes
 - Safety Alignment Can Be Not Superficial With Explicit Safety Signals
 - SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior
 - Safety Certificate against Latent Variables with Partially Unidentifiable Dynamics
 - Safety-Polarized and Prioritized Reinforcement Learning
 - Safety Reasoning with Guidelines
 - SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization
 - SAH-Drive: A Scenario-Aware Hybrid Planner for Closed-Loop Vehicle Trajectory Generation
 - SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
 - Sample Complexity of Branch-length Estimation by Maximum Likelihood
 - Sample Complexity of Correlation Detection in the Gaussian Wigner Model
 - Sample Complexity of Distributionally Robust Off-Dynamics Reinforcement Learning with Online Interaction
 - Sample Efficient Demonstration Selection for In-Context Learning
 - Sample-Optimal Agnostic Boosting with Unlabeled Data
 - Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification
 - Sample-specific Noise Injection for Diffusion-based Adversarial Purification
 - Sampling Binary Data by Denoising through Score Functions
 - Sampling from Binary Quadratic Distributions via Stochastic Localization
 - SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer
 - SAND: One-Shot Feature Selection with Additive Noise Distortion
 - SAN: Hypothesizing Long-Term Synaptic Development and Neural Engram Mechanism in Scalable Model's Parameter-Efficient Fine-Tuning
 - Sanity Checking Causal Representation Learning on a Simple Real-World System
 - Sassha: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
 - Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
 - SBGD: Improving Graph Diffusion Generative Model via Stochastic Block Diffusion
 - Scaffold with Stochastic Gradients: New Analysis with Linear Speed-Up
 - Scalable Approximation Algorithms for $p$-Wasserstein Distance and Its Variants
 - Scalable Attribute-Missing Graph Clustering via Neighborhood Differentiation
 - Scalable Equilibrium Sampling with Sequential Boltzmann Generators
 - Scalable First-order Method for Certifying Optimal k-Sparse GLMs
 - Scalable Gaussian Processes with Latent Kronecker Structure
 - Scalable Generation of Spatial Transcriptomics from Histology Images via Whole-Slide Flow Matching
 - Scalable Meta-Learning via Mixed-Mode Differentiation
 - Scalable Model Merging with Progressive Layer-wise Distillation
 - Scalable Non-Equivariant 3D Molecule Generation via Rotational Alignment
 - Scalable Private Partition Selection via Adaptive Weighting
 - Scalable Sobolev IPM for Probability Measures on a Graph
 - Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks
 - Scaling Inference-Efficient Language Models
 - Scaling Large Motion Models with Million-Level Human Motions
 - Scaling Laws for Differentially Private Language Models
 - Scaling Laws for Floating–Point Quantization Training
 - Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection
 - Scaling Laws for Pre-training Agents and World Models
 - Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream
 - Scaling Laws for Upcycling Mixture-of-Experts Language Models
 - Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More
 - Scaling Probabilistic Circuits via Monarch Matrices
 - Scaling Sparse Feature Circuits For Studying In-Context Learning
 - Scaling Test-Time Compute Without Verification or RL is Suboptimal
 - Scaling Trends in Language Model Robustness
 - Scaling Up Intervention Models
 - Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning
 - Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation
 - SCENIR: Visual Semantic Clarity through Unsupervised Scene Graph Retrieval
 - SCENT: Robust Spatiotemporal Learning for Continuous Scientific Data via Scalable Conditioned Neural Fields
 - Schwarz–Schur Involution: Lightspeed Differentiable Sparse Linear Solvers
 - sciLaMA: A Single-Cell Representation Learning Framework to Leverage Prior Knowledge from Large Language Models
 - SCISSOR: Mitigating Semantic Bias through Cluster-Aware Siamese Networks for Robust Classification
 - Score as Action: Fine Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning
 - Score-Based Diffusion Policy Compatible with Reinforcement Learning via Optimal Transport
 - Score-based Pullback Riemannian Geometry: Extracting the Data Manifold Geometry using Anisotropic Flows
 - Score Matching with Missing Data
 - Score-of-Mixture Training: One-Step Generative Model Training Made Simple via Score Estimation of Mixture Distributions
 - scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data
 - SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations
 - SDMG: Smoothing Your Diffusion Models for Powerful Graph Representation Learning
 - SDP-CROWN: Efficient Bound Propagation for Neural Network Verification with Tightness of Semidefinite Programming
 - SE(3)-Equivariant Diffusion Policy in Spherical Fourier Space
 - SEAD: Unsupervised Ensemble of Streaming Anomaly Detectors
 - Secant Line Search for Frank-Wolfe Algorithms
 - SecEmb: Sparsity-Aware Secure Federated Learning of On-Device Recommender System with Large Embedding
 - SECOND: Mitigating Perceptual Hallucination in Vision-Language Models via Selective and Contrastive Decoding
 - Securing Equal Share: A Principled Approach for Learning Multiplayer Symmetric Games
 - SeedLoRA: A Fusion Approach to Efficient LLM Fine-Tuning
 - SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning
 - Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation
 - Selective Preference Aggregation
 - Selective Prompt Anchoring for Code Generation
 - Selective Response Strategies for GenAI
 - Self-Bootstrapping for Versatile Test-Time Adaptation
 - SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models
 - Self-Consistency Preference Optimization
 - Self-Consuming Generative Models with Adversarially Curated Data
 - Self-cross Feature based Spiking Neural Networks for Efficient Few-shot Learning
 - Self-Discriminative Modeling for Anomalous Graph Detection
 - Self-Disentanglement and Re-Composition for Cross-Domain Few-Shot Segmentation
 - Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI
 - Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
 - Self-Organizing Visual Prototypes for Non-Parametric Representation Learning
 - Self-Play $Q$-Learners Can Provably Collude in the Iterated Prisoner's Dilemma
 - Self-supervised Adversarial Purification for Graph Neural Networks
 - Self-Supervised Learning of Intertwined Content and Positional Features for Object Detection
 - Self-supervised Masked Graph Autoencoder via Structure-aware Curriculum
 - Self-Supervised Transformers as Iterative Solution Improvers for Constraint Satisfaction
 - Semantics-aware Test-time Adaptation for 3D Human Pose Estimation
 - Semantic Shift Estimation via Dual-Projection and Classifier Reconstruction for Exemplar-Free Class-Incremental Learning
 - Semi-Supervised Blind Quality Assessment with Confidence-quantifiable Pseudo-label Learning for Authentic Images
 - SEMU: Singular Value Decomposition for Efficient Machine Unlearning
 - SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models
 - Separating Knowledge and Perception with Procedural Data
 - SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
 - SERENA: A Unified Stochastic Recursive Variance Reduced Gradient Framework for Riemannian Non-Convex Optimization
 - Settling the Maximin Share Fairness for Scheduling among Groups of Machines
 - Set Valued Predictions For Robust Domain Generalization
 - SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
 - SGD Jittering: A Training Strategy for Robust and Accurate Model-Based Architectures
 - ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
 - SHARP-Distill: A 68× Faster Recommender System with Hypergraph Neural Networks and Language Models
 - Sharp Generalization for Nonparametric Regression by Over-Parameterized Neural Networks: A Distribution-Free Analysis in Spherical Covariate
 - Sharp Optimality of Simple, Plug-in Estimation of the Fisher Information of a Smoothed Density
 - SHE: Streaming-media Hashing Retrieval
 - ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning
 - Shielded Diffusion: Generating Novel and Diverse Images using Sparse Repellency
 - SHIELD: Multi-task Multi-distribution Vehicle Routing Solver with Sparsity and Hierarchy
 - Shifting Time: Time-series Forecasting with Khatri-Rao Neural Operators
 - Shortcut-connected Expert Parallelism for Accelerating Mixture of Experts
 - Should Decision-Makers Reveal Classifiers in Online Strategic Classification?
 - Sidechain conditioning and modeling for full-atom protein sequence design with FAMPNN
 - Signed Laplacians for Constrained Graph Clustering
 - Simple and Critical Iterative Denoising: A Recasting of Discrete Diffusion in Graph Generation
 - SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
 - Simple Path Structural Encoding for Graph Transformers
 - Simple Policy Optimization
 - Simple Randomized Rounding for Max-Min Eigenvalue Augmentation
 - Simplicity Bias and Optimization Threshold in Two-Layer ReLU Networks
 - Simplifying DINO via Coding Rate Regularization
 - Simultaneous Multi-Robot Motion Planning with Projected Diffusion Models
 - Since Faithfulness Fails: The Performance Limits of Neural Causal Discovery
 - SING: Spatial Context in Large Language Model for Next-Gen Wearables
 - SITCOM: Step-wise Triple-Consistent Diffusion Sampling For Inverse Problems
 - SketchDNN: Joint Continuous-Discrete Diffusion for CAD Sketch Generation
 - Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation
 - SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization
 - SkipGPT: Each Token is One of a Kind
 - Skip the Equations: Learning Behavior of Personalized Dynamical Systems Directly From Data
 - SKOLR: Structured Koopman Operator Linear RNN for Time-Series Forecasting
 - Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation
 - SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs
 - Sleeping Reinforcement Learning
 - Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning
 - SlimLLM: Accurate Structured Pruning for Large Language Models
 - SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
 - Slimming the Fat-Tail: Morphing-Flow for Adaptive Time Series Modeling
 - SLiM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression
 - SMART-PC: Skeletal Model Adaptation for Robust Test-Time Training in Point Clouds
 - Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences
 - Smooth Interpolation for Improved Discrete Graph Generative Models
 - SNS-Bench: Defining, Building, and Assessing Capabilities of Large Language Models in Social Networking Services
 - Socialized Coevolution: Advancing a Better World through Cross-Task Collaboration
 - Softmax is not Enough (for Sharp Size Generalisation)
 - Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration
 - SOLD: Slot Object-Centric Latent Dynamics Models for Relational Manipulation Learning from Pixels
 - Solving Linear-Gaussian Bayesian Inverse Problems with Decoupled Diffusion Sequential Monte Carlo
 - Solving Probabilistic Verification Problems of Neural Networks using Branch and Bound
 - Solving Satisfiability Modulo Counting Exactly with Probabilistic Circuits
 - Solving Zero-Sum Convex Markov Games
 - SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
 - Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model
 - Sort Before You Prune: Improved Worst-Case Guarantees of the DiskANN Family of Graphs
 - Sortformer: A Novel Approach for Permutation-Resolved Speaker Supervision in Speech-to-Text Systems
 - Sounding that Object: Interactive Object-Aware Image to Audio Generation
 - Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging
 - SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model
 - SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference
 - Sparse Autoencoders, Again?
 - Sparse Autoencoders for Hypothesis Generation
 - Sparse Causal Discovery with Generative Intervention for Unsupervised Graph Domain Adaptation
 - SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
 - Sparse-pivot: Dynamic correlation clustering for node insertions
 - Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks
 - Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry
 - Sparse Video-Gen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity
 - SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
 - Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
 - Spatial Reasoning with Denoising Models
 - SPD: Sync-Point Drop for Efficient Tensor Parallelism of Large Language Models
 - Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
 - SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs
 - Spectral-Aware Reservoir Computing for Fast and Accurate Time Series Classification
 - Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding
 - Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation
 - Speeding up Policy Simulation in Supply Chain RL
 - SPEX: Scaling Feature Interaction Explanations for LLMs
 - Spherical-Nested Diffusion Model for Panoramic Image Outpainting
 - Spherical Rotation Dimension Reduction with Geometric Loss Functions
 - SPHINX: Structural Prediction using Hypergraph Inference Network
 - SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity
 - SpikF: Spiking Fourier Network for Efficient Long-term Prediction
 - Splitting & Integrating: Out-of-Distribution Detection via Adversarial Gradient Attribution
 - Splitting with Importance-aware Updating for Heterogeneous Federated Learning with Large Language Models
 - SPMC: Self-Purifying Federated Backdoor Defense via Margin Contribution
 - SPRI: Aligning Large Language Models with Context-Situated Principles
 - Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and Over-Parameterization
 - Square$\chi$PO: Differentially Private and Robust $\chi^2$-Preference Optimization in Offline Direct Alignment
 - SSHR: More Secure Generative Steganography with High-Quality Revealed Secret Images
 - Stability and Generalization Analysis of Decentralized SGD: Sharper Bounds Beyond Lipschitzness and Smoothness
 - Stability and Generalization Capability of Subgraph Reasoning Models for Inductive Knowledge Graph Completion
 - Stabilizing Sample Similarity in Representation via Mitigating Random Consistency
 - Stable Fair Graph Representation Learning with Lipschitz Constraint
 - Stable Offline Value Function Learning with Bisimulation-based Representations
 - Stacey: Promoting Stochastic Steepest Descent via Accelerated $\ell_p$-Smooth Nonconvex Optimization
 - Staged and Physics-Grounded Learning Framework with Hyperintensity Prior for Pre-Contrast MRI Synthesis
 - STAIR: Improving Safety Alignment with Introspective Reasoning
 - STAMP Your Content: Proving Dataset Membership via Watermarked Rephrasings
 - Star Attention: Efficient LLM Inference over Long Sequences
 - STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization
 - Statistical and Computational Guarantees of Kernel Max-Sliced Wasserstein Distances
 - Statistical Collusion by Collectives on Learning Platforms
 - Statistical Hypothesis Testing for Auditing Robustness in Language Models
 - Statistical Query Hardness of Multiclass Linear Classification with Random Classification Noise
 - Statistical Test for Feature Selection Pipelines by Selective Inference
 - Stay Hungry, Keep Learning: Sustainable Plasticity for Deep Reinforcement Learning
 - Stay-Positive: A Case for Ignoring Real Image Features in Fake Image Detection
 - STD-FD: Spatio-Temporal Distribution Fitting Deviation for AIGC Forgery Identification
 - Stealing That Free Lunch: Exposing the Limits of Dyna-Style Reinforcement Learning
 - Stealix: Model Stealing via Prompt Evolution
 - StealthInk: A Multi-bit and Stealthy Watermark for Large Language Models
 - Steerable Transformers for Volumetric Data
 - Steering Protein Language Models
 - Steer LLM Latents for Hallucination Detection
 - Step-DAD: Semi-Amortized Policy-Based Bayesian Experimental Design
 - Stochastic Control for Fine-tuning Diffusion Models: Optimality, Regularity, and Convergence
 - Stochastic Deep Restoration Priors for Imaging Inverse Problems
 - Stochastic Encodings for Active Feature Acquisition
 - Stochastic Forward–Backward Deconvolution: Training Diffusion Models with Finite Noisy Datasets
 - Stochastic Layer-Wise Shuffle for Improving Vision Mamba Training
 - Stochastic Online Conformal Prediction with Semi-Bandit Feedback
 - Stochastic Poisson Surface Reconstruction with One Solve using Geometric Gaussian Processes
 - Stochastic Smoothed Primal-Dual Algorithms for Nonconvex Optimization with Linear Inequality Constraints
 - SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics
 - STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving
 - Strategic A/B testing via Maximum Probability-driven Two-armed Bandit
 - Strategic Planning: A Top-Down Approach to Option Generation
 - Strategy Coopetition Explains the Emergence and Transience of In-Context Learning
 - Stray Intrusive Outliers-Based Feature Selection on Intra-Class Asymmetric Instance Distribution or Multiple High-Density Clusters
 - Stream-level Flow Matching with Gaussian Processes
 - Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
 - Strengthen Out-of-Distribution Detection Capability with Progressive Self-Knowledge Distillation
 - Strong and Weak Identifiability of Optimization-based Causal Discovery in Non-linear Additive Noise Models
 - Stronger Neyman Regret Guarantees for Adaptive Experimental Design
 - Structured Preconditioners in Adaptive Optimization: A Unified Analysis
 - Structure-Guided Large Language Models for Text-to-SQL Generation
 - Structure-informed Risk Minimization for Robust Ensemble Learning
 - Structure Is All You Need: Structural Representation Learning on Hyper-Relational Knowledge Graphs
 - Subgoal-Guided Policy Heuristic Search with Learned Subgoals
 - Subgroups Matter for Robust Bias Mitigation
 - Subobject-level Image Tokenization
 - Sub-Sequential Physics-Informed Learning with State Space Model
 - Subspace Optimization for Large Language Models with Convergence Guarantees
 - SUICA: Learning Super-high Dimensional Sparse Implicit Neural Representations for Spatial Transcriptomics
 - Suitability Filter: A Statistical Framework for Classifier Evaluation in Real-World Deployment Settings
 - Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Groups
 - Sundial: A Family of Highly Capable Time Series Foundation Models
 - Supercharging Graph Transformers with Advective Diffusion
 - Super Deep Contrastive Information Bottleneck for Multi-modal Clustering
 - Supervised Contrastive Learning from Weakly-Labeled Audio Segments for Musical Version Matching
 - Surrogate Prompt Learning: Towards Efficient and Diverse Prompt Learning for Vision-Language Models
 - Survival Analysis via Density Estimation
 - SWAN: SGD with Normalization and Whitening Enables Stateless LLM Training
 - SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?
 - Symmetric Reinforcement Learning Loss for Robust Learning on Diverse Tasks and Model Scales
 - Symmetry-Aware GFlowNets
 - Symmetry-Driven Discovery of Dynamical Variables in Molecular Simulations
 - Symmetry-Robust 3D Orientation Estimation
 - SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering
 - SynEVO: A neuro-inspired spatiotemporal evolutional framework for cross-domain adaptation
 - Synonymous Variational Inference for Perceptual Image Compression
 - Synthesizing Images on Perceptual Boundaries of ANNs for Uncovering and Manipulating Human Perceptual Variability
 - Synthesizing Privacy-Preserving Text Data via Finetuning *without* Finetuning Billion-Scale LLMs
 - Synthesizing Software Engineering Data in a Test-Driven Manner
 - Synthetic Face Datasets Generation via Latent Space Exploration from Brownian Identity Diffusion
 - Synthetic Text Generation for Training Large Language Models via Gradient Matching
 - System-Aware Unlearning Algorithms: Use Lesser, Forget Faster
 - T1: Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
 - TabFlex: Scaling Tabular Learning to Millions with Linear Attention
 - TabFSBench: Tabular Benchmark for Feature Shifts in Open Environments
 - TabICL: A Tabular Foundation Model for In-Context Learning on Large Data
 - TabNAT: A Continuous-Discrete Joint Generative Framework for Tabular Data
 - TabPFN Unleashed: A Scalable and Effective Solution to Tabular Classification Problems
 - TabSDS: a Lightweight, Fully Non-Parametric, and Model Free Approach for Generating Synthetic Tabular Data
 - Tackling Dimensional Collapse toward Comprehensive Universal Domain Adaptation
 - Tackling View-Dependent Semantics in 3D Language Gaussian Splatting
 - Taming Diffusion for Dataset Distillation with High Representativeness
 - Taming Knowledge Conflicts in Language Models
 - Taming Rectified Flow for Inversion and Editing
 - TANGO: Clustering with Typicality-Aware Nonlocal Mode-Seeking and Graph-Cut Optimization
 - Target Concrete Score Matching: A Holistic Framework for Discrete Diffusion
 - Targeted control of fast prototyping through domain-specific interface
 - Targeted Low-rank Refinement: Enhancing Sparse Language Models with Precision
 - Targeted Unlearning with Single Layer Unlearning Gradient
 - TAROT: Targeted Data Selection via Optimal Transport
 - Task-Agnostic Pre-training and Task-Guided Fine-tuning for Versatile Diffusion Planner
 - Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks
 - Task-Gated Multi-Expert Collaboration Network for Degraded Multi-Modal Image Fusion
 - Task Generalization with Autoregressive Compositional Structure: Can Learning from $D$ Tasks Generalize to $D^T$ Tasks?
 - TCP-Diffusion: A Multi-modal Diffusion Model for Global Tropical Cyclone Precipitation Forecasting with Change Awareness
 - Teaching Language Models to Critique via Reinforcement Learning
 - Teaching Physical Awareness to LLMs through Sounds
 - Teaching Transformers Causal Reasoning through Axiomatic Training
 - TeDS: Joint Learning of Diachronic and Synchronic Perspectives in Quaternion Space for Temporal Knowledge Graph Completion
 - Telling Peer Direct Effects from Indirect Effects in Observational Network Data
 - TeLoGraF: Temporal Logic Planning via Graph-encoded Flow Matching
 - Temperature-Annealed Boltzmann Generators
 - Temporal Difference Flows
 - Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning
 - Temporal Misalignment in ANN-SNN Conversion and its Mitigation via Probabilistic Spiking Neurons
 - Temporal Query Network for Efficient Multivariate Time Series Forecasting
 - Tensor Decomposition Based Memory-Efficient Incremental Learning
 - Tensorized Multi-View Multi-Label Classification via Laplace Tensor Rank
 - Tensor Product Neural Networks for Functional ANOVA Model
 - Tensor-Var: Efficient Four-Dimensional Variational Data Assimilation
 - TerraBytes: Towards global datasets and models for Earth Observation
 - Testing Conditional Mean Independence Using Generative Neural Networks
 - Testing the Limits of Fine-Tuning for Improving Visual Cognition in Vision Language Models
 - Test-Time Adaptation for Online Vision-Language Navigation with Feedback-based Reinforcement Learning
 - Test-time Adaptation on Graphs via Adaptive Subgraph-based Selection and Regularized Prototypes
 - Test-Time Adaptation with Binary Feedback
 - Test-time Adapted Reinforcement Learning with Action Entropy Regularization
 - Test-Time Canonicalization by Foundation Models for Robust Perception
 - Test-time Correlation Alignment
 - Test-Time Graph Neural Dataset Search With Generative Projection
 - Test-Time Learning for Large Language Models
 - Test-Time Multimodal Backdoor Detection by Contrastive Prompting
 - Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
 - Test-Time Selective Adaptation for Uni-Modal Distribution Shift in Multi-Modal Data
 - Test-Time Training Provably Improves Transformers as In-context Learners
 - TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation
 - Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models
 - Text-to-LoRA: Instant Transformer Adaption
 - Textual Unlearning Gives a False Sense of Unlearning
 - Textural or Textual: How Vision-Language Models Read Text in Images
 - TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization
 - The 1st Workshop on Vector Databases
 - The 2nd Workshop on Reliable and Responsible Foundation Models
 - The Batch Complexity of Bandit Pure Exploration
 - The Berkeley Function Calling Leaderboard (BFCL): From Tool Use to Agentic Evaluation of Large Language Models
 - The Best of Both Worlds: Bridging Quality and Diversity in Data Selection with Bipartite Graph
 - The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning
 - The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions
 - The Canary’s Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text
 - The Case for Learned Provenance-based System Behavior Baseline
 - The Complexity of Learning Sparse Superposed Features with Feedback
 - The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning
 - The dark side of the forces: assessing non-conservative force models for atomistic machine learning
 - The Devil Is in the Details: Tackling Unimodal Spurious Correlations for Generalizable Multimodal Reward Models
 - The Diffusion Duality
 - The Disparate Benefits of Deep Ensembles
 - The Double-Ellipsoid Geometry of CLIP
 - The Elicitation Game: Evaluating Capability Elicitation Techniques
 - The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination
 - The Empirical Mean is Minimax Optimal for Local Glivenko-Cantelli
 - The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking
 - The Four Color Theorem for Cell Instance Segmentation
 - The Generalized Skew Spectrum of Graphs
 - The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence
 - The Global Convergence Time of Stochastic Gradient Descent in Non-Convex Landscapes: Sharp Estimates via Large Deviations
 - The Harder Path: Last Iterate Convergence for Uncoupled Learning in Zero-Sum Games with Bandit Feedback
 - The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Analysis of Orthogonal Safety Directions
 - The Hidden Joules: Evaluating the Energy Consumption of Vision Backbones for Progress Towards More Efficient Model Inference
 - The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models Via Visual Information Steering
 - The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them)
 - The Impact of Memorization on Trustworthy Foundation Models
 - The Impact of On-Policy Parallelized Data Collection on Deep Reinforcement Learning Networks
 - The impact of uncertainty on regularized learning in games
 - The Importance of Being Lazy: Scaling Limits of Continual Learning
 - The Jailbreak Tax: How Useful are Your Jailbreak Outputs?
 - The Limits of Predicting Agents from Behaviour
 - The Limits of Tractable Marginalization
 - The Lock-in Hypothesis: Stagnation by Algorithm
 - The Logical Implication Steering Method for Conditional Interventions on Transformer Generation
 - The Missing Alignment Link of In-context Learning on Sequences
 - The Noisy Laplacian: a Threshold Phenomenon for Non-Linear Dimension Reduction
 - The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes
 - Theoretical guarantees on the best-of-n alignment policy
 - Theoretical Limitations of Ensembles in the Age of Overparameterization
 - Theoretically Unmasking Inference Attacks Against LDP-Protected Clients in Federated Vision Models
 - Theoretical Performance Guarantees for Partial Domain Adaptation via Partial Optimal Transport
 - The Panaceas for Improving Low-Rank Decomposition in Communication-Efficient Federated Learning
 - The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret
 - The Polynomial Stein Discrepancy for Assessing Moment Convergence
 - The Power of Random Features and the Limits of Distribution-Free Gradient Descent
 - The Price of Freedom: Exploring Expressivity and Runtime Tradeoffs in Equivariant Tensor Products
 - The Price of Linear Time: Error Analysis of Structured Kernel Interpolation
 - The Relationship Between No-Regret Learning and Online Conformal Prediction
 - The Ripple Effect: On Unforeseen Complications of Backdoor Attacks
 - Thermalizer: Stable autoregressive neural emulation of spatiotemporal chaos
 - The Role of Randomness in Stability
 - The Role of Sparsity for Length Generalization in LLMs
 - The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability
 - The Second Workshop on Long-Context Foundation Models
 - The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
 - The Sparse-Plus-Low-Rank Quasi-Newton Method for Entropic-Regularized Optimal Transport
 - The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training
 - The Surprising Effectiveness of Test-Time Training for Few-Shot Learning
 - The Synergy of LLMs & RL Unlocks Offline Learning of Generalizable Language-Conditioned Policies with Low-fidelity Data
 - The Underlying Logic of Language Models
 - The underlying structures of self-attention: symmetry, directionality, and emergent dynamics in Transformer training
 - The Underlying Universal Statistical Structure of Natural Datasets
 - The Value of Prediction in Identifying the Worst-Off
 - Thickness-aware E(3)-Equivariant 3D Mesh Neural Networks
 - Thinking LLMs: General Instruction Following with Thought Generation
 - Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
 - Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making
 - Three-Dimensional Trajectory Prediction with 3DMoTraj Dataset
 - Tight and Fast Bounds for Multi-Label Learning
 - Tightening Causal Bounds via Covariate-Aware Optimal Transport
 - Tilted Sharpness-Aware Minimization
 - Time-Aware World Model for Adaptive Prediction and Control
 - TimeBase: The Power of Minimalism in Efficient Long-term Time Series Forecasting
 - TimeBridge: Non-Stationarity Matters for Long-term Time Series Forecasting
 - TimeDART: A Diffusion Autoregressive Transformer for Self-Supervised Time Series Representation
 - TimeFilter: Patch-Specific Spatial-Temporal Graph Filtration for Time Series Forecasting
 - TimePoint: Accelerated Time Series Alignment via Self-Supervised Keypoint and Descriptor Learning
 - TimePro: Efficient Multivariate Long-term Time Series Forecasting with Variable- and Time-Aware Hyper-state
 - Time Series Representations with Hard-Coded Invariances
 - TimeStacker: A Novel Framework with Multilevel Observation for Capturing Nonstationary Patterns in Time Series Forecasting
 - TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision
 - Time to Spike? Understanding the Representational Power of Spiking Neural Networks in Discrete Time
 - Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting
 - TIMING: Temporality-Aware Integrated Gradients for Time Series Explanation
 - TINED: GNNs-to-MLPs by Teacher Injection and Dirichlet Energy Distillation
 - TinyMIG: Transferring Generalization from Vision Foundation Models to Single-Domain Medical Imaging
 - Tiny Titans: The next wave of On-Device Learning for Foundation Models (TTODLer-FM)
 - TLLC: Transfer Learning-based Label Completion for Crowdsourcing
 - TMetaNet: Topological Meta-Learning Framework for Dynamic Link Prediction
 - To Each Metric Its Decoding: Post-Hoc Optimal Decision Rules of Probabilistic Hierarchical Classifiers
 - Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
 - Token Cleaning: Fine-Grained Data Selection for LLM Supervised Fine-Tuning
 - Token Coordinated Prompt Attention is Needed for Visual Prompting
 - Tokenization Workshop (TokShop)
 - Tokenized Bandit for LLM Decoding and Alignment
 - Token Signature: Predicting Chain-of-Thought Gains with Token Decoding Feature in Large Language Models
 - TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation
 - ToMA: Token Merge with Attention for Diffusion Models
 - Tool Unlearning for Tool-Augmented LLMs
 - TopInG: Topologically Interpretable Graph Learning via Persistent Rationale Filtration
 - TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference
 - Topological Signatures of Adversaries in Multimodal Alignments
 - Topology-Aware Dynamic Reweighting for Distribution Shifts on Graph
 - Topology-aware Neural Flux Prediction Guided by Physics
 - TopoTune: A Framework for Generalized Combinatorial Complex Neural Networks
 - To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models
 - Toward a Unified Theory of Gradient Descent under Generalized Smoothness
 - Toward Data-centric Directed Graph Learning: An Entropy-driven Approach
 - Toward Efficient Kernel-Based Solvers for Nonlinear PDEs
 - Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage
 - Towards a Formal Theory of Representational Compositionality
 - Towards a General Time Series Forecasting Model with Unified Representation and Adaptive Transfer
 - Towards a Mechanistic Explanation of Diffusion Model Generalization
 - Towards an Explainable Comparison and Alignment of Feature Embeddings
 - Towards Attributions of Input Variables in a Coalition
 - Towards a Unified Framework of Clustering-based Anomaly Detection
 - Towards Better-than-2 Approximation for Constrained Correlation Clustering
 - Towards Black-Box Membership Inference Attack for Diffusion Models
 - Towards characterizing the value of edge embeddings in Graph Neural Networks
 - Towards Cost-Effective Reward Guided Text Generation
 - Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning
 - Towards Escaping from Class Dependency Modeling for Multi-Dimensional Classification
 - Towards flexible perception with visual memory
 - Towards Global-level Mechanistic Interpretability: A Perspective of Modular Circuits of Large Language Models
 - Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees
 - Towards Learning to Complete Anything in Lidar
 - Towards Lifelong Model Editing via Simulating Ideal Editor
 - Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond
 - Towards Memorization Estimation: Fast, Formal and Free
 - Towards Practical Defect-Focused Automated Code Review
 - Towards Rationale-Answer Alignment of LVLMs via Self-Rationale Calibration
 - Towards Robust Influence Functions with Flat Validation Minima
 - Towards Robustness and Explainability of Automatic Algorithm Selection
 - Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models
 - Towards the Causal Complete Cause of Multi-Modal Representation Learning
 - Towards the Efficient Inference by Incorporating Automated Computational Phenotypes under Covariate Shift
 - Towards Theoretical Understanding of Sequential Decision Making with Preference Feedback
 - Towards Trustworthy Federated Learning with Untrusted Participants
 - Towards Understanding Catastrophic Forgetting in Two-layer Convolutional Neural Networks
 - Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis
 - Towards Understanding Gradient Dynamics of the Sliced-Wasserstein Distance via Critical Point Analysis
 - Towards Understanding Parametric Generalized Category Discovery on Graphs
 - Towards Universal Offline Black-Box Optimization via Learning Language Model Embeddings
 - Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
 - TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation
 - TraceGrad: a Framework Learning Expressive SO(3)-equivariant Non-linear Representations for Electronic-Structure Hamiltonian Prediction
 - Tracking Most Significant Shifts in Infinite-Armed Bandits
 - Tracking The Best Expert Privately
 - Tractable Transformers for Flexible Conditional Generation
 - Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions
 - Training a Generally Curious Agent
 - Training Deep Learning Models with Norm-Constrained LMOs
 - Training Diffusion-based Generative Models with Limited Data
 - Training Dynamics of In-Context Learning in Linear Attention
 - Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra
 - Training High Performance Spiking Neural Network by Temporal Model Calibration
 - Training Neural Networks at Any Scale
 - Training Software Engineering Agents and Verifiers with SWE-Gym
 - Trajectory Inference with Smooth Schrödinger Bridges
 - Trajectory World Models for Heterogeneous Environments
 - Transfer Learning for Nonparametric Contextual Dynamic Pricing
 - Transfer Q-Learning with Composite MDP Structures
 - Transformative or Conservative? Conservation laws for ResNets and Transformers
 - Transformer-Based Spatial-Temporal Counterfactual Outcomes Estimation
 - Transolver++: An Accurate Neural Solver for PDEs on Million-Scale Geometries
 - TransPL: VQ-Code Transition Matrices for Pseudo-Labeling of Time Series Unsupervised Domain Adaptation
 - TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree
 - Tree-Sliced Wasserstein Distance: A Geometric Perspective
 - Tree-Sliced Wasserstein Distance with Nonlinear Projection
 - Triple-Optimistic Learning for Stochastic Contextual Bandits with General Constraints
 - Trusted Multi-View Classification with Expert Knowledge Constraints
 - Trust-Region Twisted Policy Improvement
 - TRUST-VLM: Thorough Red-Teaming for Uncovering Safety Threats in Vision-Language Models
 - Trustworthy Machine Learning through Data-Specific Indistinguishability
 - TruthFlow: Truthful LLM Generation via Representation Flow Correction
 - TSP: A Two-Sided Smoothed Primal-Dual Method for Nonconvex Bilevel Optimization
 - TS-SNN: Temporal Shift Module for Spiking Neural Networks
 - TtBA: Two-third Bridge Approach for Decision-Based Adversarial Attack
 - TTFSFormer: A TTFS-based Lossless Conversion of Spiking Transformer
 - TuCo: Measuring the Contribution of Fine-Tuning to Individual Responses of LLMs
 - TUMTraf VideoQA: Dataset and Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes
 - Tuning LLM Judge Design Decisions for 1/1000 of the Cost
 - Tuning Sequential Monte Carlo Samplers via Greedy Incremental Divergence Minimization
 - Tutorial on Mechanistic Interpretability for Language Models
 - Two Tickets are Better than One: Fair and Accurate Hiring Under Strategic LLM Manipulations
 - TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories
 - UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning
 - UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
 - UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
 - Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion
 - Ultra-Resolution Adaptation with Ease
 - UltraTWD: Optimizing Ultrametric Trees for Tree-Wasserstein Distance
 - Unbiased Evaluation of Large Language Models from a Causal Perspective
 - Unbiased Recommender Learning from Implicit Feedback via Weakly Supervised Learning
 - UncertainSAM: Fast and Efficient Uncertainty Quantification of the Segment Anything Model
 - Uncertainty-Based Extensible Codebook for Discrete Federated Learning in Heterogeneous Data Silos
 - Uncertainty Estimation for Heterophilic Graphs Through the Lens of Information Theory
 - Uncertainty Quantification for LLM-Based Survey Simulations
 - Unconstrained Robust Online Convex Optimization
 - Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning
 - Understanding and Improving Length Generalization in Recurrent Models
 - Understanding and Mitigating Memorization in Diffusion Models for Tabular Data
 - Understanding and Mitigating Memorization in Generative Models via Sharpness of Probability Landscapes
 - Understanding and Mitigating Miscalibration in Prompt Tuning for Vision-Language Models
 - Understanding Chain-of-Thought in LLMs through Information Theory
 - Understanding Complexity in VideoQA via Visual Program Generation
 - Understanding Fixed Predictions via Confined Regions
 - Understanding Generalization in Quantum Machine Learning with Margins
 - Understanding High-Dimensional Bayesian Optimization
 - Understanding Input Selectivity in Mamba: Impact on Approximation Power, Memorization, and Associative Recall Capacity
 - Understanding Mode Connectivity via Parameter Space Symmetry
 - Understanding Model Ensemble in Transferable Adversarial Attack
 - Understanding Model Reprogramming for CLIP via Decoupling Visual Prompts
 - Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach
 - Understanding Nonlinear Implicit Bias via Region Counts in Input Space
 - Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods
 - Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More
 - Understanding Synthetic Context Extension via Retrieval Heads
 - Understanding the difficulties of posterior predictive estimation
 - Understanding the Emergence of Multimodal Representation Alignment
 - Understanding the Forgetting of (Replay-based) Continual Learning via Feature Learning: Angle Matters
 - Understanding the Kronecker Matrix-Vector Complexity of Linear Algebra
 - Understanding the Limits of Deep Tabular Methods with Temporal Shift
 - Understanding the Logic of Direct Preference Alignment through Logic
 - Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
 - Understanding the Statistical Accuracy-Communication Trade-off in Personalized Federated Learning with Minimax Guarantees
 - Understanding the Unfairness in Network Quantization
 - UnHiPPO: Uncertainty-aware Initialization for State Space Models
 - UniDB: A Unified Diffusion Bridge Framework via Stochastic Optimal Control
 - Unifews: You Need Fewer Operations for Efficient Graph Neural Networks
 - Unified Analysis of Continuous Weak Features Learning with Applications to Learning from Missing Data
 - Unified Breakdown Analysis for Byzantine Robust Gossip
 - Unified K-Means Clustering with Label-Guided Manifold Learning
 - Unified Screening for Multiple Diseases
 - Uniform Mean Estimation for Heavy-Tailed Distributions via Median-of-Means
 - Unifying 2D and 3D Vision-Language Understanding
 - Unifying Knowledge from Diverse Datasets to Enhance Spatial-Temporal Modeling: A Granularity-Adaptive Geographical Embedding Approach
 - Unifying Specialized Visual Encoders for Video Language Models
 - UniMate: A Unified Model for Mechanical Metamaterial Generation, Property Prediction, and Condition Confirmation
 - UniMC: Taming Diffusion Transformer for Unified Keypoint-Guided Multi-Class Image Generation
 - UniMoMo: Unified Generative Modeling of 3D Molecules for De Novo Binder Design
 - UniSim: A Unified Simulator for Time-Coarsened Dynamics of Biomolecules
 - Unisolver: PDE-Conditional Transformers Towards Universal Neural PDE Solvers
 - Unisoma: A Unified Transformer-based Solver for Multi-Solid Systems
 - Universal Approximation of Mean-Field Models via Transformers
 - Universal Approximation Theorem of Deep Q-Networks
 - Universal Biological Sequence Reranking for Improved De Novo Peptide Sequencing
 - Universal Length Generalization with Turing Programs
 - Universal Neural Optimal Transport
 - Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
 - Unlocking Post-hoc Dataset Inference with Synthetic Data
 - Unlocking the Capabilities of Large Vision-Language Models for Generalizable and Explainable Deepfake Detection
 - Unlocking the Power of Rehearsal in Continual Learning: A Theoretical Perspective
 - Unlocking the Power of SAM 2 for Few-Shot Segmentation
 - unMORE: Unsupervised Multi-Object Segmentation via Center-Boundary Reasoning
 - Unnatural Languages Are Not Bugs but Features for LLMs
 - Unpaired Point Cloud Completion via Unbalanced Optimal Transport
 - Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments
 - Unsupervised Learning for Class Distribution Mismatch
 - Unveiling AI's Blind Spots: An Oracle for In-Domain, Out-of-Domain, and Adversarial Errors
 - Unveiling Markov heads in Pretrained Language Models for Offline Reinforcement Learning
 - Upcycling Text-to-Image Diffusion Models for Multi-Task Capabilities
 - Update Your Transformer to the Latest Release: Re-Basin of Task Vectors
 - UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent
 - Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting
 - Validating Mechanistic Interpretations: An Axiomatic Approach
 - Value-Based Deep RL Scales Predictably
 - Variance as a Catalyst: Efficient and Transferable Semantic Erasure Adversarial Attack for Customized Diffusion Models
 - Variance-Reduced Forward-Reflected-Backward Splitting Methods for Nonmonotone Generalized Equations
 - Variational Control for Guidance in Diffusion Models
 - Variational Counterfactual Intervention Planning to Achieve Target Outcomes
 - Variational Learning of Fractional Posteriors
 - Variational Phylogenetic Inference with Products over Bipartitions
 - Variational Rectified Flow Matching
 - VCT: Training Consistency Models with Variational Noise Coupling
 - Vector Grimoire: Codebook-based Shape Generation under Raster Image Supervision
 - VerbalTS: Generating Time Series from Texts
 - Verification Learning: Make Unsupervised Neuro-Symbolic System Feasible
 - VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data
 - Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach
 - VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models
 - Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
 - VideoRoPE: What Makes for Good Video Rotary Position Embedding?
 - video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
 - VinePPO: Refining Credit Assignment in RL Training of LLMs
 - Vintix: Action Model via In-Context Reinforcement Learning
 - VIP: Vision Instructed Pre-training for Robotic Manipulation
 - Vision Graph Prompting via Semantic Low-Rank Decomposition
 - Vision-Language Models Create Cross-Modal Task Representations
 - Vision-Language Model Selection and Reuse for Downstream Adaptation
 - VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters
 - Visual Abstraction: A Plug-and-Play Approach for Text-Visual Retrieval
 - Visual and Domain Knowledge for Professional-level Graph-of-Thought Medical Reasoning
 - Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models
 - Visual Autoregressive Modeling for Image Super-Resolution
 - Visual Generation Without Guidance
 - Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models
 - ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy
 - Volume-Aware Distance for Robust Similarity Learning
 - Volume Optimality in Conformal Prediction with Structured Prediction Sets
 - Voronoi-grid-based Pareto Front Learning and Its Application to Collaborative Federated Learning
 - VTGaussian-SLAM: RGBD SLAM for Large Scale Scenes with Splatting View-Tied 3D Gaussians
 - Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning
 - Wait-Less Offline Tuning and Re-solving for Online Decision Making
 - Wasserstein Flow Matching: Generative Modeling Over Families of Distributions
 - Wasserstein Policy Optimization
 - WATCH: Adaptive Monitoring for AI Deployments via Weighted-Conformal Martingales
 - Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models
 - WAVE: Weighted Autoregressive Varying Gate for Time Series Forecasting
 - Weakly Supervised Anomaly Detection via Dual-Tailed Kernel
 - Weakly-Supervised Contrastive Learning for Imprecise Class Labels
 - Weak-to-Strong Generalization Even in Random Feature Networks, Provably
 - Weak-to-Strong Jailbreaking on Large Language Models
 - WeGeFT: Weight‑Generative Fine‑Tuning for Multi‑Faceted Efficient Adaptation of Large Models
 - Weight matrices compression based on PDB model in deep neural networks
 - Weisfeiler and Leman Go Gambling: Why Expressive Lottery Tickets Win
 - WGFormer: An SE(3)-Transformer Driven by Wasserstein Gradient Flows for Molecular Ground-State Conformation Prediction
 - What can large language models do for sustainable food?
 - What Do Learning Dynamics Reveal About Generalization in LLM Mathematical Reasoning?
 - What Has a Foundation Model Found? Inductive Bias Reveals World Models
 - What If We Recaption Billions of Web Images with LLaMA-3?
 - What Limits Bidirectional Model's Generative Capabilities? A Uni-Bi-Directional Mixture-of-Expert Method For Bidirectional Fine-tuning
 - What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities
 - What Makes a Good Feedforward Computational Graph?
 - What makes an Ensemble (Un) Interpretable?
 - What Makes In-context Learning Effective for Mathematical Reasoning
 - What to optimize for – from robot arms to frontier AI - Anca Dragan
 - When and How Does CLIP Enable Domain and Compositional Generalization?
 - When Bad Data Leads to Good Models
 - When can in-context learning generalize out of task distribution?
 - When Can Proxies Improve the Sample Complexity of Preference Learning?
 - When Data-Free Knowledge Distillation Meets Non-Transferable Teacher: Escaping Out-of-Distribution Trap is All You Need
 - When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets
 - When Do LLMs Help With Node Classification? A Comprehensive Analysis
 - When do neural networks learn world models?
 - When Dynamic Data Selection Meets Data Augmentation: Achieving Enhanced Training Acceleration
 - When Every Millisecond Counts: Real-Time Anomaly Detection via the Multimodal Asynchronous Hybrid Network
 - When Maximum Entropy Misleads Policy Optimization
 - When Model Knowledge meets Diffusion Model: Diffusion-assisted Data-free Image Synthesis with Alignment of Domain and Class
 - When to Forget? Complexity Trade-offs in Machine Unlearning
 - When to retrain a machine learning model
 - When, Where and Why to Average Weights?
 - When Will It Fail?: Anomaly to Prompt for Forecasting Future Anomalies in Time Series
 - Where is the Truth? The Risk of Getting Confounded in a Continual World
 - Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
 - Which Attention Heads Matter for In-Context Learning?
 - Whitened CLIP as a Likelihood Surrogate of Images and Captions
 - Whoever Started the interference Should End It: Guiding Data-Free Model Merging via Task Vectors
 - "Who experiences large model decay and why?" A Hierarchical Framework for Diagnosing Heterogeneous Performance Drift
 - Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
 - Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas
 - "Why Is There a Tumor?": Tell Me the Reason, Show Me the Evidence
 - Widening the Network Mitigates the Impact of Data Heterogeneity on FedAvg
 - WikiBigEdit: Understanding the Limits of Lifelong Knowledge Editing in LLMs
 - WildChat-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training
 - WILTing Trees: Interpreting the Distance Between MPNN Embeddings
 - Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
 - Winner-takes-all for Multivariate Probabilistic Time Series Forecasting
 - WMAdapter: Adding WaterMark Control to Latent Diffusion Models
 - WMarkGPT: Watermarked Image Understanding via Multimodal Large Language Models
 - Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning
 - WOMD-Reasoning: A Large-Scale Dataset for Interaction Reasoning in Driving
 - Workshop on Computer Use Agents
 - Workshop on Multi-modal Foundation Models and Large Language Models for Life Sciences
 - Workshop on Technical AI Governance
 - World Model Implanting for Test-time Adaptation of Embodied Agents
 - WorldSimBench: Towards Video Generation Models as World Simulators
 - Wrapped Gaussian on the manifold of Symmetric Positive Definite Matrices
 - WyckoffDiff -- A Generative Diffusion Model for Crystal Symmetry
 - Wyckoff Transformer: Generation of Symmetric Crystals
 - XAttention: Block Sparse Attention with Antidiagonal Scoring
 - XAttnMark: Learning Robust Audio Watermarking with Cross-Attention
 - X-Hacking: The Threat of Misguided AutoML
 - xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
 - X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
 - You Always Recognize Me (YARM): Robust Texture Synthesis Against Multi-View Corruption
 - You Get What You Give: Reciprocally Fair Federated Learning
 - Zebra: In-Context Generative Pretraining for Solving Parametric PDEs
 - ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
 - ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think
 - Zero-Inflated Bandits
 - Zero-Shot Adaptation of Parameter-Efficient Fine-Tuning in Diffusion Models
 - Zero-Shot Cyclic Peptide Design via Composable Geometric Constraints
 - Zero-Shot Generalization of GNNs over Distinct Attribute Domains
 - Zero Shot Generalization of Vision-Based RL Without Data Augmentation
 - Zero-shot Meta-learning for Tabular Prediction Tasks with Adversarially Pre-trained Transformer
 - Zero-Shot Offline Imitation Learning via Optimal Transport
 - ZipAR: Parallel Autoregressive Image Generation through Spatial Locality
 - µnit Scaling: Simple and Scalable FP8 LLM Training
 
Successful Page Load