Downloads 2024
            Number of events: 2681
        
    
    - $\bf{\Phi}_\textrm{Flow}$: Differentiable Simulations for PyTorch, TensorFlow and Jax
 - $f$-Divergence Based Classification: Beyond the Use of Cross-Entropy
 - $H$-Consistency Guarantees for Regression
 - $\mathtt{VITS}$ : Variational Inference Thompson Sampling for contextual bandits
 - ${\rm E}(3)$-Equivariant Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning
 - $S^2$IP-LLM: Semantic Space Informed Prompt Learning with LLM for Time Series Forecasting
 - $\texttt{MoE-RBench}$: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
 - 1st ICML Workshop on In-Context Learning (ICL @ ICML 2024)
 - 2nd Workshop on Advancing Neural Network Training : Computational Efficiency, Scalability, and Resource Optimization (WANT@ICML 2024)
 - 2nd Workshop on Generative AI and Law (GenLaw ’24)
 - 3D Geometric Shape Assembly via Efficient Point Cloud Matching
 - 3D-VLA: A 3D Vision-Language-Action Generative World Model
 - A2Q+: Improving Accumulator-Aware Weight Quantization
 - A3S: A General Active Clustering Method with Pairwise Constraints
 - A Bayesian Approach to Online Planning
 - A Bias-Variance-Covariance Decomposition of Kernel Scores for Generative Models
 - Absolute Policy Optimization: Enhancing Lower Probability Bound of Performance with High Confidence
 - Accelerated Algorithms for Constrained Nonconvex-Nonconcave Min-Max Optimization and Comonotone Inclusion
 - Accelerated Policy Gradient for s-rectangular Robust MDPs with Large State Spaces
 - Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning
 - Accelerated Speculative Sampling Based on Tree Monte Carlo
 - Accelerating Convergence in Bayesian Few-Shot Classification
 - Accelerating Convergence of Score-Based Diffusion Models, Provably
 - Accelerating Federated Learning with Quick Distributed Mean Estimation
 - Accelerating Heterogeneous Federated Learning with Closed-form Classifiers
 - Accelerating Iterative Retrieval-augmented Language Model Serving with Speculation
 - Accelerating Legacy Numerical Solvers by Non-intrusive Gradient-based Meta-solving
 - Accelerating Look-ahead in Bayesian Optimization: Multilevel Monte Carlo is All you Need
 - Accelerating Parallel Sampling of Diffusion Models
 - Accelerating PDE Data Generation via Differential Operator Action in Solution Space
 - Accelerating Transformer Pre-training with 2:4 Sparsity
 - Accessible and Efficient Foundation Models for Biological Discovery
 - Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
 - ACE: Off-Policy Actor-Critic with Causality-Aware Entropy Regularization
 - Achieving Lossless Gradient Sparsification via Mapping to Alternative Space in Federated Learning
 - Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling
 - A Circuit Domain Generalization Framework for Efficient Logic Synthesis in Chip Design
 - A Closer Look at the Limitations of Instruction Tuning
 - ACM-MILP: Adaptive Constraint Modification via Grouping and Selection for Hardness-Preserving MILP Instance Generation
 - A Computational Framework for Solving Wasserstein Lagrangian Flows
 - A connection between Tempering and Entropic Mirror Descent
 - A Contextual Combinatorial Bandit Approach to Negotiation
 - ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints
 - Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts
 - Acquisition Conditioned Oracle for Nongreedy Active Feature Acquisition
 - Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
 - Activation-Descent Regularization for Input Optimization of ReLU Networks
 - Active Adaptive Experimental Design for Treatment Effect Estimation with Covariate Choice
 - Active Label Correction for Semantic Segmentation with Foundation Models
 - Active Preference Learning for Large Language Models
 - Active Ranking and Matchmaking, with Perfect Matchings
 - Active Statistical Inference
 - AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors
 - Adapt and Diffuse: Sample-adaptive Reconstruction via Latent Diffusion Models
 - Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control
 - Adapting Static Fairness to Sequential Decision-Making: Bias Mitigation Strategies towards Equal Long-term Benefit Rate
 - Adaptive Accompaniment with ReaLchords
 - Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning
 - Adaptive Conformal Inference by Betting
 - Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity
 - Adaptive-Gradient Policy Optimization: Enhancing Policy Learning in Non-Smooth Differentiable Simulations
 - Adaptive Group Personalization for Federated Mutual Transfer Learning
 - Adaptive Hierarchical Certification for Segmentation using Randomized Smoothing
 - Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation
 - Adaptive Learning of Density Ratios in RKHS
 - Adaptively Learning to Select-Rank in Online Platforms
 - Adaptively Perturbed Mirror Descent for Learning in Games
 - Adaptive Observation Cost Control for Variational Quantum Eigensolvers
 - Adaptive Online Experimental Design for Causal Discovery
 - Adaptive Proximal Gradient Methods Are Universal Without Approximation
 - Adaptive Robust Learning using Latent Bernoulli Variables
 - Adaptive Sampling of k-Space in Magnetic Resonance for Rapid Pathology Prediction
 - Adaptive Stabilization Based on Machine Learning for Column Generation
 - Adaptive Text Watermark for Large Language Models
 - A decoder-only foundation model for time-series forecasting
 - A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
 - A Differentiable Partially Observable Generalized Linear Model with Forward-Backward Message Passing
 - A Diffusion Model Framework for Unsupervised Neural Combinatorial Optimization
 - A Distributional Analogue to the Successor Representation
 - A Doubly Recursive Stochastic Compositional Gradient Descent Method for Federated Multi-Level Compositional Optimization
 - AdsorbDiff: Adsorbate Placement via Conditional Denoising Diffusion
 - A Dual-module Framework for Counterfactual Estimation over Time
 - Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment
 - Advancing Dynamic Sparse Training by Exploring Optimization Opportunities
 - Adversarial Attacks on Combinatorial Multi-Armed Bandits
 - Adversarially Robust Deep Multi-View Clustering: A Novel Attack and Defense Framework
 - Adversarially Robust Hypothesis Transfer Learning
 - Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies
 - A Dynamic Algorithm for Weighted Submodular Cover Problem
 - A Dynamical Model of Neural Scaling Laws
 - AegisFL: Efficient and Flexible Privacy-Preserving Byzantine-Robust Cross-silo Federated Learning
 - A fast algorithm to simulate nonlinear resistive networks
 - A Federated Stochastic Multi-level Compositional Minimax Algorithm for Deep AUC Maximization
 - A Field Guide for Pacing Budget and ROS Constraints
 - A Fine-grained Analysis of Fitted Q-evaluation: Beyond Parametric Models
 - A Fixed-Point Approach for Causal Generative Modeling
 - A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks
 - A General Framework for Learning from Weak Supervision
 - A General Framework for Sequential Decision-Making under Adaptivity Constraints
 - A General Online Algorithm for Optimizing Complex Performance Metrics
 - A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts
 - A Generative Approach for Treatment Effect Estimation under Collider Bias: From an Out-of-Distribution Perspective
 - Agentic Markets Workshop
 - Agent Instructs Large Language Models to be General Zero-Shot Reasoners
 - Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
 - Agent-Specific Effects: A Causal Effect Propagation Analysis in Multi-Agent MDPs
 - A Geometric Decomposition of Finite Games: Convergence vs. Recurrence under Exponential Weights
 - A Geometric Explanation of the Likelihood OOD Detection Paradox
 - A Global Geometric Analysis of Maximal Coding Rate Reduction
 - Agnostic Interactive Imitation Learning: New Theory and Practical Algorithms
 - Agnostic Learning of Mixed Linear Regressions with EM and AM Algorithms
 - Agnostic Sample Compression Schemes for Regression
 - A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer
 - A Hierarchical Adaptive Multi-Task Reinforcement Learning Framework for Multiplier Circuit Design
 - A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
 - AI Alignment with Changing and Influenceable Reward Functions
 - AI Control: Improving Safety Despite Intentional Subversion
 - AI for Math Workshop
 - AI for Science: Scaling in AI for Scientific Discovery
 - Ai-sampler: Adversarial Learning of Markov kernels with involutive maps
 - A Language Model’s Guide Through Latent Space
 - ALERT-Transformer: Bridging Asynchronous and Synchronous Machine Learning for Real-Time Event-based Spatio-Temporal Data
 - Algorithm and Hardness for Dynamic Attention Maintenance in Large Language Models
 - Algorithmic Stability Unleashed: Generalization Bounds with Unbounded Losses
 - Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models
 - Aligned Objective for Soft-Pseudo-Label Generation in Supervised Learning
 - Aligning Reinforcement Learning Experimentalists and Theorists
 - Aligning Transformers with Weisfeiler-Leman
 - Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
 - A Linear Time and Space Local Point Cloud Geometry Encoder via Vectorized Kernel Mixture (VecKM)
 - All-in-one simulation-based inference
 - Allocation Requires Prediction Only if Inequality Is Low
 - AlphaFold Meets Flow Matching for Generating Protein Ensembles
 - AlphaZero-Like Tree-Search can Guide Large Language Model Decoding and Training
 - Ambiguity-Aware Abductive Learning
 - A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
 - Ameliorate Spurious Correlations in Dataset Condensation
 - Amend to Alignment: Decoupled Prompt Tuning for Mitigating Spurious Correlation in Vision-Language Models
 - A Minimaximalist Approach to Reinforcement Learning from Human Feedback
 - Amortized Equation Discovery in Hybrid Dynamical Systems
 - Amortized Variational Deep Kernel Learning
 - Amortizing Pragmatic Program Synthesis with Rankings
 - AMPA: Adaptive Mixed Precision Allocation for Low-Bit Integer Training
 - A Multimodal Automated Interpretability Agent
 - Analysis for Abductive Learning and Neural-Symbolic Reasoning Shortcuts
 - Analyzing $D^\alpha$ seeding for $k$-means
 - An amortized approach to non-linear mixed-effects modeling based on neural posterior estimation
 - An Analysis of Linear Time Series Forecasting Models
 - AND: Audio Network Dissection for Interpreting Deep Acoustic Models
 - A Near-Linear Time Approximation Algorithm for Beyond-Worst-Case Graph Clustering
 - A Nearly Optimal Single Loop Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness
 - An Effective Dynamic Gradient Calibration Method for Continual Learning
 - An Efficient Maximal Ancestral Graph Listing Algorithm
 - An Efficient Self-Learning Framework For Interactive Spoken Dialog Systems
 - An Embodied Generalist Agent in 3D World
 - An Empirical Examination of Balancing Strategy for Counterfactual Estimation on Time Series
 - An Empirical Study Into What Matters for Calibrating Vision-Language Models
 - An Empirical Study of Realized GNN Expressiveness
 - A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data
 - A Neural-Preconditioned Poisson Solver for Mixed Dirichlet and Neumann Boundary Conditions
 - A New Branch-and-Bound Pruning Framework for $\ell_0$-Regularized Problems
 - A New Computationally Efficient Algorithm to solve Feature Selection for Functional Data Classification in High-dimensional Spaces
 - A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization
 - A New Robust Partial p-Wasserstein-Based Metric for Comparing Distributions
 - A New Theoretical Perspective on Data Heterogeneity in Federated Optimization
 - An Explicit Frame Construction for Normalizing 3D Point Clouds
 - An Image is Worth Multiple Words: Discovering Object Level Concepts using Multi-Concept Prompt Learning
 - An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks
 - An Independence-promoting Loss for Music Generation with Language Models
 - An Infinite-Width Analysis on the Jacobian-Regularised Training of a Neural Network
 - An Information-Theoretic Analysis of In-Context Learning
 - An Information Theoretic Approach to Interaction-Grounded Learning
 - An Interpretable Evaluation of Entropy-based Novelty of Generative Models
 - An Intrinsic Vector Heat Network
 - An Iterative Min-Min Optimization Method for Sparse Bayesian Learning
 - An LLM Compiler for Parallel Function Calling
 - An Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization
 - Antibody Design Using a Score-based Diffusion Model Guided by Evolutionary, Physical and Geometric Constraints
 - An Unsupervised Approach for Periodic Source Detection in Time Series
 - Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
 - AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
 - A Persuasive Approach to Combating Misinformation
 - Applying language models to algebraic topology: generating simplicial cycles using multi-labeling in Wu's formula
 - Approximate Nearest Neighbor Search with Window Filters
 - A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs
 - A Probabilistic Approach to Learning the Degree of Equivariance in Steerable CNNs
 - A Provable Decision Rule for Out-of-Distribution Detection
 - A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts
 - APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
 - AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA
 - A Rate-Distortion View of Uncertainty Quantification
 - ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
 - A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models
 - Arrows of Time for Large Language Models
 - ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations
 - A sampling theory perspective on activations for implicit neural representations
 - A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models
 - A Single-Loop Robust Policy Gradient Method for Robust Markov Decision Processes
 - A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules?
 - A Space Group Symmetry Informed Network for O(3) Equivariant Crystal Tensor Prediction
 - A Sparsity Principle for Partially Observable Causal Representation Learning
 - Assessing Large Language Models on Climate Information
 - Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
 - A Statistical Framework for Data-dependent Retrieval-Augmented Models
 - A Statistical Theory of Regularization-Based Continual Learning
 - AST-T5: Structure-Aware Pretraining for Code Generation and Understanding
 - A Study of First-Order Methods with a Deterministic Relative-Error Gradient Oracle
 - A Subquadratic Time Algorithm for Robust Sparse Mean Estimation
 - Asymmetry in Low-Rank Adapters of Foundation Models
 - Asymptotically Optimal and Computationally Efficient Average Treatment Effect Estimation in A/B testing
 - Asymptotics of feature learning in two-layer networks after one gradient-step
 - Asymptotics of Learning with Deep Structured (Random) Features
 - A Tale of Tails: Model Collapse as a Change of Scaling Laws
 - A Tensor Decomposition Perspective on Second-order RNNs
 - A Theoretical Analysis of Backdoor Poisoning Attacks in Convolutional Neural Networks
 - A Theory of Fault-Tolerant Learning
 - A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks
 - A Touch, Vision, and Language Dataset for Multimodal Alignment
 - ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories
 - Attack-free Evaluating and Enhancing Adversarial Robustness on Categorical Data
 - Attention Meets Post-hoc Interpretability: A Mathematical Perspective
 - AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers
 - AttNS: Attention-Inspired Numerical Solving For Limited Data Scenarios
 - Attribute Based Interpretable Evaluation Metrics for Generative Models
 - Attribution-based Explanations that Provide Recourse Cannot be Robust
 - Auctionformer: A Unified Deep Learning Algorithm for Solving Equilibrium Strategies in Auction Games
 - Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
 - Auditing Private Prediction
 - Augmenting Decision with Hypothesis in Reinforcement Learning
 - A Unified Adaptive Testing System Enabled by Hierarchical Structure Search
 - A Unified Framework for Learning with Nonlinear Model Classes from Arbitrary Linear Samples
 - A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback
 - A Unified Recipe for Deriving (Time-Uniform) PAC-Bayes Bounds
 - A Unified View of FANOVA: A Comprehensive Bayesian Framework for Component Selection and Estimation
 - A Universal Class of Sharpness-Aware Minimization Algorithms
 - A Universal Transfer Theorem for Convex Optimization Algorithms Using Inexact First-order Oracles
 - Autaptic Synaptic Circuit Enhances Spatio-temporal Predictive Learning of Spiking Neural Networks
 - Autoencoding Conditional Neural Processes for Representation Learning
 - Auto-Encoding Morph-Tokens for Multimodal LLM
 - Autoformalizing Euclidean Geometry
 - Auto-Linear Phenomenon in Subsurface Imaging
 - Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation
 - Automated Loss function Search for Class-imbalanced Node Classification
 - Automated Reinforcement Learning: Exploring Meta-Learning, AutoML, and LLMs
 - Automated Statistical Model Discovery with Language Models
 - Automating the Selection of Proxy Variables of Unmeasured Confounders
 - Autonomous Sparse Mean-CVaR Portfolio Optimization
 - AutoOS: Make Your OS More Powerful by Exploiting Large Language Models
 - Auto-Regressive Next-Token Predictors are Universal Learners
 - Averaging $n$-step Returns Reduces Variance in Reinforcement Learning
 - BadPart: Unified Black-box Adversarial Patch Attacks against Pixel-wise Regression Tasks
 - BAGEL: Bootstrapping Agents by Guiding Exploration with Language
 - Bagged Deep Image Prior for Recovering Images in the Presence of Speckle Noise
 - Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance
 - Balanced Resonate-and-Fire Neurons
 - Balancing Feature Similarity and Label Variability for Optimal Size-Aware One-shot Subset Selection
 - Balancing Similarity and Complementarity for Federated Learning
 - Barrier Algorithms for Constrained Non-Convex Optimization
 - Batch and match: black-box variational inference with a score-based divergence
 - Batch Singular Value Polarization and Weighted Semantic Augmentation for Universal Domain Adaptation
 - BAT: Learning to Reason about Spatial Sounds with Large Language Models
 - Bayesian Adaptation of Network Depth and Width for Continual Learning
 - Bayesian Design Principles for Offline-to-Online Reinforcement Learning
 - Bayesian Exploration Networks
 - Bayesian Knowledge Distillation: A Bayesian Perspective of Distillation with Uncertainty Quantification
 - Bayesian Optimization of Function Networks with Partial Evaluations
 - Bayesian Power Steering: An Effective Approach for Domain Adaptation of Diffusion Models
 - Bayesian Program Learning by Decompiling Amortized Knowledge
 - Bayesian Regret Minimization in Offline Bandits
 - Bayesian Uncertainty for Gradient Aggregation in Multi-Task Learning
 - BayOTIDE: Bayesian Online Multivariate Time Series Imputation with Functional Decomposition
 - BBox-Adapter: Lightweight Adapting for Black-Box Large Language Models
 - BECoTTA: Input-dependent Online Blending of Experts for Continual Test-time Adaptation
 - Behavior Generation with Latent Actions
 - BeigeMaps: Behavioral Eigenmaps for Reinforcement Learning from Images
 - Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT
 - Benchmarking Deletion Metrics with the Principled Explanations
 - Benign Overfitting in Adversarial Training of Neural Networks
 - Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data
 - Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models
 - Best Arm Identification for Stochastic Rising Bandits
 - Best of Both Worlds Guarantees for Smoothed Online Quadratic Optimization
 - Better & Faster Large Language Models via Multi-token Prediction
 - Better Locally Private Sparse Estimation Given Multiple Samples Per User
 - Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks
 - BetterV: Controlled Verilog Generation with Discriminative Guidance
 - Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
 - Beyond ELBOs: A Large-Scale Evaluation of Variational Methods for Sampling
 - Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
 - Beyond Individual Input for Deep Anomaly Detection on Tabular Data
 - Beyond Point Prediction: Score Matching-based Pseudolikelihood Estimation of Neural Marked Spatio-Temporal Point Process
 - Beyond Regular Grids: Fourier-Based Neural Operators on Arbitrary Domains
 - Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models
 - Beyond the Calibration Point: Mechanism Comparison in Differential Privacy
 - Beyond the Federation: Topology-aware Federated Learning for Generalization to Unseen Clients
 - Beyond the Norms: Detecting Prediction Errors in Regression Models
 - Beyond the ROC Curve: Classification Trees Using Cost-Optimal Curves, with Application to Imbalanced Datasets
 - Be Your Own Neighborhood: Detecting Adversarial Examples by the Neighborhood Relations Built on Self-Supervised Learning
 - Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks
 - Bidirectional Reciprocative Information Communication for Few-Shot Semantic Segmentation
 - BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization
 - Bifurcated Attention for Single-Context Large-Batch Sampling
 - Biharmonic Distance of Graphs and its Higher-Order Variants: Theoretical Properties with Applications to Centrality and Clustering
 - BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
 - Binary Decomposition: A Problem Transformation Perspective for Open-Set Semi-Supervised Learning
 - Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains
 - Bipartite Matching in Massive Graphs: A Tight Analysis of EDCS
 - BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model
 - Bivariate Causal Discovery using Bayesian Model Selection
 - Block Acceleration Without Momentum: On Optimal Stepsizes of Block Gradient Descent for Least-Squares
 - BLO-SAM: Bi-level Optimization Based Finetuning of the Segment Anything Model for Overfitting-Preventing Semantic Segmentation
 - Boosting Offline Optimizers with Surrogate Sensitivity
 - Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays
 - Bootstrap AutoEncoders With Contrastive Paradigm for Self-supervised Gaze Estimation
 - Bootstrapping Fisher Market Equilibrium and First-Price Pacing Equilibrium
 - Borda Regret Minimization for Generalized Linear Dueling Bandits
 - BOtied: Multi-objective Bayesian optimization with tied multivariate ranks
 - Bottleneck-Minimal Indexing for Generative Document Retrieval
 - Boundary Exploration for Bayesian Optimization With Unknown Physical Constraints
 - Bounded and Uniform Energy-based Out-of-distribution Detection for Graphs
 - Bounding the Excess Risk for Linear Models Trained on Marginal-Preserving, Differentially-Private, Synthetic Data
 - Box Facets and Cut Facets of Lifted Multicut Polytopes
 - Boximator: Generating Rich and Controllable Motions for Video Synthesis
 - BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback
 - Breadth-First Exploration on Adaptive Grid for Reinforcement Learning
 - Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents
 - Breaking through the learning plateaus of in-context learning in Transformer
 - Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
 - Bridging Data Gaps in Diffusion Models with Adversarial Noise-Based Transfer Learning
 - Bridging discrete and continuous state spaces: Exploring the Ehrenfest process in time-continuous diffusion models
 - Bridging Environments and Language with Rendering Functions and Vision-Language Models
 - Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses
 - Bridging Model Heterogeneity in Federated Learning via Uncertainty-based Asymmetrical Reciprocity Learning
 - Bringing Motion Taxonomies to Continuous Domains via GPLVM on Hyperbolic manifolds
 - Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel
 - Building Socially-Equitable Public Models
 - BWS: Best Window Selection Based on Sample Scores for Data Pruning across Broad Ranges
 - ByMI: Byzantine Machine Identification with False Discovery Rate Control
 - By Tying Embeddings You Are Assuming the Distributional Hypothesis
 - Byzantine Resilient and Fast Federated Few-Shot Learning
 - Byzantine-Robust Federated Learning: Impact of Client Subsampling and Local Updates
 - Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
 - Calibration Bottleneck: Over-compressed Representations are Less Calibratable
 - CaM: Cache Merging for Memory-efficient LLMs Inference
 - Can a Few Decide for Many? The Metric Distortion of Sortition
 - Can AI Assistants Know What They Don't Know?
 - Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data
 - Can Gaussian Sketching Converge Faster on a Preconditioned Landscape?
 - Can Implicit Bias Imply Adversarial Robustness?
 - Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
 - Can Machines Learn the True Probabilities?
 - Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks
 - Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
 - CaPS: Collaborative and Private Synthetic Data Generation from Distributed Sources
 - CarbonNovo: Joint Design of Protein Structure and Sequence Using a Unified Energy-based Model
 - Careful with that Scalpel: Improving Gradient Surgery with an EMA
 - CaRiNG: Learning Temporal Causal Representation under Non-Invertible Generation Process
 - CARTE: Pretraining and Transfer for Tabular Learning
 - Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
 - CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling
 - Case-Based or Rule-Based: How Do Transformers Do the Math?
 - Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
 - Category-Aware Active Domain Adaptation
 - CATS: Enhancing Multivariate Time Series Forecasting by Constructing Auxiliary Time Series as Exogenous Variables
 - CauDiTS: Causal Disentangled Domain Adaptation of Multivariate Time Series
 - Causal Action Influence Aware Counterfactual Data Augmentation
 - Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals
 - Causal Customer Churn Analysis with Low-rank Tensor Block Hazard Model
 - Causal Discovery via Conditional Independence Testing with Proxy Variables
 - Causal Discovery with Fewer Conditional Independence Tests
 - Causal Effect Identification in LiNGAM Models with Latent Confounders
 - Causal Inference from Competing Treatments
 - Causal Inference out of Control: Estimating Performativity without Treatment Randomization
 - Causal-IQA: Towards the Generalization of Image Quality Assessment Based on Causal Inference
 - Causality Based Front-door Defense Against Backdoor Attack on Language Models
 - Causally Motivated Personalized Federated Invariant Learning with Shortcut-Averse Information-Theoretic Regularization
 - Causal Representation Learning from Multiple Distributions: A General Setting
 - Causal Representation Learning Made Identifiable by Grouping of Observational Variables
 - CCM: Real-Time Controllable Visual Content Creation Using Text-to-Image Consistency Models
 - Cell2Sentence: Teaching Large Language Models the Language of Biology
 - Centralized Selection with Preferences in the Presence of Biases
 - Certifiably Byzantine-Robust Federated Conformal Prediction
 - CF-OPT: Counterfactual Explanations for Structured Prediction
 - CHAI: Clustered Head Attention for Efficient LLM Inference
 - Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
 - Chain-of-Thought Predictive Control
 - Challenges and Considerations in the Evaluation of Bayesian Causal Discovery
 - Challenges in Language Model Evaluations
 - Challenges in Training PINNs: A Loss Landscape Perspective
 - Characteristic Guidance: Non-linear Correction for Diffusion Model at Large Guidance Scale
 - Characterizing Large Language Model Geometry Helps Solve Toxicity Detection and Generation
 - Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum
 - Characterizing ResNet's Universal Approximation Capability
 - Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension
 - Chasing Convex Functions with Long-term Constraints
 - Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
 - CHEMREASONER: Heuristic Search over a Large Language Model’s Knowledge Space using Quantum-Chemical Feedback
 - CKGConv: General Graph Convolution with Continuous Kernels
 - Classification under Nuisance Parameters and Generalized Label Shift in Likelihood-Free Inference
 - Classification Under Strategic Self-Selection
 - Class-Imbalanced Graph Learning without Class Rebalancing
 - CLIF: Complementary Leaky Integrate-and-Fire Neuron for Spiking Neural Networks
 - Clifford-Steerable Convolutional Neural Networks
 - CLIPZyme: Reaction-Conditioned Virtual Screening of Enzymes
 - CLLMs: Consistency Large Language Models
 - Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization
 - Cluster-Aware Similarity Diffusion for Instance Retrieval
 - Clustered Federated Learning via Gradient-based Partitioning
 - Coactive Learning for Large Language Models using Implicit User Feedback
 - COALA: A Practical and Vision-Centric Federated Learning Platform
 - Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models
 - Coarse-To-Fine Tensor Trains for Compact Visual Representations
 - Code as Reward: Empowering Reinforcement Learning with VLMs
 - Codebook Features: Sparse and Discrete Interpretability for Neural Networks
 - CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay
 - CogBench: a large language model walks into a psychology lab
 - CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding
 - COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
 - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis
 - Collaborative Learning with Different Labeling Functions
 - Collage: Light-Weight Low-Precision Strategy for LLM Training
 - Collapse-Aware Triplet Decoupling for Adversarially Robust Image Retrieval
 - Collective Certified Robustness against Graph Injection Attacks
 - CoLoRA: Continuous low-rank adaptation for reduced implicit neural modeling of parameterized partial differential equations
 - Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better
 - Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond
 - Combining Experimental and Historical Data for Policy Evaluation
 - Community-Invariant Graph Contrastive Learning
 - Compact Optimality Verification for Optimization Proxies
 - Comparing Graph Transformers via Positional Encodings
 - CompeteAI: Understanding the Competition Dynamics of Large Language Model-based Agents
 - Completing Visual Objects via Bridging Generation and Segmentation
 - Complexity Matters: Feature Learning in the Presence of Spurious Correlations
 - Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
 - Compositional Curvature Bounds for Deep Neural Networks
 - Compositional Few-Shot Class-Incremental Learning
 - Compositional Image Decomposition with Diffusion Models
 - Compositional Text-to-Image Generation with Dense Blob Representations
 - Compress Clean Signal from Noisy Raw Image: A Self-Supervised Approach
 - Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation
 - Compressing Large Language Models by Joint Sparsification and Quantization
 - Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth
 - Compute Better Spent: Replacing Dense Layers with Structured Matrices
 - Concentration Inequalities for General Functions of Heavy-Tailed Random Variables
 - Conditional Common Entropy for Instrumental Variable Testing and Partial Identification
 - Conditional Language Learning with Context
 - Conditionally-Conjugate Gaussian Process Factor Analysis for Spike Count Data via Data Augmentation
 - Conditional Normalizing Flows for Active Learning of Coarse-Grained Molecular Representations
 - Confidence-aware Contrastive Learning for Selective Classification
 - Confidence Aware Inverse Constrained Reinforcement Learning
 - Configurable Mirror Descent: Towards a Unification of Decision Making
 - Conformalized Adaptive Forecasting of Heterogeneous Trajectories
 - Conformalized Survival Distributions: A Generic Post-Process to Increase Calibration
 - Conformal Prediction for Deep Classifier via Label Ranking
 - Conformal prediction for multi-dimensional time series by ellipsoidal sets
 - Conformal Prediction Sets Improve Human Decision Making
 - Conformal Predictions under Markovian Data
 - Conformal Prediction with Learned Features
 - Conformal Validity Guarantees Exist for Any Data Distribution (and How to Find Them)
 - Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases
 - Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models
 - Connecting the Dots: Is Mode-Connectedness the Key to Feasible Sample-Based Inference in Bayesian Neural Networks?
 - Connect Later: Improving Fine-tuning for Robustness with Targeted Augmentations
 - Consistent Adversarially Robust Linear Classification: Non-Parametric Setting
 - Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data
 - Consistent Long-Term Forecasting of Ergodic Dynamical Systems
 - Consistent Submodular Maximization
 - Constrained Ensemble Exploration for Unsupervised Skill Discovery
 - Constrained Exploration via Reflected Replica Exchange Stochastic Gradient Langevin Dynamics
 - Constrained Reinforcement Learning Under Model Mismatch
 - Contamination-Resilient Anomaly Detection via Adversarial Learning on Partially-Observed Normal and Anomalous Data
 - Context-Guided Diffusion for Out-of-Distribution Molecular and Protein Design
 - ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models
 - Contextual Feature Selection with Conditional Stochastic Gates
 - Contextualized Policy Recovery: Modeling and Interpreting Medical Decisions with Adaptive Imitation Learning
 - Continuous Treatment Effects with Surrogate Outcomes
 - ContPhy: Continuum Physical Concept Learning and Reasoning from Videos
 - Contrasting Multiple Representations with the Multi-Marginal Matching Gap
 - Contrastive Learning for Clinical Outcome Prediction with Partial Data Sources
 - Contrastive Predict-and-Search for Mixed Integer Linear Programs
 - Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
 - Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning
 - Controllable Prompt Tuning For Balancing Group Distributional Robustness
 - Controlled Decoding from Language Models
 - Controlling Behavioral Diversity in Multi-Agent Reinforcement Learning
 - Convergence and Complexity Guarantee for Inexact First-order Riemannian Optimization Algorithms
 - Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point
 - Convergence Guarantees for the DeepWalk Embedding on Block Models
 - Convergence of Online Learning Algorithm for a Mixture of Multiple Linear Regressions
 - Convergence of Some Convex Message Passing Algorithms to a Fixed Point
 - Converting Transformers to Polynomial Form for Secure Inference Over Homomorphic Encryption
 - Convex Analysis at Infinity: An Introduction to Astral Space
 - Convex and Bilevel Optimization for Neural-Symbolic Inference and Learning
 - Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time
 - ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy
 - convSeq: Fast and Scalable Method for Detecting Patterns in Spike Data
 - Cooperative Graph Neural Networks
 - COPAL: Continual Pruning in Large Language Generative Models
 - Coprocessor Actor Critic: A Model-Based Reinforcement Learning Approach For Adaptive Brain Stimulation
 - Copula-Nested Spectral Kernel Network
 - Copyright Traps for Large Language Models
 - Coresets for Multiple $\ell_p$ Regression
 - Correcting Diffusion-Based Perceptual Image Compression with Privileged End-to-End Decoder
 - Correlation-Induced Label Prior for Semi-Supervised Multi-Label Learning
 - CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasks
 - Counterfactual Image Editing
 - Counterfactual Metarules for Local and Global Recourse
 - Counterfactual Reasoning for Multi-Label Image Classification via Patching-Based Training
 - Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
 - Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning
 - C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models
 - Creative Text-to-Audio Generation via Synthesizer Programming
 - Criterion Collapse and Loss Distribution Control
 - Critical feature learning in deep neural networks
 - Critical windows: non-asymptotic theory for feature emergence in diffusion models
 - CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection
 - Cross-domain Open-world Discovery
 - Cross-Domain Policy Adaptation by Capturing Representation Mismatch
 - CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
 - Cross-view Masked Diffusion Transformers for Person Image Synthesis
 - CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
 - Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes
 - CurBench: Curriculum Learning Benchmark
 - CuTS: Customizable Tabular Synthetic Data Generation
 - CW Complex Hypothesis for Image Data
 - DAG-Based Column Generation for Adversarial Team Games
 - Data Attribution at Scale
 - Data-centric Machine Learning Research (DMLR): Datasets for Foundation Models
 - Data-efficient Large Vision Models through Sequential Autoregression
 - Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond
 - Data-Efficient Molecular Generation with Hierarchical Textual Inversion
 - Data Engineering for Scaling Language Models to 128K Context
 - Data-free Distillation of Diffusion Models with Bootstrapping
 - Data-free Neural Representation Compression with Riemannian Neural Dynamics
 - DataFreeShield: Defending Adversarial Attacks without Training Data
 - Data Poisoning Attacks against Conformal Prediction
 - Dealing With Unbounded Gradients in Stochastic Saddle-point Optimization
 - Debating with More Persuasive LLMs Leads to More Truthful Answers
 - Debiased Distribution Compression
 - Debiased Offline Representation Learning for Fast Online Adaptation in Non-stationary Dynamics
 - Decentralized Convex Finite-Sum Optimization with Better Dependence on Condition Numbers
 - Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook Matching Perspective
 - DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning
 - Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
 - Decoding-time Realignment of Language Models
 - Decomposable Submodular Maximization in Federated Setting
 - Decomposed Linear Dynamical Systems (dLDS) for learning the latent components of neural dynamics
 - Decomposing and Editing Predictions by Modeling Model Computation
 - Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling
 - Deconstructing the Goldilocks Zone of Neural Network Initialization
 - DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection
 - DE-COP: Detecting Copyrighted Content in Language Models Training Data
 - Decouple then Classify: A Dynamic Multi-view Labeling Strategy with Shared and Specific Information
 - Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks
 - Decoupling Learning and Decision-Making: Breaking the $\mathcal{O}(\sqrt{T})$ Barrier in Online Resource Allocation with First-Order Methods
 - Deep Demonstration Tracing: Learning Generalizable Imitator Policy for Runtime Imitation from a Single Demonstration
 - Deep Equilibrium Models are Almost Equivalent to Not-so-deep Explicit Models for High-dimensional Gaussian Mixtures
 - Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev Loss
 - Deep Functional Factor Models: Forecasting High-Dimensional Functional Time Series via Bayesian Nonparametric Factorization
 - Deep Fusion: Efficient Network Training via Pre-trained Initializations
 - Deep Networks Always Grok and Here is Why
 - Deep Neural Room Acoustics Primitive
 - DeepPolar: Inventing Nonlinear Large-Kernel Polar Codes via Deep Learning
 - Deep Regression Representation Learning with Topology
 - Deep Stochastic Mechanics
 - Defense against Backdoor Attack on Pre-trained Language Models via Head Pruning and Attention Normalization
 - Defense against Model Extraction Attack by Bayesian Active Watermarking
 - Defining Neural Network Architecture through Polytope Structures of Datasets
 - Degeneration-free Policy Optimization: RL Fine-Tuning for Language Models without Degeneration
 - DéjàVu: KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving
 - Delaunay Graph: Addressing Over-Squashing and Over-Smoothing Using Delaunay Triangulation
 - Deletion-Anticipative Data Selection with a Limited Budget
 - Delving into Differentially Private Transformer
 - Delving into the Convergence of Generalized Smooth Minimax Optimization
 - Demystifying SGD with Doubly Stochastic Gradients
 - Denoising Autoregressive Representation Learning
 - Dense Reward for Free in Reinforcement Learning from Human Feedback
 - Density Ratio Estimation with Doubly Strong Robustness
 - Density-Softmax: Efficient Test-time Model for Uncertainty Estimation and Robustness under Distribution Shifts
 - Designing Decision Support Systems using Counterfactual Prediction Sets
 - Detecting and Identifying Selection Structure in Sequential Data
 - Detecting Any instruction-to-answer interaction relationship:Universal Instruction-to-Answer Navigator for Med-VQA
 - Detecting Influence Structures in Multi-Agent Reinforcement Learning
 - DetKDS: Knowledge Distillation Search for Object Detectors
 - DFA-RAG: Conversational Semantic Router for Large Language Model with Definite Finite Automaton
 - DFD: Distilling the Feature Disparity Differently for Detectors
 - DFlow: A Generative Model Combining Denoising AutoEncoder and Normalizing Flow for High Fidelity Waveform Generation
 - D-Flow: Differentiating through Flows for Controlled Generation
 - Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
 - DIDI: Diffusion-Guided Diversity for Offline Behavioral Generation
 - DiffAug: Enhance Unsupervised Contrastive Learning with Domain-Knowledge-Free Diffusion-based Data Augmentation
 - DiffDA: a Diffusion model for weather-scale Data Assimilation
 - Differentiability and Optimization of Multiparameter Persistent Homology
 - Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators
 - Differentiable Annealed Importance Sampling Minimizes The Jensen-Shannon Divergence Between Initial and Target Distribution
 - Differentiable Combinatorial Scheduling at Scale
 - Differentiable Distributionally Robust Optimization Layers
 - Differentiable Mapper for Topological Optimization of Data Representation
 - Differentiable Model Scaling using Differentiable Topk
 - Differentiable Weightless Neural Networks
 - Differentially Private Bias-Term Fine-tuning of Foundation Models
 - Differentially Private Decentralized Learning with Random Walks
 - Differentially Private Domain Adaptation with Theoretical Guarantees
 - Differentially private exact recovery for stochastic block models
 - Differentially Private Post-Processing for Fair Regression
 - Differentially Private Representation Learning via Image Captioning
 - Differentially Private Sum-Product Networks
 - Differentially Private Synthetic Data via Foundation Model APIs 2: Text
 - Differentially Private Worst-group Risk Minimization
 - DiffFPR: Diffusion Prior for Oversampled Fourier Phase Retrieval
 - diff History for Neural Language Agents
 - DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching
 - Diffuse, Sample, Project: Plug-And-Play Controllable Graph Generation
 - Diffusion-based Missing-view Generation With the Application on Incomplete Multi-view Clustering
 - Diffusion Language Models Are Versatile Protein Learners
 - Diffusion Model-Augmented Behavioral Cloning
 - Diffusion Models Demand Contrastive Guidance for Adversarial Purification to Advance
 - Diffusion Models Encode the Intrinsic Dimension of Data Manifolds
 - Diffusion Posterior Sampling is Computationally Intractable
 - Diffusion Rejection Sampling
 - Diffusion Tempering Improves Parameter Estimation with Probabilistic Integrators for Ordinary Differential Equations
 - Diffusive Gibbs Sampling
 - DiJiang: Efficient Large Language Models through Compact Kernelization
 - DiNADO: Norm-Disentangled Neurally-Decomposed Oracles for Controlling Language Models
 - DiracDiffusion: Denoising and Incremental Reconstruction with Assured Data-Consistency
 - Directly Denoising Diffusion Models
 - Dirichlet Flow Matching with Applications to DNA Sequence Design
 - DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents
 - Discounted Adaptive Online Learning: Towards Better Regularization
 - Discovering Bias in Latent Space: An Unsupervised Debiasing Approach
 - Discovering Environments with XRM
 - Discovering Features with Synergistic Interactions in Multiple Views
 - Discovering Mixtures of Structural Causal Models from Time Series Data
 - Discovering Multiple Solutions from a Single Task in Offline Reinforcement Learning
 - Discovering Symmetry Breaking in Physical Systems with Relaxed Group Convolution
 - Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution
 - Discrete Latent Perspective Learning for Segmentation and Detection
 - DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation
 - Disentangled 3D Scene Generation with Layout Learning
 - Disentangled Continual Graph Neural Architecture Search with Invariant Modular Supernet
 - Disentangled Graph Self-supervised Learning for Out-of-Distribution Generalization
 - Disentanglement Learning via Topology
 - Disguised Copyright Infringement of Latent Diffusion Models
 - Disparate Impact on Group Accuracy of Linearization for Private Inference
 - Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion
 - Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control
 - DistiLLM: Towards Streamlined Distillation for Large Language Models
 - Distinguishing the Knowable from the Unknowable with Language Models
 - Distributed Bilevel Optimization with Communication Compression
 - Distributed High-Dimensional Quantile Regression: Estimation Efficiency and Support Recovery
 - Distributional Bellman Operators over Mean Embeddings
 - Distribution Alignment Optimization through Neural Collapse for Long-tailed Classification
 - Distributionally Robust Data Valuation
 - Distribution-Free Predictive Uncertainty Quantification: Strengths and Limits of Conformal Prediction
 - DITTO: Diffusion Inference-Time T-Optimization for Music Generation
 - Ditto: Quantization-aware Secure Inference of Transformers upon MPC
 - Diversified Batch Selection for Training Acceleration
 - Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset
 - DMTG: One-Shot Differentiable Multi-Task Grouping
 - DNA-SE: Towards Deep Neural-Nets Assisted Semiparametric Estimation
 - DNCs Require More Planning Steps
 - Do Efficient Transformers Really Save Computation?
 - Does Label Smoothing Help Deep Partial Label Learning?
 - DOGE: Domain Reweighting with Generalization Estimation
 - Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?
 - Do Large Code Models Understand Programming Concepts? Counterfactual Analysis for Code Predicates
 - Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function
 - Domain Generalisation via Imprecise Learning
 - Domain-wise Data Acquisition to Improve Performance under Distribution Shift
 - Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
 - Don't be so Negative! Score-based Generative Modeling with Oracle-assisted Guidance
 - Don’t Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget
 - Don't trust your eyes: on the (un)reliability of feature visualizations
 - DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
 - DoRA: Weight-Decomposed Low-Rank Adaptation
 - Do Topological Characteristics Help in Knowledge Distillation?
 - Do Transformer World Models Give Better Policy Gradients?
 - Double Momentum Method for Lower-Level Constrained Bilevel Optimization
 - Double-Step Alternating Extragradient with Increasing Timescale Separation for Finding Local Minimax Points: Provable Improvements
 - Double Stochasticity Gazes Faster: Snap-Shot Decentralized Stochastic Gradient Tracking Methods
 - Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient
 - Doubly Robust Causal Effect Estimation under Networked Interference via Targeted Learning
 - DPN: Decoupling Partition and Navigation for Neural Solvers of Min-max Vehicle Routing Problems
 - DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training
 - DPZero: Private Fine-Tuning of Language Models without Backpropagation
 - DRCT: Diffusion Reconstruction Contrastive Training towards Universal Detection of Diffusion Generated Images
 - DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design
 - Dr. Strategy: Model-Based Generalist Agents with Strategic Dreaming
 - Drug Discovery with Dynamic Goal-aware Fragments
 - DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning
 - DSD-DA: Distillation-based Source Debiasing for Domain Adaptive Object Detection
 - DsDm: Model-Aware Dataset Selection with Datamodels
 - Dual Operating Modes of In-Context Learning
 - DUPLEX: Dual GAT for Complex Embedding of Directed Graphs
 - Dynamic Anisotropic Smoothing for Noisy Derivative-Free Optimization
 - Dynamic Byzantine-Robust Learning: Adapting to Switching Byzantine Workers
 - Dynamic Correlation Clustering in Sublinear Update Time
 - Dynamic Evaluation of Large Language Models by Meta Probing Agents
 - Dynamic Facility Location in High Dimensional Euclidean Spaces
 - Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
 - Dynamic Metric Embedding into lp Space
 - Dynamic Spectral Clustering with Provable Approximation Guarantee
 - Dynamic Survival Analysis with Controlled Latent States
 - DynSyn: Dynamical Synergistic Representation for Efficient Learning and Control in Overactuated Embodied Systems
 - DySLIM: Dynamics Stable Learning by Invariant Measure for Chaotic Systems
 - E$^2$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation
 - EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
 - Early Time Classification with Accumulated Accuracy Gap Control
 - Easing Concept Bleeding in Diffusion via Entity Localization and Anchoring
 - eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data
 - ED-Copilot: Reduce Emergency Department Wait Time with Language Model Diagnostic Assistance
 - EDISON: Enhanced Dictionary-Induced Tensorized Incomplete Multi-View Clustering with Gaussian Error Rank Minimization
 - Editing Partially Observable Networks via Graph Diffusion Models
 - EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
 - Effect-Invariant Mechanisms for Policy Generalization
 - Effective Federated Graph Matching
 - Effects of Exponential Gaussian Distribution on (Double Sampling) Randomized Smoothing
 - Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning
 - Efficient Algorithms for Empirical Group Distributionally Robust Optimization and Beyond
 - Efficient Algorithms for Sum-Of-Minimum Optimization
 - Efficient and Effective Time-Series Forecasting with Spiking Neural Networks
 - Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior
 - Efficient Contextual Bandits with Uninformed Feedback Graphs
 - Efficient Contrastive Learning for Fast and Accurate Inference on Graphs
 - Efficient Denoising Diffusion via Probabilistic Masking
 - Efficient Error Certification for Physics-Informed Neural Networks
 - Efficient Exploration for LLMs
 - Efficient Exploration in Average-Reward Constrained Reinforcement Learning: Achieving Near-Optimal Regret With Posterior Sampling
 - Efficient Low-Rank Matrix Estimation, Experimental Design, and Arm-Set-Dependent Low-Rank Bandits
 - Efficient Mixture Learning in Black-Box Variational Inference
 - Efficient Non-stationary Online Learning by Wavelets with Applications to Online Distribution Shift Adaptation
 - Efficient Online Set-valued Classification with Bandit Feedback
 - Efficient PAC Learnability of Dynamical Systems Over Multilayer Networks
 - Efficient Pareto Manifold Learning with Low-Rank Structure
 - Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design
 - Efficient Precision and Recall Metrics for Assessing Generative Models using Hubness-aware Sampling
 - Efficient Stochastic Approximation of Minimax Excess Risk Optimization
 - Efficient Value Iteration for s-rectangular Robust Markov Decision Processes
 - Efficient World Models with Context-Aware Tokenization
 - EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data
 - EiG-Search: Generating Edge-Induced Subgraphs for GNN Explanation in Linear Time
 - ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
 - ELTA: An Enhancer against Long-Tail for Aesthetics-oriented Models
 - Eluder-based Regret for Stochastic Contextual MDPs
 - Embarrassingly Parallel GFlowNets
 - Embodied CoT Distillation From LLM To Off-the-shelf Agents
 - EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence
 - Emergence of In-Context Reinforcement Learning from Noise Distillation
 - Emergent Equivariance in Deep Ensembles
 - Emergent Representations of Program Semantics in Language Models Trained on Programs
 - Empowering Graph Invariance Learning with Deep Spurious Infomax
 - Enabling Few-Shot Learning with PID Control: A Layer Adaptive Optimizer
 - Enabling Uncertainty Estimation in Iterative Neural Networks
 - Encodings for Prediction-based Neural Architecture Search
 - End-to-End Neuro-Symbolic Reinforcement Learning with Textual Explanations
 - Energy-based Backdoor Defense without Task-Specific Samples and Model Retraining
 - Energy-Efficient Gaussian Processes Using Low-Precision Arithmetic
 - Energy-Guided Diffusion Sampling for Offline-to-Online Reinforcement Learning
 - Enforcing Constraints in RNA Secondary Structure Predictions: A Post-Processing Framework Based on the Assignment Problem
 - Enhancing Adversarial Robustness in SNNs with Sparse Gradients
 - Enhancing Class-Imbalanced Learning with Pre-Trained Guidance through Class-Conditional Knowledge Distillation
 - Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation
 - Enhancing Implicit Shape Generators Using Topological Regularizations
 - Enhancing Size Generalization in Graph Neural Networks through Disentangled Representation Learning
 - Enhancing Storage and Computational Efficiency in Federated Multimodal Learning for Large-Scale Models
 - Enhancing Sufficient Dimension Reduction via Hellinger Correlation
 - Enhancing Trajectory Prediction through Self-Supervised Waypoint Distortion Prediction
 - Enhancing Value Function Estimation through First-Order State-Action Dynamics in Offline Reinforcement Learning
 - Enhancing Vision Transformer: Amplifying Non-Linearity in Feedforward Network Module
 - Ensemble Pruning for Out-of-distribution Generalization
 - Entropy-Reinforced Planning with Large Language Models for Drug Discovery
 - Environment Design for Inverse Reinforcement Learning
 - Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection
 - EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
 - Equilibrium of Data Markets with Externality
 - EquiPocket: an E(3)-Equivariant Geometric Graph Neural Network for Ligand Binding Site Prediction
 - Equivariance via Minimal Frame Averaging for More Symmetries and Efficiency
 - Equivariant Deep Weight Space Alignment
 - Equivariant Diffusion for Crystal Structure Prediction
 - Equivariant Frames and the Impossibility of Continuous Canonicalization
 - Equivariant Graph Neural Operator for Modeling 3D Dynamics
 - Erasing the Bias: Fine-Tuning Foundation Models for Semi-Supervised Learning
 - ERQ: Error Reduction for Post-Training Quantization of Vision Transformers
 - Error Feedback Can Accurately Compress Preconditioners
 - ES-FoMo II: 2nd Workshop on Efficient Systems for Foundation Models
 - ESM All-Atom: Multi-Scale Protein Language Model for Unified Molecular Modeling
 - ESNet: Evolution and Succession Network for High-Resolution Salient Object Detection
 - Estimating Barycenters of Distributions with Neural Optimal Transport
 - Estimating Canopy Height at Scale
 - Estimating Distributional Treatment Effects in Randomized Experiments: Machine Learning for Variance Reduction
 - Estimating the Permanent by Nesting Importance Sampling
 - Estimating Unknown Population Sizes Using the Hypergeometric Distribution
 - ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections
 - Et Tu Certifications: Robustness Certificates Yield Better Adversarial Examples
 - Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
 - Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models
 - Evaluating Instrument Validity using the Principle of Independent Mechanisms
 - Evaluating Model Bias Requires Characterizing its Mistakes
 - Evaluating Quantized Large Language Models
 - Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks
 - Evaluation of Test-Time Adaptation Under Computational Time Constraints
 - Evaluation of Trajectory Distribution Predictions with Energy Score
 - EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens
 - EvGGS: A Collaborative Learning Framework for Event-based Generalizable Gaussian Splatting
 - EvIL: Evolution Strategies for Generalisable Imitation Learning
 - EvoluNet: Advancing Dynamic Non-IID Transfer Learning on Graphs
 - Evolution-Inspired Loss Functions for Protein Representation Learning
 - Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Model
 - Evolving Subnetwork Training for Large Language Models
 - EvoRainbow: Combining Improvements in Evolutionary Reinforcement Learning for Policy Search
 - EvTexture: Event-driven Texture Enhancement for Video Super-Resolution
 - Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers
 - Exact Soft Analytical Side-Channel Attacks using Tractable Circuits
 - ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking
 - Executable Code Actions Elicit Better LLM Agents
 - Expand-and-Cluster: Parameter Recovery of Neural Networks
 - Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning
 - Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs
 - Explaining Graph Neural Networks via Structure-aware Interaction Index
 - Explaining Probabilistic Models with Distributional Values
 - Explain Temporal Black-Box Models via Functional Decomposition
 - Exploiting Code Symmetries for Learning Program Semantics
 - Exploiting Human-AI Dependence for Learning to Defer
 - Exploiting Negative Samples: A Catalyst for Cohort Discovery in Healthcare Analytics
 - Exploration and Anti-Exploration with Distributional Random Network Distillation
 - Exploration by Optimization with Hybrid Regularizers: Logarithmic Regret with Adversarial Robustness in Partial Monitoring
 - Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
 - Explorations of Self-Repair in Language Models
 - Exploring Correlations of Self-Supervised Tasks for Graphs
 - Exploring Intrinsic Dimension for Vision-Language Model Pruning
 - Exploring the Benefit of Activation Sparsity in Pre-training
 - Exploring the Complexity of Deep Neural Networks through Functional Equivalence
 - Exploring the Enigma of Neural Dynamics Through A Scattering-Transform Mixer Landscape for Riemannian Manifold
 - Exploring the LLM Journey from Cognition to Expression with Linear Representations
 - Exploring the Low-Pass Filtering Behavior in Image Super-Resolution
 - Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters
 - Exponential Spectral Pursuit: An Effective Initialization Method for Sparse Phase Retrieval
 - Expressivity and Generalization: Fragment-Biases for Molecular GNNs
 - Extending Adversarial Attacks to Produce Adversarial Class Probability Distributions
 - Extending Test-Time Augmentation with Metamorphic Relations for Combinatorial Problems
 - Extracting Training Data From Document-Based VQA Models
 - Extreme Compression of Large Language Models via Additive Quantization
 - Factored-Reward Bandits with Intermediate Observations
 - FADAS: Towards Federated Adaptive Asynchronous Optimization
 - FAFE: Immune Complex Modeling with Geodesic Distance Loss on Noisy Group Frames
 - Failures Are Fated, But Can Be Faded: Characterizing and Mitigating Unwanted Behaviors in Large-Scale Vision and Language Models
 - Fair Classification with Partial Feedback: An Exploration-Based Data Collection Approach
 - Fair Data Representation for Machine Learning at the Pareto Frontier
 - Fair Federated Learning via the Proportional Veto Core
 - Fair Off-Policy Learning from Observational Data
 - FairProof : Confidential and Certifiable Fairness for Neural Networks
 - Fair Resource Allocation in Multi-Task Learning
 - Fair Risk Control: A Generalized Framework for Calibrating Multi-group Fairness Risks
 - Faithfulness Measurable Masked Language Models
 - Fast Adversarial Attacks on Language Models In One GPU Minute
 - Fast Algorithms for Hypergraph PageRank with Applications to Semi-Supervised Learning
 - Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits
 - Fast Co-Training under Weak Dependence via Stream-Based Active Learning
 - Fast Decision Boundary based Out-of-Distribution Detector
 - Faster Adaptive Decentralized Learning Algorithms
 - Faster Maximum Inner Product Search in High Dimensions
 - Faster Sampling via Stochastic Gradient Proximal Sampler
 - Faster Streaming and Scalable Algorithms for Finding Directed Dense Subgraphs in Large Graphs
 - Fast Peer Adaptation with Context-aware Exploration
 - Fast Sampling-Based Sketches for Tensors
 - Fast, Scalable, Warm-Start Semidefinite Programming with Spectral Bundling and Sketching
 - Fast-Slow Test-Time Adaptation for Online Vision-and-Language Navigation
 - Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization
 - Fast Timing-Conditioned Latent Audio Diffusion
 - Fast White-Box Adversarial Streaming Without a Random Oracle
 - Fault Tolerant ML: Efficient Meta-Aggregation and Synchronous Training
 - Feasibility Consistent Representation Learning for Safe Reinforcement Learning
 - Feasible Reachable Policy Iteration
 - Feature Attribution with Necessity and Sufficiency via Dual-stage Perturbation Test for Causal Explanation
 - Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize
 - Feature Distribution on Graph Topology Mediates the Effect of Graph Convolution: Homophily Perspective
 - Feature Importance Disparities for Data Bias Investigations
 - Feature Reuse and Scaling: Understanding Transfer Learning with Protein Language Models
 - FedBAT: Communication-Efficient Federated Learning via Learnable Binarization
 - FedBPT: Efficient Federated Black-box Prompt Tuning for Large Language Models
 - FedCal: Achieving Local and Global Calibration in Federated Learning via Aggregated Parameterized Scaler
 - Federated Combinatorial Multi-Agent Multi-Armed Bandits
 - Federated Continual Learning via Prompt-based Dual Knowledge Transfer
 - Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes
 - Federated Neuro-Symbolic Learning
 - Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices
 - Federated Optimization with Doubly Regularized Drift Correction
 - Federated Representation Learning in the Under-Parameterized Regime
 - Federated Self-Explaining GNNs with Anti-shortcut Augmentations
 - FedLMT: Tackling System Heterogeneity of Federated Learning via Low-Rank Model Training with Theoretical Guarantees
 - FedMBridge: Bridgeable Multimodal Federated Learning
 - FedRC: Tackling Diverse Distribution Shifts Challenge in Federated Learning by Robust Clustering
 - FedREDefense: Defending against Model Poisoning Attacks for Federated Learning using Model Update Reconstruction Error
 - FedSC: Provable Federated Self-supervised Learning with Spectral Contrastive Objective over Non-i.i.d. Data
 - Feedback Efficient Online Fine-Tuning of Diffusion Models
 - Feedback Loops With Language Models Drive In-Context Reward Hacking
 - Feel-Good Thompson Sampling for Contextual Dueling Bandits
 - FESSNC: Fast Exponentially Stable and Safe Neural Controller
 - Fewer Truncations Improve Language Modeling
 - Few-shot Adaptation to Distribution Shifts By Mixing Source and Target Embeddings
 - Few-Shot Character Understanding in Movies as an Assessment to Meta-Learning of Theory-of-Mind
 - Few-Shot Unsupervised Implicit Neural Shape Representation Learning with Spatial Adversaries
 - FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning
 - Finding NEM-U: Explaining unsupervised representation learning through neural network generated explanation masks
 - Fine-Grained Causal Dynamics Learning with Quantization for Improving Robustness in Reinforcement Learning
 - Fine-grained Classes and How to Find Them
 - Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention
 - Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem
 - Finite Smoothing Algorithm for High-Dimensional Support Vector Machines and Quantile Regression
 - Finite-Time Convergence and Sample Complexity of Actor-Critic Multi-Objective Reinforcement Learning
 - Finite Time Logarithmic Regret Bounds for Self-Tuning Regulation
 - Finite Volume Features, Global Geometry Representations, and Residual Training for Deep Learning-based CFD Simulation
 - First-Order Manifold Data Augmentation for Regression Learning
 - FiT: Flexible Vision Transformer for Diffusion Model
 - FlashST: A Simple and Universal Prompt-Tuning Framework for Traffic Prediction
 - Flexible Residual Binarization for Image Super-Resolution
 - Flextron: Many-in-One Flexible Large Language Model
 - Floating Anchor Diffusion Model for Multi-motif Scaffolding
 - Flora: Low-Rank Adapters Are Secretly Gradient Compressors
 - FlowMM: Generating Materials with Riemannian Flow Matching
 - Fool Your (Vision and) Language Model with Embarrassingly Simple Permutations
 - Forget Sharpness: Perturbed Forgetting of Model Biases Within SAM Dynamics
 - Foundation Policies with Hilbert Representations
 - Foundations of Data-efficient Machine Learning
 - Foundations of Reinforcement Learning and Control: Connections and Perspectives
 - Foundations of Testing for Finite-Sample Causal Discovery
 - Fourier Controller Networks for Real-Time Decision-Making in Embodied Learning
 - FRAG: Frequency Adapting Group for Diffusion Video Editing
 - FrameQuant: Flexible Low-Bit Quantization for Transformers
 - FRAPPÉ: A Group Fairness Framework for Post-Processing Everything
 - FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
 - From Biased Selective Labels to Pseudo-Labels: An Expectation-Maximization Framework for Learning from Biased Decisions
 - From Classification Accuracy to Proper Scoring Rules: Elicitability of Probabilistic Top List Predictions
 - From Coarse to Fine: Enable Comprehensive Graph Self-supervised Learning with Multi-granular Semantic Ensemble
 - From Fourier to Neural ODEs: Flow Matching for Modeling Complex Systems
 - From Generalization Analysis to Optimization Designs for State Space Models
 - From Geometry to Causality- Ricci Curvature and the Reliability of Causal Inference on Networks
 - From Inverse Optimization to Feasibility to ERM
 - From Neurons to Neutrons: A Case Study in Interpretability
 - From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers
 - From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation
 - From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems
 - From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning
 - Full-Atom Peptide Design based on Multi-modal Flow Matching
 - Fully-Dynamic Approximate Decision Trees With Worst-Case Update Time Guarantees
 - Fundamental Benefit of Alternating Updates in Minimax Optimization
 - Fundamental Limitations of Alignment in Large Language Models
 - Fundamental Limits of Distributed Covariance Matrix Estimation Under Communication Constraints
 - FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning
 - GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
 - GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
 - Gambling-Based Confidence Sequences for Bounded Random Vectors
 - Gated Linear Attention Transformers with Hardware-Efficient Training
 - GATE: How to Keep Out Intrusive Neighbors
 - Gaussian Plane-Wave Neural Operator for Electron Density Estimation
 - GaussianPro: 3D Gaussian Splatting with Progressive Propagation
 - Gaussian Processes on Cellular Complexes
 - GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
 - GenCO: Generating Diverse Designs with Combinatorial Constraints
 - Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning
 - Generalization Analysis for Multi-Label Learning
 - Generalization Analysis of Deep Non-linear Matrix Completion
 - Generalization Analysis of Stochastic Weight Averaging with General Sampling
 - Generalization Bound and New Algorithm for Clean-Label Backdoor Attack
 - Generalization Bounds for Causal Regression: Insights, Guarantees and Sensitivity Analysis
 - Generalization Bounds for Heavy-Tailed SDEs through the Fractional Fokker-Planck Equation
 - Generalization Error of Graph Neural Networks in the Mean-field Regime
 - Generalization in Kernel Regression Under Realistic Assumptions
 - Generalization to New Sequential Decision Making Tasks with In-Context Learning
 - Generalized Neural Collapse for a Large Number of Classes
 - Generalized Preference Optimization: A Unified Approach to Offline Alignment
 - Generalized Smooth Variational Inequalities: Methods with Adaptive Stepsizes
 - Generalized Sobolev Transport for Probability Measures on a Graph
 - Generalizing Knowledge Graph Embedding with Universal Orthogonal Parameterization
 - Generalizing Orthogonalization for Models with Non-Linearities
 - Generating Chain-of-Thoughts with a Pairwise-Comparison Approach to Searching for the Most Promising Intermediate Thought
 - Generating In-Distribution Proxy Graphs for Explaining Graph Neural Networks
 - Generative Active Learning for Long-tailed Instance Segmentation
 - Generative Conditional Distributions by Neural (Entropic) Optimal Transport
 - Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates
 - Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design
 - Generative Marginalization Models
 - Generative Modeling on Manifolds Through Mixture of Riemannian Diffusion Processes
 - Genie: Generative Interactive Environments
 - GeoAB: Towards Realistic Antibody Design and Reliable Affinity Maturation
 - Geometric Active Exploration in Markov Decision Processes: the Benefit of Abstraction
 - Geometry-Aware Instrumental Variable Regression
 - Geometry-Calibrated DRO: Combating Over-Pessimism with Free Energy Implications
 - Geometry-grounded Representation Learning and Generative Modeling
 - GeoMFormer: A General Architecture for Geometric Molecular Representation Learning
 - GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
 - Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
 - Getting the most out of your tokenizer for pre-training and domain adaptation
 - GFlowNet Training by Policy Gradients
 - Gibbs Sampling of Continuous Potentials on a Quantum Computer
 - GiLOT: Interpreting Generative Language Models via Optimal Transport
 - GistScore: Learning Better Representations for In-Context Example Selection with Gist Bottlenecks
 - GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding
 - Global Reinforcement Learning : Beyond Linear and Convex Rewards via Submodular Semi-gradient Methods
 - GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements
 - GNNs Also Deserve Editing, and They Need It More Than Once
 - Going beyond Compositions, DDPMs Can Produce Zero-Shot Interpolations
 - Gondzo - Charting a Path for African Low-Resource Languages: A Multifaceted Approach to Research and Development
 - GPT-4V(ision) is a Generalist Web Agent, if Grounded
 - GPTSwarm: Language Agents as Optimizable Graphs
 - Gradient-based Visual Explanation for Transformer-based CLIP
 - Gradient Compressed Sensing: A Query-Efficient Gradient Estimator for High-Dimensional Zeroth-Order Optimization
 - Gradual Divergence for Seamless Adaptation: A Novel Domain Incremental Learning Method
 - Graph2Tac: Online Representation Learning of Formal Math Concepts
 - Graph Adversarial Diffusion Convolution
 - Graph As Point Set
 - Graph Attention Retrospective
 - Graph Automorphism Group Equivariant Neural Networks
 - Graph-based Forecasting with Missing Data through Spatiotemporal Downsampling
 - Graph-based Time Series Clustering for End-to-End Hierarchical Forecasting
 - Graph Distillation with Eigenbasis Matching
 - Graph-enhanced Large Language Models in Asynchronous Plan Reasoning
 - Graph External Attention Enhanced Transformer
 - Graph Generation with Diffusion Mixture
 - Graph Geometry-Preserving Autoencoders
 - Graph Learning: Principles, Challenges, and Open Directions
 - Graph Mixup on Approximate Gromov–Wasserstein Geodesics
 - Graph Neural Network Explanations are Fragile
 - Graph Neural Networks Use Graphs When They Shouldn't
 - Graph Neural Networks with a Distribution of Parametrized Graphs
 - Graph Neural PDE Solvers with Conservation and Similarity-Equivariance
 - Graph Neural Stochastic Diffusion for Estimating Uncertainty in Node Classification
 - Graphon Mean Field Games with a Representative Player: Analysis and Learning Algorithm
 - Graph Out-of-Distribution Detection Goes Neighborhood Shaping
 - Graph Positional and Structural Encoder
 - Graph Structure Extrapolation for Out-of-Distribution Generalization
 - Graph-Triggered Rising Bandits
 - GRATH: Gradual Self-Truthifying for Large Language Models
 - Grokking Group Multiplication with Cosets
 - GroupCover: A Secure, Efficient and Scalable Inference Framework for On-device Model Protection based on TEEs
 - Guarantees for Nonlinear Representation Learning: Non-identical Covariates, Dependent Data, Fewer Samples
 - Guidance with Spherical Gaussian Constraint for Conditional Diffusion
 - Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation
 - HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
 - HAMLET: Graph Transformer Neural Operator for Partial Differential Equations
 - Handling Heterogeneous Curvatures in Bandit LQR Control
 - Hard Tasks First: Multi-Task Reinforcement Learning Through Task Scheduling
 - HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
 - HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning
 - Harmonic Self-Conditioned Flow Matching for joint Multi-Ligand Docking and Binding Site Design
 - Harmonizing Generalization and Personalization in Federated Prompt Learning
 - HarmonyDream: Task Harmonization Inside World Models
 - Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis
 - Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition
 - Harnessing Neural Unit Dynamics for Effective and Scalable Class-Incremental Learning
 - Harnessing the Power of Neural Operators with Automatically Encoded Conservation Laws
 - HelmFluid: Learning Helmholtz Dynamics for Interpretable Fluid Prediction
 - Helpful or Harmful Data? Fine-tuning-free Shapley Attribution for Explaining Language Model Predictions
 - HexGen: Generative Inference of Large Language Model over Heterogeneous Environment
 - HGAP: Boosting Permutation Invariant and Permutation Equivariant in Multi-Agent Reinforcement Learning via Graph Attention Network
 - HGCN2SP: Hierarchical Graph Convolutional Network for Two-Stage Stochastic Programming
 - Hidden Traveling Waves bind Working Memory Variables in Recurrent Neural Networks
 - Hierarchical Integral Probability Metrics: A distance on random probability measures with low sample complexity
 - Hierarchical Neural Operator Transformer with Learnable Frequency-aware Loss Prior for Arbitrary-scale Super-resolution
 - Hierarchical Novelty Detection via Fine-Grained Evidence Allocation
 - Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling
 - Hieros: Hierarchical Imagination on Structured State Space Sequence World Models
 - High-Dimensional Bayesian Optimization via Semi-Supervised Learning with Optimized Unlabeled Data Sampling
 - High-Dimensional Geometric Streaming for Nearly Low Rank Data
 - High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization
 - High-dimensional Learning Dynamics Workshop: The Emergence of Structure and Reasoning
 - High-dimensional Linear Bandits with Knapsacks
 - High-Order Contrastive Learning with Fine-grained Comparative Levels for Sparse Ordinal Tensor Completion
 - High-Performance Temporal Reversible Spiking Neural Networks with $\mathcal{O}(L)$ Training Memory and $\mathcal{O}(1)$ Inference Cost
 - High-Probability Bound for Non-Smooth Non-Convex Stochastic Optimization with Heavy Tails
 - High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise
 - Highway Value Iteration Networks
 - Homomorphism Counts for Graph Neural Networks: All About That Basis
 - How Deep Do We Need: Accelerating Training and Inference of Neural ODEs via Control Perspective
 - How Deep Networks Learn Sparse and Hierarchical Data: the Sparse Random Hierarchy Model
 - How Does Goal Relabeling Improve Sample Efficiency?
 - How do Large Language Models Navigate Conflicts between Honesty and Helpfulness?
 - How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?
 - How do Transformers Perform In-Context Autoregressive Learning ?
 - How Far Can Fairness Constraints Help Recover From Biased Data?
 - How Flawed Is ECE? An Analysis via Logit Smoothing
 - How Free is Parameter-Free Stochastic Optimization?
 - How Graph Neural Networks Learn: Lessons from Training Dynamics
 - How Interpretable Are Interpretable Graph Neural Networks?
 - How Language Model Hallucinations Can Snowball
 - How Learning by Reconstruction Produces Uninformative Features For Perception
 - How Private are DP-SGD Implementations?
 - How Smooth Is Attention?
 - How Spurious Features are Memorized: Precise Analysis for Random and NTK Features
 - How to Escape Sharp Minima with Random Perturbations
 - How to Explore with Belief: State Entropy Maximization in POMDPs
 - How to Leverage Diverse Demonstrations in Offline Imitation Learning
 - How to Make the Gradients Small Privately: Improved Rates for Differentially Private Non-Convex Optimization
 - How to Trace Latent Generative Model Generated Images without Artificial Watermark?
 - How Transformers Learn Causal Structure with Gradient Descent
 - How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers
 - How Universal Polynomial Bases Enhance Spectral Graph Neural Networks: Heterophily, Over-smoothing, and Over-squashing
 - How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis
 - Human Alignment of Large Language Models through Online Preference Optimisation
 - Human-like Category Learning by Injecting Ecological Priors from Large Language Models into Neural Networks
 - Humans, Algorithmic Decision-Making and Society: Modeling Interactions and Impact
 - HumanTOMATO: Text-aligned Whole-body Motion Generation
 - Human vs. Generative AI in Content Creation Competition: Symbiosis or Conflict?
 - Hybrid$^2$ Neural ODE Causal Modeling and an Application to Glycemic Response
 - Hybrid Inverse Reinforcement Learning
 - Hybrid Neural Representations for Spherical Data
 - Hybrid Reinforcement Learning from Offline Observation Alone
 - Hyperbolic Active Learning for Semantic Segmentation under Domain Shift
 - Hyperbolic Geometric Latent Diffusion Model for Graph Generation
 - Hyperbolic Optimizer as a Dynamical System
 - HyperFields: Towards Zero-Shot Generation of NeRFs from Text
 - Hypergraph-enhanced Dual Semi-supervised Graph Classification
 - IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency
 - ICML 2024 Workshop on Foundation Models in the Wild
 - ICML Workshop on Large Language Models and Cognition
 - Identifiability Matters: Revealing the Hidden Recoverable Condition in Unbiased Learning to Rank
 - Identification and Estimation for Nonignorable Missing Data: A Data Fusion Approach
 - IIANet: An Intra- and Inter-Modality Attention Network for Audio-Visual Speech Separation
 - ILILT: Implicit Learning of Inverse Lithography Technologies
 - IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation
 - Image Clustering with External Guidance
 - Image Fusion via Vision-Language Model
 - Image Hijacks: Adversarial Images can Control Generative Models at Runtime
 - Image Restoration Through Generalized Ornstein-Uhlenbeck Bridge
 - Imitation Learning from Purified Demonstrations
 - Imitation Learning in Discounted Linear MDPs without exploration assumptions
 - Impact of Decentralized Learning on Player Utilities in Stackelberg Games
 - Implicit Bias of AdamW: $\ell_\infty$-Norm Constrained Optimization
 - Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States
 - Implicit Compressibility of Overparametrized Neural Networks Trained with Heavy-Tailed SGD
 - Implicit meta-learning may lead language models to trust more reliable sources
 - Implicit Regularization in Feedback Alignment Learning Mechanisms for Neural Networks
 - Implicit Representations for Constrained Image Segmentation
 - Implicit Representations via Operator Learning
 - Improved Bounds for Pure Private Agnostic Learning: Item-Level and User-Level Privacy
 - Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy
 - Improved Differentially Private and Lazy Online Convex Optimization: Lower Regret without Smoothness Requirements
 - Improved Dimensionality Dependence for Zeroth-Order Optimisation over Cross-Polytopes
 - Improved Generalization of Weight Space Networks via Augmentations
 - Improved Modelling of Federated Datasets using Mixtures-of-Dirichlet-Multinomials
 - Improved Operator Learning by Orthogonal Attention
 - Improved Stability and Generalization Guarantees of the Decentralized SGD Algorithm
 - Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training
 - Improving Adversarial Energy-Based Model via Diffusion Process
 - Improving Antibody Humanness Prediction using Patent Data
 - Improving Computational Complexity in Statistical Models with Local Curvature Information
 - Improving Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning
 - Improving Diffusion Models for Inverse Problems Using Optimal Posterior Covariance
 - Improving Equivariant Graph Neural Networks on Large Geometric Graphs via Virtual Nodes Learning
 - Improving Factuality and Reasoning in Language Models through Multiagent Debate
 - Improving fine-grained understanding in image-text pre-training
 - Improving Generalization in Offline Reinforcement Learning via Adversarial Data Splitting
 - Improving Gradient-Guided Nested Sampling for Posterior Inference
 - Improving Group Robustness on Spurious Correlation Requires Preciser Group Inference
 - Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
 - Improving Interpretation Faithfulness for Vision Transformers
 - Improving Neural Additive Models with Bayesian Principles
 - Improving Neural Logic Machines via Failure Reflection
 - Improving Open-Ended Text Generation via Adaptive Decoding
 - Improving Prototypical Visual Explanations with Reward Reweighing, Reselection, and Retraining
 - Improving Robustness to Multiple Spurious Correlations by Multi-Objective Optimization
 - Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games
 - Improving SAM Requires Rethinking its Optimization Formulation
 - Improving Sharpness-Aware Minimization by Lookahead
 - Improving Token-Based World Models with Parallel Observation Prediction
 - Improving Transformers with Dynamically Composable Multi-Head Attention
 - IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers
 - Incentivized Learning in Principal-Agent Bandit Games
 - In-context Convergence of Transformers
 - In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought
 - In-Context Freeze-Thaw Bayesian Optimization for Hyperparameter Optimization
 - In-Context Language Learning: Architectures and Algorithms
 - In-Context Learning Agents Are Asymmetric Belief Updaters
 - In-context Learning on Function Classes Unveiled for Transformers
 - In-Context Principle Learning from Mistakes
 - In-Context Reinforcement Learning for Variable Action Spaces
 - In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
 - In-Context Unlearning: Language Models as Few-Shot Unlearners
 - In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
 - Incorporating Information into Shapley Values: Reweighting via a Maximum Entropy Approach
 - Incorporating probabilistic domain knowledge into deep multiple instance learning
 - Incremental Topological Ordering and Cycle Detection with Predictions
 - Indirectly Parameterized Concrete Autoencoders
 - Individual Contributions as Intrinsic Exploration Scaffolds for Multi-agent Reinforcement Learning
 - Individual Fairness in Graph Decomposition
 - Individualized Privacy Accounting via Subsampling with Applications in Combinatorial Optimization
 - Inexact Newton-type Methods for Optimisation with Nonnegativity Constraints
 - InferCept: Efficient Intercept Support for Augmented Large Language Model Inference
 - Inferring Change Points in High-Dimensional Linear Regression via Approximate Message Passing
 - Inferring Dynamic Networks from Marginals with Iterative Proportional Fitting
 - Inferring the Long-Term Causal Effects of Long-Term Treatments from Short-Term Experiments
 - InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks
 - Infinite-Horizon Distributionally Robust Regret-Optimal Control
 - InfoNet: Neural Estimation of Mutual Information without Test-Time Optimization
 - Information Complexity of Stochastic Convex Optimization: Applications to Generalization, Memorization, and Tracing
 - Information-Directed Pessimism for Offline Reinforcement Learning
 - Information Flow in Self-Supervised Learning
 - Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmarks
 - Initial Guessing Bias: How Untrained Networks Favor Some Classes
 - Instruction Tuning for Secure Code Generation
 - InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
 - InstructSpeech: Following Speech Editing Instructions via Large Language Models
 - InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models
 - Integrated Hardware Architecture and Device Placement Search
 - Integrating Global Context Contrast and Local Sensitivity for Blind Image Quality Assessment
 - Integrating Multimodal Data for Joint Generative Modeling of Complex Dynamics
 - Interacting Diffusion Processes for Event Sequence Forecasting
 - Interaction-based Retrieval-augmented Diffusion Models for Protein-specific 3D Molecule Generation
 - InterLUDE: Interactions between Labeled and Unlabeled Data to Enhance Semi-Supervised Learning
 - Interplay of ROC and Precision-Recall AUCs: Theoretical Limits and Practical Implications in Binary Classification
 - Interpretability Illusions in the Generalization of Simplified Models
 - Interpretable Deep Clustering for Tabular Data
 - InterpreTabNet: Distilling Predictive Signals from Tabular Data by Salient Feature Interpretation
 - Interpreting and Improving Diffusion Models from an Optimization Perspective
 - Interpreting and Improving Large Language Models in Arithmetic Calculation
 - Interpreting Equivariant Representations
 - Intersecting-Boundary-Sensitive Fingerprinting for Tampering Detection of DNN Models
 - Intersectional Unfairness Discovery
 - In value-based deep reinforcement learning, a pruned network is a good network
 - Invariant Risk Minimization Is A Total Variation Model
 - Inverse-Variance Weighting for Estimation of Heterogeneous Treatment Effects
 - Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning
 - INViT: A Generalizable Routing Problem Solver with Invariant Nested View Transformer
 - I/O Complexity of Attention, or How Optimal is FlashAttention?
 - IOI: Invisible One-Iteration Adversarial Attack on No-Reference Image- and Video-Quality Metrics
 - Irregular Multivariate Time Series Forecasting: A Transformable Patching Graph Neural Networks Approach
 - Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
 - Is Epistemic Uncertainty Faithfully Represented by Evidential Deep Learning Methods?
 - Is In-Context Learning in Large Language Models Bayesian? A Martingale Perspective
 - Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning? A Theoretical Perspective
 - Is Kernel Prediction More Powerful than Gating in Convolutional Neural Networks?
 - Isometric Representation Learning for Disentangled Latent Space of Diffusion Models
 - Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?
 - Iterated Denoising Energy Matching for Sampling from Boltzmann Densities
 - Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
 - Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint
 - Iterative Regularized Policy Optimization with Imperfect Demonstrations
 - Iterative Search Attribution for Deep Neural Networks
 - IW-GAE: Importance weighted group accuracy estimation for improved calibration and model selection in unsupervised domain adaptation
 - Jacobian Regularizer-based Neural Granger Causality
 - Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization
 - Joint Composite Latent Space Bayesian Optimization
 - Junk DNA Hypothesis: Pruning Small Pre-Trained Weights $\textit{Irreversibly}$ and $\textit{Monotonically}$ Impairs ``Difficult" Downstream Tasks in LLMs
 - Just Cluster It: An Approach for Exploration in High-Dimensions using Clustering and Pre-Trained Representations
 - Keep the Momentum: Conservation Laws beyond Euclidean Gradient Flows
 - Kepler codebook
 - Kernel-Based Evaluation of Conditional Biological Sequence Models
 - Kernel Debiased Plug-in Estimation: Simultaneous, Automated Debiasing without Influence Functions for Many Target Parameters
 - Kernel Semi-Implicit Variational Inference
 - KernelSHAP-IQ: Weighted Least Square Optimization for Shapley Interactions
 - KernelWarehouse: Rethinking the Design of Dynamic Convolution
 - Keypoint-based Progressive Chain-of-Thought Distillation for LLMs
 - KISA: A Unified Keyframe Identifier and Skill Annotator for Long-Horizon Robotics Demonstrations
 - KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
 - KnowFormer: Revisiting Transformers for Knowledge Graph Reasoning
 - Knowledge-aware Reinforced Language Models for Protein Directed Evolution
 - Knowledge Distillation with Auxiliary Variable
 - Knowledge Graphs Can be Learned with Just Intersection Features
 - Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models
 - KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation
 - LAGMA: LAtent Goal-guided Multi-Agent Reinforcement Learning
 - LaMAGIC: Language-Model-based Topology Generation for Analog Integrated Circuits
 - LangCell: Language-Cell Pre-training for Cell Identity Understanding
 - Langevin Policy for Safe Reinforcement Learning
 - Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game
 - Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models
 - Language-Driven Cross-Modal Classifier for Zero-Shot Multi-Label Image Recognition
 - Language Generation with Strictly Proper Scoring Rules
 - Language-guided Skill Learning with Temporal Variational Inference
 - Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
 - Language Models as Science Tutors
 - Language Models as Semantic Indexers
 - Language Models Represent Beliefs of Self and Others
 - Language Models with Conformal Factuality Guarantees
 - Large Language Models are Geographically Biased
 - Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning
 - Large Scale Dataset Distillation with Domain Shift
 - Larimar: Large Language Models with Episodic Memory Control
 - LASER: Linear Compression in Wireless Distributed Optimization
 - Latent Logic Tree Extraction for Event Sequence Explanation from LLMs
 - Latent Noise Segmentation: How Neural Noise Leads to the Emergence of Segmentation and Grouping
 - Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming
 - Latent Space Symmetry Discovery
 - Latent variable model for high-dimensional point process with structured missingness
 - Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency
 - LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging
 - Layerwise Change of Knowledge in Neural Networks
 - Layerwise Proximal Replay: A Proximal Point Method for Online Continual Learning
 - LCA-on-the-Line: Benchmarking Out of Distribution Generalization with Class Taxonomies
 - LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions
 - Learning 1-Bit Tiny Object Detector with Discriminative Feature Refinement
 - Learning Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking
 - Learning a Diffusion Model Policy from Rewards via Q-Score Matching
 - Learning and Forgetting Unsafe Examples in Large Language Models
 - Learning Associative Memories with Gradient Descent
 - Learning Causal Domain-Invariant Temporal Dynamics for Few-Shot Action Recognition
 - Learning Causal Dynamics Models in Object-Oriented Environments
 - Learning Causal Relations from Subsampled Time Series with Two Time-Slices
 - Learning Cognitive Maps from Transformer Representations for Efficient Planning in Partially Observed Environments
 - Learning Constraints from Offline Demonstrations via Superior Distribution Correction Estimation
 - Learning Coverage Paths in Unknown Environments with Deep Reinforcement Learning
 - Learning Decision Policies with Instrumental Variables through Double Machine Learning
 - Learning Decision Trees and Forests with Algorithmic Recourse
 - Learning Divergence Fields for Shift-Robust Graph Representations
 - Learning-Efficient Yet Generalizable Collaborative Filtering for Item Recommendation
 - Learning Exceptional Subgroups by End-to-End Maximizing KL-Divergence
 - Learning from Integral Losses in Physics Informed Neural Networks
 - Learning from Memory: Non-Parametric Memory Augmented Self-Supervised Learning of Visual Features
 - Learning from Streaming Data when Users Choose
 - Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs
 - Learning Graph Representation via Graph Entropy Maximization
 - Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding
 - Learning High-Order Relationships of Brain Regions
 - Learning in Deep Factor Graphs with Gaussian Belief Propagation
 - Learning in Feature Spaces via Coupled Covariances: Asymmetric Kernel SVD and Nyström method
 - Learning Iterative Reasoning through Energy Diffusion
 - Learning Label Shift Correction for Test-Agnostic Long-Tailed Recognition
 - Learning Latent Dynamic Robust Representations for World Models
 - Learning Latent Space Hierarchical EBM Diffusion Models
 - Learning Latent Structures in Network Games via Data-Dependent Gated-Prior Graph Variational Autoencoders
 - Learning Linear Block Error Correction Codes
 - Learning Low-dimensional Latent Dynamics from High-dimensional Observations: Non-asymptotics and Lower Bounds
 - Learning Mixtures of Gaussian Processes through Random Projection
 - Learning Modality Knowledge Alignment for Cross-Modality Transfer
 - Learning Multiple Secrets in Mastermind
 - Learning Optimal Deterministic Policies with Stochastic Policy Gradients
 - Learning Optimal Projection for Forecast Reconciliation of Hierarchical Time Series
 - Learning Pseudo-Contractive Denoisers for Inverse Problems
 - Learning-Rate-Free Stochastic Optimization over Riemannian Manifolds
 - Learning Reward for Robot Skills Using Large Language Models via Self-Alignment
 - Learning Scale-Aware Spatio-temporal Implicit Representation for Event-based Motion Deblurring
 - Learning Shadow Variable Representation for Treatment Effect Estimation under Collider Bias
 - Learning Solution-Aware Transformers for Efficiently Solving Quadratic Assignment Problem
 - Learning Surrogates for Offline Black-Box Optimization via Gradient Matching
 - Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making
 - Learning the Target Network in Function Space
 - Learning the Uncertainty Sets of Linear Control Systems via Set Membership: A Non-asymptotic Analysis
 - Learning to Compile Programs to Neural Networks
 - Learning to Continually Learn with the Bayesian Principle
 - Learning to Explore for Stochastic Gradient MCMC
 - Learning to Explore in POMDPs with Informational Rewards
 - Learning to Infer Generative Template Programs for Visual Concepts
 - Learning to Intervene on Concept Bottlenecks
 - Learning to Model the World With Language
 - Learning to Play Atari in a World of Tokens
 - Learning to Predict Mutational Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning
 - Learning to Reach Goals via Diffusion
 - Learning to Remove Cuts in Integer Linear Programming
 - Learning to Route Among Specialized Experts for Zero-Shot Generalization
 - Learning to Scale Logits for Temperature-Conditional GFlowNets
 - Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces
 - Learning Universal Predictors
 - Learning Useful Representations of Recurrent Neural Network Weight Matrices
 - Learning with 3D rotations, a hitchhiker's guide to SO(3)
 - Learning with Adaptive Resource Allocation
 - Learning with Complementary Labels Revisited: The Selected-Completely-at-Random Setting Is More Practical
 - Learning with Partial-Label and Unlabeled Data: A Uniform Treatment for Supervision Redundancy and Insufficiency
 - Less is More: on the Over-Globalizing Problem in Graph Transformers
 - Lessons from Generalization Error Analysis of Federated Learning: You May Communicate Less Often!
 - LESS: Selecting Influential Data for Targeted Instruction Tuning
 - Let Go of Your Labels with Unsupervised Transfer
 - Leverage Class-Specific Accuracy to Guide Data Generation for Improving Image Classification
 - Leveraging Attractor Dynamics in Spatial Navigation for Better Language Parsing
 - Leveraging (Biased) Information: Multi-armed Bandits with Offline Data
 - Leveraging Self-Consistency for Data-Efficient Amortized Bayesian Inference
 - Leveraging VLM-Based Pipelines to Annotate 3D Objects
 - LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views
 - Libra: Building Decoupled Vision System on Large Language Models
 - LIDAO: Towards Limited Interventions for Debiasing (Large) Language Models
 - Lie Neurons: Adjoint-Equivariant Neural Networks for Semisimple Lie Algebras
 - Light and Optimal Schrödinger Bridge Matching
 - Lightweight Image Super-Resolution via Flexible Meta Pruning
 - Limited Preference Aided Imitation Learning from Imperfect Demonstrations
 - Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback
 - Linear Explanations for Individual Neurons
 - Linguistic Calibration of Long-Form Generations
 - Liouville Flow Importance Sampler
 - Listenable Maps for Audio Classifiers
 - Listening to the noise: Blind Denoising with Gibbs Diffusion
 - Listwise Reward Estimation for Offline Preference-based Reinforcement Learning
 - LLaGA: Large Language and Graph Assistant
 - LLark: A Multimodal Instruction-Following Language Model for Music
 - LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery
 - LLM-Empowered State Representation for Reinforcement Learning
 - LLM Maybe LongLM: SelfExtend LLM Context Window Without Tuning
 - Local Causal Structure Learning in the Presence of Latent Variables
 - Local Feature Selection without Label or Feature Leakage for Interpretable Machine Learning Predictions
 - Locality-Sensitive Hashing-Based Efficient Point Transformer with Applications in High-Energy Physics
 - Localizing Task Information for Improved Model Merging and Compression
 - Locally Differentially Private Decentralized Stochastic Bilevel Optimization with Guaranteed Convergence Accuracy
 - Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization
 - Locally Interdependent Multi-Agent MDP: Theoretical Framework for Decentralized Agents with Dynamic Dependencies
 - Local vs. Global Interpretability: A Computational Complexity Perspective
 - LoCoCo: Dropping In Convolutions for Long Context Compression
 - Logistic Variational Bayes Revisited
 - Log Neural Controlled Differential Equations: The Lie Brackets Make A Difference
 - Long-Context Foundation Models
 - Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
 - Longitudinal Targeted Minimum Loss-based Estimation with Temporal-Difference Heterogeneous Transformer
 - Long Range Propagation on Continuous-Time Dynamic Graphs
 - LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
 - Long-Tail Learning with Foundation Model: Heavy Fine-Tuning Hurts
 - Look Ahead or Look Around? A Theoretical Comparison Between Autoregressive and Masked Pretraining
 - Lookbehind-SAM: k steps back, 1 step forward
 - LoRA+: Efficient Low Rank Adaptation of Large Models
 - LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models
 - LoRA Training in the NTK Regime has No Spurious Local Minima
 - Loss Shaping Constraints for Long-Term Time Series Forecasting
 - Low-Cost High-Power Membership Inference Attacks
 - Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace Recovery
 - Low-Rank Similarity Mining for Multimodal Dataset Distillation
 - LPGD: A General Framework for Backpropagation through Embedded Optimization Layers
 - LQER: Low-Rank Quantization Error Reconstruction for LLMs
 - LSEnet: Lorentz Structural Entropy Neural Network for Deep Graph Clustering
 - Lucilla Sioli
 - Lyapunov-stable Neural Control for State and Output Feedback: A Novel Formulation
 - Machine Learning for Earth System Modeling: Accelerating Pathways to Impact
 - Machine Learning Opportunities for the Next Generation of Particle Physics
 - Machine Vision Therapy: Multimodal Large Language Models Can Enhance Visual Robustness via Denoising In-Context Learning
 - MADA: Meta-Adaptive Optimizers Through Hyper-Gradient Descent
 - Maestro: Uncovering Low-Rank Structures via Trainable Decomposition
 - MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models
 - MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
 - Magicoder: Empowering Code Generation with OSS-Instruct
 - MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion
 - MAGNOLIA: Matching Algorithms via GNNs for Online Value-to-go Approximation
 - Major-Minor Mean Field Multi-Agent Reinforcement Learning
 - Make-A-Shape: a Ten-Million-scale 3D Shape Model
 - Making Old Things New: A Unified Algorithm for Differentially Private Clustering
 - MALIBO: Meta-learning for Likelihood-free Bayesian Optimization
 - Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution
 - Mapping the Multiverse of Latent Representations
 - Masked Face Recognition with Generative-to-Discriminative Representations
 - MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective
 - Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning
 - Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
 - Mastering Zero-Shot Interactions in Cooperative and Competitive Simultaneous Games
 - Mathematical Framework for Online Social Media Auditing
 - MathScale: Scaling Instruction Tuning for Mathematical Reasoning
 - Matrix Information Theory for Self-Supervised Learning
 - Matroid Semi-Bandits in Sublinear Time
 - MaxMin-RLHF: Alignment with Diverse Human Preferences
 - MC-GTA: Metric-Constrained Model-Based Clustering using Goodness-of-fit Tests with Autocorrelations
 - MD tree: a model-diagnostic tree grown on loss landscape
 - Mean Estimation in the Add-Remove Model of Differential Privacy
 - Mean-field Analysis on Two-layer Neural Networks from a Kernel Perspective
 - Mean-field Chaos Diffusion Models
 - Mean Field Langevin Actor-Critic: Faster Convergence and Global Optimality beyond Lazy Learning
 - Mean-field Underdamped Langevin Dynamics and its Spacetime Discretization
 - Measures of diversity and space-filling designs for categorical data
 - Measuring Stochastic Data Complexity with Boltzmann Influence Functions
 - Mechanistic Design and Scaling of Hybrid Architectures
 - Mechanistic Neural Networks for Scientific Machine Learning
 - Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
 - Membership Inference Attacks on Diffusion Models via Quantile Regression
 - Memoria: Resolving Fateful Forgetting Problem through Human-Inspired Memory Architecture
 - Memorization Through the Lens of Curvature of Loss Function Around Samples
 - Memory Consolidation Enables Long-Context Video Understanding
 - Memory Efficient Neural Processes via Constant Memory Attention Block
 - MEMORYLLM: Towards Self-Updatable Large Language Models
 - Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning
 - Merging Multi-Task Models via Weight-Ensembling Mixture of Experts
 - Meta Evidential Transformer for Few-Shot Open-Set Recognition
 - Meta-Learners for Partially-Identified Treatment Effects Across Multiple Environments
 - Meta-Reinforcement Learning Robust to Distributional Shift Via Performing Lifelong In-Context Learning
 - MF-CLR: Multi-Frequency Contrastive Learning Representation for Time Series
 - MFTN: A Multi-scale Feature Transfer Network Based on IMatchFormer for Hyperspectral Image Super-Resolution
 - MGit: A Model Versioning and Management System
 - MH-pFLID: Model Heterogeneous personalized Federated Learning via Injection and Distillation for Medical Data Analysis
 - MILP-FBGen: LP/MILP Instance Generation with Feasibility/Boundedness
 - Mimicking Better by Matching the Approximate Action Distribution
 - MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data
 - Mind the Boundary: Coreset Selection via Reconstructing the Decision Boundary
 - Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value
 - Minimax Optimality of Score-based Diffusion Models: Beyond the Density Lower Bound Assumptions
 - Minimizing $f$-Divergences by Interpolating Velocity Fields
 - Minimum Norm Interpolation Meets The Local Theory of Banach Spaces
 - Minimum-Norm Interpolation Under Covariate Shift
 - Mitigating Catastrophic Forgetting in Online Continual Learning by Modeling Previous Task Interrelations via Pareto Optimization
 - Mitigating Label Noise on Graphs via Topological Sample Selection
 - Mitigating Oversmoothing Through Reverse Process of GNNs for Heterophilic Graphs
 - Mitigating Privacy Risk in Membership Inference by Convex-Concave Loss
 - Mixtures of Experts Unlock Parameter Scaling for Deep RL
 - MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation
 - ML for Life and Material Science: From Theory to Industry Applications
 - MLI Formula: A Nearly Scale-Invariant Solution with Noise Perturbation
 - MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization
 - MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark
 - MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance
 - MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
 - MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
 - Mobile Attention: Mobile-Friendly Linear-Attention for Vision Transformers
 - MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
 - Model Alignment as Prospect Theoretic Optimization
 - Model Assessment and Selection under Temporal Distribution Shift
 - Model-Based Minimum Bayes Risk Decoding for Text Generation
 - Model-based Reinforcement Learning for Confounded POMDPs
 - Model-based Reinforcement Learning for Parameterized Action Spaces
 - Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL
 - Model-Free Robust $\phi$-Divergence Reinforcement Learning Using Both Offline and Online Data
 - Modeling Caption Diversity in Contrastive Vision-Language Pretraining
 - Modeling Language Tokens as Functionals of Semantic Fields
 - Modelling Microbial Communities with Graph Neural Networks
 - Models of Human Feedback for AI Alignment
 - Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models
 - Modular Learning of Deep Causal Generative Models for High-dimensional Causal Inference
 - MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence
 - Mol-AE: Auto-Encoder Based Molecular Representation Learning With 3D Cloze Test Objective
 - MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space
 - Mollification Effects of Policy Gradient Methods
 - MOMENT: A Family of Open Time-series Foundation Models
 - Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
 - Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments
 - Momentum Particle Maximum Likelihood
 - MoMo: Momentum Models for Adaptive Learning Rates
 - Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
 - Monotone, Bi-Lipschitz, and Polyak-Łojasiewicz Networks
 - Monotone Individual Fairness
 - Moreau Envelope for Nonconvex Bi-Level Optimization: A Single-Loop and Hessian-Free Solution Strategy
 - More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning
 - More Flexible PAC-Bayesian Meta-Learning by Learning Learning Algorithms
 - MorphGrower: A Synchronized Layer-by-layer Growing Approach for Plausible Neuronal Morphology Generation
 - MS$^3$D: A RG Flow-Based Regularization for GAN Training with Limited Data
 - MS-TIP: Imputation Aware Pedestrian Trajectory Prediction
 - Multi-Agent Reinforcement Learning Meets Leaf Sequencing in Radiotherapy
 - Multi-Agent Reinforcement Learning with Hierarchical Coordination for Emergency Responder Stationing
 - Multicalibration for Confidence Scoring in LLMs
 - Multi-class Probabilistic Bounds for Majority Vote Classifiers with Partially Labeled Data
 - Multi-Factor Adaptive Vision Selection for Egocentric Video Question Answering
 - Multi-Fidelity Residual Neural Processes for Scalable Surrogate Modeling
 - Multi-group Learning for Hierarchical Groups
 - Multigroup Robustness
 - Multi-layer Rehearsal Feature Augmentation for Class-Incremental Learning
 - MultiMax: Sparse and Multi-Modal Attention Learning
 - Multi-modal Foundation Model meets Embodied AI (MFM-EAI)
 - Multimodal Prototyping for cancer survival prediction
 - Multi-Patch Prediction: Adapting Language Models for Time Series Representation Learning
 - Multiplicative Weights Update, Area Convexity and Random Coordinate Descent for Densest Subgraph Problems
 - Multiply-Robust Causal Change Attribution
 - Multiply Robust Estimation for Local Distribution Shifts with Multiple Domains
 - Multi-Region Markovian Gaussian Process: An Efficient Method to Discover Directional Communications Across Multiple Brain Regions
 - Multi-Sender Persuasion: A Computational Perspective
 - Multi-Source Conformal Inference Under Distribution Shift
 - Multi-Track Message Passing: Tackling Oversmoothing and Oversquashing in Graph Learning via Preventing Heterophily Mixing
 - Multi-View Clustering by Inter-cluster Connectivity Guided Reward
 - Multi-View Stochastic Block Models
 - MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
 - MusicRL: Aligning Music Generation to Human Preferences
 - MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving
 - MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts
 - Naive Bayes Classifiers over Missing Data: Decision and Poisoning
 - Nash Incentive-compatible Online Mechanism Learning via Weakly Differentially Private Online Learning
 - Nash Learning from Human Feedback
 - NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
 - Navigating Complexity: Toward Lossless Graph Condensation via Expanding Window Matching
 - Navigating Scaling Laws: Compute Optimality in Adaptive Model Training
 - NDOT: Neuronal Dynamics-based Online Training for Spiking Neural Networks
 - Nearest Neighbour Score Estimators for Diffusion Generative Models
 - Near-Linear Time Approximation Algorithms for k-means with Outliers
 - Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
 - Near-Optimal Reinforcement Learning with Self-Play under Adaptivity Constraints
 - Neighboring Perturbations of Knowledge Editing on Large Language Models
 - Nesting Particle Filters for Experimental Design in Dynamical Systems
 - Networked Inequality: Preferential Attachment Bias in Graph Neural Network Link Prediction
 - Network Tight Community Detection
 - Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Features Model
 - Neural Collapse in Multi-label Learning with Pick-all-label Loss
 - Neural Collapse meets Differential Privacy: Curious behaviors of NoisyGD with Near-Perfect Representation Learning
 - Neural Diffusion Models
 - Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity
 - NeuralIndicator: Implicit Surface Reconstruction from Neural Indicator Priors
 - Neural Jump-Diffusion Temporal Point Processes
 - Neural-Kernel Conditional Mean Embeddings
 - Neural NeRF Compression
 - Neural Networks Learn Statistics of Increasing Complexity
 - Neural Operator Learning
 - Neural operators meet conjugate gradients: The FCG-NO method for efficient PDE solving
 - Neural Operators with Localized Integral and Differential Kernels
 - Neural SPH: Improved Neural Modeling of Lagrangian Fluid Dynamics
 - Neural Tangent Kernels for Axis-Aligned Tree Ensembles
 - Neural Tangent Kernels Motivate Cross-Covariance Graphs in Neural Networks
 - Neurodegenerative Brain Network Classification via Adaptive Diffusion with Temporal Regularization
 - Neuroexplicit Diffusion Models for Inpainting of Optical Flow Fields
 - Neuro-Symbolic Temporal Point Processes
 - Neuro-Visualizer: A Novel Auto-Encoder-Based Loss Landscape Visualization Method With an Application in Knowledge-Guided Machine Learning
 - New Bounds on the Cohesion of Complete-link and Other Linkage Methods for Agglomerative Clustering
 - NeWRF: A Deep Learning Framework for Wireless Radiation Field Reconstruction and Channel Prediction
 - New Sample Complexity Bounds for Sample Average Approximation in Heavy-Tailed Stochastic Programming
 - NExT-Chat: An LMM for Chat, Detection and Segmentation
 - Next Generation of AI Safety
 - Next Generation of Sequence Modeling Architectures
 - NExT-GPT: Any-to-Any Multimodal LLM
 - NExT: Teaching Large Language Models to Reason about Code Execution
 - No Dimensional Sampling Coresets for Classification
 - No Double Descent in Principal Component Regression: A High-Dimensional Analysis
 - No Free Prune: Information-Theoretic Barriers to Pruning at Initialization
 - Noise-Adaptive Confidence Sets for Linear Bandits and Application to Bayesian Optimization
 - Noise-Aware Algorithm for Heterogeneous Differentially Private Federated Learning
 - Non-Asymptotic Analysis for Single-Loop (Natural) Actor-Critic with Compatible Function Approximation
 - Non-clairvoyant Scheduling with Partial Predictions
 - Non-confusing Generation of Customized Concepts in Diffusion Models
 - Non-convex Stochastic Composite Optimization with Polyak Momentum
 - Nonlinear Filtering with Brenier Optimal Transport Maps
 - Non-parametric Online Change Point Detection on Riemannian Manifolds
 - Nonparametric Teaching of Implicit Neural Representations
 - Nonsmooth Implicit Differentiation: Deterministic and Stochastic Convergence Rates
 - Non-stationary Online Convex Optimization with Arbitrary Delays
 - Non-Vacuous Generalization Bounds for Large Language Models
 - No-Regret Reinforcement Learning in Smooth MDPs
 - Not all distributional shifts are equal: Fine-grained robust conformal inference
 - Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators
 - Novel Spectral Algorithms for the Partial Credit Model
 - No Wrong Turns: The Simple Geometry Of Neural Networks Optimization Paths
 - O$n$ Learning Deep O($n$)-Equivariant Hyperspheres
 - OAK: Enriching Document Representations using Auxiliary Knowledge for Extreme Classification
 - Observable Propagation: Uncovering Feature Vectors in Transformers
 - ODIM: Outlier Detection via Likelihood of Under-Fitted Generative Models
 - ODIN: Disentangled Reward Mitigates Hacking in RLHF
 - Offline Actor-Critic Reinforcement Learning Scales to Large Models
 - Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL
 - Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching
 - Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms
 - Offline Multi-Objective Optimization
 - Offline Training of Language Model Agents with Functions as Learnable Weights
 - Offline Transition Modeling via Contrastive Energy Learning
 - Off-policy Evaluation Beyond Overlap: Sharp Partial Identification Under Smoothness
 - OLLIE: Imitation Learning from Offline Pretraining to Online Finetuning
 - OMPO: A Unified Framework for RL under Policy and Dynamics Shifts
 - On a Combinatorial Problem Arising in Machine Teaching
 - On a Neural Implementation of Brenier's Polar Factorization
 - On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis
 - On Convergence of Incremental Gradient for Non-convex Smooth Functions
 - On dimensionality of feature vectors in MPNNs
 - On Discrete Prompt Optimization for Diffusion Models
 - One for All: A Universal Generator for Concept Unlearnability via Multi-Modal Alignment
 - One Meta-tuned Transformer is What You Need for Few-shot Learning
 - One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts
 - One-Shot Strategic Classification Under Unknown Costs
 - One Size Fits All for Semantic Shifts: Adaptive Prompt Tuning for Continual Learning
 - On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box
 - On Hypothesis Transfer Learning of Functional Linear Models
 - On Interpolating Experts and Multi-Armed Bandits
 - On Least Square Estimation in Softmax Gating Mixture of Experts
 - Online Adaptive Anomaly Thresholding with Confidence Sequences
 - Online Algorithms with Uncertainty-Quantified Predictions
 - Online bipartite matching with imperfect advice
 - Online Cascade Learning for Efficient Inference over Streams
 - Online conformal prediction with decaying step sizes
 - Online Isolation Forest
 - Online Learning and Information Exponents: The Importance of Batch size & Time/Complexity Tradeoffs
 - Online Learning in Betting Markets: Profit versus Prediction
 - Online Learning in CMDPs: Handling Stochastic and Adversarial Constraints
 - Online Learning under Budget and ROI Constraints via Weak Adaptivity
 - Online Learning with Bounded Recall
 - Online Linear Regression in Dynamic Environments via Discounting
 - Online Matching with Stochastic Rewards: Provable Better Bound via Adversarial Reinforcement Learning
 - Online Matrix Completion: A Collaborative Approach with Hott Items
 - Online Non-stochastic Control with Partial Feedback
 - Online Resource Allocation with Non-Stationary Customers
 - Online Speculative Decoding
 - Online Variational Sequential Monte Carlo
 - On Mechanistic Knowledge Localization in Text-to-Image Generative Models
 - On Multi-Armed Bandit with Impatient Arms
 - On Online Experimentation without Device Identifiers
 - On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization
 - On Positivity Condition for Causal Inference
 - On Prompt-Driven Safeguarding for Large Language Models
 - On Statistical Learning Theory for Distributional Inputs
 - On Stronger Computational Separations Between Multimodal and Unimodal Machine Learning
 - On the Asymptotic Distribution of the Minimum Empirical Risk
 - On the Calibration of Human Pose Estimation
 - On the Complexity of Finite-Sum Smooth Optimization under the Polyak–Łojasiewicz Condition
 - On The Complexity of First-Order Methods in Stochastic Bilevel Optimization
 - On the Consistency of Kernel Methods with Dependent Observations
 - On the Convergence of Projected Bures-Wasserstein Gradient Descent under Euclidean Strong Convexity
 - On the Diminishing Returns of Width for Continual Learning
 - On the Duality Between Sharpness-Aware Minimization and Adversarial Training
 - On the Effectiveness of Supervision in Asymmetric Non-Contrastive Learning
 - On the Embedding Collapse when Scaling up Recommendation Models
 - On the Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm
 - On the Error-Propagation of Inexact Hotelling's Deflation for Principal Component Analysis
 - On the Expressive Power of Spectral Invariant Graph Neural Networks
 - On The Fairness Impacts of Hardware Selection in Machine Learning
 - On the Feasibility of Single-Pass Full-Capacity Learning in Linear Threshold Neurons with Binary Input Vectors
 - On the Generalization of Equivariant Graph Neural Networks
 - On the Generalization of Stochastic Gradient Descent with Momentum
 - On the Hardness of Probabilistic Neurosymbolic Learning
 - On the Identifiability of Switching Dynamical Systems
 - On the Implicit Bias of Adam
 - On the Independence Assumption in Neurosymbolic Learning
 - On the Last-Iterate Convergence of Shuffling Gradient Methods
 - On the Maximal Local Disparity of Fairness-Aware Classifiers
 - On the Minimal Degree Bias in Generalization on the Unseen for non-Boolean Functions
 - On the Nonlinearity of Layer Normalization
 - On the Origins of Linear Representations in Large Language Models
 - On the Recoverability of Causal Relations from Temporally Aggregated I.I.D. Data
 - On the Role of Edge Dependency in Graph Generative Models
 - On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control
 - On the sample complexity of conditional independence testing with Von Mises estimator with application to causal discovery
 - On the Second-Order Convergence of Biased Policy Gradient Algorithms
 - On The Statistical Complexity of Offline Decision-Making
 - On the Tractability of SHAP Explanations under Markovian Distributions
 - On the Trajectory Regularity of ODE-based Diffusion Sampling
 - On the Unexpected Effectiveness of Reinforcement Learning for Sequential Recommendation
 - On the Universality of Volume-Preserving and Coupling-Based Normalizing Flows
 - On the Weight Dynamics of Deep Normalized Networks
 - On Universally Optimal Algorithms for A/B Testing
 - On Which Nodes Does GCN Fail? Enhancing GCN From the Node Perspective
 - OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution Shift
 - Open Ad Hoc Teamwork with Cooperative Game Theory
 - Open-Domain Text Evaluation via Contrastive Distribution Methods
 - OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
 - Open-Vocabulary Calibration for Fine-tuned CLIP
 - Operator SVD with Neural Networks via Nested Low-Rank Approximation
 - Optimal Acceleration for Minimax and Fixed-Point Problems is Not Unique
 - Optimal Batched Linear Bandits
 - Optimal bounds for $\ell_p$ sensitivity sampling via $\ell_2$ augmentation
 - Optimal Coresets for Low-Dimensional Geometric Median
 - Optimal Differentially Private Model Training with Public Data
 - Optimal Exact Recovery in Semi-Supervised Learning: A Study of Spectral Methods and Graph Convolutional Networks
 - Optimal Eye Surgeon: Finding image priors through sparse generators at initialization
 - Optimal Hessian/Jacobian-Free Nonconvex-PL Bilevel Optimization
 - Optimal Kernel Choice for Score Function-based Causal Discovery
 - Optimal Kernel Quantile Learning with Random Features
 - Optimally Improving Cooperative Learning in a Social Setting
 - Optimal Recurrent Network Topologies for Dynamical Systems Reconstruction
 - Optimal Ridge Regularization for Out-of-Distribution Prediction
 - Optimal Transport for Structure Learning Under Missing Data
 - Optimistic Multi-Agent Policy Gradient
 - Optimization without Retraction on the Random Generalized Stiefel Manifold
 - Optimizing Watermarks for Large Language Models
 - OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models
 - Orthogonal Bootstrap: Efficient Simulation of Input Uncertainty
 - OSN: Infinite Representations of Dynamic 3D Scenes from Monocular Videos
 - OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization
 - OT-CLIP: Understanding and Generalizing CLIP via Optimal Transport
 - OTMatch: Improving Semi-Supervised Learning with Optimal Transport
 - Outlier-aware Slicing for Post-Training Quantization in Vision Transformer
 - Outlier-Efficient Hopfield Layers for Large Transformer-Based Models
 - Outlier-robust Kalman Filtering through Generalised Bayes
 - Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
 - Out-of-Distribution Detection via Deep Multi-Comprehension Ensemble
 - Out-of-Domain Generalization in Dynamical Systems Reconstruction
 - Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift
 - Overcoming Data and Model heterogeneities in Decentralized Federated Learning via Synthetic Anchors
 - Overcoming Saturation in Density Ratio Estimation by Iterated Regularization
 - Overcoming the Optimizer's Curse: Obtaining Realistic Prescriptions from Neural Networks
 - Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter Lesson of Reinforcement Learning
 - OxyGenerator: Reconstructing Global Ocean Deoxygenation Over a Century with Deep Learning
 - PAC-Bayesian Error Bound, via Rényi Divergence, for a Class of Linear Time-Invariant State-Space Models
 - PAC-Bayesian Generalization Bounds for Knowledge Graph Representation Learning
 - PAGER: Accurate Failure Characterization in Deep Regression Models
 - PairNet: Training with Observed Pairs to Estimate Individual Treatment Effect
 - Pairwise Alignment Improves Graph Domain Adaptation
 - PANDA: Expanded Width-Aware Message Passing Beyond Rewiring
 - PAPM: A Physics-aware Proxy Model for Process Systems
 - Parallel Affine Transformation Tuning of Markov Chain Monte Carlo
 - Parallelized Spatiotemporal Slot Binding for Videos
 - Parameter-Dependent Competitive Analysis for Online Capacitated Coverage Maximization through Boostings and Attenuations
 - Parameter-Efficient Fine-Tuning with Controls
 - Parameter-Efficient Fine-Tuning with Discrete Fourier Transform
 - Parameter Efficient Quasi-Orthogonal Fine-Tuning via Givens Rotation
 - Parameter Estimation in DAGs from Incomplete Data via Optimal Transport
 - Parameterized Physics-informed Neural Networks for Parameterized PDEs
 - PARCv2: Physics-aware Recurrent Convolutional Neural Networks for Spatiotemporal Dynamics Modeling
 - PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition
 - Parsimonious Learning-Augmented Approximations for Dense Instances of $\mathcal{NP}$-hard Problems
 - Partially Stochastic Infinitely Deep Bayesian Neural Networks
 - Partial Multi-View Multi-Label Classification via Semantic Invariance Learning and Prototype Modeling
 - Partial Optimality in the Linear Ordering Problem
 - Particle Denoising Diffusion Sampler
 - PASOA- PArticle baSed Bayesian Optimal Adaptive design
 - Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
 - Path-Guided Particle-based Sampling
 - Pausing Policy Learning in Non-stationary Reinforcement Learning
 - PcLast: Discovering Plannable Continuous Latent States
 - PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming
 - PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation
 - Pedestrian Attribute Recognition as Label-balanced Multi-label Learning
 - Peeking with PEAK: Sequential, Nonparametric Composite Hypothesis Tests for Means of Multiple Data Streams
 - PerceptAnon: Exploring the Human Perception of Image Anonymization Beyond Pseudonymization for GDPR
 - Perfect Alignment May be Poisonous to Graph Contrastive Learning
 - Performance Bounds for Active Binary Testing with Information Maximization
 - Performative Prediction with Bandit Feedback: Learning through Reparameterization
 - Perturb-and-Project: Differentially Private Similarities and Marginals
 - Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
 - PGODE: Towards High-quality System Dynamics Modeling
 - PhAST: Physics-Aware, Scalable, and Task-Specific GNNs for Accelerated Catalyst Design
 - Physics and Lie symmetry informed Gaussian processes
 - Physics-Informed Neural Network Policy Iteration: Algorithms, Convergence, and Verification
 - Physics of Language Models
 - Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
 - PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning
 - PIDformer: Transformer Meets Control Theory
 - PID: Prompt-Independent Data Protection Against Latent Diffusion Models
 - Pi-DUAL: Using privileged information to distinguish clean from noisy labels
 - Piecewise Constant and Linear Regression Trees: An Optimal Dynamic Programming Approach
 - PinNet: Pinpoint Instructive Information for Retrieval Augmented Code-to-Text Generation
 - PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling
 - PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
 - PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer
 - Planning, Fast and Slow: Online Reinforcement Learning with Action-Free Offline Data via Multiscale Planners
 - Plug-and-Play image restoration with Stochastic deNOising REgularization
 - Plug-in Performative Optimization
 - Pluvial Flood Emulation with Hydraulics-informed Message Passing
 - PointMC: Multi-instance Point Cloud Registration based on Maximal Cliques
 - Policy-conditioned Environment Models are More Generalizable
 - Policy Evaluation for Variance in Average Reward Reinforcement Learning
 - Policy Learning for Balancing Short-Term and Long-Term Rewards
 - Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks
 - Polynomial-based Self-Attention for Table Representation Learning
 - PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels
 - Position: $C^*$-Algebraic Machine Learning $-$ Moving in a New Direction
 - Position: A Call for Embodied AI
 - Position: A Call to Action for a Human-Centered AutoML Paradigm
 - Position: AI/ML Influencers Have a Place in the Academic Process
 - Position: AI-Powered Autonomous Weapons Risk Geopolitical Instability and Threaten AI Research
 - Positional Knowledge is All You Need: Position-induced Transformer (PiT) for Operator Learning
 - Position: Amazing Things Come From Having Many Good Models
 - Position: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience
 - Position: Application-Driven Innovation in Machine Learning
 - Position: A Roadmap to Pluralistic Alignment
 - Position: A Safe Harbor for AI Evaluation and Red Teaming
 - Position: Automatic Environment Shaping is the Next Frontier in RL
 - Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI
 - Position: Benchmarking is Limited in Reinforcement Learning Research
 - Position: Beyond Personhood: Agency, Accountability, and the Limits of Anthropomorphic Ethical Analysis
 - Position: Building Guardrails for Large Language Models Requires Systematic Design
 - Position: Categorical Deep Learning is an Algebraic Theory of All Architectures
 - Position: Compositional Generative Modeling: A Single Model is Not All You Need
 - Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining
 - Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities
 - Position: Data Authenticity, Consent, & Provenance for AI are all broken: what will it take to fix them?
 - Position: Data-driven Discovery with Large Generative Models
 - Position: Do Not Explain Vision Models Without Context
 - Position: Do pretrained Transformers Learn In-Context by Gradient Descent?
 - Position: Embracing Negative Results in Machine Learning
 - Position: Enforced Amnesia as a Way to Mitigate the Potential Risk of Silent Suffering in the Conscious AI
 - Position: Evolving AI Collectives Enhance Human Diversity and Enable Self-Regulation
 - Position: Explain to Question not to Justify
 - Position: Exploring the Robustness of Pipeline-Parallelism-Based Decentralized Training
 - Position: Foundation Agents as the Paradigm Shift for Decision Making
 - Position: Fundamental Limitations of LLM Censorship Necessitate New Approaches
 - Position: Future Directions in the Theory of Graph Machine Learning
 - Position: Graph Foundation Models Are Already Here
 - Position: Insights from Survey Methodology can Improve Training Data
 - Position: Intent-aligned AI Systems Must Optimize for Agency Preservation
 - Position: Is machine learning good or bad for the natural sciences?
 - Position: Key Claims in LLM Research Have a Long Tail of Footnotes
 - Position: Levels of AGI for Operationalizing Progress on the Path to AGI
 - Position: Leverage Foundational Models for Black-Box Optimization
 - Position: LLMs Can’t Plan, But Can Help Planning in LLM-Modulo Frameworks
 - Position: Machine Learning-powered Assessments of the EU Digital Services Act Aid Quantify Policy Impacts on Online Harms
 - Position: Measure Dataset Diversity, Don't Just Claim It
 - Position: Mission Critical – Satellite Data is a Distinct Modality in Machine Learning
 - Position: Near to Mid-term Risks and Opportunities of Open-Source Generative AI
 - Position: On the Possibilities of AI-Generated Text Detection
 - Position: On the Societal Impact of Open Foundation Models
 - Position: Open-Endedness is Essential for Artificial Superhuman Intelligence
 - Position: Opportunities Exist for Machine Learning in Magnetic Fusion Energy
 - Position: Optimization in SciML Should Employ the Function Space Geometry
 - Position: Quo Vadis, Unsupervised Time Series Anomaly Detection?
 - Position: Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination
 - Position: Relational Deep Learning - Graph Representation Learning on Relational Databases
 - Position: Rethinking Post-Hoc Search-Based Neural Approaches for Solving Large-Scale Traveling Salesman Problems
 - Position: Scaling Simulation is Neither Necessary Nor Sufficient for In-the-Wild Robot Manipulation
 - Position: Scarce Resource Allocations That Rely On Machine Learning Should Be Randomized
 - Position: Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback
 - Position: Social Environment Design Should be Further Developed for AI-based Policy-Making
 - Position: Standardization of Behavioral Use Clauses is Necessary for the Adoption of Responsible Licensing of AI
 - Position: Stop Making Unscientific AGI Performance Claims
 - Position: Technical Research and Talent is Needed for Effective AI Governance
 - Position: Tensor Networks are a Valuable Asset for Green AI
 - Position: The Causal Revolution Needs Scientific Pragmatism
 - Position: The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning
 - Position: The Platonic Representation Hypothesis
 - Position: The Reasonable Person Standard for AI
 - Position: Topological Deep Learning is the New Frontier for Relational Learning
 - Position: Towards Implicit Prompt For Text-To-Image Models
 - Position: Towards Unified Alignment Between Agents, Humans, and Environment
 - Position: TrustLLM: Trustworthiness in Large Language Models
 - Position: Understanding LLMs Requires More Than Statistical Generalization
 - Position: Video as the New Language for Real-World Decision Making
 - Position: What Can Large Language Models Tell Us about Time Series Analysis
 - Position: What makes an image realistic?
 - Position: Why Tabular Foundation Models Should Be a Research Priority
 - Position: Why We Must Rethink Empirical Research in Machine Learning
 - Position: Will we run out of data? Limits of LLM scaling based on human-generated data
 - Positive and Unlabeled Learning with Controlled Probability Boundary Fence
 - Positive Concave Deep Equilibrium Models
 - Posterior Sampling-Based Bayesian Optimization with Tighter Bayesian Regret Bounds
 - Post-hoc Part-Prototype Networks
 - Potential Based Diffusion Motion Planning
 - PPFLOW: Target-Aware Peptide Design with Torsional Flow Matching
 - Practical Hamiltonian Monte Carlo on Riemannian Manifolds via Relativity Theory
 - Practical Performance Guarantees for Pipelined DNN Inference
 - Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from Human Input
 - Precise Accuracy / Robustness Tradeoffs in Regression: Case of General Norms
 - Predicting and Interpreting Energy Barriers of Metallic Glasses with Graph Neural Networks
 - Predicting Dose-Response Curves with Deep Neural Networks
 - Predicting Lagrangian Multipliers for Mixed Integer Linear Programs
 - Prediction Accuracy of Learning in Games : Follow-the-Regularized-Leader meets Heisenberg
 - Prediction-powered Generalization of Causal Inferences
 - Predictive Coding beyond Correlations
 - Predictive Dynamic Fusion
 - Predictive Linear Online Tracking for Unknown Targets
 - Predictive Performance Comparison of Decision Policies Under Confounding
 - Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
 - Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models
 - Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss
 - Premise Order Matters in Reasoning with Large Language Models
 - PrE-Text: Training Language Models on Private Federated Data in the Age of LLMs
 - Pre-Training Protein Bi-level Representation Through Span Mask Strategy On 3D Protein Chains
 - Preventing Model Collapse in Gaussian Process Latent Variable Models
 - Pricing with Contextual Elasticity and Heteroscedastic Valuation
 - Principled Gradient-Based MCMC for Conditional Sampling of Text
 - Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
 - Principled Preferential Bayesian Optimization
 - PriorBoost: An Adaptive Algorithm for Learning from Aggregate Responses
 - Prior Mismatch and Adaptation in PnP-ADMM with a Nonconvex Convergence Analysis
 - Prior Specification for Bayesian Matrix Factorization via Prior Predictive Matching
 - PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control
 - Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
 - Privacy Attacks in Decentralized Learning
 - Privacy Backdoors: Stealing Data with Corrupted Pretrained Models
 - Privacy Preserving Adaptive Experiment Design
 - Privacy-Preserving Data Release Leveraging Optimal Transport and Particle Gradient Descent
 - Privacy-Preserving Embedding via Look-up Table Evaluation with Fully Homomorphic Encryption
 - Privacy-Preserving Instructions for Aligning Large Language Models
 - Privacy Profiles for Private Selection
 - Private and Federated Stochastic Convex Optimization: Efficient Strategies for Centralized Systems
 - Private Gradient Descent for Linear Regression: Tighter Error Bounds and Instance-Specific Uncertainty Estimation
 - Private Heterogeneous Federated Learning Without a Trusted Server Revisited: Error-Optimal and Communication-Efficient Algorithms for Convex Losses
 - Privately Learning Smooth Distributions on the Hypercube by Projections
 - Private Truly-Everlasting Robust-Prediction
 - Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages
 - Proactive Detection of Voice Cloning with Localized Watermarking
 - Proactive DP: A Multiple Target Optimization Framework for DP-SGD
 - Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models
 - Probabilistic Constrained Reinforcement Learning with Formal Interpretability
 - Probabilistic Forecasting with Stochastic Interpolants and Föllmer Processes
 - Probabilistic Generating Circuits - Demystified
 - Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo
 - Probabilistic Modeling of Interpersonal Coordination Processes
 - Probabilistic Routing for Graph-Based Approximate Nearest Neighbor Search
 - Probabilistic Subgoal Representations for Hierarchical Reinforcement Learning
 - Probabilistic Time Series Modeling with Decomposable Denoising Diffusion Model
 - Probability Distribution of Hypervolume Improvement in Bi-objective Bayesian Optimization
 - Prodigy: An Expeditiously Adaptive Parameter-Free Learner
 - Profile Reconstruction from Private Sketches
 - Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions
 - Projecting Molecules into Synthesizable Chemical Spaces
 - Projection-Free Online Convex Optimization with Time-Varying Constraints
 - Projection-Free Variance Reduction Methods for Stochastic Constrained Multi-Level Compositional Optimization
 - Prometheus: Out-of-distribution Fluid Dynamics Modeling with Disentangled Graph ODE
 - Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines
 - Promoting External and Internal Equities Under Ex-Ante/Ex-Post Metrics in Online Resource Allocation
 - Prompt-based Visual Alignment for Zero-shot Policy Transfer
 - Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution
 - Prompt-guided Precise Audio Editing with Diffusion Models
 - Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts
 - Prompting a Pretrained Transformer Can Be a Universal Approximator
 - Prompting is a Double-Edged Sword: Improving Worst-Group Robustness of Foundation Models
 - Prompt Sketching for Large Language Models
 - Prompt-tuning Latent Diffusion Models for Inverse Problems
 - Prospective Side Information for Latent MDPs
 - Prospector Heads: Generalized Feature Attribution for Large Models & Data
 - Protein Conformation Generation via Force-Guided SE(3) Diffusion Models
 - Proteus: Exploring Protein Structure Generation for Enhanced Designability and Efficiency
 - ProtoGate: Prototype-based Neural Networks with Global-to-local Feature Selection for Tabular Biomedical Data
 - Prototypical Transformer As Unified Motion Learners
 - Provable Benefits of Local Steps in Heterogeneous Federated Learning for Neural Networks: A Feature Learning Perspective
 - Provable Contrastive Continual Learning
 - Provable Interactive Learning with Hindsight Instruction Feedback
 - Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks
 - Provable Privacy with Non-Private Pre-Processing
 - Provable Representation with Efficient Planning for Partially Observable Reinforcement Learning
 - Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation
 - Provably Better Explanations with Optimized Aggregation of Feature Attributions
 - Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret
 - Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization
 - Provably Efficient Partially Observable Risk-sensitive Reinforcement Learning with Hindsight Observation
 - Provably Efficient Reinforcement Learning for Adversarial Restless Multi-Armed Bandits with Unknown Transitions and Bandit Feedback
 - Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples
 - Provably Robust DPO: Aligning Language Models with Noisy Feedback
 - Provably Scalable Black-Box Variational Inference with Structured Variational Families
 - Pruned Pivot: Correlation Clustering Algorithm for Dynamic, Parallel, and Local Computation Models
 - PruNeRF: Segment-Centric Dataset Pruning via 3D Spatial Consistency
 - Pruner-Zero: Evolving Symbolic Pruning Metric From Scratch for Large Language Models
 - Pseudo-Calibration: Improving Predictive Uncertainty Estimation in Unsupervised Domain Adaptation
 - Purifying Quantization-conditioned Backdoors via Layer-wise Activation Correction with Distribution Approximation
 - Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders
 - Pursuing Overall Welfare in Federated Learning through Sequential Decision Making
 - Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels
 - QBMK: Quantum-based Matching Kernels for Un-attributed Graphs
 - QORA: Zero-Shot Transfer via Interpretable Object-Relational Model Learning
 - Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
 - Q-Star Meets Scalable Posterior Sampling: Bridging Theory and Practice via HyperAgent
 - Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features Critics
 - Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization
 - Quality-Diversity with Limited Resources
 - Quality-Weighted Vendi Scores And Their Application To Diverse Experimental Design
 - Quantum Algorithm for Online Exp-concave Optimization
 - Quantum Algorithms and Lower Bounds for Finite-Sum Optimization
 - Quantum Implicit Neural Representations
 - Quantum Positional Encodings for Graph Neural Networks
 - Quantum Theory and Application of Contextual Optimal Transport
 - Quasi-Monte Carlo Features for Kernel Approximation
 - QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference
 - QuIP$\#$: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
 - QuRating: Selecting High-Quality Data for Training Language Models
 - Q-value Regularized Transformer for Offline Reinforcement Learning
 - R2E: Turning any Github Repository into a Programming Agent Environment
 - Random Exploration in Bayesian Optimization: Order-Optimal Regret and Computational Efficiency
 - Random features models: a way to study the success of naive imputation
 - Randomized Confidence Bounds for Stochastic Partial Monitoring
 - Random Latent Exploration for Deep Reinforcement Learning
 - Random Masking Finds Winning Tickets for Parameter Efficient Fine-tuning
 - Random matrix theory improved Fréchet mean of symmetric positive definite matrices
 - Random Scaling and Momentum for Non-smooth Non-convex Optimization
 - Ranking-based Client Imitation Selection for Efficient Federated Learning
 - RankSEG: A Consistent Ranking-based Framework for Segmentation
 - Rapid Learning without Catastrophic Forgetting in the Morris Water Maze
 - Rate-Optimal Policy Optimization for Linear Markov Decision Processes
 - RAUCA: A Novel Physical Adversarial Attack on Vehicle Detectors via Robust and Accurate Camouflage Generation
 - Realistic Unsupervised CLIP Fine-tuning with Universal Entropy Optimization
 - Reason for Future, Act for Now: A Principled Architecture for Autonomous LLM Agents
 - Receptive Fields As Experts in Convolutional Neural Architectures
 - ReconBoost: Boosting Can Achieve Modality Reconcilement
 - Recovering Labels from Local Updates in Federated Learning
 - Recovering the Pre-Fine-Tuning Weights of Generative Models
 - Recurrent Distance Filtering for Graph Representation Learning
 - Recurrent Early Exits for Federated Learning with Heterogeneous Clients
 - ReDiffuser: Reliable Decision-Making Using a Diffuser with Confidence Estimation
 - Re-Dock: Towards Flexible and Realistic Molecular Docking with Diffusion Bridge
 - Reducing Balancing Error for Causal Inference via Optimal Transport
 - Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation
 - Reducing Item Discrepancy via Differentially Private Robust Embedding Alignment for Privacy-Preserving Cross Domain Recommendation
 - Reducing sequential change detection to sequential estimation
 - Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion
 - Reference Neural Operators: Learning the Smooth Dependence of Solutions of PDEs on Geometric Deformations
 - Refined Coreset Selection: Towards Minimal Coreset Size under Model Performance Constraints
 - Refining Minimax Regret for Unsupervised Environment Design
 - Reflected Flow Matching
 - Reflective Policy Optimization
 - ReGAL: Refactoring Programs to Discover Generalizable Abstractions
 - Regression Learning with Limited Observations of Multivariate Outcomes and Features
 - Regression with Multi-Expert Deferral
 - Regularized Q-learning through Robust Averaging
 - Regularizing with Pseudo-Negatives for Continual Self-Supervised Learning
 - Reinforcement Learning and Regret Bounds for Admission Control
 - Reinforcement Learning from Reachability Specifications: PAC Guarantees with Expected Conditional Distance
 - Reinforcement Learning within Tree Search for Fast Macro Placement
 - Reinformer: Max-Return Sequence Modeling for Offline RL
 - Rejuvenating image-GPT as Strong Visual Representation Learners
 - Relational DNN Verification With Cross Executional Bound Refinement
 - Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective
 - Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise
 - Relaxing the Accurate Imputation Assumption in Doubly Robust Learning for Debiased Collaborative Filtering
 - ReLU Network with Width $d+\mathcal{O}(1)$ Can Achieve Optimal Approximation Rate
 - ReLUs Are Sufficient for Learning Implicit Neural Representations
 - ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages
 - ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
 - REMEDI: Corrective Transformations for Improved Neural Entropy Estimation
 - Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making
 - Removing Spurious Concepts from Neural Network Representations via Joint Subspace Estimation
 - Rényi Pufferfish Privacy: General Additive Noise Mechanisms and Privacy Amplification by Iteration via Shift Reduction Lemmas
 - Reparameterized Importance Sampling for Robust Variational Bayesian Neural Networks
 - Repeat After Me: Transformers are Better than State Space Models at Copying
 - Replicable Learning of Large-Margin Halfspaces
 - Repoformer: Selective Retrieval for Repository-Level Code Completion
 - Representation Surgery for Multi-Task Model Merging
 - Representation Surgery: Theory and Practice of Affine Steering
 - Representing Molecules as Random Walks Over Interpretable Grammars
 - Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling
 - Reservoir Computing for Short High-Dimensional Time Series: an Application to SARS-CoV-2 Hospitalization Forecast
 - Reshape and Adapt for Output Quantization (RAOQ): Quantization-aware Training for In-memory Computing Systems
 - Residual-Conditioned Optimal Transport: Towards Structure-Preserving Unpaired and Paired Image Restoration
 - Residual Quantization with Implicit Neural Codebooks
 - Resisting Stochastic Risks in Diffusion Planners with the Trajectory Aggregation Tree
 - REST: Efficient and Accelerated EEG Seizure Analysis through Residual State Updates
 - Restoring balance: principled under/oversampling of data for optimal classification
 - Rethinking Adversarial Robustness in the Context of the Right to be Forgotten
 - Rethinking Data Shapley for Data Selection Tasks: Misleads and Merits
 - Rethinking Decision Transformer via Hierarchical Reinforcement Learning
 - Rethinking DP-SGD in Discrete Domain: Exploring Logistic Distribution in the Realm of signSGD
 - Rethinking Generative Large Language Model Evaluation for Semantic Comprehension
 - Rethinking Guidance Information to Utilize Unlabeled Samples: A Label Encoding Perspective
 - Rethinking Independent Cross-Entropy Loss For Graph-Structured Data
 - Rethinking Momentum Knowledge Distillation in Online Continual Learning
 - Rethinking Optimization and Architecture for Tiny Language Models
 - Rethinking Specificity in SBDD: Leveraging Delta Score and Energy-Guided Diffusion
 - Rethinking the Flat Minima Searching in Federated Learning
 - Rethinking Transformers in Solving POMDPs
 - Retrieval Across Any Domains via Large-scale Pre-trained Model
 - Retrieval-Augmented Score Distillation for Text-to-3D Generation
 - Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness
 - Revealing Vision-Language Integration in the Brain with Multimodal Networks
 - Revisiting Character-level Adversarial Attacks for Language Models
 - Revisiting Context Aggregation for Image Matting
 - Revisiting Inexact Fixed-Point Iterations for Min-Max Problems: Stochasticity and Structured Nonconvexity
 - Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning
 - Revisiting the Power of Prompt for Visual Tuning
 - Revisiting the Role of Language Priors in Vision-Language Models
 - Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark
 - Revisit the Essence of Distilling Knowledge through Calibration
 - Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations Modeling
 - Reward-Free Kernel-Based Reinforcement Learning
 - Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences
 - Reward Shaping for Reinforcement Learning with An Assistant Reward Agent
 - Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
 - Reweighted Solutions for Weighted Low Rank Approximation
 - RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation
 - Rich-Observation Reinforcement Learning with Continuous Latent Dynamics
 - Riemannian Accelerated Zeroth-order Algorithm: Improved Robustness and Lower Query Complexity
 - Riemannian coordinate descent algorithms on matrix manifolds
 - Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models
 - RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
 - RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences
 - Risk Aware Benchmarking of Large Language Models
 - Risk Estimation in a Markov Cost Process: Lower and Upper Bounds
 - Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient
 - Risk-Sensitive Reward-Free Reinforcement Learning with CVaR
 - RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
 - RL-CFR: Improving Action Abstraction for Imperfect Information Extensive-Form Games with Reinforcement Learning
 - RLVF: Learning from Verbal Feedback without Overgeneralization
 - RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback
 - RMIB: Representation Matching Information Bottleneck for Matching Text Representations
 - RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching
 - RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
 - RoboDreamer: Learning Compositional World Models for Robot Imagination
 - RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation
 - RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models
 - Robust and Conjugate Gaussian Process Regression
 - Robust Classification via a Single Diffusion Model
 - Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
 - Robust Data-driven Prescriptiveness Optimization
 - Robust Graph Matching when Nodes are Corrupt
 - Robust Inverse Constrained Reinforcement Learning under Model Misspecification
 - Robust Inverse Graphics via Probabilistic Inference
 - Robust Learning-Augmented Dictionaries
 - Robustly Learning Single-Index Models via Alignment Sharpness
 - Robust Multi-Task Learning with Excess Risks
 - Robustness of Deep Learning for Accelerated MRI: Benefits of Diverse Training Data
 - Robustness of Nonlinear Representation Learning
 - Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space
 - Robust Sparse Estimation for Gaussians with Optimal Error under Huber Contamination
 - Robust Stable Spiking Neural Networks
 - Robust Universal Adversarial Perturbations
 - Robust Yet Efficient Conformal Prediction Sets
 - RODEO: Robust Outlier Detection via Exposing Adaptive Out-of-Distribution Samples
 - Rolling Diffusion Models
 - Roping in Uncertainty: Robustness and Regularization in Markov Games
 - RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation
 - Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
 - Run-Time Task Composition with Safety Semantics
 - RVI-SAC: Average Reward Off-Policy Deep Reinforcement Learning
 - S$\Omega$I: Score-based O-INFORMATION Estimation
 - S3GCL: Spectral, Swift, Spatial Graph Contrastive Learning
 - S3O: A Dual-Phase Approach for Reconstructing Dynamic Shape and Skeleton of Articulated Objects from Single Monocular Video
 - Safe and Robust Subgame Exploitation in Imperfect Information Games
 - Safe Exploration in Dose Finding Clinical Trials with Heterogeneous Participants
 - Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation
 - Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models
 - Saliency strikes back: How filtering out high frequencies improves white-box explanations
 - SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation
 - SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation
 - SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention
 - Sample as you Infer: Predictive Coding with Langevin Dynamics
 - Sample Average Approximation for Conditional Stochastic Optimization with Dependent Data
 - Sample Complexity Bounds for Estimating Probability Divergences under Invariances
 - Sample-Efficient Multiagent Reinforcement Learning with Reset Replay
 - Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty
 - Sample-specific Masks for Visual Reprogramming-based Prompting
 - Sampling-based Multi-dimensional Recalibration
 - Sampling in Unit Time with Kernel Fisher-Rao Flow
 - Sampling is as easy as keeping the consistency: convergence guarantee for Consistency Models
 - SAPG: Split and Aggregate Policy Gradients
 - Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features
 - SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP
 - Scalable AI Safety via Doubly-Efficient Debate
 - Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency
 - Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers
 - Scalable Multiple Kernel Clustering: Learning Clustering Structure from Expectation
 - Scalable Online Exploration via Coverability
 - Scalable Pre-training of Large Autoregressive Image Models
 - Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks
 - Scalable Safe Policy Improvement for Factored Multi-Agent MDPs
 - Scalable Wasserstein Gradient Flow for Generative Modeling through Unbalanced Optimal Transport
 - Scale-Free Image Keypoints Using Differentiable Persistent Homology
 - Scaling Beyond the GPU Memory Limit for Large Mixture-of-Experts Model Training
 - Scaling Down Deep Learning with MNIST-1D
 - Scaling Exponents Across Parameterizations and Optimizers
 - Scaling Laws for Fine-Grained Mixture of Experts
 - Scaling Laws for the Value of Individual Data Points in Machine Learning
 - Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
 - Scaling Speech Technology to 1,000+ Languages
 - Scaling Tractable Probabilistic Circuits: A Systems Perspective
 - SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code
 - Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency
 - SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
 - Score-Based Causal Discovery of Latent Variable Causal Models
 - Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation
 - SCoRe: Submodular Combinatorial Representation Learning
 - Scribble-Supervised Semantic Segmentation with Prototype-based Feature Augmentation
 - Second-Order Uncertainty Quantification: A Distance-Based Approach
 - See More Details: Efficient Image Super-Resolution by Experts Mining
 - Seesaw: Compensating for Nonlinear Reduction with Linear Computations for Private Inference
 - Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic
 - Selecting Large Language Model to Fine-tune via Rectified Scaling Law
 - Selective Mixup Helps with Distribution Shifts, But Not (Only) because of Mixup
 - Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation
 - Self-attention Networks Localize When QK-eigenspectrum Concentrates
 - Self-Attention through Kernel-Eigen Pair Sparse Variational Gaussian Processes
 - Self-cognitive Denoising in the Presence of Multiple Noisy Label Sources
 - Self-Composing Policies for Scalable Continual Reinforcement Learning
 - Self-Consistency Training for Density-Functional-Theory Hamiltonian Prediction
 - Self-Correcting Self-Consuming Loops for Generative Model Training
 - Self-Driven Entropy Aggregation for Byzantine-Robust Heterogeneous Federated Learning
 - SelfIE: Self-Interpretation of Large Language Model Embeddings
 - Self-Infilling Code Generation
 - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
 - Self-Rewarding Language Models
 - Self-Supervised Coarsening of Unstructured Grid with Automatic Differentiation
 - Self-Supervised Interpretable End-to-End Learning via Latent Functional Modularity
 - SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
 - SelMatch: Effectively Scaling Up Dataset Distillation via Selection-Based Initialization and Partial Updates by Trajectory Matching
 - Semantically-correlated memories in a dense associative model
 - Semantic-Aware Human Object Interaction Image Generation
 - SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets
 - Sequence Compression Speeds Up Credit Assignment in Reinforcement Learning
 - Sequential Asynchronous Action Coordination in Multi-Agent Systems: A Stackelberg Decision Transformer Approach
 - Sequential Disentanglement by Extracting Static Information From A Single Sequence Element
 - Sequential Kernel Goodness-of-fit Testing
 - Sequential Neural Score Estimation: Likelihood-Free Inference with Conditional Score Based Diffusion Models
 - SFC: Achieve Accurate Fast Convolution under Low-precision Arithmetic
 - SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning
 - Sharpness-Aware Data Generation for Zero-shot Quantization
 - Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss
 - Shifted Interpolation for Differential Privacy
 - SHINE: Shielding Backdoors in Deep Reinforcement Learning
 - Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences
 - Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs
 - SiBBlInGS: Similarity-driven Building-Block Inference using Graphs across States
 - Sign Gradient Descent-based Neuronal Dynamics: ANN-to-SNN Conversion Beyond ReLU Network
 - Sign is Not a Remedy: Multiset-to-Multiset Message Passing for Learning on Heterophilic Graphs
 - Sign Rank Limitations for Inner Product Graph Decoders
 - SignSGD with Federated Defense: Harnessing Adversarial Attacks through Gradient Sign Decoding
 - SILVER: Single-loop variance reduction and application to federated learning
 - Simple Ingredients for Offline Reinforcement Learning
 - Simple linear attention language models balance the recall-throughput tradeoff
 - Simplicity Bias of Two-Layer Networks beyond Linearly Separable Data
 - Simplicity Bias via Global Convergence of Sharpness Minimization
 - SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning
 - Simulation-Based Inference with Quantile Regression
 - Simulation of Graph Algorithms with Looped Transformers
 - Simultaneous identification of models and parameters of scientific simulators
 - Single-Model Attribution of Generative Models Through Final-Layer Inversion
 - Single-Trajectory Distributionally Robust Reinforcement Learning
 - SIN: Selective and Interpretable Normalization for Long-Term Time Series Forecasting
 - SiT: Symmetry-invariant Transformers for Generalisation in Reinforcement Learning
 - Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection
 - Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills
 - SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
 - SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
 - SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals
 - Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices
 - Sliced-Wasserstein Estimation with Spherical Harmonics as Control Variates
 - Sliced Wasserstein with Random-Path Projecting Directions
 - Slicing Mutual Information Generalization Bounds for Neural Networks
 - Sliding Down the Stairs: How Correlated Latent Variables Accelerate Learning with Neural Networks
 - SLOG: An Inductive Spectral Graph Neural Network Beyond Polynomial Filter
 - Slot Abstractors: Toward Scalable Abstract Visual Reasoning
 - Slow and Steady Wins the Race: Maintaining Plasticity with Hare and Tortoise Networks
 - Small-loss Adaptive Regret for Online Convex Optimization
 - SMaRt: Improving GANs with Score Matching Regularity
 - Smoothing Proximal Gradient Methods for Nonsmooth Sparsity Constrained Optimization: Optimality Conditions and Global Convergence
 - Smooth Min-Max Monotonic Networks
 - Smoothness Adaptive Hypothesis Transfer Learning
 - Smooth Tchebycheff Scalarization for Multi-Objective Optimization
 - Sobolev Space Regularised Pre Density Models
 - Socialized Learning: Making Each Other Better Through Multi-Agent Collaboration
 - Soft Prompt Recovers Compressed LLMs, Transferably
 - Solving Hierarchical Information-Sharing Dec-POMDPs: An Extensive-Form Game Approach
 - Solving Poisson Equations using Neural Walk-on-Spheres
 - SPABA: A Single-Loop and Probabilistic Stochastic Bilevel Algorithm Achieving Optimal Sample Complexity
 - SPADE: Sparsity-Guided Debugging for Deep Neural Networks
 - SparQ Attention: Bandwidth-Efficient LLM Inference
 - Sparse and Structured Hopfield Networks
 - Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once
 - Sparse Dimensionality Reduction Revisited
 - Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
 - Sparse Inducing Points in Deep Gaussian Processes: Enhancing Modeling with Denoising Diffusion Variational Inference
 - Sparse is Enough in Fine-tuning Pre-trained Large Language Models
 - Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications
 - Sparser, Better, Deeper, Stronger: Improving Static Sparse Training with Exact Orthogonal Initialization
 - Sparsest Models Elude Pruning: An Exposé of Pruning’s Current Capabilities
 - Sparse-to-dense Multimodal Image Registration via Multi-Task Learning
 - SparseTSF: Modeling Long-term Time Series Forecasting with *1k* Parameters
 - Spectral Phase Transition and Optimal PCA in Block-Structured Spiked Models
 - Spectral Preconditioning for Gradient Methods on Graded Non-convex Functions
 - Speech Self-Supervised Learning Using Diffusion Model Synthetic Data
 - SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
 - Spider: A Unified Framework for Context-dependent Concept Segmentation
 - Spike Distance Function as a Learning Objective for Spike Prediction
 - SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms
 - SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN
 - Split-and-Denoise: Protect large language model inference with local differential privacy
 - Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting
 - Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text
 - SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
 - SqueezeLLM: Dense-and-Sparse Quantization
 - SSL4Q: Semi-Supervised Learning of Quantum Data with Application to Quantum State Classification
 - Stability and Generalization for Stochastic Recursive Momentum-based Algorithms for (Strongly-)Convex One to $K$-Level Stochastic Optimizations
 - Stability and Generalization of Stochastic Compositional Gradient Descent Algorithms
 - Stability and Multigroup Fairness in Ranking with Uncertain Predictions
 - Stability Evaluation through Distributional Perturbation Analysis
 - Stability-Informed Initialization of Neural Ordinary Differential Equations
 - Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process
 - Stable Differentiable Causal Discovery
 - StableMask: Refining Causal Masking in Decoder-only Transformer
 - StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization
 - Stacking Deep Set Networks and Pooling by Quantiles
 - StackSight: Unveiling WebAssembly through Large Language Models and Neurosymbolic Chain-of-Thought Decompilation
 - Standardized Interpretable Fairness Measures for Continuous Risk Scores
 - State-Constrained Zero-Sum Differential Games with One-Sided Information
 - State-Free Inference of State-Space Models: The *Transfer Function* Approach
 - Stationarity without mean reversion in improper Gaussian processes
 - Stationary Latent Weight Inference for Unreliable Observations from Online Test-Time Adaptation
 - Statistical Inference Under Constrained Selection Bias
 - Statistically Optimal Generative Modeling with Maximum Deviation from the Empirical Distribution
 - Statistical Properties of Robust Satisficing
 - Statistical Test for Attention Maps in Vision Transformers
 - Stay on Topic with Classifier-Free Guidance
 - Stealing part of a production language model
 - Stealthy Imitation: Reward-guided Environment-free Policy Stealing
 - STEER: Assessing the Economic Rationality of Large Language Models
 - STELLA: Continual Audio-Video Pre-training with SpatioTemporal Localized Alignment
 - Stereographic Spherical Sliced Wasserstein Distances
 - Stereo Risk: A Continuous Modeling Approach to Stereo Matching
 - Stochastic Bandits with ReLU Neural Networks
 - Stochastic Conditional Diffusion Models for Robust Semantic Image Synthesis
 - Stochastic Gradient Flow Dynamics of Test Risk and its Exact Solution for Weak Features
 - Stochastic Interpolants with Data-Dependent Couplings
 - Stochastic Localization via Iterative Posterior Sampling
 - Stochastic Optimization with Arbitrary Recurrent Data Sampling
 - Stochastic positional embeddings improve masked image modeling
 - Stochastic Q-learning for Large Discrete Action Spaces
 - Stochastic Quantum Sampling for Non-Logconcave Distributions and Estimating Partition Functions
 - Stochastic Weakly Convex Optimization beyond Lipschitz Continuity
 - Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
 - Straight-Through Meets Sparse Recovery: the Support Exploration Algorithm
 - Strategic ML: How to Learn With Data That ‘Behaves’
 - StrokeNUWA—Tokenizing Strokes for Vector Graphic Synthesis
 - Structure-Aware E(3)-Invariant Molecular Conformer Aggregation Networks
 - Structure-based drug design by denoising voxel grids
 - Structured Chemistry Reasoning with Large Language Models
 - Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC
 - Structured Probabilistic Inference and Generative Modeling
 - Structure Your Data: Towards Semantic Graph Counterfactuals
 - StrWAEs to Invariant Representations
 - Studying K-FAC Heuristics by Viewing Adam through a Second-Order Lens
 - StyDeSty: Min-Max Stylization and Destylization for Single Domain Generalization
 - Subequivariant Reinforcement Learning in 3D Multi-Entity Physical Environments
 - Subgoal-based Demonstration Learning for Formal Theorem Proving
 - Subgraphormer: Unifying Subgraph GNNs and Graph Transformers via Graph Products
 - Subhomogeneous Deep Equilibrium Models
 - Submodular framework for structured-sparse optimal transport
 - Subsampling is not Magic: Why Large Batch Sizes Work for Differentially Private Stochastic Optimisation
 - Sub-token ViT Embedding via Stochastic Resonance Transformers
 - Successor Features for Efficient Multi-Subject Controlled Text Generation
 - SuDA: Support-based Domain Adaptation for Sim2Real Hinge Joint Tracking with Flexible Sensors
 - Superpoint Gaussian Splatting for Real-Time High-Fidelity Dynamic Scene Reconstruction
 - Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation
 - Supervised Matrix Factorization: Local Landscape Analysis and Applications
 - Surface-VQMAE: Vector-quantized Masked Auto-encoders on Molecular Surfaces
 - SurfPro: Functional Protein Design Based on Continuous Surface
 - Surprisingly Strong Performance Prediction with Neural Graph Features
 - Survival Kernets: Scalable and Interpretable Deep Kernel Survival Analysis with an Accuracy Guarantee
 - Swallowing the Bitter Pill: Simplified Scalable Conformer Generation
 - Switchable Decision: Dynamic Neural Generation Networks
 - Switched Flow Matching: Eliminating Singularities via Switching ODEs
 - Switching the Loss Reduces the Cost in Batch Reinforcement Learning
 - SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment
 - Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion
 - Symmetric Matrix Completion with ReLU Sampling
 - Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial Optimization
 - Symmetry Induces Structure and Constraint of Learning
 - Synergistic Integration of Coordinate Network and Tensorial Feature for Improving Neural Radiance Fields from Sparse Inputs
 - TabLog: Test-Time Adaptation for Tabular Data Using Logic Rules
 - Tabular Insights, Visual Impacts: Transferring Expertise from Tables to Images
 - Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation
 - Tackling Prevalent Conditions in Unsupervised Combinatorial Optimization: Cardinality, Minimum, Covering, and More
 - Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
 - Tandem Transformers for Inference Efficient LLMs
 - Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation
 - Task-aware Orthogonal Sparse Network for Exploring Shared Knowledge in Continual Learning
 - Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models
 - Taylor Videos for Action Recognition
 - T-Cal: An Optimal Test for the Calibration of Predictive Models
 - Tell, Don't Show: Language Guidance Eases Transfer Across Domains in Images and Videos
 - Temporal Logic Specification-Conditioned Decision Transformer for Offline Safe Reinforcement Learning
 - Temporal Spiking Neural Networks with Synaptic Delay for Graph Reasoning
 - TENG: Time-Evolving Natural Gradient for Solving PDEs With Deep Neural Nets Toward Machine Precision
 - TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors
 - Testing the Feasibility of Linear Programs with Bandit Feedback
 - Test-Time Degradation Adaptation for Open-Set Image Restoration
 - Test-Time Model Adaptation with Only Forward Passes
 - Test-Time Regret Minimization in Meta Reinforcement Learning
 - Text, camera, action! Frontiers in controllable video generation
 - The Balanced-Pairwise-Affinities Feature Transform
 - The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents
 - The Computational Complexity of Finding Second-Order Stationary Points
 - The Effect of Weight Precision on the Neuron Count in Deep ReLU Networks
 - The effects of digital technology on youth development in low-and-middle-income countries
 - The Emergence of Reproducibility and Consistency in Diffusion Models
 - The Entropy Enigma: Success and Failure of Entropy Minimization
 - The Expressive Power of Path-Based Graph Neural Networks
 - The Fundamental Limits of Least-Privilege Learning
 - The good, the bad and the ugly sides of data augmentation: An implicit spectral regularization perspective
 - The Good, The Bad, and Why: Unveiling Emotions in Generative AI
 - The Illusion of State in State-Space Models
 - The Linear Representation Hypothesis and the Geometry of Large Language Models
 - The Max-Min Formulation of Multi-Objective Reinforcement Learning: From Theory to a Model-Free Algorithm
 - The Merit of River Network Topology for Neural Flood Forecasting
 - The Non-linear $F$-Design and Applications to Interactive Learning
 - Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability
 - Theoretical Guarantees for Variational Inference with Fixed-Variance Mixture of Gaussians
 - Theoretical insights for diffusion guidance: A case study for Gaussian mixture models
 - Theory of Consistency Diffusion Models: Distribution Estimation Meets Fast Sampling
 - The Perception-Robustness Tradeoff in Deterministic Image Restoration
 - The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks
 - The Pitfalls of Next-Token Prediction
 - The Privacy Power of Correlated Noise in Decentralized Learning
 - The Relative Value of Prediction in Algorithmic Decision Making
 - Thermometer: Towards Universal Calibration for Large Language Models
 - The Role of Learning Algorithms in Collective Action
 - The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright BreachesWithout Adjusting Finetuning Pipeline
 - The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling
 - The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning
 - Think Before You Act: Decision Transformers with Working Memory
 - TIC-TAC: A Framework For Improved Covariance Estimation In Deep Heteroscedastic Regression
 - Tight Partial Identification of Causal Effects with Marginal Distribution of Unmeasured Confounders
 - Tilt and Average : Geometric Adjustment of the Last Layer for Recalibration
 - Tilting the Odds at the Lottery: the Interplay of Overparameterisation and Curricula in Neural Networks
 - Tilt your Head: Activating the Hidden Spatial-Invariance of Classifiers
 - TimeMIL: Advancing Multivariate Time Series Classification via a Time-aware Multiple Instance Learning
 - Timer: Generative Pre-trained Transformers Are Large Time Series Models
 - Time Series Diffusion in the Frequency Domain
 - Time-Series Forecasting for Out-of-Distribution Generalization Using Invariant Learning
 - TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling
 - Time Weaver: A Conditional Time Series Generation Model
 - TimeX++: Learning Time-Series Explanations with Information Bottleneck
 - tinyBenchmarks: evaluating LLMs with fewer examples
 - TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge
 - tnGPS: Discovering Unknown Tensor Network Structure Search Algorithms via Large Language Models (LLMs)
 - To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO
 - To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models
 - Token-level Direct Preference Optimization
 - Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models
 - Topological Neural Networks go Persistent, Equivariant, and Continuous
 - Total Variation Distance Meets Probabilistic Inference
 - Total Variation Floodgate for Variable Importance Inference in Classification
 - To the Max: Reinventing Reward in Reinforcement Learning
 - Toward Adaptive Reasoning in Large Language Models with Thought Rollback
 - Toward Availability Attacks in 3D Point Clouds
 - Towards a Better Theoretical Understanding of Independent Subnetwork Training
 - Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model
 - Towards a Self-contained Data-driven Global Weather Forecasting Framework
 - Towards AutoAI: Optimizing a Machine Learning System with Black-box and Differentiable Components
 - Towards Causal Foundation Model: on Duality between Optimal Balancing and Attention
 - Towards Certified Unlearning for Deep Neural Networks
 - Towards Compositionality in Concept Learning
 - Towards efficient deep spiking neural networks construction with spiking activity based pruning
 - Towards Efficient Exact Optimization of Language Model Alignment
 - Towards Efficient Generative Large Language Model Serving: A Tutorial from Algorithms to Systems
 - Towards Efficient Spiking Transformer: a Token Sparsification Framework for Training and Inference Acceleration
 - Towards Efficient Training and Evaluation of Robust Models against $l_0$ Bounded Adversarial Perturbations
 - Towards General Algorithm Discovery for Combinatorial Optimization: Learning Symbolic Branching Policy from Bipartite Graph
 - Towards Generalization beyond Pointwise Learning: A Unified Information-theoretic Perspective
 - Towards General Neural Surrogate Solvers with Specialized Neural Accelerators
 - Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles
 - Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation
 - Towards Modular LLMs by Building and Reusing a Library of LoRAs
 - Towards Neural Architecture Search through Hierarchical Generative Modeling
 - Towards Optimal Adversarial Robust Q-learning with Bellman Infinity-error
 - Towards Realistic Model Selection for Semi-supervised Learning
 - Towards Resource-friendly, Extensible and Stable Incomplete Multi-view Clustering
 - Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption
 - Towards Scalable and Versatile Weight Space Learning
 - Towards Theoretical Understanding of Learning Large-scale Dependent Data via Random Features
 - Towards Theoretical Understandings of Self-Consuming Generative Models
 - Towards the Theory of Unsupervised Federated Learning: Non-asymptotic Analysis of Federated EM Algorithms
 - Towards Understanding Inductive Bias in Transformers: A View From Infinity
 - Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features
 - Towards Unified Multi-granularity Text Detection with Interactive Attention
 - Trainable Transformer in Transformer
 - Trained Random Forests Completely Reveal your Dataset
 - Training-Free Long-Context Scaling of Large Language Models
 - Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization
 - Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
 - Transferable Facial Privacy Protection against Blind Face Restoration via Domain-Consistent Adversarial Obfuscation
 - Transferring Knowledge From Large Foundation Models to Small Downstream Models
 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
 - Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models
 - Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
 - Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape
 - Transformers, parallel computation, and logarithmic depth
 - Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot
 - Transforming and Combining Rewards for Aligning Large Language Models
 - Transitional Uncertainty with Layered Intermediate Predictions
 - Translating Subgraphs to Nodes Makes Simple GNNs Strong and Efficient for Subgraph Representation Learning
 - Translation Equivariant Transformer Neural Processes
 - Transolver: A Fast Transformer Solver for PDEs on General Geometries
 - Transport of Algebraic Structure to Latent Embeddings
 - TravelPlanner: A Benchmark for Real-World Planning with Language Agents
 - Triadic-OCD: Asynchronous Online Change Detection with Provable Robustness, Optimality, and Convergence
 - Triple Changes Estimator for Targeted Policies
 - Triplet Interaction Improves Graph Transformers: Accurate Molecular Graph Learning with Triplet Graph Transformers
 - Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning
 - TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks
 - Truly No-Regret Learning in Constrained MDPs
 - Trustless Audits without Revealing Data or Models
 - Trust Regions for Explanations via Black-Box Probabilistic Certification
 - Trust the Model Where It Trusts Itself - Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption
 - Trustworthy Actionable Perturbations
 - Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning
 - Trustworthy Multi-modal Foundation Models and AI Agents (TiFA)
 - TSLANet: Rethinking Transformers for Time Series Representation Learning
 - Tuning-free Estimation and Inference of Cumulative Distribution Function under Local Differential Privacy
 - Tuning-Free Stochastic Optimization
 - Turnstile $\ell_p$ leverage score sampling with applications
 - TVE: Learning Meta-attribution for Transferable Vision Explainer
 - Two Fists, One Heart: Multi-Objective Optimization Based Strategy Fusion for Long-tailed Learning
 - Two Heads are Actually Better than One: Towards Better Adversarial Robustness via Transduction and Rejection
 - Two Heads Are Better Than One: Boosting Graph Sparse Training via Semantic and Topological Awareness
 - Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences Constraints
 - Two-Stage Shadow Inclusion Estimation: An IV Approach for Causal Inference under Latent Confounding and Collider Bias
 - Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation
 - Two Tales of Single-Phase Contrastive Hebbian Learning
 - Two-timescale Derivative Free Optimization for Performative Prediction with Markovian Data
 - UGrid: An Efficient-And-Rigorous Neural Multigrid Solver for Linear PDEs
 - ULAREF: A Unified Label Refinement Framework for Learning with Inaccurate Supervision
 - ULTRAFEEDBACK: Boosting Language Models with Scaled AI Feedback
 - Unapologetically Open Science -- the complexity and challenges of making openness win!
 - Unbiased Multi-Label Learning from Crowdsourced Annotations
 - Uncertainty-Aware Reward-Free Exploration with General Function Approximation
 - Uncertainty Estimation by Density Aware Evidential Deep Learning
 - Uncertainty for Active Learning on Graphs
 - Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise
 - Understanding and Diagnosing Deep Reinforcement Learning
 - Understanding Diffusion Models by Feynman's Path Integral
 - Understanding Finetuning for Factual Knowledge Extraction
 - Understanding Forgetting in Continual Learning with Linear Regression
 - Understanding Heterophily for Graph Neural Networks
 - Understanding Inter-Concept Relationships in Concept-Based Models
 - Understanding MLP-Mixer as a wide and sparse MLP
 - Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation
 - Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models
 - Understanding Server-Assisted Federated Learning in the Presence of Incomplete Client Participation
 - Understanding Stochastic Natural Gradient Variational Inference
 - Understanding the Effects of Iterative Prompting on Truthfulness
 - Understanding the Impact of Introducing Constraints at Inference Time on Generalization Error
 - Understanding the Learning Dynamics of Alignment with Human Feedback
 - Understanding the Role of Large Language Models in Planning
 - Understanding the Training Speedup from Sampling with Approximate Losses
 - Understanding Unimodal Bias in Multimodal Deep Linear Networks
 - UniAudio: Towards Universal Audio Generation with Large Language Models
 - UniCorn: A Unified Contrastive Learning Approach for Multi-view Molecular Representation Learning
 - Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding
 - Unified Training of Universal Time Series Forecasting Transformers
 - Uniformly Stable Algorithms for Adversarial Training and Beyond
 - Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models
 - Unifying Bayesian Flow Networks and Diffusion Models through Stochastic Differential Equations
 - Unifying Image Processing as Visual Prompting Question Answering
 - Universal Consistency of Wide and Deep ReLU Neural Networks and Minimax Optimal Convergence Rates for Kolmogorov-Donoho Optimal Function Classes
 - Universal Gradient Methods for Stochastic Convex Optimization
 - Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues
 - Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts
 - Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training
 - Unlock the Cognitive Generalization of Deep Reinforcement Learning via Granular Ball Representation
 - Unmasking Vulnerabilities: Cardinality Sketches under Adaptive Inputs
 - Unraveling the Impact of Heterophilic Structures on Graph Positive-Unlabeled Learning
 - Unsupervised Concept Discovery Mitigates Spurious Correlations
 - Unsupervised Domain Adaptation for Anatomical Structure Detection in Ultrasound Images
 - Unsupervised Episode Generation for Graph Meta-learning
 - Unsupervised Evaluation of Code LLMs with Round-Trip Correctness
 - Unsupervised Parameter-free Simplicial Representation Learning with Scattering Transforms
 - Unsupervised Representation Learning of Brain Activity via Bridging Voxel Activity and Functional Connectivity
 - Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings
 - Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration
 - Unveiling Privacy, Memorization, and Input Curvature Links
 - Unveiling the Cycloid Trajectory of EM Iterations in Mixed Linear Regression
 - Unveiling the Dynamics of Information Interplay in Supervised Learning
 - Unveiling the Potential of AI for Nanomaterial Morphology Prediction
 - UP2ME: Univariate Pre-training to Multivariate Fine-tuning as a General-purpose Framework for Multivariate Time Series Analysis
 - UPAM: Unified Prompt Attack in Text-to-Image Generation Models Against Both Textual Filters and Visual Checkers
 - UPOCR: Towards Unified Pixel-Level OCR Interface
 - Use Your INSTINCT: INSTruction optimization for LLMs usIng Neural bandits Coupled with Transformers
 - Using AI Uncertainty Quantification to Improve Human Decision-Making
 - Using Left and Right Brains Together: Towards Vision and Language Planning
 - Using Uncertainty Quantification to Characterize and Improve Out-of-Domain Learning for PDEs
 - USTAD: Unified Single-model Training Achieving Diverse Scores for Information Retrieval
 - Vague Prototype-Oriented Diffusion Model for Multi-Class Anomaly Detection
 - Value-Evolutionary-Based Reinforcement Learning
 - Vanilla Bayesian Optimization Performs Great in High Dimensions
 - Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
 - Variational Inference with Coverage Guarantees in Simulation-Based Inference
 - Variational Learning is Effective for Large Deep Networks
 - Variational Linearized Laplace Approximation for Bayesian Deep Learning
 - Variational Partial Group Convolutions for Input-Aware Partial Equivariance of Rotations and Color-Shifts
 - Variational Schrödinger Diffusion Models
 - Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention
 - Vectorized Conditional Neural Fields: A Framework for Solving Time-dependent Parametric Partial Differential Equations
 - Vector Quantization Pretraining for EEG Time Series with Random Projection and Phase Alignment
 - Verification of Machine Unlearning is Fragile
 - Verifying message-passing neural networks via topology-based bounds tightening
 - Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
 - Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
 - VideoPoet: A Large Language Model for Zero-Shot Video Generation
 - VideoPrism: A Foundational Visual Encoder for Video Understanding
 - video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
 - Viewing Transformers Through the Lens of Long Convolutions Layers
 - VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception
 - ViP: A Differentially Private Foundation Model for Computer Vision
 - VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context
 - Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
 - Vision Transformers as Probabilistic Expansion from Learngene
 - Visual Representation Learning with Stochastic Frame Prediction
 - Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
 - Visual Transformer with Differentiable Channel Selection: An Information Bottleneck Inspired Approach
 - VNN: Verification-Friendly Neural Networks with Hard Robustness Guarantees
 - Vocabulary for Universal Approximation: A Linguistic Perspective of Mapping Compositions
 - VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model
 - VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling
 - WARM: On the Benefits of Weight Averaged Reward Models
 - Wasserstein Wormhole: Scalable Optimal Transport Distance with Transformer
 - Watermarks in the Sand: Impossibility of Strong Watermarking for Language Models
 - Watermark Stealing in Large Language Models
 - WAVES: Benchmarking the Robustness of Image Watermarks
 - Weakly Convex Regularisers for Inverse Problems: Convergence of Critical Points and Primal-Dual Optimisation
 - Weakly-Supervised Residual Evidential Learning for Multi-Instance Uncertainty Estimation
 - Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
 - WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
 - Weighted distance nearest neighbor condensing
 - Weisfeiler-Leman at the margin: When more expressivity matters
 - Weisfeiler Leman for Euclidean Equivariant Machine Learning
 - What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks
 - What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding
 - What is Dataset Distillation Learning?
 - What is the Long-Run Distribution of Stochastic Gradient Descent? A Large Deviations Analysis
 - What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
 - "What robots have taught me about machine learning"
 - What’s the score? Automated Denoising Score Matching for Nonlinear Diffusions
 - What Will My Model Forget? Forecasting Forgotten Examples in Language Model Refinement
 - What Would Gauss Say About Representations? Probing Pretrained Image Models using Synthetic Gaussian Benchmarks
 - When and How Does In-Distribution Label Help Out-of-Distribution Detection?
 - When Do Skills Help Reinforcement Learning? A Theoretical Analysis of Temporal Abstractions
 - When is Transfer Learning Possible?
 - When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
 - When Representations Align: Universality in Representation Learning Dynamics
 - When Will Gradient Regularization Be Harmful?
 - Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning
 - Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models
 - Why Do Animals Need Shaping? A Theory of Task Composition and Curriculum Learning
 - Why do Variational Autoencoders Really Promote Disentanglement?
 - Why Do You Grok? A Theoretical Analysis on Grokking Modular Addition
 - Why Larger Language Models Do In-context Learning Differently?
 - Winner-takes-all learners are geometry-aware conditional density estimators
 - WISER: Weak Supervision and Supervised Representation Learning to Improve Drug Response Prediction in Cancer
 - WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
 - Workshop on Mechanistic Interpretability
 - Workshop on Theoretical Foundations of Foundation Models (TF2M)
 - Wukong: Towards a Scaling Law for Large-Scale Recommendation
 - X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation
 - xT: Nested Tokenization for Larger Context in Large Images
 - Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement
 - Zero-Shot Reinforcement Learning via Function Encoders
 - Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion
 - Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach
 - Zeroth-Order Methods for Constrained Nonconvex Nonsmooth Stochastic Optimization
 
Successful Page Load