Downloads 2023
            Number of events: 1908
        
    
    - $H$-Consistency Bounds for Pairwise Misranking Loss Surrogates
 - $\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation
 - 2D-Shapley: A Framework for Fragmented Data Valuation
 - 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML)
 - 2nd ICML Workshop on Machine Learning for Astrophysics
 - 2nd ICML Workshop on New Frontiers in Adversarial Machine Learning
 - 2nd Workshop on Formal Verification of Machine Learning
 - 3rd Workshop on Interpretable Machine Learning in Healthcare (IMLH)
 - abess: A Fast Best-Subset Selection Library in Python and R
 - AbODE: Ab initio antibody design using conjoined ODEs
 - Abstracting Imperfect Information Away from Two-Player Zero-Sum Games
 - Abstract-to-Executable Trajectory Translation for One-Shot Task Generalization
 - A/B Testing in Network Data with Covariate-Adaptive Randomization
 - ACAT: Adversarial Counterfactual Attention for Classification and Detection in Medical Imaging
 - A Category-theoretical Meta-analysis of Definitions of Disentanglement
 - Accelerated Cyclic Coordinate Dual Averaging with Extrapolation for Composite Convex Optimization
 - Accelerated Infeasibility Detection of Constrained Optimization and Fixed-Point Iterations
 - Accelerated Primal-Dual Methods for Convex-Strongly-Concave Saddle Point Problems
 - Accelerated Stochastic Optimization Methods under Quasar-convexity
 - Accounting For Informative Sampling When Learning to Forecast Treatment Outcomes Over Time
 - Accuracy on the Curve: On the Nonlinear Correlation of ML Performance Between Data Subpopulations
 - Achieving Hierarchy-Free Approximation for Bilevel Programs with Equilibrium Constraints
 - Achieving High Accuracy with PINNs via Energy Natural Gradient Descent
 - Achieving Linear Speedup in Non-IID Federated Bilevel Learning
 - A Closer Look at Few-shot Classification Again
 - A Closer Look at Self-Supervised Lightweight Vision Transformers
 - A Closer Look at the Intervention Procedure of Concept Bottleneck Models
 - A Complete Expressiveness Hierarchy for Subgraph GNNs via Subgraph Weisfeiler-Lehman Tests
 - A Conditional Normalizing Flow for Accelerated Multi-Coil MR Imaging
 - A Connection between One-Step RL and Critic Regularization in Reinforcement Learning
 - A Coupled Flow Approach to Imitation Learning
 - A Critical Revisit of Adversarial Robustness in 3D Point Cloud Recognition with Diffusion-Driven Purification
 - A Critical View of Vision-Based Long-Term Dynamics Prediction Under Environment Misalignment
 - Action Matching: Learning Stochastic Dynamics from Samples
 - Active causal structure learning with advice
 - Active Learning based Structural Inference
 - Active Policy Improvement from Multiple Black-box Oracles
 - Active Ranking of Experts Based on their Performances in Many Tasks
 - Actor-Critic Alignment for Offline-to-Online Reinforcement Learning
 - AdaBoost is not an Optimal Weak to Strong Learner
 - AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation
 - AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
 - Adapting to game trees in zero-sum imperfect information games
 - Adaptive Annealed Importance Sampling with Constant Rate Progress
 - Adaptive Barrier Smoothing for First-Order Policy Gradient with Contact Dynamics
 - Adaptive Compositional Continual Meta-Learning
 - Adaptive Computation with Elastic Input Sequence
 - Adaptive Coordination in Social Embodied Rearrangement
 - Adaptive Estimation of Graphical Models under Total Positivity
 - Adaptive Identification of Populations with Treatment Benefit in Clinical Trials: Machine Learning Challenges and Solutions
 - Adaptive IMLE for Few-shot Pretraining-free Generative Modelling
 - Adaptively Weighted Data Augmentation Consistency Regularization for Robust Optimization under Concept Shift
 - Adaptive Smoothing Gradient Learning for Spiking Neural Networks
 - Adaptive Whitening in Neural Populations with Gain-modulating Interneurons
 - Additive Causal Bandits with Unknown Graph
 - Addressing Budget Allocation and Revenue Allocation in Data Market Environments Using an Adaptive Sampling Algorithm
 - A Deep Conjugate Direction Method for Iteratively Solving Linear Systems
 - A Distribution Optimization Framework for Confidence Bounds of Risk Measures
 - Adversarial Cheap Talk
 - Adversarial Classification: Necessary Conditions and Geometric Flows
 - Adversarial Collaborative Learning on Non-IID Features
 - Adversarial Example Does Good: Preventing Painting Imitation from Diffusion Models via Adversarial Examples
 - Adversarial Learning of Distributional Reinforcement Learning
 - Adversarially Robust PAC Learnability of Real-Valued Functions
 - Adversarial Parameter Attack on Deep Neural Networks
 - Adversarial Policies Beat Superhuman Go AIs
 - Adversarial robustness of amortized Bayesian inference
 - A Fast Optimistic Method for Monotone Variational Inequalities
 - A Fast, Well-Founded Approximation to the Empirical Neural Tangent Kernel
 - A Flexible Diffusion Model
 - A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback
 - A Fully First-Order Method for Stochastic Bilevel Optimization
 - A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems
 - A Generalization of ViT/MLP-Mixer to Graphs
 - A General Representation Learning Framework with Generalization Performance Guarantees
 - A General Theory for Federated Optimization with Asynchronous and Heterogeneous Clients Updates
 - A Gromov--Wasserstein Geometric View of Spectrum-Preserving Graph Coarsening
 - A Group Symmetric Stochastic Differential Equation Model for Molecule Multi-modal Pretraining
 - A Hybrid Quantum-Classical Approach based on the Hadamard Transform for the Convolutional Layer
 - A Kernel-Based View of Language Model Fine-Tuning
 - A Kernelized Stein Discrepancy for Biological Sequences
 - A Kernel Stein Test of Goodness of Fit for Sequential Models
 - A Large-Scale Study of Probabilistic Calibration in Neural Network Regression
 - A Law of Robustness beyond Isoperimetry
 - Algorithmic Collective Action in Machine Learning
 - Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions
 - Algorithms for bounding contribution for histogram estimation under user-level privacy
 - Aligning Language Models with Preferences through $f$-divergence Minimization
 - A Likelihood Approach to Nonparametric Estimation of a Singular Distribution Using Deep Generative Models
 - All in a Row: Compressed Convolution Networks for Graphs
 - Alternately Optimized Graph Neural Networks
 - Alternating Local Enumeration (TnALE): Solving Tensor Network Structure Search with Fewer Evaluations
 - A Mathematical Model for Curriculum Learning for Parities
 - A Model-Based Method for Minimizing CVaR and Beyond
 - A Model-free Closeness-of-influence Test for Features in Supervised Learning
 - A Modern Look at the Relationship between Sharpness and Generalization
 - An Adaptive Entropy-Regularization Framework for Multi-Agent Reinforcement Learning
 - Analysis of Error Feedback in Federated Non-Convex Optimization with Biased Compression: Fast Convergence and Partial Participation
 - Analyzing Convergence in Quantum Neural Networks: Deviations from Neural Tangent Kernels
 - Analyzing Diffusion as Serial Reproduction
 - Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis Testing: A Lesson From Fano
 - Anchor Sampling for Federated Learning with Partial Client Participation
 - A Nearly-Optimal Bound for Fast Regression with $\ell_\infty$ Guarantee
 - A Near-Optimal Algorithm for Safe Reinforcement Learning Under Instantaneous Hard Constraints
 - An Effective Meaningful Way to Evaluate Survival Models
 - A Neural PDE Solver with Temporal Stencil Modeling
 - A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree
 - A New PHO-rmula for Improved Performance of Semi-Structured Networks
 - An Information-Theoretic Analysis of Nonstationary Bandit Learning
 - An Instrumental Variable Approach to Confounded Off-Policy Evaluation
 - An Investigation into Pre-Training Object-Centric Representations for Reinforcement Learning
 - An SDE for Modeling SAM: Theory and Insights
 - Answering Complex Logical Queries on Knowledge Graphs via Query Computation Tree Optimization
 - Anti-Exploration by Random Network Distillation
 - A Picture of the Space of Typical Learnable Tasks
 - Applied Online Algorithms with Heterogeneous Predictors
 - Approximate Causal Effect Identification under Weak Confounding
 - Approximately Optimal Core Shapes for Tensor Decompositions
 - Approximate Stein Classes for Truncated Density Estimation
 - Approximation Algorithms for Fair Range Clustering
 - Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input
 - Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN
 - Are Diffusion Models Vulnerable to Membership Inference Attacks?
 - Are Equivariant Equilibrium Approximators Beneficial?
 - Are Gaussian Data All You Need? The Extents and Limits of Universality in High-Dimensional Generalized Linear Estimation
 - A Reinforcement Learning Framework for Dynamic Mediation Analysis
 - Are labels informative in semi-supervised learning? Estimating and leveraging the missing-data mechanism.
 - Are Large Kernels Better Teachers than Transformers for ConvNets?
 - Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations
 - Are Random Decompositions all we need in High Dimensional Bayesian Optimisation?
 - Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
 - A Robust Optimisation Perspective on Counterexample-Guided Repair of Neural Networks
 - A Robust Test for the Stationarity Assumption in Sequential Decision Making
 - Artificial Intelligence & Human Computer Interaction
 - A Scalable Frank-Wolfe-Based Algorithm for the Max-Cut SDP
 - A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
 - A Statistical Perspective on Retrieval-Based Models
 - A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs
 - A Study on Transformer Configuration and Training Objective
 - Atari-5: Distilling the Arcade Learning Environment down to Five Games
 - A Theoretical Analysis of the Learning Dynamics under Class Imbalance
 - A theory of continuous generative flow networks
 - A theory of representation learning gives a deep generalisation of kernel methods
 - A Three-regime Model of Network Pruning
 - A Toy Model of Universality: Reverse Engineering how Networks Learn Group Operations
 - Attention-Based Recurrence for Multi-Agent Reinforcement Learning under Stochastic Partial Observability
 - Attribute-Efficient PAC Learning of Low-Degree Polynomial Threshold Functions with Nasty Noise
 - Attributing Image Generative Models using Latent Fingerprints
 - A Two-Stage Active Learning Algorithm for k-Nearest Neighbors
 - AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
 - A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition
 - A Unified Optimization Framework of ANN-SNN Conversion: Towards Optimal Mapping from Activation Values to Firing Rates
 - A Unifying Framework to the Analysis of Interaction Methods using Synergy Functions
 - A Universal Unbiased Method for Classification from Aggregate Observations
 - AutoCoreset: An Automatic Practical Coreset Construction Framework
 - Auto-Differentiation of Relational Computations for Very Large Scale Machine Learning
 - Automated Search for Conjectures on Mathematical Constants using Analysis of Integer Sequences
 - Automatically Auditing Large Language Models via Discrete Optimization
 - Automatically marginalized MCMC in probabilistic programming
 - Automatic Data Augmentation via Invariance-Constrained Learning
 - Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning
 - Autoregressive Diffusion Model for Graph Generation
 - Auxiliary Learning as an Asymmetric Bargaining Game
 - Auxiliary Modality Learning with Generalized Curriculum Distillation
 - Averaged Method of Multipliers for Bi-Level Optimization without Lower-Level Strong Convexity
 - A Watermark for Large Language Models
 - Bag of Tricks for Training Data Extraction from Language Models
 - Bandit Multi-linear DR-Submodular Maximization and Its Applications on Adversarial Submodular Bandits
 - Bandit Online Linear Optimization with Hints and Queries
 - Bandits with Knapsacks: Advice on Time-Varying Demands
 - Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit Learning
 - Bayesian Design Principles for Frequentist Sequential Learning
 - Bayesian Estimation of Differential Privacy
 - Bayesian Neural Networks Avoid Encoding Complex and Perturbation-Sensitive Concepts
 - Bayesian online change point detection with Hilbert space approximate Student-t process
 - Bayesian Progressive Deep Topic Model with Knowledge Informed Textual Data Coarsening Process
 - Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models
 - Bayes-optimal Learning of Deep Random Networks of Extensive-width
 - Beam Tree Recursive Cells
 - BEATs: Audio Pre-Training with Acoustic Tokenizers
 - Behavior Contrastive Learning for Unsupervised Skill Discovery
 - Benign Overfitting in Deep Neural Networks under Lazy Training
 - Benign Overfitting in Two-layer ReLU Convolutional Neural Networks
 - Best Arm Identification in Multi-Agent Multi-Armed Bandits
 - Best of Both Worlds Policy Optimization
 - Better Diffusion Models Further Improve Adversarial Training
 - Better Training of GFlowNets with Local Credit and Incomplete Trajectories
 - Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic
 - Beyond Homophily: Reconstructing Structure for Graph-agnostic Clustering
 - Beyond In-Domain Scenarios: Robust Density-Aware Calibration
 - Beyond Lipschitz Smoothness: A Tighter Analysis for Nonconvex Optimization
 - Beyond Reward: Offline Preference-guided Policy Optimization
 - Beyond the Edge of Stability via Two-step Gradient Updates
 - Beyond the Universal Law of Robustness: Sharper Laws for Random Features and Neural Tangent Kernels
 - Beyond Uniform Lipschitz Condition in Differentially Private Optimization
 - Biases in Evaluation of Molecular Optimization Methods and Bias Reduction Strategies
 - BiBench: Benchmarking and Analyzing Network Binarization
 - Bidirectional Adaptation for Robust Semi-Supervised Learning with Inconsistent Data Distributions
 - Bidirectional Learning for Offline Model-based Biological Sequence Design
 - Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers
 - Bi-directional Masks for Efficient N:M Sparse Training
 - Bigger, Better, Faster: Human-level Atari with human-level efficiency
 - Bilevel Optimization with Coupled Decision-Dependent Distributions
 - BiRT: Bio-inspired Replay in Vision Transformers for Continual Learning
 - Bit Allocation using Optimization
 - Blackout Diffusion: Generative Diffusion Models in Discrete-State Spaces
 - B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding
 - BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
 - Block Subsampled Randomized Hadamard Transform for Nyström Approximation on Distributed Architectures
 - Blockwise Stochastic Variance-Reduced Methods with Parallel Speedup for Multi-Block Bilevel Optimization
 - Blossom: an Anytime Algorithm for Computing Optimal Decision Trees
 - BNN-DP: Robustness Certification of Bayesian Neural Networks via Dynamic Programming
 - Boosting Graph Contrastive Learning via Graph Contrastive Saliency
 - Boosting Offline Reinforcement Learning with Action Preference Query
 - Bootstrap in High Dimension with Low Computation
 - Bootstrapped Representations in Reinforcement Learning
 - BPipe: Memory-Balanced Pipeline Parallelism for Training Large Language Models
 - Brainformers: Trading Simplicity for Efficiency
 - Brauer's Group Equivariant Neural Networks
 - Building Neural Networks on Matrix Manifolds: A Gyrovector Space Approach
 - Buying Information for Stochastic Optimization
 - Byzantine-Robust Learning on Heterogeneous Data via Gradient Splitting
 - CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
 - Calibrating Multimodal Learning
 - Can Forward Gradient Match Backpropagation?
 - Can Large Language Models Reason about Program Invariants?
 - Can Neural Network Memorization Be Localized?
 - Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?
 - CataBEEM: Integrating Latent Interaction Categories in Node-wise Community Detection Models for Network Data
 - Causal Bounds in Quasi-Markovian Graphs
 - Causal Discovery with Latent Confounders Based on Higher-Order Cumulants
 - Causal Isotonic Calibration for Heterogeneous Treatment Effects
 - Causal Modeling of Policy Interventions From Treatment–Outcome Sequences
 - Causal Proxy Models for Concept-based Model Explanations
 - Causal Strategic Classification: A Tale of Two Shifts
 - Causal Structure Learning for Latent Intervened Non-stationary Data
 - Cell-Free Latent Go-Explore
 - Certified Robust Neural Networks: Generalization and Corruption Resistance
 - Certifying Ensembles: A General Certification Theory with S-Lipschitzness
 - Challenges in Deployable Generative AI
 - Chameleon: Adapting to Peer Images for Planting Durable Backdoors in Federated Learning
 - Change is Hard: A Closer Look at Subpopulation Shift
 - Chemically Transferable Generative Backmapping of Coarse-Grained Proteins
 - CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets
 - ChiPFormer: Transferable Chip Placement via Offline Decision Transformer
 - CircuitNet: A Generic Neural Network to Realize Universal Circuit Motif Modeling
 - CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms
 - ClimaX: A foundation model for weather and climate
 - CLIPood: Generalizing CLIP to Out-of-Distributions
 - Cluster Explanation via Polyhedral Descriptions
 - ClusterFuG: Clustering Fully connected Graphs by Multicut
 - Cluster-Specific Predictions with Multi-Task Gaussian Processes
 - CLUSTSEG: Clustering for Universal Segmentation
 - CLUTR: Curriculum Learning via Unsupervised Task Representation Learning
 - Coarse-to-Fine: a Hierarchical Diffusion Model for Molecule Generation in 3D
 - CO-BED: Information-Theoretic Contextual Optimization via Bayesian Experimental Design
 - Cocktail Party Attack: Breaking Aggregation-Based Privacy in Federated Learning Using Independent Component Analysis
 - CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks
 - CoCo: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification
 - CodeIPPrompt: Intellectual Property Infringement Assessment of Code Language Models
 - Coder Reviewer Reranking for Code Generation
 - CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis
 - Coin Sampling: Gradient-Based Bayesian Inference without Learning Rates
 - COLA: Orchestrating Error Coding and Learning for Robust Neural Network Inference Against Hardware Defects
 - Cold Analysis of Rao-Blackwellized Straight-Through Gumbel-Softmax Gradient Estimator
 - Collaborative Causal Inference with Fair Incentives
 - Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits
 - Combinatorial Neural Bandits
 - COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models
 - Communication-Constrained Bandits under Additive Gaussian Noise
 - Communication-Efficient Federated Hypergradient Computation via Aggregated Iterative Differentiation
 - Comparison of meta-learners for estimating multi-valued treatment heterogeneous effects
 - Competing for Shareable Arms in Multi-Player Multi-Armed Bandits
 - Competitive Gradient Optimization
 - Complementary Attention for Multi-Agent Reinforcement Learning
 - Complexity of Block Coordinate Descent with Proximal Regularization and Applications to Wasserstein CP-dictionary Learning
 - Composer: Creative and Controllable Image Synthesis with Composable Conditions
 - Compositional Exemplars for In-context Learning
 - Compositional Score Modeling for Simulation-Based Inference
 - Compressed Decentralized Proximal Stochastic Gradient Method for Nonconvex Composite Problems with Heterogeneous Data
 - Compressing Tabular Data via Latent Variable Estimation
 - Computational Asymmetries in Robust Classification
 - Computational Doob h-transforms for Online Filtering of Discretely Observed Diffusions
 - Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings
 - Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
 - Concept-based Explanations for Out-of-Distribution Detectors
 - ConCerNet: A Contrastive Learning Based Framework for Automated Conservation Law Discovery and Trustworthy Dynamical System Prediction
 - Concurrent Shuffle Differential Privacy Under Continual Observation
 - Conditional Graph Information Bottleneck for Molecular Relational Learning
 - Conditionally Strongly Log-Concave Generative Models
 - Conditional Tree Matching for Inference-Time Adaptation of Tree Prediction Models
 - Conditions and Assumptions for Constraint-based Causal Structure Learning
 - Cones: Concept Neurons in Diffusion Models for Customized Generation
 - Confidence and Dispersity Speak: Characterizing Prediction Matrix for Unsupervised Accuracy Estimation
 - Conformal Inference is (almost) Free for Neural Networks Trained with Early Stopping
 - Conformalization of Sparse Generalized Linear Models
 - Conformal Prediction for Federated Uncertainty Quantification Under Label Shift
 - Conformal Prediction Sets for Graph Neural Networks
 - Conformal Prediction with Missing Values
 - Consistency Models
 - Consistency of Multiple Kernel Clustering
 - Constant Matters: Fine-grained Error Bound on Differentially Private Continual Observation
 - Constrained Causal Bayesian Optimization
 - Constrained Decision Transformer for Offline Safe Reinforcement Learning
 - Constrained Efficient Global Optimization of Expensive Black-box Functions
 - Constrained Monotonic Neural Networks
 - Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching
 - Constrained Phi-Equilibria
 - Constraint Reasoning Embedded Structured Prediction
 - Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning
 - Context Consistency Regularization for Label Sparsity in Time Series
 - Contextual Combinatorial Bandits with Probabilistically Triggered Arms
 - Contextual Conservative Interleaving Bandits
 - Contextual Reliability: When Different Features Matter in Different Contexts
 - Continual Learners are Incremental Model Generalizers
 - Continual Learning in Linear Classification on Separable Data
 - Continual Task Allocation in Meta-Policy Network via Sparse Prompting
 - Continual Vision-Language Representation Learning with Off-Diagonal Information
 - Continuation Path Learning for Homotopy Optimization
 - Continuously Parameterized Mixture Models
 - Continuous Spatiotemporal Transformer
 - ContraBAR: Contrastive Bayes-Adaptive Deep RL
 - Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning
 - Contrastive Learning Meets Homophily: Two Birds with One Stone
 - Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
 - Controllability-Aware Unsupervised Skill Discovery
 - Controllable Neural Symbolic Regression
 - Controlled Differential Equations on Long Sequences via Non-standard Wavelets
 - Controlled Text Generation with Natural Language Instructions
 - Controlling Posterior Collapse by an Inverse Lipschitz Constraint on the Decoder Network
 - Controlling Type Confounding in Ad Hoc Teamwork with Instance-wise Teammate Feedback Rectification
 - Convergence of First-Order Methods for Constrained Nonconvex Optimization with Dependent Data
 - Convergence of Proximal Point and Extragradient-Based Methods Beyond Monotonicity: the Case of Negative Comonotonicity
 - Convex Geometry of ReLU-layers, Injectivity on the Ball and Local Reconstruction
 - Cooperation in the Latent Space: The Benefits of Adding Mixture Components in Variational Autoencoders
 - Cooperative Multi-Agent Reinforcement Learning: Asynchronous Communication and Linear Function Approximation
 - Cooperative Open-ended Learning Framework for Zero-Shot Coordination
 - Coordinated Dynamic Bidding in Repeated Second-Price Auctions with Budgets
 - Coordinate Descent Methods for Fractional Minimization
 - Correcting discount-factor mismatch in on-policy policy gradient methods
 - Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
 - “Could it have been different?” Counterfactuals in Minds and Machines
 - Counterfactual Analysis in Dynamic Latent State Models
 - Counterfactual Identifiability of Bijective Causal Models
 - Coupled Variational Autoencoder
 - Covariate balancing using the integral probability metric for causal inference
 - Crafting Training Degradation Distribution for the Accuracy-Generalization Trade-off in Real-World Super-Resolution
 - Cramming: Training a Language Model on a single GPU in one day.
 - CRISP: Curriculum based Sequential neural decoders for Polar code family
 - Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss
 - Cross-Entropy Loss Functions: Theoretical Analysis and Applications
 - Cross-Modal Fine-Tuning: Align then Refine
 - CrossSplit: Mitigating Label Noise Memorization through Data Splitting
 - CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations
 - Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments
 - Curious Replay for Model-based Adaptation
 - Curriculum Co-disentangled Representation Learning across Multiple Environments for Social Recommendation
 - Cut your Losses with Squentropy
 - Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization
 - D2Match: Leveraging Deep Learning and Degeneracy for Subgraph Matching
 - DADAO: Decoupled Accelerated Decentralized Asynchronous Optimization
 - Data-Copying in Generative Models: A Formal Framework
 - Data-Derived Weak Universal Consistency
 - Data-Driven Subgroup Identification for Linear Regression
 - Data-Efficient Contrastive Self-supervised Learning: Most Beneficial Examples for Supervised Learning Contribute the Least
 - Data Efficient Neural Scaling Law via Model Reusing
 - Data Feedback Loops: Model-driven Amplification of Dataset Biases
 - Data-OOB: Out-of-bag Estimate as a Simple and Efficient Data Value
 - Data Poisoning Attacks Against Multimodal Encoders
 - Data Representations' Study of Latent Image Manifolds
 - Dataset Distillation with Convexified Implicit Gradients
 - Data Structures for Density Estimation
 - DDGR: Continual Learning with Deep Diffusion-based Generative Replay
 - Decentralized SGD and Average-direction SAM are Asymptotically Equivalent
 - Decentralized Stochastic Bilevel Optimization with Improved per-Iteration Complexity
 - Decoding Layer Saliency in Language Transformers
 - DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design
 - Deep Anomaly Detection under Labeling Budget Constraints
 - Deep Clustering with Incomplete Noisy Pairwise Annotations: A Geometric Regularization Approach
 - Deep Generative Symbolic Regression with Monte-Carlo-Tree-Search
 - Deep Graph Representation Learning and Optimization for Influence Maximization
 - Deep Laplacian-based Options for Temporally-Extended Exploration
 - Deep Latent State Space Models for Time-Series Generation
 - Deep linear networks can benignly overfit when shallow ones do
 - Deep Perturbation Learning: Enhancing the Network Performance via Image Perturbations
 - Deep Regression Unlearning
 - Deep Temporal Sets with Evidential Reinforced Attentions for Unique Behavioral Pattern Discovery
 - Defects of Convolutional Decoder Networks in Frequency Representation
 - Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
 - Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback
 - Delay-agnostic Asynchronous Coordinate Update Algorithm
 - Delayed Bandits: When Do Intermediate Observations Help?
 - Delayed Feedback in Kernel Bandits
 - Delving into Noisy Label Detection with Clean Data
 - Demonstration-free Autonomous Reinforcement Learning via Implicit and Bidirectional Curriculum
 - Demystifying Disagreement-on-the-Line in High Dimensions
 - Demystifying Uneven Vulnerability of Link Stealing Attacks against Graph Neural Networks
 - Denoising MCMC for Accelerating Diffusion-Based Generative Models
 - DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models
 - DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
 - Detecting Adversarial Data by Probing Multiple Perturbations Using Expected Perturbation Score
 - Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions
 - Detecting Out-of-distribution Data through In-distribution Class Prior
 - Deterministic equivalent and error universality of deep random features learning
 - DevFormer: A Symmetric Transformer for Context-Aware Device Placement
 - Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation
 - DIFF2: Differential Private Optimization via Gradient Differences for Nonconvex Distributed Learning
 - Difference-in-Differences Meets Tree-based Methods: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding
 - Difference of submodular minimization via DC programming
 - Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators
 - Differentiable and Transportable Structure Learning
 - Differentiable Multi-Target Causal Bayesian Experimental Design
 - Differentiable Simulations for Enhanced Sampling of Rare Events
 - Differentiable Tree Operations Promote Compositional Generalization
 - Differentially Private Distributed Bayesian Linear Regression with MCMC
 - Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards
 - Differentially Private Hierarchical Clustering with Provable Approximation Guarantees
 - Differentially Private Optimization on Large Model at Small Cost
 - Differentially Private Sharpness-Aware Training
 - Differentially Private Stochastic Convex Optimization under a Quantile Loss Function
 - Differential Privacy has Bounded Impact on Fairness in Classification
 - Differential Privacy, Linguistic Fairness, and Training Data Influence: Impossibility and Possibility Theorems for Multilingual Language Models
 - Diffusion Based Representation Learning
 - Diffusion Models are Minimax Optimal Distribution Estimators
 - Diffusion Models as Artists: Are we Closing the Gap between Humans and Machines?
 - Diffusion Models for Black-Box Optimization
 - Dimensionality Reduction for General KDE Mode Finding
 - Dimension-independent Certified Neural Network Watermarks via Mollifier Smoothing
 - Dink-Net: Neural Clustering on Large Graphs
 - Directed Chain Generative Adversarial Networks
 - Direct Parameterization of Lipschitz-Bounded Deep Networks
 - Dirichlet Diffusion Score Model for Biological Sequence Generation
 - DiscoBAX - Discovery of optimal intervention sets in genomic experiment design
 - Discover and Cure: Concept-aware Mitigation of Spurious Correlation
 - Discovering Agent-Centric Latent States in Theory and in Practice
 - Discovering Object-Centric Generalized Value Functions From Pixels
 - Discover-Then-Rank Unlabeled Support Vectors in the Dual Space for Multi-Class Active Learning
 - Discrete Continuous Optimization Framework for Simultaneous Clustering and Training in Mixture Models
 - Discrete Key-Value Bottleneck
 - Disentangled Generative Models for Robust Prediction of System Dynamics
 - Disentangled Multi-Fidelity Deep Bayesian Active Learning
 - Disentangled Multiplex Graph Representation Learning
 - Disinformation, Fake News and Computational Propaganda: Challenges and Opportunities for Machine Learning Research
 - Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
 - Distance Weighted Supervised Learning for Offline Interaction Data
 - Distilling Internet-Scale Vision-Language Models into Embodied Agents
 - Distortion and Uncertainty Aware Loss for Panoramic Depth Completion
 - Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost
 - Distributed Linear Bandits under Communication Constraints
 - Distributed Stochastic Gradient Descent: Nonconvexity, Nonsmoothness, and Convergence to Local Minima
 - Distributional Offline Policy Evaluation with Predictive Error Guarantees
 - Distribution-dependent McDiarmid-type Inequalities for Functions of Unbounded Interaction
 - Distribution Free Domain Generalization
 - Distribution Free Prediction Sets for Node Classification
 - Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior Inference
 - Diversity-enhancing Generative Network for Few-shot Hypothesis Adaptation
 - Divide and Conquer Dynamic Programming: An Almost Linear Time Change Point Detection Methodology in High Dimensions
 - Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat
 - DIVISION: Memory Efficient Training via Dual Activation Precision
 - DMLR Workshop: Data-centric Machine Learning Research
 - DoCoFL: Downlink Compression for Cross-Device Federated Learning
 - Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling
 - Does a Neural Network Really Encode Symbolic Concepts?
 - Does Continual Learning Equally Forget All Parameters?
 - Does Sparsity Help in Learning Misspecified Linear Bandits?
 - DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule
 - Do Machine Learning Models Learn Statistical Rules Inferred from Data?
 - Domain Adaptation for Time Series Under Feature and Label Shifts
 - DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
 - Do Not Train It: A Linear Neural Architecture Search of Graph Neural Networks
 - Do Perceptually Aligned Gradients Imply Robustness?
 - Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark
 - Double-Weighting for Covariate Shift Adaptation
 - Doubly Adversarial Federated Bandits
 - Doubly Optimal No-Regret Learning in Monotone Games
 - Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection
 - DP-Fast MH: Private, Fast, and Accurate Metropolis-Hastings for Large-Scale Bayesian Inference
 - DRCFS: Doubly Robust Causal Feature Selection
 - DRew: Dynamically Rewired Message Passing with Delay
 - Dropout Reduces Underfitting
 - Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions
 - DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation
 - DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm
 - Dual Focal Loss for Calibration
 - DualHSIC: HSIC-Bottleneck and Alignment for Continual Learning
 - Duality Principles for Modern Machine Learning
 - Dual Propagation: Accelerating Contrastive Hebbian Learning with Dyadic Neurons
 - DUET: 2D Structured and Approximately Equivariant Representations
 - dugMatting: Decomposed-Uncertainty-Guided Matting
 - Dynamical Linear Bandits
 - Dynamic Constrained Submodular Optimization with Polylogarithmic Update Time
 - Dynamic Regularized Sharpness Aware Minimization in Federated Learning: Approaching Global Consistency and Smooth Landscape
 - Dynamics-inspired Neuromorphic Visual Representation Learning
 - E$(n)$ Equivariant Message Passing Simplicial Networks
 - ED-Batch: Efficient Automatic Batching of Dynamic Neural Networks via Learned Finite State Machines
 - EF21-P and Friends: Improved Theoretical Communication Complexity for Distributed Optimization with Bidirectional Compression
 - Effective and Efficient Structural Inference with Reservoir Computing
 - Effectively Using Public Data in Privacy Preserving Machine Learning
 - Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories
 - Effective Neural Topic Modeling with Embedding Clustering Regularization
 - Effective Structured Prompting by Meta-Learning and Representative Verbalizer
 - Efficient Algorithms for Exact Graph Matching on Correlated Stochastic Block Models with Constant Correlation
 - Efficient and Degree-Guided Graph Generation via Discrete Diffusion Modeling
 - Efficient and Equivariant Graph Networks for Predicting Quantum Hamiltonian
 - Efficient Approximations of Complete Interatomic Potentials for Crystal Property Prediction
 - Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram Iteration
 - Efficient displacement convex optimization with particle gradient descent
 - Efficient Exploration via Epistemic-Risk-Seeking Policy Optimization
 - Efficient Graph Field Integrators Meet Point Clouds
 - Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming
 - Efficient Learning of Mesh-Based Physical Simulation with Bi-Stride Multi-Scale Graph Neural Network
 - Efficient List-Decodable Regression using Batches
 - Efficiently predicting high resolution mass spectra with graph neural networks
 - Efficient Online Reinforcement Learning with Offline Data
 - Efficient Parametric Approximations of Neural Network Function Space Distance
 - Efficient Personalized Federated Learning via Sparse Model-Adaptation
 - Efficient preconditioned stochastic gradient descent for estimation in latent variable models
 - Efficient Quantum Algorithms for Quantum Optimal Control
 - Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation
 - Efficient RL via Disentangled Environment and Agent Representations
 - Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
 - Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
 - Efficient Training of Language Models using Few-Shot Learning
 - Efficient Transformed Gaussian Processes for Non-Stationary Dependent Multi-class Classification
 - Eliminating Adversarial Noise via Information Discard and Robust Representation Restoration
 - ELSA: Efficient Label Shift Adaptation through the Lens of Semiparametric Models
 - Emergence of Adaptive Circadian Rhythms in Deep Reinforcement Learning
 - Emergence of Sparse Representations from Noise
 - Emergent Agentic Transformer from Chain of Hindsight Experience
 - Emergent Asymmetry of Precision and Recall for Measuring Fidelity and Diversity of Generative Models in High Dimensions
 - EM-Network: Oracle Guided Self-distillation for Sequence Learning
 - Enabling First-Order Gradient-Based Learning for Equilibrium Computation in Markets
 - End-to-end Differentiable Clustering with Associative Memories
 - End-to-End Full-Atom Antibody Design
 - End-to-End Learning for Stochastic Optimization: A Bayesian Perspective
 - End-to-End Multi-Object Detection with a Regularized Mixture Model
 - End-to-end Training of Deep Boltzmann Machines by Unbiased Contrastive Divergence with Local Mode Initialization
 - Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments
 - Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language
 - Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning
 - Entropy-driven Unsupervised Keypoint Representation Learning in Videos
 - Equivariance with Learned Canonicalization Functions
 - Equivariant Architectures for Learning in Deep Weight Spaces
 - Equivariant Polynomials for Graph Neural Networks
 - Escaping saddle points in zeroth-order optimization: the power of two-point estimators
 - ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation
 - ES-FoMo: Efficient Systems for Foundation Models
 - Estimating Causal Effects using a Multi-task Deep Ensemble
 - Estimating Heterogeneous Treatment Effects: Mutual Information Bounds and Learning Algorithms
 - Estimating Joint Treatment Effects by Combining Multiple Experiments
 - Estimating Possible Causal Effects with Latent Variables via Adjustment
 - Estimating the Contamination Factor's Distribution in Unsupervised Anomaly Detection
 - Estimation Beyond Data Reweighting: Kernel Method of Moments
 - Evaluating Self-Supervised Learning via Risk Decomposition
 - Evaluating Unsupervised Denoising Requires Unsupervised Metrics
 - Eventual Discounting Temporal Logic Counterfactual Experience Replay
 - Everyone's Preference Changes Differently: A Weighted Multi-Interest Model For Retrieval
 - Evidential Interactive Learning for Medical Image Captioning
 - Evolving Semantic Prototype Improves Generative Zero-Shot Learning
 - Ewald-based Long-Range Message Passing for Molecular Graphs
 - Exact Inference in High-order Structured Prediction
 - Existence and Estimation of Critical Batch Size for Training Generative Adversarial Networks with Two Time-Scale Update Rule
 - Existence, Stability and Scalability of Orthogonal Convolutional Neural Networks
 - Expectation-Complete Graph Representations with Homomorphisms
 - Expected Gradients of Maxout Networks and Consequences to Parameter Initialization
 - Expertise Trees Resolve Knowledge Limitations in Collective Decision-Making
 - Exphormer: Sparse Transformers for Graphs
 - Explainability as statistical inference
 - Explainable Data-Driven Optimization: From Context to Decision and Back Again
 - Explaining Reinforcement Learning with Shapley Values
 - Explaining the effects of non-convergent MCMC in the training of Energy-Based Models
 - Exploiting locality in high-dimensional Factorial hidden Markov models
 - Explore and Exploit the Diverse Knowledge in Model Zoo for Domain Generalization
 - Exploring Chemical Space with Score-based Out-of-distribution Generation
 - Exploring Model Dynamics for Accumulative Poisoning Discovery
 - Exploring the Benefits of Training Expert Language Models over Instruction Tuning
 - Exploring the Limits of Model-Targeted Indiscriminate Data Poisoning Attacks
 - Exponential Smoothing for Off-Policy Learning
 - Extending Conformal Prediction to Hidden Markov Models with Exact Validity via de Finetti's Theorem for Markov Chains
 - Extending Kernel PCA through Dualization: Sparsity, Robustness and Fast Algorithms
 - Extrapolated Random Tree for Regression
 - Extrapolative Controlled Sequence Generation via Iterative Refinement
 - Facial Expression Recognition with Adaptive Frame Rate based on Multiple Testing Correction
 - FaDIn: Fast Discretized Inference for Hawkes Processes with General Parametric Kernels
 - FAENet: Frame Averaging Equivariant GNN for Materials Modeling
 - Fair and Accurate Decision Making through Group-Aware Learning
 - Fair and Optimal Classification via Post-Processing
 - Fair and Robust Estimation of Heterogeneous Treatment Effects for Policy Learning
 - Fair Densities via Boosting the Sufficient Statistics of Exponential Families
 - FAIRER: Fairness as Decision Rationale Alignment
 - Fair Neighbor Embedding
 - Fairness in Matching under Uncertainty
 - Fairness in Streaming Submodular Maximization over a Matroid Constraint
 - Fair yet Asymptotically Equal Collaborative Learning
 - Faith-Shap: The Faithful Shapley Interaction Index
 - FARE: Provably Fair Representation Learning with Practical Certificates
 - Fascinating Supervisory Signals and Where to Find Them: Deep Anomaly Detection with Scale Learning
 - Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization
 - Fast Algorithms for Distributed k-Clustering with Outliers
 - Fast as CHITA: Neural Network Pruning with Combinatorial Optimization
 - Fast Combinatorial Algorithms for Min Max Correlation Clustering
 - Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective
 - Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization
 - Faster Rates of Convergence to Stationary Points in Differentially Private Optimization
 - Fast Excess Risk Rates via Offset Rademacher Complexity
 - Fast Federated Machine Unlearning with Nonlinear Functional Theory
 - Fast Inference from Transformers via Speculative Decoding
 - Fast Online Node Labeling for Very Large Graphs
 - Fast Online Value-Maximizing Prediction Sets with Conformal Cost Control
 - Fast Private Kernel Density Estimation via Locality Sensitive Quantization
 - Fast Rates for Maximum Entropy Exploration
 - Fast Rates in Time-Varying Strongly Monotone Games
 - Fast Sampling of Diffusion Models via Operator Learning
 - Featured Graph Coarsening with Similarity Guarantees
 - Feature Directions Matter: Long-Tailed Learning via Rotated Balanced Representation
 - Feature Expansion for Graph Neural Networks
 - Feature learning in deep classifiers through Intermediate Neural Collapse
 - Feature Programming for Multivariate Time Series Prediction
 - FedAvg Converges to Zero Training Loss Linearly for Overparameterized Multi-Layer Neural Networks
 - FedBR: Improving Federated Learning on Heterogeneous Data via Local Learning Bias Reduction
 - Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated Learning via Class-Imbalance Reduction
 - FedCR: Personalized Federated Learning Based on Across-Client Common Representation with Conditional Mutual Information Regularization
 - FedDisco: Federated Learning with Discrepancy-Aware Collaboration
 - Federated Adversarial Learning: A Framework with Convergence Analysis
 - Federated Conformal Predictors for Distributed Uncertainty Quantification
 - Federated Heavy Hitter Recovery under Linear Sketching
 - Federated Learning and Analytics in Practice: Algorithms, Systems, Applications, and Opportunities
 - Federated Linear Contextual Bandits with User-level Differential Privacy
 - Federated Online and Bandit Convex Optimization
 - FedHPO-Bench: A Benchmark Suite for Federated Hyperparameter Optimization
 - FedVS: Straggler-Resilient and Privacy-Preserving Vertical Federated Learning for Split Models
 - FeDXL: Provable Federated Learning for Deep X-Risk Optimization
 - Feed Two Birds with One Scone: Exploiting Wild Data for Both Out-of-Distribution Generalization and Detection
 - Few-bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction
 - Few-Sample Feature Selection via Feature Manifold Learning
 - Fighting Fire with Fire: Contrastive Debiasing without Bias-free Data via Generative Bias-transformation
 - Finding Generalization Measures by Contrasting Signal and Noise
 - Finding the Missing-half: Graph Complementary Learning for Homophily-prone and Heterophily-prone Graphs
 - Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron
 - Fisher Information Embedding for Node and Graph Learning
 - Flash: Concept Drift Adaptation in Federated Learning
 - FLEX: an Adaptive Exploration Algorithm for Nonlinear Systems
 - FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
 - Flexible Model Aggregation for Quantile Regression
 - Flexible Phase Dynamics for Bio-Plausible Contrastive Learning
 - FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization
 - Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement Learning
 - Forget Unlearning: Towards True Data-Deletion in Machine Learning
 - Formalizing Preferences Over Runtime Distributions
 - For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal
 - Forward-Backward Gaussian Variational Inference via JKO in the Bures-Wasserstein Space
 - Fourmer: An Efficient Global Modeling Paradigm for Image Restoration
 - FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation
 - Fractional Denoising for 3D Molecular Pre-training
 - FREDIS: A Fusion Framework of Refinement and Disambiguation for Unreliable Partial Label Learning
 - Free-Form Variational Inference for Gaussian Process State-Space Models
 - From Adaptive Query Release to Machine Unlearning
 - From Hypergraph Energy Functions to Hypergraph Neural Networks
 - From Noisy Fixed-Point Iterations to Private ADMM for Centralized and Federated Learning
 - From Perception to Programs: Regularize, Overparameterize, and Amortize
 - From Relational Pooling to Subgraph GNNs: A Universal Framework for More Expressive Graph Neural Networks
 - From Robustness to Privacy and Back
 - From Temporal to Contemporaneous Iterative Causal Discovery in the Presence of Latent Confounders
 - Fully-Adaptive Composition in Differential Privacy
 - Fully Bayesian Autoencoders with Latent Sparse Gaussian Processes
 - Fully Dynamic Submodular Maximization over Matroids
 - Functional Neural Networks: Shift invariant models for functional data with applications to EEG classification
 - Function-Space Regularization in Neural Networks: A Probabilistic Perspective
 - Fundamental Limits of Two-layer Autoencoders, and Achieving Them with Gradient Methods
 - Fundamental Tradeoffs in Learning with Prior Information
 - FusionRetro: Molecule Representation Fusion via In-Context Learning for Retrosynthetic Planning
 - Future-conditioned Unsupervised Pretraining for Decision Transformer
 - GAT: Guided Adversarial Training with Pareto-optimal Auxiliary Tasks
 - Gaussian processes at the Helm(holtz): A more fluid model for ocean currents
 - Gaussian Process Priors for Systems of Linear Partial Differential Equations with Constant Coefficients
 - GC-Flow: A Graph-Based Flow Network for Effective Clustering
 - GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models
 - GeCoNeRF: Few-shot Neural Radiance Fields via Geometric Consistency
 - General Covariance Data Augmentation for Neural PDE Solvers
 - Generalization Analysis for Contrastive Representation Learning
 - Generalization Bounds using Data-Dependent Fractal Dimensions
 - Generalization on the Unseen, Logic Reasoning and Degree Curriculum
 - Generalized Disparate Impact for Configurable Fairness Solutions in ML
 - Generalized Implicit Follow-The-Regularized-Leader
 - Generalized Polyak Step Size for First Order Optimization with Momentum
 - Generalized Reductions: Making any Hierarchical Clustering Fair and Balanced with Low Cost
 - Generalized-Smooth Nonconvex Optimization is As Efficient As Smooth Nonconvex Optimization
 - Generalized Teacher Forcing for Learning Chaotic Dynamics
 - Generalizing Neural Wave Functions
 - General Sequential Episodic Memory Model
 - Generated Graph Detection
 - Generating Language Corrections for Teaching Physical Control Tasks
 - Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds
 - Generating Private Synthetic Data with Genetic Algorithms
 - Generative Adversarial Symmetry Discovery
 - Generative AI and Law (GenLaw)
 - Generative Causal Representation Learning for Out-of-Distribution Motion Forecasting
 - Generative Decoding of Visual Stimuli
 - Generative Graph Dictionary Learning
 - Generative Pretraining for Black-Box Optimization
 - Geometric Autoencoders - What You See is What You Decode
 - Geometric Clifford Algebra Networks
 - Geometric Latent Diffusion Models for 3D Molecule Generation
 - GFlowNet-EM for Learning Compositional Latent Variable Models
 - GFlowOut: Dropout with Generative Flow Networks
 - GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration
 - Gibbsian Polar Slice Sampling
 - Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models
 - Global Context Vision Transformers
 - Global Convergence of Sub-gradient Method for Robust Matrix Recovery: Small Initialization, Noisy Measurements, and Over-parameterization
 - Global optimality for Euclidean CCCP under Riemannian convexity
 - Global optimality of Elman-type RNNs in the mean-field regime
 - Global Optimization with Parametric Function Approximation
 - Global Selection of Contrastive Batches via Optimization on Sample Permutations
 - GLOBE-CE: A Translation Based Approach for Global Counterfactual Explanations
 - GNN&GBDT-Guided Fast Optimizing Framework for Large-scale Integer Programming
 - GNOT: A General Neural Operator Transformer for Operator Learning
 - GOAT: A Global Transformer on Large-scale Graphs
 - Go Beyond Imagination: Maximizing Episodic Reachability with World Models
 - Gradient-based Wang--Landau Algorithm: A Novel Sampler for Output Distribution of Neural Networks over the Input Space
 - Gradient Descent Converges Linearly for Logistic Regression on Separable Data
 - Gradient Descent Finds the Global Optima of Two-Layer Physics-Informed Neural Networks
 - Gradient Descent in Neural Networks as Sequential Learning in Reproducing Kernel Banach Space
 - Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond
 - Gradient-Free Structured Pruning with Unlabeled Data
 - GRAFENNE: Learning on Graphs with Heterogeneous and Dynamic Feature Sets
 - GraphCleaner: Detecting Mislabelled Samples in Popular Graph Learning Benchmarks
 - Graph Contrastive Backdoor Attacks
 - Graph Generative Model for Benchmarking Graph Neural Networks
 - Graphically Structured Diffusion Models
 - Graph Inductive Biases in Transformers without Message Passing
 - Graph Ladling: Shockingly Simple Parallel GNN Training without Intermediate Communication
 - Graph Mixup with Soft Alignments
 - Graph Neural Networks can Recover the Hidden Features Solely from the Graph Structure
 - Graph Neural Networks with Learnable and Optimal Polynomial Bases
 - Graph Neural Tangent Kernel: Convergence on Large Graphs
 - Graph Positional Encoding via Random Feature Propagation
 - Graph Reinforcement Learning for Network Control via Bi-Level Optimization
 - Graph Switching Dynamical Systems
 - GREAD: Graph Neural Reaction-Diffusion Networks
 - Great Models Think Alike: Improving Model Reliability via Inter-Model Latent Agreement
 - Grounding Language Models to Images for Multimodal Inputs and Outputs
 - Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
 - Group Equivariant Fourier Neural Operators for Partial Differential Equations
 - GuardHFL: Privacy Guardian for Heterogeneous Federated Learning
 - Guiding Pretraining in Reinforcement Learning with Large Language Models
 - Half-Hop: A graph upsampling approach for slowing down message passing
 - Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games
 - Hardware-Aware Compression with Random Operation Access Specific Tile (ROAST) Hashing
 - Harmonic Neural Networks
 - HarsanyiNet: Computing Accurate Shapley Values in a Single Forward Propagation
 - HETAL: Efficient Privacy-preserving Transfer Learning with Homomorphic Encryption
 - Hidden Symmetries of ReLU Networks
 - Hiding Data Helps: On the Benefits of Masking for Sparse Coding
 - Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
 - Hierarchical Diffusion for Offline Decision Making
 - Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction
 - Hierarchical Imitation Learning with Vector Quantized Models
 - Hierarchical Neural Coding for Controllable CAD Model Generation
 - Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs
 - Hierarchies of Reward Machines
 - High-dimensional Clustering onto Hamiltonian Cycle
 - High-dimensional Location Estimation via Norm Concentration for Subgamma Vectors
 - High Fidelity Image Counterfactuals with Probabilistic Causal Models
 - High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance
 - High Probability Convergence of Stochastic Gradient Methods
 - HiLD: High-dimensional Learning Dynamics Workshop
 - Hindsight Learning for MDPs with Exogenous Inputs
 - H-Likelihood Approach to Deep Neural Networks with Temporal-Spatial Random Effects for High-Cardinality Categorical Features
 - Homomorphism AutoEncoder --- Learning Group Structured Representations from Observed Transitions
 - HOPE: High-order Graph ODE For Modeling Interacting Dynamics
 - Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes
 - Horizon-free Learning for Markov Decision Processes and Games: Stochastically Bounded Rewards and Improved Bounds
 - How Bad is Top-$K$ Recommendation under Competing Content Creators?
 - How Does Information Bottleneck Help Deep Learning?
 - How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
 - How Jellyfish Characterise Alternating Group Equivariant Neural Networks
 - How Many Perturbations Break This Model? Evaluating Robustness Beyond Adversarial Accuracy
 - How much does Initialization Affect Generalization?
 - How Powerful are Shallow Neural Networks with Bandlimited Random Weights?
 - How to address monotonicity for model risk management?
 - How to DP-fy ML: A Practical Tutorial to Machine Learning with Differential Privacy
 - How to Trust Your Diffusion Model: A Convex Optimization Approach to Conformal Risk Control
 - Human-Timescale Adaptation in an Open-Ended Task Space
 - Hybrid Energy Based Model in the Feature Space for Out-of-Distribution Detection
 - Hyena Hierarchy: Towards Larger Convolutional Language Models
 - Hyperbolic Diffusion Embedding and Distance for Hierarchical Representation Learning
 - Hyperbolic Image-text Representations
 - Hyperbolic Representation Learning: Revisiting and Advancing
 - Hyperparameters in Reinforcement Learning and How To Tune Them
 - HyperTuning: Toward Adapting Large Language Models without Back-propagation
 - Hypervolume Knowledge Gradient: A Lookahead Approach for Multi-Objective Bayesian Optimization with Partial Information
 - Hypothesis Transfer Learning with Surrogate Classification Losses: Generalization Bounds through Algorithmic Stability
 - I$^2$SB: Image-to-Image Schrödinger Bridge
 - ICML 2023 Workshop on Computational Biology
 - Identifiability and Generalizability in Constrained Inverse Reinforcement Learning
 - Identifiability of Label Noise Transition Matrix
 - Identification of the Adversary from a Single Adversarial Example
 - Identifying Interpretable Subspaces in Image Representations
 - Identifying Useful Learnwares for Heterogeneous Label Spaces
 - ILLUME: Rationalizing Vision-Language Models through Human Interactions
 - Image generation with shortest path diffusion
 - Image Restoration with Mean-Reverting Stochastic Differential Equations
 - Image Shortcut Squeezing: Countering Perturbative Availability Poisons with Compression
 - Implicit Graph Neural Networks: A Monotone Operator Viewpoint
 - Implicit Jacobian regularization weighted with impurity of probability output
 - Implicit Neural Spatial Representations for Time-dependent PDEs
 - Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression
 - Importance Weighted Expectation-Maximization for Protein Sequence Design
 - Improved Active Multi-Task Representation Learning via Lasso
 - Improved Algorithms for Multi-period Multi-class Packing Problems with Bandit Feedback
 - Improved Algorithms for White-Box Adversarial Streams
 - Improved Analysis of Score-based Generative Modeling: User-Friendly Bounds under Minimal Smoothness Assumptions
 - Improved Learning-Augmented Algorithms for the Multi-Option Ski Rental Problem via Best-Possible Competitive Analysis
 - Improved Online Conformal Prediction via Strongly Adaptive Online Learning
 - Improved Online Learning Algorithms for CTR Prediction in Ad Auctions
 - Improved Policy Evaluation for Randomized Trials of Algorithmic Resource Allocation
 - Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation
 - Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs
 - Improving Adversarial Robustness by Putting More Regularizations on Less Robust Samples
 - Improving Adversarial Robustness of Deep Equilibrium Models with Explicit Regulations Along the Neural Dynamics
 - Improving Adversarial Robustness Through the Contrastive-Guided Diffusion Process
 - Improving Bi-level Optimization Based Methods with Inspiration from Humans' Classroom Study Techniques
 - Improving Expert Predictions with Conformal Prediction
 - Improving Fair Training under Correlation Shifts
 - Improving Graph Generation by Restricting Graph Bandwidth
 - Improving Graph Neural Networks with Learnable Propagation Operators
 - Improving Hyperparameter Learning under Approximate Inference in Gaussian Process Models
 - Improving l1-Certified Robustness via Randomized Smoothing by Leveraging Box Constraints
 - Improving Medical Predictions by Irregular Multimodal Electronic Health Records Modeling
 - Improving Statistical Fidelity for Neural Image Compression with Implicit Local Likelihood Models
 - Improving the Model Consistency of Decentralized Federated Learning
 - Improving Visual Prompt Tuning for Self-supervised Vision Transformers
 - IncDSI: Incrementally Updatable Document Retrieval
 - Incentivizing Exploration with Linear Contexts and Combinatorial Actions
 - Individually Fair Learning with One-Sided Feedback
 - Inferring Relational Potentials in Interacting Systems
 - Infinite Action Contextual Bandits with Reusable Data Exhaust
 - Inflow, Outflow, and Reciprocity in Machine Learning
 - InfoDiffusion: Representation Learning Using Information Maximizing Diffusion Models
 - InfoOT: Information Maximizing Optimal Transport
 - Information-Theoretic State Space Model for Multi-View Reinforcement Learning
 - Infusing Lattice Symmetry Priors in Attention Mechanisms for Sample-Efficient Abstract Geometric Reasoning
 - InGram: Inductive Knowledge Graph Embedding via Relation Graphs
 - In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation
 - Input Perturbation Reduces Exposure Bias in Diffusion Models
 - Input uncertainty propagation through trained neural networks
 - In Search for a Generalizable Method for Source Free Domain Adaptation
 - In Search of Insights, Not Magic Bullets: Towards Demystification of the Model Selection Dilemma in Heterogeneous Treatment Effect Estimation
 - Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models
 - Instrumental Variable Estimation of Average Partial Causal Effects
 - Integrating Prior Knowledge in Contrastive Learning with Kernel
 - Interactive Learning with Implicit Human Feedback
 - Interactive Object Placement with Reinforcement Learning
 - Internally Rewarded Reinforcement Learning
 - Internet Explorer: Targeted Representation Learning on the Open Web
 - Interpolation for Robust Learning: Data Augmentation on Wasserstein Geodesics
 - Interpretable Neural-Symbolic Concept Reasoning
 - Interval Bound Interpolation for Few-shot Learning with Few Tasks
 - Interventional Causal Representation Learning
 - Intrinsic Sliced Wasserstein Distances for Comparing Collections of Probability Distributions on Manifolds and Graphs
 - Invariance in Policy Optimisation and Partial Identifiability in Reward Learning
 - Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames
 - Inverse Reinforcement Learning without Reinforcement Learning
 - Investigating the Role of Model-Based Learning in Exploration and Transfer
 - IRNeXt: Rethinking Convolutional Network Design for Image Restoration
 - Is Consensus Acceleration Possible in Decentralized Optimization over Slowly Time-Varying Networks?
 - Is Learning Summary Statistics Necessary for Likelihood-free Inference?
 - Is Overfitting Necessary for Implicit Video Representation?
 - Iterative Approximate Cross-Validation
 - JAWS-X: Addressing Efficiency Bottlenecks of Conformal Prediction Under Standard and Feedback Covariate Shift
 - Jump-Start Reinforcement Learning
 - KDEformer: Accelerating Transformers via Kernel Density Estimation
 - Kernel Logistic Regression Approximation of an Understandable ReLU Neural Network
 - Kernel QuantTree
 - Kernel Sufficient Dimension Reduction and Variable Selection for Compositional Data via Amalgamation
 - Knowledge and Logical Reasoning in the Era of Data-driven Learning
 - Knowledge Hypergraph Embedding Meets Relational Algebra
 - K-SHAP: Policy Clustering Algorithm for Anonymous Multi-Agent State-Action Pairs
 - Label differential privacy and private training data release
 - Label Distributionally Robust Losses for Multi-class Classification: Consistency, Robustness and Adaptivity
 - Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning
 - Language Instructed Reinforcement Learning for Human-AI Coordination
 - Large Language Models Can Be Easily Distracted by Irrelevant Context
 - Large Language Models Struggle to Learn Long-Tail Knowledge
 - Last Switch Dependent Bandits with Monotone Payoff Functions
 - Latent Traversals in Generative Models as Potential Flows
 - Layered State Discovery for Incremental Autonomous Exploration
 - Lazy Agents: A New Perspective on Solving Sparse Reward Problem in Multi-agent Reinforcement Learning
 - LazyGNN: Large-Scale Graph Neural Networks via Lazy Propagation
 - LeadFL: Client Self-Defense against Model Poisoning in Federated Learning
 - Learnability and Algorithm for Continual Learning
 - Learning Affinity with Hyperbolic Representation for Spatial Propagation
 - Learning Antidote Data to Individual Unfairness
 - Learning-augmented private algorithms for multiple quantile release
 - Learning Belief Representations for Partially Observable Deep RL
 - Learning Compiler Pass Orders using Coreset and Normalized Value Prediction
 - Learning Control by Iterative Inversion
 - Learning Controllable Degradation for Real-World Super-Resolution via Constrained Flows
 - Learning Control-Oriented Dynamical Structure from Data
 - Learning Deductive Reasoning from Synthetic Corpus based on Formal Logic
 - Learning Deep Time-index Models for Time Series Forecasting
 - Learning Dense Correspondences between Photos and Sketches
 - Learning Distributions over Quantum Measurement Outcomes
 - Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation
 - Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural Networks
 - Learning for Edge-Weighted Online Bipartite Matching with Robustness Guarantees
 - Learning Functional Distributions with Private Labels
 - Learning GFlowNets From Partial Episodes For Improved Convergence And Stability
 - Learning Globally Smooth Functions on Manifolds
 - Learning Hidden Markov Models When the Locations of Missing Observations are Unknown
 - Learning in POMDPs is Sample-Efficient with Hindsight Observability
 - Learning Instance-Specific Augmentations by Capturing Local Invariances
 - Learning Intuitive Policies Using Action Features
 - Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation
 - Learning Mixtures of Gaussians with Censored Data
 - Learning Mixtures of Markov Chains and MDPs
 - Learning Neural Constitutive Laws from Motion Observations for Generalizable PDE Dynamics
 - Learning Neural PDE Solvers with Parameter-Guided Channel Attention
 - Learning Noisy OR Bayesian Networks with Max-Product Belief Propagation
 - Learning Optimal Group-structured Individualized Treatment Rules with Many Treatments
 - Learning Perturbations to Explain Time Series Predictions
 - Learning Physical Models that Can Respect Conservation Laws
 - Learning Preconditioners for Conjugate Gradient PDE Solvers
 - Learning Prescriptive ReLU Networks
 - Learning-Rate-Free Learning by D-Adaptation
 - Learning Rate Schedules in the Presence of Distribution Shift
 - Learning Regions of Interest for Bayesian Optimization with Adaptive Level-Set Estimation
 - Learning Representations without Compositional Assumptions
 - Learning Signed Distance Functions from Noisy 3D Point Clouds via Noise to Noise Mapping
 - Learning Subpocket Prototypes for Generalizable Structure-based Drug Design
 - Learning Temporally AbstractWorld Models without Online Experimentation
 - Learning the Dynamics of Sparsely Observed Interacting Systems
 - Learning the Right Layers a Data-Driven Layer-Aggregation Strategy for Semi-Supervised Learning on Multilayer Graphs
 - Learning to acquire novel cognitive tasks with evolution, plasticity and meta-meta-learning
 - Learning to Bid in Repeated First-Price Auctions with Budgets
 - Learning to Boost Training by Periodic Nowcasting Near Future Weights
 - Learning to Decouple Complex Systems
 - Learning to Design Analog Circuits to Meet Threshold Specifications
 - Learning to Incentivize Information Acquisition: Proper Scoring Rules Meet Principal-Agent Model
 - Learning to Initiate and Reason in Event-Driven Cascading Processes
 - Learning to Jump: Thinning and Thickening Latent Counts for Generative Modeling
 - Learning to Learn from APIs: Black-Box Data-Free Meta-Learning
 - Learning to Maximize Mutual Information for Dynamic Feature Selection
 - Learning to Optimize Differentiable Games
 - Learning to Suggest Breaks: Sustainable Optimization of Long-Term User Engagement
 - Learning Unforeseen Robustness from Out-of-distribution Data Using Equivariant Domain Translator
 - Learning Unnormalized Statistical Models via Compositional Optimization
 - Learning useful representations for shifting tasks and distributions
 - Learn to Accumulate Evidence from All Training Samples: Theory and Practice
 - LegendreTron: Uprising Proper Multiclass Loss Learning
 - Less is More: Task-aware Layer-wise Distillation for Language Model Compression
 - LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework
 - LESS-VFL: Communication-Efficient Feature Selection for Vertical Federated Learning
 - Let's Make Block Coordinate Descent Converge Faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence
 - Leveraging Demonstrations to Improve Online Learning: Quality Matters
 - Leveraging Label Non-Uniformity for Node Classification in Graph Neural Networks
 - Leveraging Offline Data in Online Reinforcement Learning
 - Leveraging Proxy of Training Data for Test-Time Adaptation
 - LEVER: Learning to Verify Language-to-Code Generation with Execution
 - Lifelong Language Pretraining with Distribution-Specialized Experts
 - Likelihood Adjusted Semidefinite Programs for Clustering Heterogeneous Data
 - Linear Causal Disentanglement via Interventions
 - Linear CNNs Discover the Statistical Structure of the Dataset Using Only the Most Dominant Frequencies
 - Linearly Constrained Bilevel Optimization: A Smoothed Implicit Gradient Approach
 - Linear optimal partial transport embedding
 - Linear Time GPs for Inferring Latent Trajectories from Neural Spike Trains
 - Linkless Link Prediction via Relational Distillation
 - LinSATNet: The Positive Linear Satisfiability Neural Networks
 - LipsNet: A Smooth and Robust Neural Network with Adaptive Lipschitz Constant for High Accuracy Optimal Control
 - Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy
 - LIV: Language-Image Representations and Rewards for Robotic Control
 - Localized Learning: Decentralized Model Updates via Non-Global Objectives
 - Locally Regularized Neural Differential Equations: Some Black Boxes were meant to remain closed!
 - Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning
 - Local Vertex Colouring Graph Neural Networks
 - LongCoder: A Long-Range Pre-trained Language Model for Code Completion
 - Long Horizon Temperature Scaling
 - Long-Tailed Recognition by Mutual Information Maximization between Latent Features and Ground-Truth Labels
 - Long-Term Rhythmic Video Soundtracker
 - Lookahead When It Matters: Adaptive Non-causal Transformers for Streaming Neural Transducers
 - LookupFFN: Making Transformers Compute-lite for CPU inference
 - Looped Transformers as Programmable Computers
 - LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation
 - Loss Balancing for Fair Supervised Learning
 - Loss-Guided Diffusion Models for Plug-and-Play Controllable Generation
 - Lottery Tickets in Evolutionary Optimization: On Sparse Backpropagation-Free Trainability
 - Low Complexity Homeomorphic Projection to Ensure Neural-Network Solution Feasibility for Optimization over (Non-)Convex Set
 - Lower Bounds for Learning in Revealing POMDPs
 - Lowering the Pre-training Tax for Gradient-based Subset Training: A Lightweight Distributed Pre-Training Toolkit
 - Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling
 - Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single
 - LSDS++ : Dual Sampling for Accelerated k-means++
 - MABe22: A Multi-Species Multi-Task Benchmark for Learned Representations of Behavior
 - Machine Learning Force Fields with Data Cost Aware Training
 - Machine Learning for Multimodal Healthcare Data
 - Machine Learning with Social Purpose
 - MAGANet: Achieving Combinatorial Generalization by Modeling a Group Action
 - Magneto: A Foundation Transformer
 - MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations
 - Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
 - MALTS: Matching After Learning to Stretch
 - MANSA: Learning Fast and Slow in Multi-Agent Systems
 - Marginalization is not Marginal: No Bad VAE Local Minima when Learning Optimal Sparse Representations
 - Margin-based Neural Network Watermarking
 - Margin-based sampling in high dimensions: When being active is less efficient than staying passive
 - Markovian Gaussian Process Variational Autoencoders
 - Masked Bayesian Neural Networks : Theoretical Guarantee and its Posterior Inference
 - Masked Trajectory Models for Prediction, Representation, and Control
 - Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning
 - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels
 - Matrix Estimation for Individual Fairness
 - Maximal Initial Learning Rates in Deep ReLU Networks
 - Maximum Optimality Margin: A Unified Approach for Contextual Linear Programming and Inverse Linear Programming
 - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks
 - Measuring the Impact of Programming Language Distribution
 - Mechanistic Mode Connectivity
 - Memory-Based Dual Gaussian Processes for Sequential Learning
 - Memory-Based Meta-Learning on Non-Stationary Distributions
 - Men Also Do Laundry: Multi-Attribute Bias Amplification
 - MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL
 - Metagenomic Binning using Connectivity-constrained Variational Autoencoders
 - Meta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural Networks
 - Meta-learning Parameterized Skills
 - Meta-Learning the Inductive Bias of Simple Neural Circuits
 - MetaModulation: Learning Variational Feature Hierarchies for Few-Shot Learning with Fewer Tasks
 - Meta Optimal Transport
 - Meta-SAGE: Scale Meta-Learning Scheduled Adaptation with Guided Exploration for Mitigating Scale Shift on Combinatorial Optimization
 - MetricGAN-OKD: Multi-Metric Optimization of MetricGAN via Online Knowledge Distillation for Speech Enhancement
 - MEWL: Few-shot multimodal word learning with referential uncertainty
 - MG-GNN: Multigrid Graph Neural Networks for Learning Multilevel Domain Decomposition Methods
 - Mimetic Initialization of Self-Attention Layers
 - Minimalistic Predictions to Schedule Jobs with Online Precedence Constraints
 - Minimal Width for Universal Property of Deep RNN
 - Minimax estimation of discontinuous optimal transport maps: The semi-discrete case
 - Minimizing Trajectory Curvature of ODE-based Generative Models
 - Minimum Width of Leaky-ReLU Neural Networks for Uniform Universal Approximation
 - Mirror Sinkhorn: Fast Online Optimization on Transport Polytopes
 - Mitigating Memorization of Noisy Labels by Clipping the Model Prediction
 - Mitigating Propagation Failures in Physics-informed Neural Networks using Retain-Resample-Release (R3) Sampling
 - Mitigating Spurious Correlations in Multi-modal Models during Fine-tuning
 - Mitigating the Effects of Non-Identifiability on Inference for Bayesian Neural Networks with Latent Variables
 - MixFlows: principled variational inference via mixed flows
 - Mixing Predictions for Online Metric Algorithms
 - Mixture Proportion Estimation Beyond Irreducibility
 - Moccasin: Efficient Tensor Rematerialization for Neural Networks
 - Modality-Agnostic Variational Compression of Implicit Neural Representations
 - Model-agnostic Measure of Generalization Difficulty
 - Model-Aware Contrastive Learning: Towards Escaping the Dilemmas
 - Model-based Offline Reinforcement Learning with Count-based Conservatism
 - Model-based Reinforcement Learning with Scalable Composite Policy Gradient Estimators
 - Model-Bellman Inconsistency for Model-based Offline Reinforcement Learning
 - ModelDiff: A Framework for Comparing Learning Algorithms
 - Model-Free Robust Average-Reward Reinforcement Learning
 - Modeling Dynamic Environments with Scene Graph Memory
 - Modeling Temporal Data as Continuous Functions with Stochastic Process Diffusion
 - MODeL: Memory Optimizations for Deep Learning
 - Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization
 - Model Transferability with Responsive Decision Subjects
 - Moderately Distributional Exploration for Domain Generalization
 - MolDiff: Addressing the Atom-Bond Inconsistency Problem in 3D Molecule Diffusion Generation
 - Momentum Ensures Convergence of SIGNSGD under Weaker Assumptions
 - Monge, Bregman and Occam: Interpretable Optimal Transport in High-Dimensions with Feature-Sparse Maps
 - MonoFlow: Rethinking Divergence GANs via the Perspective of Wasserstein Gradient Flows
 - MonoNeRF: Learning Generalizable NeRFs from Monocular Videos without Camera Poses
 - Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes
 - Monotonic Location Attention for Length Generalization
 - Motion Question Answering via Modular Motion Programs
 - mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
 - Mu$^2$SLAM: Multitask, Multilingual Speech and Language Models
 - MultiAdam: Parameter-wise Scale-invariant Optimizer for Multiscale Training of Physics-informed Neural Networks
 - Multi-Agent Best Arm Identification with Private Communications
 - Multi-Agent Learning from Learners
 - Multi-Agent Online Optimization with Delays: Asynchronicity, Adaptivity, and Optimism
 - Multi-agent Online Scheduling: MMS Allocations for Indivisible Items
 - Multicalibration as Boosting for Regression
 - Multi-channel Autobidding with Budget and ROI Constraints
 - Multi-class Graph Clustering via Approximated Effective $p$-Resistance
 - MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation
 - Multi-Environment Pretraining Enables Transfer to Action Limited Datasets
 - Multi-Epoch Matrix Factorization Mechanisms for Private Machine Learning
 - Multi-Fidelity Covariance Estimation in the Log-Euclidean Geometry
 - Multi-Layer Neural Networks as Trainable Ladders of Hilbert Spaces
 - Multi-Modal Classifiers for Open-Vocabulary Object Detection
 - Multi-Objective GFlowNets
 - Multi-Objective Population Based Training
 - Multiple Thinking Achieving Meta-Ability Decoupling for Object Navigation
 - Multiplier Bootstrap-based Exploration
 - Multiply Robust Off-policy Evaluation and Learning under Truncation by Death
 - MultiRobustBench: Benchmarking Robustness Against Multiple Attacks
 - Multisample Flow Matching: Straightening Flows with Minibatch Couplings
 - Multi-Symmetry Ensembles: Improving Diversity and Generalization via Opposing Symmetries
 - Multi-Task Differential Privacy Under Distribution Skew
 - Multi-task Hierarchical Adversarial Inverse Reinforcement Learning
 - Multi-Task Off-Policy Learning from Bandit Feedback
 - Multi-task Representation Learning for Pure Exploration in Linear Bandits
 - Multi-Task Structural Learning using Local Task Similarity induced Neuron Creation and Removal
 - Multi-User Reinforcement Learning with Low Rank Rewards
 - Multi-View Masked World Models for Visual Robotic Manipulation
 - Muse: Text-To-Image Generation via Masked Generative Transformers
 - MyoDex: A Generalizable Prior for Dexterous Manipulation
 - N$\text{A}^\text{2}$Q: Neural Attention Additive Model for Interpretable Multi-Agent Q-Learning
 - Naive imputation implicitly regularizes high-dimensional linear models
 - Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA
 - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path
 - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes
 - Nearly Optimal Algorithms with Sublinear Computational Complexity for Online Kernel Regression
 - Nearly Optimal Competitive Ratio for Online Allocation Problems with Two-sided Resource Constraints and Finite Requests
 - Nearly-Optimal Hierarchical Clustering for Well-Clustered Graphs
 - Nearly-tight Bounds for Deep Kernel Learning
 - Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR
 - Near-Optimal $\Phi$-Regret Learning in Extensive-Form Games
 - Near-Optimal Algorithms for Private Online Optimization in the Realizable Regime
 - Near-optimal Conservative Exploration in Reinforcement Learning under Episode-wise Constraints
 - Near-Optimal Cryptographic Hardness of Agnostically Learning Halfspaces and ReLU Regression under Gaussian Marginals
 - Near-Optimal Quantum Coreset Construction Algorithms for Clustering
 - NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion
 - NeRFool: Uncovering the Vulnerability of Generalizable Neural Radiance Fields against Adversarial Perturbations
 - Nested Elimination: A Simple Algorithm for Best-Item Identification From Choice-Based Feedback
 - Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization
 - Network Effects in Performative Prediction Games
 - Neural Algorithmic Reasoning with Causal Regularisation
 - Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data
 - Neural Compression: From Information Theory to Applications
 - Neural Continuous-Discrete State Space Models for Irregularly-Sampled Time Series
 - Neural Conversational AI Workshop - What’s left to TEACH (Trustworthy, Enhanced, Adaptable, Capable and Human-centric) chatbots?
 - Neural Diffusion Processes
 - Neural FIM for learning Fisher information metrics from point cloud data
 - Neural Inverse Operators for Solving PDE Inverse Problems
 - Neural Latent Aligner: Cross-trial Alignment for Learning Representations of Complex, Naturalistic Neural Data
 - Neural Markov Jump Processes
 - Neural Network Accelerated Implicit Filtering: Integrating Neural Network Surrogates With Provably Convergent Derivative Free Optimization Methods
 - Neural Network Approximations of PDEs Beyond Linearity: A Representational Perspective
 - Neural networks trained with SGD learn distributions of increasing complexity
 - Neural Prediction Errors enable Analogical Visual Reasoning in Human Standard Intelligence Tests
 - Neural signature kernels as infinite-width-depth-limits of controlled ResNets
 - NeuralSlice: Neural 3D Triangle Mesh Reconstruction via Slicing 4D Tetrahedral Meshes
 - NeuralStagger: Accelerating Physics-constrained Neural PDE Solver with Spatial-temporal Decomposition
 - Neural Status Registers
 - Neural Stochastic Differential Games for Time-series Analysis
 - Neural Wasserstein Gradient Flows for Discrepancies with Riesz Kernels
 - Neural Wave Machines: Learning Spatiotemporally Structured Representations with Locally Coupled Oscillatory Recurrent Neural Networks
 - Neuro-Symbolic Continual Learning: Knowledge, Reasoning Shortcuts and Concept Rehearsal
 - Never mind the metrics---what about the uncertainty? Visualising binary confusion matrix metric distributions to put performance in perspective
 - New Frontiers in Learning, Control, and Dynamical Systems
 - New metrics and search algorithms for weighted causal DAGs
 - NNSplitter: An Active Defense Solution for DNN Model via Automated Weight Obfuscation
 - Node Embedding from Neural Hamiltonian Orbits in Graph Neural Networks
 - Non-asymptotic Properties of Individualized Treatment Rules from Sequentially Rule-Adaptive Trials
 - Non-autoregressive Conditional Diffusion Models for Time Series Prediction
 - Nonlinear Advantage: Trained Networks Might Not Be As Complex as You Think
 - Nonlinear Causal Discovery with Latent Confounders
 - Nonparametric Density Estimation under Distribution Drift
 - Nonparametric Extensions of Randomized Response for Private Confidence Sets
 - Nonparametric Generative Modeling with Conditional Sliced-Wasserstein Flows
 - Nonparametric Iterative Machine Teaching
 - Non-stationary Reinforcement Learning under General Function Approximation
 - No One Idles: Efficient Heterogeneous Federated Learning with Parallel Edge and Server Computation
 - Normalizing Flows for Interventional Density Estimation
 - Not All Semantics are Created Equal: Contrastive Self-supervised Learning with Automatic Temperature Individualization
 - Not all Strongly Rayleigh Distributions Have Small Probabilistic Generating Circuits
 - NP-SemiSeg: When Neural Processes meet Semi-Supervised Semantic Segmentation
 - NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning
 - Nugget: Neural Agglomerative Embeddings of Text
 - NUNO: A General Framework for Learning Parametric PDEs with Non-Uniform Data
 - OCD: Learning to Overfit with Conditional Diffusion Models
 - ODS: Test-Time Adaptation in the Presence of Open-World Data Shift
 - Offline Learning in Markov Games with General Function Approximation
 - Offline Meta Reinforcement Learning with In-Distribution Online Adaptation
 - Offline Reinforcement Learning with Closed-Form Policy Improvement Operators
 - Off-Policy Average Reward Actor-Critic with Deterministic Policy Search
 - Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling
 - Omnipredictors for Constrained Optimization
 - OMS-DPM: Optimizing the Model Schedule for Diffusion Probabilistic Models
 - On Balancing Bias and Variance in Unsupervised Multi-Source-Free Domain Adaptation
 - On Bridging the Gap between Mean Field and Finite Width Deep Random Multilayer Perceptron with Batch Normalization
 - On Computing Optimal Tree Ensembles
 - On Coresets for Clustering in Small Dimensional Euclidean spaces
 - On Data Manifolds Entailed by Structural Causal Models
 - On Distribution Dependent Sub-Logarithmic Query Time of Learned Indexing
 - On Enhancing Expressive Power via Compositions of Single Fixed-Size ReLU Network
 - One-Shot Compression of Large Edge-Exchangeable Graphs using Bits-Back Coding
 - One-Shot Federated Conformal Prediction
 - One-shot Imitation in a Non-Stationary Environment via Multi-Modal Skill
 - One-sided Matrix Completion from Two Observations Per Row
 - One-Step Estimator for Permuted Sparse Recovery
 - One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
 - One-vs-the-Rest Loss to Focus on Important Samples in Adversarial Training
 - On Excess Mass Behavior in Gaussian Mixture Models with Orlicz-Wasserstein Distances
 - On Generalizations of Some Distance Based Classifiers for HDLSS Data
 - On Heterogeneous Treatment Effects in Heterogeneous Causal Graphs
 - On Investigating the Conservative Property of Score-Based Generative Models
 - On Kinetic Optimal Probability Paths for Generative Models
 - Online Learning in Stackelberg Games with an Omniscient Follower
 - Online Learning with Feedback Graphs: The True Shape of Regret
 - Online Local Differential Private Quantile Inference via Self-normalization
 - Online Mechanism Design for Information Acquisition
 - Online Nonstochastic Control with Adversarial and Static Constraints
 - Online Platt Scaling with Calibeating
 - Online Prototype Alignment for Few-shot Policy Transfer
 - Online Restless Bandits with Unobserved States
 - On Many-Actions Policy Gradient
 - On Over-Squashing in Message Passing Neural Networks: The Impact of Width, Depth, and Topology
 - On Penalty-based Bilevel Gradient Descent Method
 - On Pitfalls of Test-Time Adaptation
 - On Preemption and Learning in Stochastic Scheduling
 - On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline
 - On Provable Copyright Protection for Generative Models
 - On Regularization and Inference with Label Constraints
 - On Sampling with Approximate Transport Maps
 - On Second-Order Scoring Rules for Epistemic Uncertainty Quantification
 - On Strengthening and Defending Graph Reconstruction Attack with Markov Chain Approximation
 - On the Complexity of Bayesian Generalization
 - On the Connection Between MPNN and Graph Transformer
 - On the Convergence of Federated Averaging with Cyclic Client Participation
 - On the Convergence of Gradient Flow on Multi-layer Linear Models
 - On the Convergence of SARSA with Linear Function Approximation
 - On the convergence of the MLE as an estimator of the learning rate in the Exp3 algorithm
 - On the Convergence Rate of Gaussianization with Random Rotations
 - On the Convergence Rates of Policy Gradient Methods
 - On the Correctness of Automatic Differentiation for Neural Networks with Machine-Representable Parameters
 - On the Effectiveness of Offline RL for Dialogue Response Generation
 - On the Estimation of Gaussian Mixture Copula Models
 - On the Expressive Power of Geometric Graph Neural Networks
 - On the Forward Invariance of Neural ODEs
 - On the Functional Similarity of Robust and Non-Robust Neural Representations
 - On the Generalization of Multi-modal Contrastive Learning
 - On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization
 - On the Global Convergence of Risk-Averse Policy Gradient Methods with Expected Conditional Risk Measures
 - On the Identifiability and Estimation of Causal Location-Scale Noise Models
 - On the Impact of Algorithmic Recourse on Social Segregation
 - On the Impact of Knowledge Distillation for Model Interpretability
 - On the Importance of Feature Decorrelation for Unsupervised Representation Learning in Reinforcement Learning
 - On the Initialization of Graph Neural Networks
 - On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits
 - On the Occupancy Measure of Non-Markovian Policies in Continuous MDPs
 - On the Optimality of Misspecified Kernel Ridge Regression
 - On the Power of Foundation Models
 - On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness
 - On the Privacy-Robustness-Utility Trilemma in Distributed Learning
 - On the Relationship Between Explanation and Prediction: A Causal View
 - On the Robustness of Randomized Ensembles to Adversarial Perturbations
 - On the Robustness of Text Vectorizers
 - On the Role of Attention in Prompt-tuning
 - On the Statistical Benefits of Temporal Difference Learning
 - On the Stepwise Nature of Self-Supervised Learning
 - On the Training Instability of Shuffling SGD with Batch Normalization
 - On the Within-Group Fairness of Screening Classifiers
 - On Uni-Modal Feature Learning in Supervised Multi-Modal Learning
 - On User-Level Private Convex Optimization
 - OpenFE: Automated Feature Generation with Expert-level Performance
 - Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization
 - Open-Vocabulary Universal Image Segmentation with MaskCLIP
 - Opponent-Limited Online Search for Imperfect Information Games
 - Optimal Arms Identification with Knapsacks
 - Optimal Convergence Rates for Agnostic Nyström Kernel Learning
 - Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning
 - Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs
 - Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits
 - Optimal LP Rounding and Linear-Time Approximation Algorithms for Clustering Edge-Colored Hypergraphs
 - Optimally-weighted Estimators of the Maximum Mean Discrepancy for Likelihood-Free Inference
 - Optimal No-Regret Learning for One-Sided Lipschitz Functions
 - Optimal Online Generalized Linear Regression with Stochastic Noise and Its Application to Heteroscedastic Bandits
 - Optimal randomized multilevel Monte Carlo for repeatedly nested expectations
 - Optimal Rates and Efficient Algorithms for Online Bayesian Persuasion
 - Optimal Sets and Solution Paths of ReLU Networks
 - Optimal Shrinkage for Distributed Second-Order Optimization
 - Optimal Stochastic Non-smooth Non-convex Optimization through Online-to-Non-convex Conversion
 - Optimal Transport in Learning, Control, and Dynamical Systems
 - Optimistic Online Mirror Descent for Bridging Stochastic and Adversarial Online Convex Optimization
 - Optimistic Planning by Regularized Dynamic Programming
 - Optimization for Amortized Inverse Problems
 - Optimizing DDPM Sampling with Shortcut Fine-Tuning
 - Optimizing Hyperparameters with Conformal Quantile Regression
 - Optimizing Mode Connectivity for Class Incremental Learning
 - Optimizing NOTEARS Objectives via Topological Swaps
 - Optimizing the Collaboration Structure in Cross-Silo Federated Learning
 - Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning
 - Orthogonality-Enforced Latent Space in Autoencoders: An Approach to Learning Disentangled Representations
 - Oscillation-free Quantization for Low-bit Vision Transformers
 - Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation
 - Out-of-Distribution Generalization of Federated Learning via Implicit Invariant Relationships
 - Out-of-Domain Robustness via Targeted Augmentations
 - Overcoming Simplicity Bias in Deep Networks using a Feature Sieve
 - Over-parametrization via Lifting for Low-rank Matrix Sensing: Conversion of Spurious Solutions to Strict Saddle Points
 - PAC-Bayesian Generalization Bounds for Adversarial Generative Models
 - PAC-Bayesian Offline Contextual Bandits With Guarantees
 - PAC-Bayes Meets Interactive Learning
 - PAC Generalization via Invariant Representations
 - PAC Prediction Sets for Large Language Models of Code
 - Paging with Succinct Predictions
 - Pairwise Ranking Losses of Click-Through Rates Prediction for Welfare Maximization in Ad Auctions
 - PaLM-E: An Embodied Multimodal Language Model
 - PAL: Program-aided Language Models
 - Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation
 - Parallel Neurosymbolic Integration with Concordia
 - Parallel Online Clustering of Bandits via Hedonic Game
 - Parameter-Level Soft-Masking for Continual Learning
 - Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models
 - Pareto Regret Analyses in Multi-objective Multi-armed Bandit
 - Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information Sharing
 - Partial Optimality in Cubic Correlation Clustering
 - PASTA: Pessimistic Assortment Optimization
 - Patch-level Contrastive Learning via Positional Query for Visual Pre-training
 - Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks
 - Path Neural Networks: Expressive and Accurate Graph Neural Networks
 - PCA-based Multi-Task Learning: a Random Matrix Approach
 - Performative Recommendation: Diversifying Content via Strategic Incentives
 - Performative Reinforcement Learning
 - Personalized Federated Learning under Mixture of Distributions
 - Personalized Federated Learning with Inferred Collaboration Graphs
 - Personalized Subgraph Federated Learning
 - Perturbation Analysis of Neural Collapse
 - PFGM++: Unlocking the Potential of Physics-Inspired Generative Models
 - PFNs4BO: In-Context Learning for Bayesian Optimization
 - Phase-aware Adversarial Defense for Improving Adversarial Robustness
 - Phase Transitions in the Detection of Correlated Databases
 - PINA: Leveraging Side Information in eXtreme Multi-label Classification via Predicted Instance Neighborhood Aggregation
 - Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
 - PixelAsParam: A Gradient View on Diffusion Sampling with Guidance
 - PLay: Parametrically Conditioned Layout Generation using Latent Diffusion
 - Poisoning Generative Replay in Continual Learning to Promote Forgetting
 - Poisoning Language Models During Instruction Tuning
 - Polarity Is All You Need to Learn and Transfer Faster
 - Policy Contrastive Imitation Learning
 - Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach
 - Policy Gradient in Robust MDPs with Global Convergence Guarantee
 - Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games
 - Policy Regularization with Dataset Constraint for Offline Reinforcement Learning
 - Polyhedral Complex Extraction from ReLU Networks using Edge Subdivision
 - Polynomial Preconditioning for Gradient Methods
 - Polynomial Time and Private Learning of Unbounded Gaussian Mixture Models
 - Posterior Sampling for Deep Reinforcement Learning
 - POUF: Prompt-Oriented Unsupervised Fine-tuning for Large Pre-trained Models
 - PPG Reloaded: An Empirical Study on What Matters in Phasic Policy Gradient
 - Practical and Matching Gradient Variance Bounds for Black-Box Variational Bayesian Inference
 - Pre-computed memory or on-the-fly encoding? A hybrid approach to retrieval augmentation makes the most of your compute
 - Predictable MDP Abstraction for Unsupervised Model-Based RL
 - Predicting Ordinary Differential Equations with Transformers
 - Predicting Rare Events by Shrinking Towards Proportional Odds
 - Predictive Flows for Faster Ford-Fulkerson
 - Prefer to Classify: Improving Text Classifiers via Auxiliary Preference Learning
 - PreNAS: Preferred One-Shot Learning Towards Efficient Neural Architecture Search
 - Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems
 - Pre-training for Speech Translation: CTC Meets Optimal Transport
 - Pretraining Language Models with Human Preferences
 - Pricing Experimental Design: Causal Effect, Expected Revenue and Tail Risk
 - Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems
 - Principled Acceleration of Iterative Numerical Methods Using Machine Learning
 - Principled Offline RL in the Presence of Rich Exogenous Information
 - Principled Reinforcement Learning with Human Feedback from Pairwise or K-wise Comparisons
 - Privacy-Aware Compression for Federated Learning Through Numerical Mechanism Design
 - Private Federated Learning with Autotuned Compression
 - Private Statistical Estimation of Many Quantiles
 - Probabilistic Attention-to-Influence Neural Models for Event Sequences
 - Probabilistic Categorical Adversarial Attack and Adversarial Training
 - Probabilistic Concept Bottleneck Models
 - Probabilistic Contrastive Learning Recovers the Correct Aleatoric Uncertainty of Ambiguous Inputs
 - Probabilistic Imputation for Time-series Classification with Missing Data
 - Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood Estimation for Latent Gaussian Models
 - Probably Anytime-Safe Stochastic Combinatorial Semi-Bandits
 - Progressive Purification for Instance-Dependent Partial Label Learning
 - Project and Forget: Solving Large-Scale Metric Constrained Problems
 - Projected Tensor Power Method for Hypergraph Community Recovery
 - Prometheus: Taming Sample and Communication Complexities in Constrained Decentralized Stochastic Bilevel Learning
 - PromptBoosting: Black-Box Text Classification with Ten Forward Passes
 - Prompting Large Language Model for Machine Translation: A Case Study
 - Propensity Matters: Measuring and Enhancing Balancing for Recommendation
 - Proper Losses for Discrete Generative Models
 - Proper Scoring Rules for Survival Analysis
 - Properties of the Mallows Model Depending on the Number of Alternatives: A Warning for an Experimentalist
 - Protecting Language Generation Models via Invisible Watermarking
 - Prototype-oriented unsupervised anomaly detection for multivariate time series
 - Prototype-Sample Relation Distillation: Towards Replay-Free Continual Learning
 - ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts
 - Provable Benefit of Mixup for Finding Optimal Decision Boundaries
 - Provable Data Subset Selection For Efficient Neural Networks Training
 - Provable Dynamic Fusion for Low-Quality Multimodal Data
 - Provable Multi-instance Deep AUC Maximization with Stochastic Pooling
 - Provable Reset-free Reinforcement Learning by No-Regret Reduction
 - Provably and Practically Efficient Neural Contextual Bandits
 - Provably Convergent Schrödinger Bridge with Applications to Probabilistic Time Series Imputation
 - Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources
 - Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP
 - Provably Invariant Learning without Domain Information
 - Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup
 - Provably Learning Object-Centric Representations
 - Proximal Causal Learning of Conditional Average Treatment Effects
 - Proxy objectives in reinforcement learning from human feedback
 - Pruning via Sparsity-indexed ODE: a Continuous Sparsity Viewpoint
 - PWSHAP: A Path-Wise Explanation Model for Targeted Variables
 - Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
 - QASA: Advanced Question Answering on Scientific Articles
 - QAS-Bench: Rethinking Quantum Architecture Search and A Benchmark
 - Q-Flow: Generative Modeling for Differential Equations of Open Quantum Dynamics with Normalizing Flows
 - Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL
 - Quantifying Human Priors over Social and Navigation Networks
 - Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs
 - Quantifying the Variability Collapse of Neural Networks
 - Quantile Credit Assignment
 - Quantitative Universal Approximation Bounds for Deep Belief Networks
 - Quantized Distributed Training of Large Models with Convergence Guarantees
 - Quantum 3D Graph Learning with Applications to Molecule Embedding
 - QuantumDARTS: Differentiable Quantum Architecture Search for Variational Quantum Algorithms
 - Quantum Lower Bounds for Finding Stationary Points of Nonconvex Functions
 - Quantum Policy Gradient Algorithm with Optimized Action Decoding
 - Quantum Ridgelet Transform: Winning Lottery Ticket of Neural Networks with Quantum Computation
 - Quantum Speedups for Zero-Sum Games via Improved Dynamic Gibbs Sampling
 - RACE: Improve Multi-Agent Reinforcement Learning with Representation Asymmetry and Collaborative Evolution
 - Raising the Cost of Malicious AI-Powered Image Editing
 - Random Classification Noise does not defeat All Convex Potential Boosters Irrespective of Model Choice
 - Random Grid Neural Processes for Parametric Partial Differential Equations
 - Randomized Gaussian Process Upper Confidence Bound with Tighter Bayesian Regret Bounds
 - Randomized Schur Complement Views for Graph Contrastive Learning
 - Random Matrix Analysis to Balance between Supervised and Unsupervised Learning under the Low Density Separation Assumption
 - Random Shuffle Transformer for Image Restoration
 - Random Teachers are Good Teachers
 - RankMe: Assessing the Downstream Performance of Pretrained Self-Supervised Representations by Their Rank
 - Reachability-Aware Laplacian Representation in Reinforcement Learning
 - Reasons for the Superiority of Stochastic Estimators over Deterministic Ones: Robustness, Consistency and Perceptual Quality
 - Recasting Self-Attention with Holographic Reduced Representations
 - Recent Advances in the Generalization Theory of Neural Networks *
 - Reconstructive Neuron Pruning for Backdoor Defense
 - Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing
 - Recovery Bounds on Class-Based Optimal Transport: A Sum-of-Norms Regularization Framework
 - ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval
 - Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC
 - Reducing SO(3) Convolutions to SO(2) for Efficient Equivariant GNNs
 - Refined Regret for Adversarial MDPs with Linear Function Approximation
 - Refining Generative Process with Discriminator Guidance in Score-based Diffusion Models
 - Reflected Diffusion Models
 - Regions of Reliability in the Evaluation of Multivariate Probabilistic Forecasts
 - Regression with Label Permutation in Generalized Linear Model
 - Regression with Sensor Data Containing Incomplete Observations
 - Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents
 - Regret Minimization and Convergence to Equilibria in General-sum Markov Games
 - Regret-Minimizing Double Oracle for Extensive-Form Games
 - Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
 - Regularization-free Diffeomorphic Temporal Alignment Nets
 - Regularizing Towards Soft Equivariance Under Mixed Symmetries
 - Reinforcement Learning Can Be More Efficient with Multiple Rewards
 - Reinforcement Learning from Human Feedback: A Tutorial *
 - Reinforcement Learning from Passive Data via Latent Intentions
 - Reinforcement Learning in Low-rank MDPs with Density Features
 - Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space
 - Reinforcement Learning with History Dependent Dynamic Contexts
 - Relevant Walk Search for Explaining Graph Neural Networks
 - Reliable Measures of Spread in High Dimensional Latent Spaces
 - ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs
 - Reparameterized Policy Learning for Multimodal Trajectory Optimization
 - Repository-Level Prompt Generation for Large Language Models of Code
 - Representation-Driven Reinforcement Learning
 - Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL
 - Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition
 - Representer Point Selection for Explaining Regularized High-dimensional Models
 - Reprogramming Pretrained Language Models for Antibody Sequence Infilling
 - Responsible AI for Generative AI in Practice: Lessons Learned and Open Challenges
 - Restoration based Generative Models
 - Restoration-Degradation Beyond Linear Diffusions: A Non-Asymptotic Analysis For DDIM-type Samplers
 - Resurrecting Recurrent Neural Networks for Long Sequences
 - Rethink DARTS Search Space and Renovate a New Benchmark
 - Rethinking Backdoor Attacks
 - Rethinking Explaining Graph Neural Networks via Non-parametric Subgraph Matching
 - Rethinking Visual Reconstruction: Experience-Based Content Completion Guided by Visual Cues
 - Rethinking Warm-Starts with Predictions: Learning Predictions Close to Sets of Optimal Solutions for Faster $\text{L}$-/$\text{L}^\natural$-Convex Function Minimization
 - Rethinking Weak Supervision in Helping Contrastive Learning
 - Retrieval-Augmented Multimodal Language Modeling
 - Retrosynthetic Planning with Dual Value Networks
 - Returning The Favour: When Regression Benefits From Probabilistic Causal Knowledge
 - Revisiting Bellman Errors for Offline Model Selection
 - Revisiting Data-Free Knowledge Distillation with Poisoned Teachers
 - Revisiting Discriminative vs. Generative Classifiers: Theory and Implications
 - Revisiting Domain Randomization via Relaxed State-Adversarial Policy Optimization
 - Revisiting Gradient Clipping: Stochastic bias and tight convergence guarantees
 - Revisiting Over-smoothing and Over-squashing Using Ollivier-Ricci Curvature
 - Revisiting Pseudo-Label for Single-Positive Multi-Label Learning
 - Revisiting Sampling for Combinatorial Optimization
 - Revisiting Simple Regret: Fast Rates for Returning a Good Arm
 - Revisiting Structured Variational Autoencoders
 - Revisiting the Linear-Programming Framework for Offline RL with General Function Approximation
 - Revisiting Weighted Aggregation in Federated Learning with Neural Networks
 - Reward-Mixing MDPs with Few Latent Contexts are Learnable
 - RGE: A Repulsive Graph Rectification for Node Classification via Influence
 - Rigid Body Flows for Sampling Molecular Crystal Structures
 - RLang: A Declarative Language for Describing Partial World Knowledge to Reinforcement Learning Agents
 - RLEG: Vision-Language Representation Learning with Diffusion-based Embedding Generation
 - RLSbench: Domain Adaptation Under Relaxed Label Shift
 - Robust and private stochastic linear bandits
 - Robust and Scalable Bayesian Online Changepoint Detection
 - Robust Budget Pacing with a Single Sample
 - Robust Camera Pose Refinement for Multi-Resolution Hash Encoding
 - Robust Collaborative Learning with Linear Gradient Overhead
 - Robust Consensus in Ranking Data Analysis: Definitions, Properties and Computational Issues
 - Robust Counterfactual Explanations for Neural Networks With Probabilistic Guarantees
 - Robust Explanation for Free or At the Cost of Faithfulness
 - Robustly Learning a Single Neuron via Sharpness
 - Robustness in Multimodal Learning under Train-Test Modality Mismatch
 - Robust Non-Linear Feedback Coding via Power-Constrained Deep Learning
 - Robust One-Class Classification with Signed Distance Function using 1-Lipschitz Neural Networks
 - Robust Perception through Equivariance
 - Robust Satisficing MDPs
 - Robust Situational Reinforcement Learning in Face of Context Disturbances
 - Robust Speech Recognition via Large-Scale Weak Supervision
 - Robust Subtask Learning for Compositional Generalization
 - Robust Weak Supervision with Variational Auto-Encoders
 - Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?
 - Rockmate: an Efficient, Fast, Automatic and Generic Tool for Re-materialization in PyTorch
 - Rotation and Translation Invariant Representation Learning with Implicit Neural Representations
 - RSC: Accelerate Graph Neural Networks Training via Randomized Sparse Computations
 - Run-off Election: Improved Provable Defense against Data Poisoning Attacks
 - R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility Across Random User Intents
 - SAAL: Sharpness-Aware Active Learning
 - Safe Offline Reinforcement Learning with Real-Time Budget Constraints
 - Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
 - SAM operates far from home: eigenvalue regularization as a dynamical phenomenon
 - Sample and Predict Your Latent: Modality-free Sequential Disentanglement via Contrastive Estimation
 - Sample Complexity Bounds for Learning High-dimensional Simplices in Noisy Regimes
 - Sample Complexity of Probability Divergences under Group Symmetry
 - Sampling and Optimization in Discrete Space
 - Sampling-Based Accuracy Testing of Posterior Estimators for General Inference
 - Sampling-based Nyström Approximation and Kernel Quadrature
 - Sampling random graph homomorphisms and applications to network data analysis
 - Scalable Adaptive Computation for Iterative Generation
 - Scalable Multi-Agent Reinforcement Learning through Intelligent Information Aggregation
 - Scalable Safe Policy Improvement via Monte Carlo Tree Search
 - Scalable Set Encoding with Universal Mini-Batch Consistency and Unbiased Full Set Gradient Approximation
 - Scaling Laws for Generative Mixed-Modal Language Models
 - Scaling Laws for Multilingual Neural Machine Translation
 - Scaling Laws for Reward Model Overoptimization
 - Scaling of Class-wise Training Losses for Post-hoc Calibration
 - Scaling Spherical CNNs
 - Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory
 - Scaling Vision Transformers to 22 Billion Parameters
 - Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data
 - SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired Image-to-Image Translation
 - SE(3) diffusion model with application to protein backbone generation
 - Searching Large Neighborhoods for Integer Linear Programs with Contrastive Learning
 - Second-Order Optimization with Lazy Hessians
 - Second-order regression models exhibit progressive sharpening to the edge of stability
 - Secure Federated Correlation Test and Entropy Estimation
 - SeedGNN: Graph Neural Network for Supervised Seeded Graph Matching
 - SEGA: Structural Entropy Guided Anchor View for Graph Contrastive Learning
 - SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation
 - Selective Machine Learning of the Average Treatment Effect with an Invalid Instrumental Variable
 - Self-Attention Amortized Distributional Projection Optimization for Sliced Wasserstein Point-Cloud Reconstruction
 - Self-Interpretable Time Series Prediction with Counterfactual Explanations
 - Self-Repellent Random Walks on General Graphs - Achieving Minimal Sampling Variance via Nonlinear Markov Chains
 - Self-Supervised Learning in Vision: from Research Advances to Best Practices
 - Self-supervised learning of Split Invariant Equivariant representations
 - Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations
 - SeMAIL: Eliminating Distractors in Visual Imitation via Separated Models
 - Semi-Autoregressive Energy Flows: Exploring Likelihood-Free Training of Normalizing Flows
 - Semi Bandit dynamics in Congestion Games: Convergence to Nash Equilibrium and No-Regret Guarantees.
 - Semi-Dual Unbalanced Quadratic Optimal Transport: fast statistical rates and convergent algorithm.
 - Semi-Offline Reinforcement Learning for Optimized Text Generation
 - Semiparametrically Efficient Off-Policy Evaluation in Linear Markov Decision Processes
 - Semi-Parametric Contextual Pricing Algorithm using Cox Proportional Hazards Model
 - Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories
 - SemSup-XC: Semantic Supervision for Zero and Few-shot Extreme Classification
 - Sequence Modeling with Multiresolution Convolutional Memory
 - Sequential Changepoint Detection via Backward Confidence Sequences
 - Sequential Counterfactual Risk Minimization
 - Sequential Kernelized Independence Testing
 - Sequential Monte Carlo Learning for Time Series Structure Discovery
 - Sequential Multi-Dimensional Self-Supervised Learning for Clinical Time Series
 - Sequential Predictive Conformal Inference for Time Series
 - Sequential Strategic Screening
 - Sequential Underspecified Instrument Selection for Cause-Effect Estimation
 - Set-membership Belief State-based Reinforcement Learning for POMDPs
 - Settling the Reward Hypothesis
 - SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance
 - SGD with Large Step Sizes Learns Sparse Features
 - Shape-Guided Dual-Memory Learning for 3D Anomaly Detection
 - Shapley Based Residual Decomposition for Instance Analysis
 - Sharper Bounds for $\ell_p$ Sensitivity Sampling
 - Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments
 - Shedding a PAC-Bayesian Light on Adaptive Sliced-Wasserstein Distances
 - Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation
 - Shortest Edit Path Crossover: A Theory-driven Solution to the Permutation Problem in Evolutionary Neural Architecture Search
 - Short-lived High-volume Bandits
 - Simple and Fast Group Robustness by Automatic Feature Reweighting
 - simple diffusion: End-to-end diffusion for high resolution images
 - Simple Disentanglement of Style and Content in Visual Representations
 - Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning
 - Simple Hardware-Efficient Long Convolutions for Sequence Modeling
 - Simplex Random Features
 - Simplified Temporal Consistency Reinforcement Learning
 - Simplifying Momentum-based Positive-definite Submanifold Optimization with Applications to Deep Learning
 - SinDDM: A Single Image Denoising Diffusion Model
 - SinFusion: Training Diffusion Models on a Single Image or Video
 - Single Point-Based Distributed Zeroth-Order Optimization with a Non-Convex Stochastic Objective Function
 - Ske2Grid: Skeleton-to-Grid Representation Learning for Action Recognition
 - Sketched Ridgeless Linear Regression: The Role of Downsampling
 - Sketch-Flip-Merge: Mergeable Sketches for Private Distinct Counting
 - Sketching for First Order Method: Efficient Algorithm for Low-Bandwidth Channel and Vulnerability
 - Sketching Meets Differential Privacy: Fast Algorithm for Dynamic Kronecker Projection Maintenance
 - SLAMB: Accelerated Large Batch Training with Sparse Communication
 - Sliced-Wasserstein on Symmetric Positive Definite Matrices for M/EEG Signals
 - SlotGAT: Slot-based Message Passing for Heterogeneous Graphs
 - Slot-VAE: Object-Centric Scene Generation with Slot Attention
 - Smart Initial Basis Selection for Linear Programs
 - Smooth Non-stationary Bandits
 - SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
 - SMURF-THP: Score Matching-based UnceRtainty quantiFication for Transformer Hawkes Process
 - SNeRL: Semantic-aware Neural Radiance Fields for Reinforcement Learning
 - Social learning spontaneously emerges by searching optimal heuristics with deep reinforcement learning
 - Solving High-Dimensional PDEs with Latent Spectral Models
 - Solving Linear Programs with Fast Online Learning Algorithms
 - SOM-CPC: Unsupervised Contrastive Learning with Self-Organizing Maps for Structured Representations of High-Rate Time Series
 - SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot
 - Sparse Learning of Dynamical Systems in RKHS: An Operator-Theoretic Approach
 - SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks at the Edge
 - Spatial Implicit Neural Representations for Global-Scale Species Mapping
 - Spatial-Temporal Graph Learning with Adversarial Contrastive Adaptation
 - Specializing Smaller Language Models towards Multi-Step Reasoning
 - Special Properties of Gradient Descent with Large Learning Rates
 - SpeedDETR: Speed-aware Transformers for End-to-end Object Detection
 - Speeding Up Bellman Ford via Minimum Violation Permutations
 - Speed-Oblivious Online Scheduling: Knowing (Precise) Speeds is not Necessary
 - SpENCNN: Orchestrating Encoding and Sparsity for Fast Homomorphically Encrypted Neural Network Inference
 - Spherical Fourier Neural Operators: Learning Stable Dynamics on the Sphere
 - Spherical Inducing Features for Orthogonally-Decoupled Gaussian Processes
 - SpotEM: Efficient Video Search for Episodic Memory
 - spred: Solving L1 Penalty with SGD
 - Spurious Valleys and Clustering Behavior of Neural Networks
 - SRATTA: Sample Re-ATTribution Attack of Secure Aggregation in Federated Learning.
 - Stabilizing GANs' Training with Brownian Motion Controller
 - Stabilizing Transformer Training by Preventing Attention Entropy Collapse
 - Stable and Consistent Prediction of 3D Characteristic Orientation via Invariant Residual Learning
 - Stable Estimation of Heterogeneous Treatment Effects
 - State and parameter learning with PARIS particle Gibbs
 - Statistical Foundations of Prior-Data Fitted Networks
 - Statistical Indistinguishability of Learning Algorithms
 - Statistical Inference and A/B Testing for First-Price Pacing Equilibria
 - Statistical Inference on Multi-armed Bandits with Delayed Feedback
 - Statistical Learning under Heterogenous Distribution Shift
 - STEERING : Stein Information Directed Exploration for Model-Based Reinforcement Learning
 - Stein Variational Goal Generation for adaptive Exploration in Multi-Goal Reinforcement Learning
 - STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition
 - Stochastic Gradient Descent-Induced Drift of Representation in a Two-Layer Neural Network
 - Stochastic Gradient Descent under Markovian Sampling Schemes
 - Stochastic Gradient Succeeds for Bandits
 - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels
 - Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies
 - Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks
 - Strategic Classification with Unknown User Manipulations
 - Stratified Adversarial Robustness with Rejection
 - Streaming Active Learning with Deep Neural Networks
 - Streaming Submodular Maximization with Differential Privacy
 - StriderNet: A Graph Reinforcement Learning Approach to Optimize Atomic Structures on Rough Energy Landscapes
 - Structural Re-weighting Improves Graph Domain Adaptation
 - Structured Cooperative Learning with Graphical Model Priors
 - Structured Probabilistic Inference and Generative Modeling
 - Structure-informed Language Models Are Protein Designers
 - Structure Learning of Latent Factors via Clique Search on Correlation Thresholded Graphs
 - StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
 - Subequivariant Graph Reinforcement Learning in 3D Environments
 - Submodular Order Functions and Assortment Optimization
 - Subsample Ridge Ensembles: Equivalences and Generalized Cross-Validation
 - Subset-Based Instance Optimality in Private Estimation
 - Subset Selection Based On Multiple Rankings in the Presence of Bias: Effectiveness of Fairness Constraints for Multiwinner Voting Score Functions
 - Superhuman Fairness
 - Supervised Metric Learning to Rank for Retrieval via Contextual Similarity Optimization
 - Supported Trust Region Optimization for Offline Reinforcement Learning
 - SurCo: Learning Linear SURrogates for COmbinatorial Nonlinear Optimization Problems
 - Surface Snapping Optimization Layer for Single Image Object Shape Reconstruction
 - SurProGenes: Survival Risk-Ordered Representation of Cancer Patients and Genes for the Identification of Prognostic Genes
 - Surrogate Model Extension (SME): A Fast and Accurate Weight Update Attack on Federated Learning
 - Surrogate Module Learning: Reduce the Gradient Error Accumulation in Training Spiking Neural Networks
 - SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
 - Symmetry-Aware Robot Design with Structured Subgroups
 - Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning
 - Synthetic data for model selection
 - Synthetic Data, Real Errors: How (Not) to Publish and Use Synthetic Data
 - Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models
 - System Identification of Neural Systems: If We Got It Right, Would We Know?
 - TabDDPM: Modelling Tabular Data with Diffusion Models
 - TabLeak: Tabular Data Leakage in Federated Learning
 - Taking the Pulse Of Ethical ML in Health
 - Taming graph kernels with random features
 - TAN Without a Burn: Scaling Laws of DP-SGD
 - Target-Aware Generative Augmentations for Single-Shot Adaptation
 - Target-based Surrogates for Stochastic Optimization
 - Task-specific experimental design for treatment effect estimation
 - Task-Specific Skill Localization in Fine-tuned Language Models
 - Taxonomy-Structured Domain Adaptation
 - Team Belief DAG: Generalizing the Sequence Form to Team Games for Fast Computation of Correlated Team Max-Min Equilibria via Regret Minimization
 - Temporal Label Smoothing for Early Event Prediction
 - Temporally Consistent Transformers for Video Generation
 - Tensor Decompositions Meet Control Theory: Learning General Mixtures of Linear Dynamical Systems
 - Tensor Gaussian Process with Contraction for Multi-Channel Imaging Analysis
 - Test-time Adaptation with Slot-Centric Models
 - Test-Time Style Shifting: Handling Arbitrary Styles in Domain Generalization
 - Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise
 - Text-To-4D Dynamic Scene Generation
 - Text-To-Concept (and Back) via Cross-Model Alignment
 - TGRL: An Algorithm for Teacher Guided Reinforcement Learning
 - The Acquisition of Physical Knowledge in Generative Neural Networks
 - The Benefits of Mixup for Feature Learning
 - The Benefits of Model-Based Generalization in Reinforcement Learning
 - The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond
 - The case for 4-bit precision: k-bit Inference Scaling Laws
 - The Catalog Problem: Clustering and Ordering Variable-Sized Sets
 - The Computational Complexity of Concise Hypersphere Classification
 - The Dormant Neuron Phenomenon in Deep Reinforcement Learning
 - The Edge of Orthogonality: A Simple View of What Makes BYOL Tick
 - The Fast Johnson-Lindenstrauss Transform Is Even Faster
 - The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
 - The Future of ML in Biology: CRISPR for Health and Climate
 - The Hessian perspective into the Nature of Convolutional Neural Networks
 - The Ideal Continual Learner: An Agent That Never Forgets
 - The Impact of Exploration on Convergence and Performance of Multi-Agent Q-Learning Dynamics
 - The Implicit Regularization of Dynamical Stability in Stochastic Gradient Descent
 - The Many Facets of Preference-Based Learning
 - The Monge Gap: A Regularizer to Learn All Transport Maps
 - The multimarginal optimal transport formulation of adversarial multiclass classification
 - The Numerical Stability of Hyperbolic Representation Learning
 - The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation
 - Theoretical Behavior of XAI Methods in the Presence of Suppressor Variables
 - Theoretical Bounds on the Network Community Profile from Low-rank Semi-definite Programming
 - Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting
 - Theory on Forgetting and Generalization of Continual Learning
 - The Persistent Laplacian for Data Science: Evaluating Higher-Order Persistent Spectral Representations of Data
 - The Power of Learned Locally Linear Models for Nonlinear Policy Optimization
 - The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing
 - The Power of Uniform Sampling for k-Median
 - The Price of Differential Privacy under Continual Observation
 - The Regret of Exploration and the Control of Bad Episodes in Reinforcement Learning
 - The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning
 - The Saddle-Point Method in Differential Privacy
 - The Second Workshop on Spurious Correlations, Invariance and Stability
 - The SSL Interplay: Augmentations, Inductive Bias, and Generalization
 - The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
 - The Statistical Scope of Multicalibration
 - The Synergy of Scientific and Machine Learning Modelling (SynS & ML) Workshop
 - The Test of Tests: A Framework for Differentially Private Hypothesis Testing
 - The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning
 - The Unreasonable Effectiveness of Few-shot Learning for Machine Translation
 - The Value of Out-of-Distribution Data
 - The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms
 - The Wisdom of Hindsight Makes Language Models Better Instruction Followers
 - Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits
 - Thompson Sampling with Diffusion Generative Prior
 - Thompson Sampling with Less Exploration is Fast and Optimal
 - TIDE: Time Derivative Diffusion for Deep Learning on Graphs
 - Tied-Augment: Controlling Representation Similarity Improves Data Augmentation
 - Tight and fast generalization error bound of graph embedding in metric space
 - Tight Certification of Adversarially Trained Neural Networks via Nonconvex Low-Rank Semidefinite Relaxations
 - Tight Data Access Bounds for Private Top-$k$ Selection
 - Tighter Analysis for ProxSkip
 - Tighter Bounds on the Expressivity of Transformer Encoders
 - Tighter Information-Theoretic Generalization Bounds from Supersamples
 - Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond
 - Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits
 - Tilted Sparse Additive Models
 - TIPS: Topologically Important Path Sampling for Anytime Neural Networks
 - Topologically Faithful Image Segmentation via Induced Matching of Persistence Barcodes
 - Topological Point Cloud Clustering
 - Topological Singularity Detection at Multiple Scales
 - Total Variation Graph Neural Networks
 - Toward Efficient Gradient-Based Value Estimation
 - Toward Large Kernel Models
 - Towards a better understanding of representation dynamics under TD-learning
 - Towards a Persistence Diagram that is Robust to Noise and Varied Densities
 - Towards Better Graph Representation Learning with Parameterized Decomposition & Filtering
 - Towards Bridging the Gaps between the Right to Explanation and the Right to be Forgotten
 - Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models
 - Towards Constituting Mathematical Structures for Learning to Optimize
 - Towards Controlled Data Augmentations for Active Learning
 - Towards credible visual model interpretation with path attribution
 - Towards Deep Attention in Graph Neural Networks: Problems and Remedies
 - Towards Explaining Distribution Shifts
 - Towards Learning Geometric Eigen-Lengths Crucial for Fitting Tasks
 - Towards Learning to Imitate from a Single Video Demonstration
 - Towards Omni-generalizable Neural Methods for Vehicle Routing Problems
 - Towards Practical Preferential Bayesian Optimization with Skew Gaussian Processes
 - Towards Quantum Machine Learning for Constrained Combinatorial Optimization: a Quantum QAP Solver
 - Towards Reliable Neural Specifications
 - Towards Robust and Safe Reinforcement Learning with Benign Off-policy Data
 - Towards Robust Graph Incremental Learning on Evolving Graphs
 - Towards Stable and Efficient Adversarial Training against $l_1$ Bounded Adversarial Attacks
 - Towards Sustainable Learning: Coresets for Data-efficient Deep Learning
 - Towards Theoretical Understanding of Inverse Reinforcement Learning
 - Towards Trustworthy Explanation: On Causal Rationalization
 - Towards Unbiased Training in Federated Open-world Semi-supervised Learning
 - Towards Understanding and Improving GFlowNet Training
 - Towards Understanding and Reducing Graph Structural Noise for GNNs
 - Towards Understanding Ensemble Distillation in Federated Learning
 - Towards Understanding Generalization of Graph Neural Networks
 - Towards Understanding Generalization of Macro-AUC in Multi-label Learning
 - TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation
 - Tractable Control for Autoregressive Language Generation
 - Trading-Off Payments and Accuracy in Online Classification with Paid Stochastic Experts
 - Trainability, Expressivity and Interpretability in Gated Neural ODEs
 - Training Deep Surrogate Models with Large Scale Online Learning
 - Training-Free Neural Active Learning with Initialization-Robustness Guarantees
 - Training Normalizing Flows from Dependent Data
 - Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning
 - TRAK: Attributing Model Behavior at Scale
 - Transcendental Idealism of Planner: Evaluating Perception from Planning Perspective for Autonomous Driving
 - Transformed Distribution Matching for Missing Value Imputation
 - Transformer-based Stagewise Decomposition for Large-Scale Multistage Stochastic Optimization
 - Transformers as Algorithms: Generalization and Stability in In-context Learning
 - Transformers Learn In-Context by Gradient Descent
 - Transformers Meet Directed Graphs
 - Trapdoor Normalization with Irreversible Ownership Verification
 - Traversing Between Modes in Function Space for Fast Ensembling
 - Trompt: Towards a Better Deep Neural Network for Tabular Data
 - Truncating Trajectories in Monte Carlo Reinforcement Learning
 - Trustworthy Policy Learning under the Counterfactual No-Harm Criterion
 - Tuning Computer Vision Models With Task Rewards
 - Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning
 - Tutorial on Multimodal Machine Learning: Principles, Challenges, and Open Questions
 - Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy
 - Two-Scale Gradient Descent Ascent Dynamics Finds Mixed Nash Equilibria of Continuous Games: A Mean-Field Perspective
 - UMD: Unsupervised Model Detection for X2X Backdoor Attacks
 - Uncertain Evidence in Probabilistic Models and Stochastic Simulators
 - Uncertainty Estimation by Fisher Information-based Evidential Deep Learning
 - Uncertainty Estimation for Molecules: Desiderata and Methods
 - Unconstrained Online Learning with Unbounded Losses
 - Uncovering Adversarial Risks of Test-Time Adaptation
 - Under-Counted Tensor Completion with Neural Incorporation of Attributes
 - Underspecification Presents Challenges for Credibility in Modern Machine Learning
 - Understand and Modularize Generator Optimization in ELECTRA-style Pretraining
 - Understanding and Defending Patched-based Adversarial Attacks for Vision Transformer
 - Understanding and Generalizing Contrastive Learning from the Inverse Optimal Transport Perspective
 - Understanding Backdoor Attacks through the Adaptability Hypothesis
 - Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias
 - Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing
 - Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases
 - Understanding Oversquashing in GNNs through the Lens of Effective Resistance
 - Understanding Plasticity in Neural Networks
 - Understanding Self-Distillation in the Presence of Label Noise
 - Understanding Self-Predictive Learning for Reinforcement Learning
 - Understanding the Complexity Gains of Single-Task RL with a Curriculum
 - Understanding the Distillation Process from Deep Generative Models to Tractable Probabilistic Circuits
 - Understanding the Impact of Adversarial Robustness on Accuracy Disparity
 - Understanding the Role of Feedback in Online Learning with Switching Costs
 - Unearthing InSights into Mars: Unsupervised Source Separation with Limited Data
 - Unifying Molecular and Textual Representations via Multi-task Language Modelling
 - Unifying Nesterov's Accelerated Gradient Methods for Convex and Strongly Convex Objective Functions
 - Unit Scaling: Out-of-the-Box Low-Precision Training
 - Universal Morphology Control via Contextual Modulation
 - Universal Physics-Informed Neural Networks: Symbolic Differential Operator Discovery with Sparse Data
 - Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection Capability
 - Unlocking Slot Attention by Changing Optimal Transport Costs
 - Unscented Autoencoder
 - Unsupervised Out-of-Distribution Detection with Diffusion Inpainting
 - Unsupervised Skill Discovery for Learning Shared Structures across Changing Environments
 - Unveiling the Latent Space Geometry of Push-Forward Generative Models
 - Unveiling The Mask of Position-Information Pattern Through the Mist of Image Features
 - UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers
 - UPSCALE: Unconstrained Channel Pruning
 - User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems
 - User-level Private Stochastic Convex Optimization with Optimal Rates
 - Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies
 - Using Perturbation to Improve Goodness-of-Fit Tests based on Kernelized Stein Discrepancy
 - VA-learning as a more efficient alternative to Q-learning
 - Variance Control for Distributional Reinforcement Learning
 - Variational Autoencoding Neural Operators
 - Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills
 - Variational Mixture of HyperGenerators for Learning Distributions over Functions
 - Variational Open-Domain Question Answering
 - Variational Sparse Inverse Cholesky Approximation for Latent Gaussian Processes via Double Kullback-Leibler Minimization
 - VectorMapNet: End-to-end Vectorized HD Map Learning
 - Vector Quantized Wasserstein Auto-Encoder
 - Vector-Valued Control Variates
 - Vertical Federated Graph Neural Network for Recommender System
 - VIMA: Robot Manipulation with Multimodal Prompts
 - Von Mises Mixture Distributions for Molecular Conformation Generation
 - Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap
 - Wasserstein Barycenter Matching for Graph Size Generalization of Message Passing Neural Networks
 - Weakly Supervised Disentangled Generative Causal Representation Learning
 - Weakly Supervised Regression with Interval Targets
 - Weak Proxies are Sufficient and Preferable for Fairness with Missing Sensitive Attributes
 - Weighted Flow Diffusion for Local Graph Clustering with Node Attributes: an Algorithm and Statistical Guarantees
 - Weighted Sampling without Replacement for Deep Top-$k$ Classification
 - Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality
 - What Can Be Learnt With Wide Convolutional Neural Networks?
 - What can online reinforcement learning with function approximation benefit from general coverage conditions?
 - What do CNNs Learn in the First Layer and Why? A Linear Systems Perspective
 - What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL?
 - What Makes Entities Similar? A Similarity Flooding Perspective for Multi-sourced Knowledge Graph Embeddings
 - When and How Does Known Class Help Discover Unknown Ones? Provable Understanding Through Spectral Analysis
 - When does Privileged information Explain Away Label Noise?
 - When do Minimax-fair Learning and Empirical Risk Minimization Coincide?
 - When is Realizability Sufficient for Off-Policy Reinforcement Learning?
 - When Personalization Harms Performance: Reconsidering the Use of Group Attributes in Prediction
 - When Sparsity Meets Contrastive Models: Less Graph Data Can Bring Better Class-Balanced Representations
 - Which Features are Learnt by Contrastive Learning? On the Role of Simplicity Bias in Class Collapse and Feature Suppression
 - Which Invariance Should We Transfer? A Causal Minimax Learning Approach
 - Which is Better for Learning with Noisy Labels: The Semi-supervised Method or Modeling Label Noise?
 - Which Tricks are Important for Learning to Rank?
 - Who Needs to Know? Minimal Knowledge for Optimal Coordination
 - Whose Opinions Do Language Models Reflect?
 - "Why did the Model Fail?": Attributing Model Performance Changes to Distribution Shifts
 - Why does Throwing Away Data Improve Worst-Group Error?
 - Why do Nearest Neighbor Language Models Work?
 - Why Is Public Pretraining Necessary for Private Model Training?
 - Why Random Pruning Is All We Need to Start Sparse
 - Why Target Networks Stabilise Temporal Difference Methods
 - Width and Depth Limits Commute in Residual Networks
 - WL meet VC
 - Workshop on Theory of Mind in Communicating Agents
 - Wrapped Cauchy Distributed Angular Softmax for Long-Tailed Visual Recognition
 - XAI Beyond Classification: Interpretable Neural Clustering
 - X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion
 - XTab: Cross-table Pretraining for Tabular Transformers
 
Successful Page Load