# Downloads 2023

Number of events: 1908

- $H$-Consistency Bounds for Pairwise Misranking Loss Surrogates
- $\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation
- 2D-Shapley: A Framework for Fragmented Data Valuation
- 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML)
- 2nd ICML Workshop on Machine Learning for Astrophysics
- 2nd ICML Workshop on New Frontiers in Adversarial Machine Learning
- 2nd Workshop on Formal Verification of Machine Learning
- 3rd Workshop on Interpretable Machine Learning in Healthcare (IMLH)
- abess: A Fast Best-Subset Selection Library in Python and R
- AbODE: Ab initio antibody design using conjoined ODEs
- Abstracting Imperfect Information Away from Two-Player Zero-Sum Games
- Abstract-to-Executable Trajectory Translation for One-Shot Task Generalization
- A/B Testing in Network Data with Covariate-Adaptive Randomization
- ACAT: Adversarial Counterfactual Attention for Classification and Detection in Medical Imaging
- A Category-theoretical Meta-analysis of Definitions of Disentanglement
- Accelerated Cyclic Coordinate Dual Averaging with Extrapolation for Composite Convex Optimization
- Accelerated Infeasibility Detection of Constrained Optimization and Fixed-Point Iterations
- Accelerated Primal-Dual Methods for Convex-Strongly-Concave Saddle Point Problems
- Accelerated Stochastic Optimization Methods under Quasar-convexity
- Accounting For Informative Sampling When Learning to Forecast Treatment Outcomes Over Time
- Accuracy on the Curve: On the Nonlinear Correlation of ML Performance Between Data Subpopulations
- Achieving Hierarchy-Free Approximation for Bilevel Programs with Equilibrium Constraints
- Achieving High Accuracy with PINNs via Energy Natural Gradient Descent
- Achieving Linear Speedup in Non-IID Federated Bilevel Learning
- A Closer Look at Few-shot Classification Again
- A Closer Look at Self-Supervised Lightweight Vision Transformers
- A Closer Look at the Intervention Procedure of Concept Bottleneck Models
- A Complete Expressiveness Hierarchy for Subgraph GNNs via Subgraph Weisfeiler-Lehman Tests
- A Conditional Normalizing Flow for Accelerated Multi-Coil MR Imaging
- A Connection between One-Step RL and Critic Regularization in Reinforcement Learning
- A Coupled Flow Approach to Imitation Learning
- A Critical Revisit of Adversarial Robustness in 3D Point Cloud Recognition with Diffusion-Driven Purification
- A Critical View of Vision-Based Long-Term Dynamics Prediction Under Environment Misalignment
- Action Matching: Learning Stochastic Dynamics from Samples
- Active causal structure learning with advice
- Active Learning based Structural Inference
- Active Policy Improvement from Multiple Black-box Oracles
- Active Ranking of Experts Based on their Performances in Many Tasks
- Actor-Critic Alignment for Offline-to-Online Reinforcement Learning
- AdaBoost is not an Optimal Weak to Strong Learner
- AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation
- AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
- Adapting to game trees in zero-sum imperfect information games
- Adaptive Annealed Importance Sampling with Constant Rate Progress
- Adaptive Barrier Smoothing for First-Order Policy Gradient with Contact Dynamics
- Adaptive Compositional Continual Meta-Learning
- Adaptive Computation with Elastic Input Sequence
- Adaptive Coordination in Social Embodied Rearrangement
- Adaptive Estimation of Graphical Models under Total Positivity
- Adaptive Identification of Populations with Treatment Benefit in Clinical Trials: Machine Learning Challenges and Solutions
- Adaptive IMLE for Few-shot Pretraining-free Generative Modelling
- Adaptively Weighted Data Augmentation Consistency Regularization for Robust Optimization under Concept Shift
- Adaptive Smoothing Gradient Learning for Spiking Neural Networks
- Adaptive Whitening in Neural Populations with Gain-modulating Interneurons
- Additive Causal Bandits with Unknown Graph
- Addressing Budget Allocation and Revenue Allocation in Data Market Environments Using an Adaptive Sampling Algorithm
- A Deep Conjugate Direction Method for Iteratively Solving Linear Systems
- A Distribution Optimization Framework for Confidence Bounds of Risk Measures
- Adversarial Cheap Talk
- Adversarial Classification: Necessary Conditions and Geometric Flows
- Adversarial Collaborative Learning on Non-IID Features
- Adversarial Example Does Good: Preventing Painting Imitation from Diffusion Models via Adversarial Examples
- Adversarial Learning of Distributional Reinforcement Learning
- Adversarially Robust PAC Learnability of Real-Valued Functions
- Adversarial Parameter Attack on Deep Neural Networks
- Adversarial Policies Beat Superhuman Go AIs
- Adversarial robustness of amortized Bayesian inference
- A Fast Optimistic Method for Monotone Variational Inequalities
- A Fast, Well-Founded Approximation to the Empirical Neural Tangent Kernel
- A Flexible Diffusion Model
- A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback
- A Fully First-Order Method for Stochastic Bilevel Optimization
- A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems
- A Generalization of ViT/MLP-Mixer to Graphs
- A General Representation Learning Framework with Generalization Performance Guarantees
- A General Theory for Federated Optimization with Asynchronous and Heterogeneous Clients Updates
- A Gromov--Wasserstein Geometric View of Spectrum-Preserving Graph Coarsening
- A Group Symmetric Stochastic Differential Equation Model for Molecule Multi-modal Pretraining
- A Hybrid Quantum-Classical Approach based on the Hadamard Transform for the Convolutional Layer
- A Kernel-Based View of Language Model Fine-Tuning
- A Kernelized Stein Discrepancy for Biological Sequences
- A Kernel Stein Test of Goodness of Fit for Sequential Models
- A Large-Scale Study of Probabilistic Calibration in Neural Network Regression
- A Law of Robustness beyond Isoperimetry
- Algorithmic Collective Action in Machine Learning
- Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions
- Algorithms for bounding contribution for histogram estimation under user-level privacy
- Aligning Language Models with Preferences through $f$-divergence Minimization
- A Likelihood Approach to Nonparametric Estimation of a Singular Distribution Using Deep Generative Models
- All in a Row: Compressed Convolution Networks for Graphs
- Alternately Optimized Graph Neural Networks
- Alternating Local Enumeration (TnALE): Solving Tensor Network Structure Search with Fewer Evaluations
- A Mathematical Model for Curriculum Learning for Parities
- A Model-Based Method for Minimizing CVaR and Beyond
- A Model-free Closeness-of-influence Test for Features in Supervised Learning
- A Modern Look at the Relationship between Sharpness and Generalization
- An Adaptive Entropy-Regularization Framework for Multi-Agent Reinforcement Learning
- Analysis of Error Feedback in Federated Non-Convex Optimization with Biased Compression: Fast Convergence and Partial Participation
- Analyzing Convergence in Quantum Neural Networks: Deviations from Neural Tangent Kernels
- Analyzing Diffusion as Serial Reproduction
- Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis Testing: A Lesson From Fano
- Anchor Sampling for Federated Learning with Partial Client Participation
- A Nearly-Optimal Bound for Fast Regression with $\ell_\infty$ Guarantee
- A Near-Optimal Algorithm for Safe Reinforcement Learning Under Instantaneous Hard Constraints
- An Effective Meaningful Way to Evaluate Survival Models
- A Neural PDE Solver with Temporal Stencil Modeling
- A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree
- A New PHO-rmula for Improved Performance of Semi-Structured Networks
- An Information-Theoretic Analysis of Nonstationary Bandit Learning
- An Instrumental Variable Approach to Confounded Off-Policy Evaluation
- An Investigation into Pre-Training Object-Centric Representations for Reinforcement Learning
- An SDE for Modeling SAM: Theory and Insights
- Answering Complex Logical Queries on Knowledge Graphs via Query Computation Tree Optimization
- Anti-Exploration by Random Network Distillation
- A Picture of the Space of Typical Learnable Tasks
- Applied Online Algorithms with Heterogeneous Predictors
- Approximate Causal Effect Identification under Weak Confounding
- Approximately Optimal Core Shapes for Tensor Decompositions
- Approximate Stein Classes for Truncated Density Estimation
- Approximation Algorithms for Fair Range Clustering
- Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input
- Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN
- Are Diffusion Models Vulnerable to Membership Inference Attacks?
- Are Equivariant Equilibrium Approximators Beneficial?
- Are Gaussian Data All You Need? The Extents and Limits of Universality in High-Dimensional Generalized Linear Estimation
- A Reinforcement Learning Framework for Dynamic Mediation Analysis
- Are labels informative in semi-supervised learning? Estimating and leveraging the missing-data mechanism.
- Are Large Kernels Better Teachers than Transformers for ConvNets?
- Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations
- Are Random Decompositions all we need in High Dimensional Bayesian Optimisation?
- Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
- A Robust Optimisation Perspective on Counterexample-Guided Repair of Neural Networks
- A Robust Test for the Stationarity Assumption in Sequential Decision Making
- Artificial Intelligence & Human Computer Interaction
- A Scalable Frank-Wolfe-Based Algorithm for the Max-Cut SDP
- A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
- A Statistical Perspective on Retrieval-Based Models
- A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs
- A Study on Transformer Configuration and Training Objective
- Atari-5: Distilling the Arcade Learning Environment down to Five Games
- A Theoretical Analysis of the Learning Dynamics under Class Imbalance
- A theory of continuous generative flow networks
- A theory of representation learning gives a deep generalisation of kernel methods
- A Three-regime Model of Network Pruning
- A Toy Model of Universality: Reverse Engineering how Networks Learn Group Operations
- Attention-Based Recurrence for Multi-Agent Reinforcement Learning under Stochastic Partial Observability
- Attribute-Efficient PAC Learning of Low-Degree Polynomial Threshold Functions with Nasty Noise
- Attributing Image Generative Models using Latent Fingerprints
- A Two-Stage Active Learning Algorithm for k-Nearest Neighbors
- AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
- A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition
- A Unified Optimization Framework of ANN-SNN Conversion: Towards Optimal Mapping from Activation Values to Firing Rates
- A Unifying Framework to the Analysis of Interaction Methods using Synergy Functions
- A Universal Unbiased Method for Classification from Aggregate Observations
- AutoCoreset: An Automatic Practical Coreset Construction Framework
- Auto-Differentiation of Relational Computations for Very Large Scale Machine Learning
- Automated Search for Conjectures on Mathematical Constants using Analysis of Integer Sequences
- Automatically Auditing Large Language Models via Discrete Optimization
- Automatically marginalized MCMC in probabilistic programming
- Automatic Data Augmentation via Invariance-Constrained Learning
- Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning
- Autoregressive Diffusion Model for Graph Generation
- Auxiliary Learning as an Asymmetric Bargaining Game
- Auxiliary Modality Learning with Generalized Curriculum Distillation
- Averaged Method of Multipliers for Bi-Level Optimization without Lower-Level Strong Convexity
- A Watermark for Large Language Models
- Bag of Tricks for Training Data Extraction from Language Models
- Bandit Multi-linear DR-Submodular Maximization and Its Applications on Adversarial Submodular Bandits
- Bandit Online Linear Optimization with Hints and Queries
- Bandits with Knapsacks: Advice on Time-Varying Demands
- Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit Learning
- Bayesian Design Principles for Frequentist Sequential Learning
- Bayesian Estimation of Differential Privacy
- Bayesian Neural Networks Avoid Encoding Complex and Perturbation-Sensitive Concepts
- Bayesian online change point detection with Hilbert space approximate Student-t process
- Bayesian Progressive Deep Topic Model with Knowledge Informed Textual Data Coarsening Process
- Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models
- Bayes-optimal Learning of Deep Random Networks of Extensive-width
- Beam Tree Recursive Cells
- BEATs: Audio Pre-Training with Acoustic Tokenizers
- Behavior Contrastive Learning for Unsupervised Skill Discovery
- Benign Overfitting in Deep Neural Networks under Lazy Training
- Benign Overfitting in Two-layer ReLU Convolutional Neural Networks
- Best Arm Identification in Multi-Agent Multi-Armed Bandits
- Best of Both Worlds Policy Optimization
- Better Diffusion Models Further Improve Adversarial Training
- Better Training of GFlowNets with Local Credit and Incomplete Trajectories
- Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic
- Beyond Homophily: Reconstructing Structure for Graph-agnostic Clustering
- Beyond In-Domain Scenarios: Robust Density-Aware Calibration
- Beyond Lipschitz Smoothness: A Tighter Analysis for Nonconvex Optimization
- Beyond Reward: Offline Preference-guided Policy Optimization
- Beyond the Edge of Stability via Two-step Gradient Updates
- Beyond the Universal Law of Robustness: Sharper Laws for Random Features and Neural Tangent Kernels
- Beyond Uniform Lipschitz Condition in Differentially Private Optimization
- Biases in Evaluation of Molecular Optimization Methods and Bias Reduction Strategies
- BiBench: Benchmarking and Analyzing Network Binarization
- Bidirectional Adaptation for Robust Semi-Supervised Learning with Inconsistent Data Distributions
- Bidirectional Learning for Offline Model-based Biological Sequence Design
- Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers
- Bi-directional Masks for Efficient N:M Sparse Training
- Bigger, Better, Faster: Human-level Atari with human-level efficiency
- Bilevel Optimization with Coupled Decision-Dependent Distributions
- BiRT: Bio-inspired Replay in Vision Transformers for Continual Learning
- Bit Allocation using Optimization
- Blackout Diffusion: Generative Diffusion Models in Discrete-State Spaces
- B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding
- BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
- Block Subsampled Randomized Hadamard Transform for Nyström Approximation on Distributed Architectures
- Blockwise Stochastic Variance-Reduced Methods with Parallel Speedup for Multi-Block Bilevel Optimization
- Blossom: an Anytime Algorithm for Computing Optimal Decision Trees
- BNN-DP: Robustness Certification of Bayesian Neural Networks via Dynamic Programming
- Boosting Graph Contrastive Learning via Graph Contrastive Saliency
- Boosting Offline Reinforcement Learning with Action Preference Query
- Bootstrap in High Dimension with Low Computation
- Bootstrapped Representations in Reinforcement Learning
- BPipe: Memory-Balanced Pipeline Parallelism for Training Large Language Models
- Brainformers: Trading Simplicity for Efficiency
- Brauer's Group Equivariant Neural Networks
- Building Neural Networks on Matrix Manifolds: A Gyrovector Space Approach
- Buying Information for Stochastic Optimization
- Byzantine-Robust Learning on Heterogeneous Data via Gradient Splitting
- CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
- Calibrating Multimodal Learning
- Can Forward Gradient Match Backpropagation?
- Can Large Language Models Reason about Program Invariants?
- Can Neural Network Memorization Be Localized?
- Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?
- CataBEEM: Integrating Latent Interaction Categories in Node-wise Community Detection Models for Network Data
- Causal Bounds in Quasi-Markovian Graphs
- Causal Discovery with Latent Confounders Based on Higher-Order Cumulants
- Causal Isotonic Calibration for Heterogeneous Treatment Effects
- Causal Modeling of Policy Interventions From Treatment–Outcome Sequences
- Causal Proxy Models for Concept-based Model Explanations
- Causal Strategic Classification: A Tale of Two Shifts
- Causal Structure Learning for Latent Intervened Non-stationary Data
- Cell-Free Latent Go-Explore
- Certified Robust Neural Networks: Generalization and Corruption Resistance
- Certifying Ensembles: A General Certification Theory with S-Lipschitzness
- Challenges in Deployable Generative AI
- Chameleon: Adapting to Peer Images for Planting Durable Backdoors in Federated Learning
- Change is Hard: A Closer Look at Subpopulation Shift
- Chemically Transferable Generative Backmapping of Coarse-Grained Proteins
- CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets
- ChiPFormer: Transferable Chip Placement via Offline Decision Transformer
- CircuitNet: A Generic Neural Network to Realize Universal Circuit Motif Modeling
- CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms
- ClimaX: A foundation model for weather and climate
- CLIPood: Generalizing CLIP to Out-of-Distributions
- Cluster Explanation via Polyhedral Descriptions
- ClusterFuG: Clustering Fully connected Graphs by Multicut
- Cluster-Specific Predictions with Multi-Task Gaussian Processes
- CLUSTSEG: Clustering for Universal Segmentation
- CLUTR: Curriculum Learning via Unsupervised Task Representation Learning
- Coarse-to-Fine: a Hierarchical Diffusion Model for Molecule Generation in 3D
- CO-BED: Information-Theoretic Contextual Optimization via Bayesian Experimental Design
- Cocktail Party Attack: Breaking Aggregation-Based Privacy in Federated Learning Using Independent Component Analysis
- CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks
- CoCo: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification
- CodeIPPrompt: Intellectual Property Infringement Assessment of Code Language Models
- Coder Reviewer Reranking for Code Generation
- CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis
- Coin Sampling: Gradient-Based Bayesian Inference without Learning Rates
- COLA: Orchestrating Error Coding and Learning for Robust Neural Network Inference Against Hardware Defects
- Cold Analysis of Rao-Blackwellized Straight-Through Gumbel-Softmax Gradient Estimator
- Collaborative Causal Inference with Fair Incentives
- Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits
- Combinatorial Neural Bandits
- COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models
- Communication-Constrained Bandits under Additive Gaussian Noise
- Communication-Efficient Federated Hypergradient Computation via Aggregated Iterative Differentiation
- Comparison of meta-learners for estimating multi-valued treatment heterogeneous effects
- Competing for Shareable Arms in Multi-Player Multi-Armed Bandits
- Competitive Gradient Optimization
- Complementary Attention for Multi-Agent Reinforcement Learning
- Complexity of Block Coordinate Descent with Proximal Regularization and Applications to Wasserstein CP-dictionary Learning
- Composer: Creative and Controllable Image Synthesis with Composable Conditions
- Compositional Exemplars for In-context Learning
- Compositional Score Modeling for Simulation-Based Inference
- Compressed Decentralized Proximal Stochastic Gradient Method for Nonconvex Composite Problems with Heterogeneous Data
- Compressing Tabular Data via Latent Variable Estimation
- Computational Asymmetries in Robust Classification
- Computational Doob h-transforms for Online Filtering of Discretely Observed Diffusions
- Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings
- Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
- Concept-based Explanations for Out-of-Distribution Detectors
- ConCerNet: A Contrastive Learning Based Framework for Automated Conservation Law Discovery and Trustworthy Dynamical System Prediction
- Concurrent Shuffle Differential Privacy Under Continual Observation
- Conditional Graph Information Bottleneck for Molecular Relational Learning
- Conditionally Strongly Log-Concave Generative Models
- Conditional Tree Matching for Inference-Time Adaptation of Tree Prediction Models
- Conditions and Assumptions for Constraint-based Causal Structure Learning
- Cones: Concept Neurons in Diffusion Models for Customized Generation
- Confidence and Dispersity Speak: Characterizing Prediction Matrix for Unsupervised Accuracy Estimation
- Conformal Inference is (almost) Free for Neural Networks Trained with Early Stopping
- Conformalization of Sparse Generalized Linear Models
- Conformal Prediction for Federated Uncertainty Quantification Under Label Shift
- Conformal Prediction Sets for Graph Neural Networks
- Conformal Prediction with Missing Values
- Consistency Models
- Consistency of Multiple Kernel Clustering
- Constant Matters: Fine-grained Error Bound on Differentially Private Continual Observation
- Constrained Causal Bayesian Optimization
- Constrained Decision Transformer for Offline Safe Reinforcement Learning
- Constrained Efficient Global Optimization of Expensive Black-box Functions
- Constrained Monotonic Neural Networks
- Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching
- Constrained Phi-Equilibria
- Constraint Reasoning Embedded Structured Prediction
- Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning
- Context Consistency Regularization for Label Sparsity in Time Series
- Contextual Combinatorial Bandits with Probabilistically Triggered Arms
- Contextual Conservative Interleaving Bandits
- Contextual Reliability: When Different Features Matter in Different Contexts
- Continual Learners are Incremental Model Generalizers
- Continual Learning in Linear Classification on Separable Data
- Continual Task Allocation in Meta-Policy Network via Sparse Prompting
- Continual Vision-Language Representation Learning with Off-Diagonal Information
- Continuation Path Learning for Homotopy Optimization
- Continuously Parameterized Mixture Models
- Continuous Spatiotemporal Transformer
- ContraBAR: Contrastive Bayes-Adaptive Deep RL
- Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning
- Contrastive Learning Meets Homophily: Two Birds with One Stone
- Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
- Controllability-Aware Unsupervised Skill Discovery
- Controllable Neural Symbolic Regression
- Controlled Differential Equations on Long Sequences via Non-standard Wavelets
- Controlled Text Generation with Natural Language Instructions
- Controlling Posterior Collapse by an Inverse Lipschitz Constraint on the Decoder Network
- Controlling Type Confounding in Ad Hoc Teamwork with Instance-wise Teammate Feedback Rectification
- Convergence of First-Order Methods for Constrained Nonconvex Optimization with Dependent Data
- Convergence of Proximal Point and Extragradient-Based Methods Beyond Monotonicity: the Case of Negative Comonotonicity
- Convex Geometry of ReLU-layers, Injectivity on the Ball and Local Reconstruction
- Cooperation in the Latent Space: The Benefits of Adding Mixture Components in Variational Autoencoders
- Cooperative Multi-Agent Reinforcement Learning: Asynchronous Communication and Linear Function Approximation
- Cooperative Open-ended Learning Framework for Zero-Shot Coordination
- Coordinated Dynamic Bidding in Repeated Second-Price Auctions with Budgets
- Coordinate Descent Methods for Fractional Minimization
- Correcting discount-factor mismatch in on-policy policy gradient methods
- Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
- “Could it have been different?” Counterfactuals in Minds and Machines
- Counterfactual Analysis in Dynamic Latent State Models
- Counterfactual Identifiability of Bijective Causal Models
- Coupled Variational Autoencoder
- Covariate balancing using the integral probability metric for causal inference
- Crafting Training Degradation Distribution for the Accuracy-Generalization Trade-off in Real-World Super-Resolution
- Cramming: Training a Language Model on a single GPU in one day.
- CRISP: Curriculum based Sequential neural decoders for Polar code family
- Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss
- Cross-Entropy Loss Functions: Theoretical Analysis and Applications
- Cross-Modal Fine-Tuning: Align then Refine
- CrossSplit: Mitigating Label Noise Memorization through Data Splitting
- CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations
- Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments
- Curious Replay for Model-based Adaptation
- Curriculum Co-disentangled Representation Learning across Multiple Environments for Social Recommendation
- Cut your Losses with Squentropy
- Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization
- D2Match: Leveraging Deep Learning and Degeneracy for Subgraph Matching
- DADAO: Decoupled Accelerated Decentralized Asynchronous Optimization
- Data-Copying in Generative Models: A Formal Framework
- Data-Derived Weak Universal Consistency
- Data-Driven Subgroup Identification for Linear Regression
- Data-Efficient Contrastive Self-supervised Learning: Most Beneficial Examples for Supervised Learning Contribute the Least
- Data Efficient Neural Scaling Law via Model Reusing
- Data Feedback Loops: Model-driven Amplification of Dataset Biases
- Data-OOB: Out-of-bag Estimate as a Simple and Efficient Data Value
- Data Poisoning Attacks Against Multimodal Encoders
- Data Representations' Study of Latent Image Manifolds
- Dataset Distillation with Convexified Implicit Gradients
- Data Structures for Density Estimation
- DDGR: Continual Learning with Deep Diffusion-based Generative Replay
- Decentralized SGD and Average-direction SAM are Asymptotically Equivalent
- Decentralized Stochastic Bilevel Optimization with Improved per-Iteration Complexity
- Decoding Layer Saliency in Language Transformers
- DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design
- Deep Anomaly Detection under Labeling Budget Constraints
- Deep Clustering with Incomplete Noisy Pairwise Annotations: A Geometric Regularization Approach
- Deep Generative Symbolic Regression with Monte-Carlo-Tree-Search
- Deep Graph Representation Learning and Optimization for Influence Maximization
- Deep Laplacian-based Options for Temporally-Extended Exploration
- Deep Latent State Space Models for Time-Series Generation
- Deep linear networks can benignly overfit when shallow ones do
- Deep Perturbation Learning: Enhancing the Network Performance via Image Perturbations
- Deep Regression Unlearning
- Deep Temporal Sets with Evidential Reinforced Attentions for Unique Behavioral Pattern Discovery
- Defects of Convolutional Decoder Networks in Frequency Representation
- Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
- Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback
- Delay-agnostic Asynchronous Coordinate Update Algorithm
- Delayed Bandits: When Do Intermediate Observations Help?
- Delayed Feedback in Kernel Bandits
- Delving into Noisy Label Detection with Clean Data
- Demonstration-free Autonomous Reinforcement Learning via Implicit and Bidirectional Curriculum
- Demystifying Disagreement-on-the-Line in High Dimensions
- Demystifying Uneven Vulnerability of Link Stealing Attacks against Graph Neural Networks
- Denoising MCMC for Accelerating Diffusion-Based Generative Models
- DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models
- DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
- Detecting Adversarial Data by Probing Multiple Perturbations Using Expected Perturbation Score
- Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions
- Detecting Out-of-distribution Data through In-distribution Class Prior
- Deterministic equivalent and error universality of deep random features learning
- DevFormer: A Symmetric Transformer for Context-Aware Device Placement
- Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation
- DIFF2: Differential Private Optimization via Gradient Differences for Nonconvex Distributed Learning
- Difference-in-Differences Meets Tree-based Methods: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding
- Difference of submodular minimization via DC programming
- Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators
- Differentiable and Transportable Structure Learning
- Differentiable Multi-Target Causal Bayesian Experimental Design
- Differentiable Simulations for Enhanced Sampling of Rare Events
- Differentiable Tree Operations Promote Compositional Generalization
- Differentially Private Distributed Bayesian Linear Regression with MCMC
- Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards
- Differentially Private Hierarchical Clustering with Provable Approximation Guarantees
- Differentially Private Optimization on Large Model at Small Cost
- Differentially Private Sharpness-Aware Training
- Differentially Private Stochastic Convex Optimization under a Quantile Loss Function
- Differential Privacy has Bounded Impact on Fairness in Classification
- Differential Privacy, Linguistic Fairness, and Training Data Influence: Impossibility and Possibility Theorems for Multilingual Language Models
- Diffusion Based Representation Learning
- Diffusion Models are Minimax Optimal Distribution Estimators
- Diffusion Models as Artists: Are we Closing the Gap between Humans and Machines?
- Diffusion Models for Black-Box Optimization
- Dimensionality Reduction for General KDE Mode Finding
- Dimension-independent Certified Neural Network Watermarks via Mollifier Smoothing
- Dink-Net: Neural Clustering on Large Graphs
- Directed Chain Generative Adversarial Networks
- Direct Parameterization of Lipschitz-Bounded Deep Networks
- Dirichlet Diffusion Score Model for Biological Sequence Generation
- DiscoBAX - Discovery of optimal intervention sets in genomic experiment design
- Discover and Cure: Concept-aware Mitigation of Spurious Correlation
- Discovering Agent-Centric Latent States in Theory and in Practice
- Discovering Object-Centric Generalized Value Functions From Pixels
- Discover-Then-Rank Unlabeled Support Vectors in the Dual Space for Multi-Class Active Learning
- Discrete Continuous Optimization Framework for Simultaneous Clustering and Training in Mixture Models
- Discrete Key-Value Bottleneck
- Disentangled Generative Models for Robust Prediction of System Dynamics
- Disentangled Multi-Fidelity Deep Bayesian Active Learning
- Disentangled Multiplex Graph Representation Learning
- Disinformation, Fake News and Computational Propaganda: Challenges and Opportunities for Machine Learning Research
- Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
- Distance Weighted Supervised Learning for Offline Interaction Data
- Distilling Internet-Scale Vision-Language Models into Embodied Agents
- Distortion and Uncertainty Aware Loss for Panoramic Depth Completion
- Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost
- Distributed Linear Bandits under Communication Constraints
- Distributed Stochastic Gradient Descent: Nonconvexity, Nonsmoothness, and Convergence to Local Minima
- Distributional Offline Policy Evaluation with Predictive Error Guarantees
- Distribution-dependent McDiarmid-type Inequalities for Functions of Unbounded Interaction
- Distribution Free Domain Generalization
- Distribution Free Prediction Sets for Node Classification
- Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior Inference
- Diversity-enhancing Generative Network for Few-shot Hypothesis Adaptation
- Divide and Conquer Dynamic Programming: An Almost Linear Time Change Point Detection Methodology in High Dimensions
- Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat
- DIVISION: Memory Efficient Training via Dual Activation Precision
- DMLR Workshop: Data-centric Machine Learning Research
- DoCoFL: Downlink Compression for Cross-Device Federated Learning
- Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling
- Does a Neural Network Really Encode Symbolic Concepts?
- Does Continual Learning Equally Forget All Parameters?
- Does Sparsity Help in Learning Misspecified Linear Bandits?
- DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule
- Do Machine Learning Models Learn Statistical Rules Inferred from Data?
- Domain Adaptation for Time Series Under Feature and Label Shifts
- DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
- Do Not Train It: A Linear Neural Architecture Search of Graph Neural Networks
- Do Perceptually Aligned Gradients Imply Robustness?
- Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark
- Double-Weighting for Covariate Shift Adaptation
- Doubly Adversarial Federated Bandits
- Doubly Optimal No-Regret Learning in Monotone Games
- Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection
- DP-Fast MH: Private, Fast, and Accurate Metropolis-Hastings for Large-Scale Bayesian Inference
- DRCFS: Doubly Robust Causal Feature Selection
- DRew: Dynamically Rewired Message Passing with Delay
- Dropout Reduces Underfitting
- Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions
- DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation
- DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm
- Dual Focal Loss for Calibration
- DualHSIC: HSIC-Bottleneck and Alignment for Continual Learning
- Duality Principles for Modern Machine Learning
- Dual Propagation: Accelerating Contrastive Hebbian Learning with Dyadic Neurons
- DUET: 2D Structured and Approximately Equivariant Representations
- dugMatting: Decomposed-Uncertainty-Guided Matting
- Dynamical Linear Bandits
- Dynamic Constrained Submodular Optimization with Polylogarithmic Update Time
- Dynamic Regularized Sharpness Aware Minimization in Federated Learning: Approaching Global Consistency and Smooth Landscape
- Dynamics-inspired Neuromorphic Visual Representation Learning
- E$(n)$ Equivariant Message Passing Simplicial Networks
- ED-Batch: Efficient Automatic Batching of Dynamic Neural Networks via Learned Finite State Machines
- EF21-P and Friends: Improved Theoretical Communication Complexity for Distributed Optimization with Bidirectional Compression
- Effective and Efficient Structural Inference with Reservoir Computing
- Effectively Using Public Data in Privacy Preserving Machine Learning
- Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories
- Effective Neural Topic Modeling with Embedding Clustering Regularization
- Effective Structured Prompting by Meta-Learning and Representative Verbalizer
- Efficient Algorithms for Exact Graph Matching on Correlated Stochastic Block Models with Constant Correlation
- Efficient and Degree-Guided Graph Generation via Discrete Diffusion Modeling
- Efficient and Equivariant Graph Networks for Predicting Quantum Hamiltonian
- Efficient Approximations of Complete Interatomic Potentials for Crystal Property Prediction
- Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram Iteration
- Efficient displacement convex optimization with particle gradient descent
- Efficient Exploration via Epistemic-Risk-Seeking Policy Optimization
- Efficient Graph Field Integrators Meet Point Clouds
- Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming
- Efficient Learning of Mesh-Based Physical Simulation with Bi-Stride Multi-Scale Graph Neural Network
- Efficient List-Decodable Regression using Batches
- Efficiently predicting high resolution mass spectra with graph neural networks
- Efficient Online Reinforcement Learning with Offline Data
- Efficient Parametric Approximations of Neural Network Function Space Distance
- Efficient Personalized Federated Learning via Sparse Model-Adaptation
- Efficient preconditioned stochastic gradient descent for estimation in latent variable models
- Efficient Quantum Algorithms for Quantum Optimal Control
- Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation
- Efficient RL via Disentangled Environment and Agent Representations
- Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
- Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
- Efficient Training of Language Models using Few-Shot Learning
- Efficient Transformed Gaussian Processes for Non-Stationary Dependent Multi-class Classification
- Eliminating Adversarial Noise via Information Discard and Robust Representation Restoration
- ELSA: Efficient Label Shift Adaptation through the Lens of Semiparametric Models
- Emergence of Adaptive Circadian Rhythms in Deep Reinforcement Learning
- Emergence of Sparse Representations from Noise
- Emergent Agentic Transformer from Chain of Hindsight Experience
- Emergent Asymmetry of Precision and Recall for Measuring Fidelity and Diversity of Generative Models in High Dimensions
- EM-Network: Oracle Guided Self-distillation for Sequence Learning
- Enabling First-Order Gradient-Based Learning for Equilibrium Computation in Markets
- End-to-end Differentiable Clustering with Associative Memories
- End-to-End Full-Atom Antibody Design
- End-to-End Learning for Stochastic Optimization: A Bayesian Perspective
- End-to-End Multi-Object Detection with a Regularized Mixture Model
- End-to-end Training of Deep Boltzmann Machines by Unbiased Contrastive Divergence with Local Mode Initialization
- Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments
- Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language
- Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning
- Entropy-driven Unsupervised Keypoint Representation Learning in Videos
- Equivariance with Learned Canonicalization Functions
- Equivariant Architectures for Learning in Deep Weight Spaces
- Equivariant Polynomials for Graph Neural Networks
- Escaping saddle points in zeroth-order optimization: the power of two-point estimators
- ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation
- ES-FoMo: Efficient Systems for Foundation Models
- Estimating Causal Effects using a Multi-task Deep Ensemble
- Estimating Heterogeneous Treatment Effects: Mutual Information Bounds and Learning Algorithms
- Estimating Joint Treatment Effects by Combining Multiple Experiments
- Estimating Possible Causal Effects with Latent Variables via Adjustment
- Estimating the Contamination Factor's Distribution in Unsupervised Anomaly Detection
- Estimation Beyond Data Reweighting: Kernel Method of Moments
- Evaluating Self-Supervised Learning via Risk Decomposition
- Evaluating Unsupervised Denoising Requires Unsupervised Metrics
- Eventual Discounting Temporal Logic Counterfactual Experience Replay
- Everyone's Preference Changes Differently: A Weighted Multi-Interest Model For Retrieval
- Evidential Interactive Learning for Medical Image Captioning
- Evolving Semantic Prototype Improves Generative Zero-Shot Learning
- Ewald-based Long-Range Message Passing for Molecular Graphs
- Exact Inference in High-order Structured Prediction
- Existence and Estimation of Critical Batch Size for Training Generative Adversarial Networks with Two Time-Scale Update Rule
- Existence, Stability and Scalability of Orthogonal Convolutional Neural Networks
- Expectation-Complete Graph Representations with Homomorphisms
- Expected Gradients of Maxout Networks and Consequences to Parameter Initialization
- Expertise Trees Resolve Knowledge Limitations in Collective Decision-Making
- Exphormer: Sparse Transformers for Graphs
- Explainability as statistical inference
- Explainable Data-Driven Optimization: From Context to Decision and Back Again
- Explaining Reinforcement Learning with Shapley Values
- Explaining the effects of non-convergent MCMC in the training of Energy-Based Models
- Exploiting locality in high-dimensional Factorial hidden Markov models
- Explore and Exploit the Diverse Knowledge in Model Zoo for Domain Generalization
- Exploring Chemical Space with Score-based Out-of-distribution Generation
- Exploring Model Dynamics for Accumulative Poisoning Discovery
- Exploring the Benefits of Training Expert Language Models over Instruction Tuning
- Exploring the Limits of Model-Targeted Indiscriminate Data Poisoning Attacks
- Exponential Smoothing for Off-Policy Learning
- Extending Conformal Prediction to Hidden Markov Models with Exact Validity via de Finetti's Theorem for Markov Chains
- Extending Kernel PCA through Dualization: Sparsity, Robustness and Fast Algorithms
- Extrapolated Random Tree for Regression
- Extrapolative Controlled Sequence Generation via Iterative Refinement
- Facial Expression Recognition with Adaptive Frame Rate based on Multiple Testing Correction
- FaDIn: Fast Discretized Inference for Hawkes Processes with General Parametric Kernels
- FAENet: Frame Averaging Equivariant GNN for Materials Modeling
- Fair and Accurate Decision Making through Group-Aware Learning
- Fair and Optimal Classification via Post-Processing
- Fair and Robust Estimation of Heterogeneous Treatment Effects for Policy Learning
- Fair Densities via Boosting the Sufficient Statistics of Exponential Families
- FAIRER: Fairness as Decision Rationale Alignment
- Fair Neighbor Embedding
- Fairness in Matching under Uncertainty
- Fairness in Streaming Submodular Maximization over a Matroid Constraint
- Fair yet Asymptotically Equal Collaborative Learning
- Faith-Shap: The Faithful Shapley Interaction Index
- FARE: Provably Fair Representation Learning with Practical Certificates
- Fascinating Supervisory Signals and Where to Find Them: Deep Anomaly Detection with Scale Learning
- Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization
- Fast Algorithms for Distributed k-Clustering with Outliers
- Fast as CHITA: Neural Network Pruning with Combinatorial Optimization
- Fast Combinatorial Algorithms for Min Max Correlation Clustering
- Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective
- Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization
- Faster Rates of Convergence to Stationary Points in Differentially Private Optimization
- Fast Excess Risk Rates via Offset Rademacher Complexity
- Fast Federated Machine Unlearning with Nonlinear Functional Theory
- Fast Inference from Transformers via Speculative Decoding
- Fast Online Node Labeling for Very Large Graphs
- Fast Online Value-Maximizing Prediction Sets with Conformal Cost Control
- Fast Private Kernel Density Estimation via Locality Sensitive Quantization
- Fast Rates for Maximum Entropy Exploration
- Fast Rates in Time-Varying Strongly Monotone Games
- Fast Sampling of Diffusion Models via Operator Learning
- Featured Graph Coarsening with Similarity Guarantees
- Feature Directions Matter: Long-Tailed Learning via Rotated Balanced Representation
- Feature Expansion for Graph Neural Networks
- Feature learning in deep classifiers through Intermediate Neural Collapse
- Feature Programming for Multivariate Time Series Prediction
- FedAvg Converges to Zero Training Loss Linearly for Overparameterized Multi-Layer Neural Networks
- FedBR: Improving Federated Learning on Heterogeneous Data via Local Learning Bias Reduction
- Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated Learning via Class-Imbalance Reduction
- FedCR: Personalized Federated Learning Based on Across-Client Common Representation with Conditional Mutual Information Regularization
- FedDisco: Federated Learning with Discrepancy-Aware Collaboration
- Federated Adversarial Learning: A Framework with Convergence Analysis
- Federated Conformal Predictors for Distributed Uncertainty Quantification
- Federated Heavy Hitter Recovery under Linear Sketching
- Federated Learning and Analytics in Practice: Algorithms, Systems, Applications, and Opportunities
- Federated Linear Contextual Bandits with User-level Differential Privacy
- Federated Online and Bandit Convex Optimization
- FedHPO-Bench: A Benchmark Suite for Federated Hyperparameter Optimization
- FedVS: Straggler-Resilient and Privacy-Preserving Vertical Federated Learning for Split Models
- FeDXL: Provable Federated Learning for Deep X-Risk Optimization
- Feed Two Birds with One Scone: Exploiting Wild Data for Both Out-of-Distribution Generalization and Detection
- Few-bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction
- Few-Sample Feature Selection via Feature Manifold Learning
- Fighting Fire with Fire: Contrastive Debiasing without Bias-free Data via Generative Bias-transformation
- Finding Generalization Measures by Contrasting Signal and Noise
- Finding the Missing-half: Graph Complementary Learning for Homophily-prone and Heterophily-prone Graphs
- Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron
- Fisher Information Embedding for Node and Graph Learning
- Flash: Concept Drift Adaptation in Federated Learning
- FLEX: an Adaptive Exploration Algorithm for Nonlinear Systems
- FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
- Flexible Model Aggregation for Quantile Regression
- Flexible Phase Dynamics for Bio-Plausible Contrastive Learning
- FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization
- Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement Learning
- Forget Unlearning: Towards True Data-Deletion in Machine Learning
- Formalizing Preferences Over Runtime Distributions
- For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal
- Forward-Backward Gaussian Variational Inference via JKO in the Bures-Wasserstein Space
- Fourmer: An Efficient Global Modeling Paradigm for Image Restoration
- FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation
- Fractional Denoising for 3D Molecular Pre-training
- FREDIS: A Fusion Framework of Refinement and Disambiguation for Unreliable Partial Label Learning
- Free-Form Variational Inference for Gaussian Process State-Space Models
- From Adaptive Query Release to Machine Unlearning
- From Hypergraph Energy Functions to Hypergraph Neural Networks
- From Noisy Fixed-Point Iterations to Private ADMM for Centralized and Federated Learning
- From Perception to Programs: Regularize, Overparameterize, and Amortize
- From Relational Pooling to Subgraph GNNs: A Universal Framework for More Expressive Graph Neural Networks
- From Robustness to Privacy and Back
- From Temporal to Contemporaneous Iterative Causal Discovery in the Presence of Latent Confounders
- Fully-Adaptive Composition in Differential Privacy
- Fully Bayesian Autoencoders with Latent Sparse Gaussian Processes
- Fully Dynamic Submodular Maximization over Matroids
- Functional Neural Networks: Shift invariant models for functional data with applications to EEG classification
- Function-Space Regularization in Neural Networks: A Probabilistic Perspective
- Fundamental Limits of Two-layer Autoencoders, and Achieving Them with Gradient Methods
- Fundamental Tradeoffs in Learning with Prior Information
- FusionRetro: Molecule Representation Fusion via In-Context Learning for Retrosynthetic Planning
- Future-conditioned Unsupervised Pretraining for Decision Transformer
- GAT: Guided Adversarial Training with Pareto-optimal Auxiliary Tasks
- Gaussian processes at the Helm(holtz): A more fluid model for ocean currents
- Gaussian Process Priors for Systems of Linear Partial Differential Equations with Constant Coefficients
- GC-Flow: A Graph-Based Flow Network for Effective Clustering
- GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models
- GeCoNeRF: Few-shot Neural Radiance Fields via Geometric Consistency
- General Covariance Data Augmentation for Neural PDE Solvers
- Generalization Analysis for Contrastive Representation Learning
- Generalization Bounds using Data-Dependent Fractal Dimensions
- Generalization on the Unseen, Logic Reasoning and Degree Curriculum
- Generalized Disparate Impact for Configurable Fairness Solutions in ML
- Generalized Implicit Follow-The-Regularized-Leader
- Generalized Polyak Step Size for First Order Optimization with Momentum
- Generalized Reductions: Making any Hierarchical Clustering Fair and Balanced with Low Cost
- Generalized-Smooth Nonconvex Optimization is As Efficient As Smooth Nonconvex Optimization
- Generalized Teacher Forcing for Learning Chaotic Dynamics
- Generalizing Neural Wave Functions
- General Sequential Episodic Memory Model
- Generated Graph Detection
- Generating Language Corrections for Teaching Physical Control Tasks
- Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds
- Generating Private Synthetic Data with Genetic Algorithms
- Generative Adversarial Symmetry Discovery
- Generative AI and Law (GenLaw)
- Generative Causal Representation Learning for Out-of-Distribution Motion Forecasting
- Generative Decoding of Visual Stimuli
- Generative Graph Dictionary Learning
- Generative Pretraining for Black-Box Optimization
- Geometric Autoencoders - What You See is What You Decode
- Geometric Clifford Algebra Networks
- Geometric Latent Diffusion Models for 3D Molecule Generation
- GFlowNet-EM for Learning Compositional Latent Variable Models
- GFlowOut: Dropout with Generative Flow Networks
- GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration
- Gibbsian Polar Slice Sampling
- Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models
- Global Context Vision Transformers
- Global Convergence of Sub-gradient Method for Robust Matrix Recovery: Small Initialization, Noisy Measurements, and Over-parameterization
- Global optimality for Euclidean CCCP under Riemannian convexity
- Global optimality of Elman-type RNNs in the mean-field regime
- Global Optimization with Parametric Function Approximation
- Global Selection of Contrastive Batches via Optimization on Sample Permutations
- GLOBE-CE: A Translation Based Approach for Global Counterfactual Explanations
- GNN&GBDT-Guided Fast Optimizing Framework for Large-scale Integer Programming
- GNOT: A General Neural Operator Transformer for Operator Learning
- GOAT: A Global Transformer on Large-scale Graphs
- Go Beyond Imagination: Maximizing Episodic Reachability with World Models
- Gradient-based Wang--Landau Algorithm: A Novel Sampler for Output Distribution of Neural Networks over the Input Space
- Gradient Descent Converges Linearly for Logistic Regression on Separable Data
- Gradient Descent Finds the Global Optima of Two-Layer Physics-Informed Neural Networks
- Gradient Descent in Neural Networks as Sequential Learning in Reproducing Kernel Banach Space
- Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond
- Gradient-Free Structured Pruning with Unlabeled Data
- GRAFENNE: Learning on Graphs with Heterogeneous and Dynamic Feature Sets
- GraphCleaner: Detecting Mislabelled Samples in Popular Graph Learning Benchmarks
- Graph Contrastive Backdoor Attacks
- Graph Generative Model for Benchmarking Graph Neural Networks
- Graphically Structured Diffusion Models
- Graph Inductive Biases in Transformers without Message Passing
- Graph Ladling: Shockingly Simple Parallel GNN Training without Intermediate Communication
- Graph Mixup with Soft Alignments
- Graph Neural Networks can Recover the Hidden Features Solely from the Graph Structure
- Graph Neural Networks with Learnable and Optimal Polynomial Bases
- Graph Neural Tangent Kernel: Convergence on Large Graphs
- Graph Positional Encoding via Random Feature Propagation
- Graph Reinforcement Learning for Network Control via Bi-Level Optimization
- Graph Switching Dynamical Systems
- GREAD: Graph Neural Reaction-Diffusion Networks
- Great Models Think Alike: Improving Model Reliability via Inter-Model Latent Agreement
- Grounding Language Models to Images for Multimodal Inputs and Outputs
- Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
- Group Equivariant Fourier Neural Operators for Partial Differential Equations
- GuardHFL: Privacy Guardian for Heterogeneous Federated Learning
- Guiding Pretraining in Reinforcement Learning with Large Language Models
- Half-Hop: A graph upsampling approach for slowing down message passing
- Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games
- Hardware-Aware Compression with Random Operation Access Specific Tile (ROAST) Hashing
- Harmonic Neural Networks
- HarsanyiNet: Computing Accurate Shapley Values in a Single Forward Propagation
- HETAL: Efficient Privacy-preserving Transfer Learning with Homomorphic Encryption
- Hidden Symmetries of ReLU Networks
- Hiding Data Helps: On the Benefits of Masking for Sparse Coding
- Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
- Hierarchical Diffusion for Offline Decision Making
- Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction
- Hierarchical Imitation Learning with Vector Quantized Models
- Hierarchical Neural Coding for Controllable CAD Model Generation
- Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs
- Hierarchies of Reward Machines
- High-dimensional Clustering onto Hamiltonian Cycle
- High-dimensional Location Estimation via Norm Concentration for Subgamma Vectors
- High Fidelity Image Counterfactuals with Probabilistic Causal Models
- High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance
- High Probability Convergence of Stochastic Gradient Methods
- HiLD: High-dimensional Learning Dynamics Workshop
- Hindsight Learning for MDPs with Exogenous Inputs
- H-Likelihood Approach to Deep Neural Networks with Temporal-Spatial Random Effects for High-Cardinality Categorical Features
- Homomorphism AutoEncoder --- Learning Group Structured Representations from Observed Transitions
- HOPE: High-order Graph ODE For Modeling Interacting Dynamics
- Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes
- Horizon-free Learning for Markov Decision Processes and Games: Stochastically Bounded Rewards and Improved Bounds
- How Bad is Top-$K$ Recommendation under Competing Content Creators?
- How Does Information Bottleneck Help Deep Learning?
- How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
- How Jellyfish Characterise Alternating Group Equivariant Neural Networks
- How Many Perturbations Break This Model? Evaluating Robustness Beyond Adversarial Accuracy
- How much does Initialization Affect Generalization?
- How Powerful are Shallow Neural Networks with Bandlimited Random Weights?
- How to address monotonicity for model risk management?
- How to DP-fy ML: A Practical Tutorial to Machine Learning with Differential Privacy
- How to Trust Your Diffusion Model: A Convex Optimization Approach to Conformal Risk Control
- Human-Timescale Adaptation in an Open-Ended Task Space
- Hybrid Energy Based Model in the Feature Space for Out-of-Distribution Detection
- Hyena Hierarchy: Towards Larger Convolutional Language Models
- Hyperbolic Diffusion Embedding and Distance for Hierarchical Representation Learning
- Hyperbolic Image-text Representations
- Hyperbolic Representation Learning: Revisiting and Advancing
- Hyperparameters in Reinforcement Learning and How To Tune Them
- HyperTuning: Toward Adapting Large Language Models without Back-propagation
- Hypervolume Knowledge Gradient: A Lookahead Approach for Multi-Objective Bayesian Optimization with Partial Information
- Hypothesis Transfer Learning with Surrogate Classification Losses: Generalization Bounds through Algorithmic Stability
- I$^2$SB: Image-to-Image Schrödinger Bridge
- ICML 2023 Workshop on Computational Biology
- Identifiability and Generalizability in Constrained Inverse Reinforcement Learning
- Identifiability of Label Noise Transition Matrix
- Identification of the Adversary from a Single Adversarial Example
- Identifying Interpretable Subspaces in Image Representations
- Identifying Useful Learnwares for Heterogeneous Label Spaces
- ILLUME: Rationalizing Vision-Language Models through Human Interactions
- Image generation with shortest path diffusion
- Image Restoration with Mean-Reverting Stochastic Differential Equations
- Image Shortcut Squeezing: Countering Perturbative Availability Poisons with Compression
- Implicit Graph Neural Networks: A Monotone Operator Viewpoint
- Implicit Jacobian regularization weighted with impurity of probability output
- Implicit Neural Spatial Representations for Time-dependent PDEs
- Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression
- Importance Weighted Expectation-Maximization for Protein Sequence Design
- Improved Active Multi-Task Representation Learning via Lasso
- Improved Algorithms for Multi-period Multi-class Packing Problems with Bandit Feedback
- Improved Algorithms for White-Box Adversarial Streams
- Improved Analysis of Score-based Generative Modeling: User-Friendly Bounds under Minimal Smoothness Assumptions
- Improved Learning-Augmented Algorithms for the Multi-Option Ski Rental Problem via Best-Possible Competitive Analysis
- Improved Online Conformal Prediction via Strongly Adaptive Online Learning
- Improved Online Learning Algorithms for CTR Prediction in Ad Auctions
- Improved Policy Evaluation for Randomized Trials of Algorithmic Resource Allocation
- Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation
- Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs
- Improving Adversarial Robustness by Putting More Regularizations on Less Robust Samples
- Improving Adversarial Robustness of Deep Equilibrium Models with Explicit Regulations Along the Neural Dynamics
- Improving Adversarial Robustness Through the Contrastive-Guided Diffusion Process
- Improving Bi-level Optimization Based Methods with Inspiration from Humans' Classroom Study Techniques
- Improving Expert Predictions with Conformal Prediction
- Improving Fair Training under Correlation Shifts
- Improving Graph Generation by Restricting Graph Bandwidth
- Improving Graph Neural Networks with Learnable Propagation Operators
- Improving Hyperparameter Learning under Approximate Inference in Gaussian Process Models
- Improving l1-Certified Robustness via Randomized Smoothing by Leveraging Box Constraints
- Improving Medical Predictions by Irregular Multimodal Electronic Health Records Modeling
- Improving Statistical Fidelity for Neural Image Compression with Implicit Local Likelihood Models
- Improving the Model Consistency of Decentralized Federated Learning
- Improving Visual Prompt Tuning for Self-supervised Vision Transformers
- IncDSI: Incrementally Updatable Document Retrieval
- Incentivizing Exploration with Linear Contexts and Combinatorial Actions
- Individually Fair Learning with One-Sided Feedback
- Inferring Relational Potentials in Interacting Systems
- Infinite Action Contextual Bandits with Reusable Data Exhaust
- Inflow, Outflow, and Reciprocity in Machine Learning
- InfoDiffusion: Representation Learning Using Information Maximizing Diffusion Models
- InfoOT: Information Maximizing Optimal Transport
- Information-Theoretic State Space Model for Multi-View Reinforcement Learning
- Infusing Lattice Symmetry Priors in Attention Mechanisms for Sample-Efficient Abstract Geometric Reasoning
- InGram: Inductive Knowledge Graph Embedding via Relation Graphs
- In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation
- Input Perturbation Reduces Exposure Bias in Diffusion Models
- Input uncertainty propagation through trained neural networks
- In Search for a Generalizable Method for Source Free Domain Adaptation
- In Search of Insights, Not Magic Bullets: Towards Demystification of the Model Selection Dilemma in Heterogeneous Treatment Effect Estimation
- Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models
- Instrumental Variable Estimation of Average Partial Causal Effects
- Integrating Prior Knowledge in Contrastive Learning with Kernel
- Interactive Learning with Implicit Human Feedback
- Interactive Object Placement with Reinforcement Learning
- Internally Rewarded Reinforcement Learning
- Internet Explorer: Targeted Representation Learning on the Open Web
- Interpolation for Robust Learning: Data Augmentation on Wasserstein Geodesics
- Interpretable Neural-Symbolic Concept Reasoning
- Interval Bound Interpolation for Few-shot Learning with Few Tasks
- Interventional Causal Representation Learning
- Intrinsic Sliced Wasserstein Distances for Comparing Collections of Probability Distributions on Manifolds and Graphs
- Invariance in Policy Optimisation and Partial Identifiability in Reward Learning
- Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames
- Inverse Reinforcement Learning without Reinforcement Learning
- Investigating the Role of Model-Based Learning in Exploration and Transfer
- IRNeXt: Rethinking Convolutional Network Design for Image Restoration
- Is Consensus Acceleration Possible in Decentralized Optimization over Slowly Time-Varying Networks?
- Is Learning Summary Statistics Necessary for Likelihood-free Inference?
- Is Overfitting Necessary for Implicit Video Representation?
- Iterative Approximate Cross-Validation
- JAWS-X: Addressing Efficiency Bottlenecks of Conformal Prediction Under Standard and Feedback Covariate Shift
- Jump-Start Reinforcement Learning
- KDEformer: Accelerating Transformers via Kernel Density Estimation
- Kernel Logistic Regression Approximation of an Understandable ReLU Neural Network
- Kernel QuantTree
- Kernel Sufficient Dimension Reduction and Variable Selection for Compositional Data via Amalgamation
- Knowledge and Logical Reasoning in the Era of Data-driven Learning
- Knowledge Hypergraph Embedding Meets Relational Algebra
- K-SHAP: Policy Clustering Algorithm for Anonymous Multi-Agent State-Action Pairs
- Label differential privacy and private training data release
- Label Distributionally Robust Losses for Multi-class Classification: Consistency, Robustness and Adaptivity
- Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning
- Language Instructed Reinforcement Learning for Human-AI Coordination
- Large Language Models Can Be Easily Distracted by Irrelevant Context
- Large Language Models Struggle to Learn Long-Tail Knowledge
- Last Switch Dependent Bandits with Monotone Payoff Functions
- Latent Traversals in Generative Models as Potential Flows
- Layered State Discovery for Incremental Autonomous Exploration
- Lazy Agents: A New Perspective on Solving Sparse Reward Problem in Multi-agent Reinforcement Learning
- LazyGNN: Large-Scale Graph Neural Networks via Lazy Propagation
- LeadFL: Client Self-Defense against Model Poisoning in Federated Learning
- Learnability and Algorithm for Continual Learning
- Learning Affinity with Hyperbolic Representation for Spatial Propagation
- Learning Antidote Data to Individual Unfairness
- Learning-augmented private algorithms for multiple quantile release
- Learning Belief Representations for Partially Observable Deep RL
- Learning Compiler Pass Orders using Coreset and Normalized Value Prediction
- Learning Control by Iterative Inversion
- Learning Controllable Degradation for Real-World Super-Resolution via Constrained Flows
- Learning Control-Oriented Dynamical Structure from Data
- Learning Deductive Reasoning from Synthetic Corpus based on Formal Logic
- Learning Deep Time-index Models for Time Series Forecasting
- Learning Dense Correspondences between Photos and Sketches
- Learning Distributions over Quantum Measurement Outcomes
- Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation
- Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural Networks
- Learning for Edge-Weighted Online Bipartite Matching with Robustness Guarantees
- Learning Functional Distributions with Private Labels
- Learning GFlowNets From Partial Episodes For Improved Convergence And Stability
- Learning Globally Smooth Functions on Manifolds
- Learning Hidden Markov Models When the Locations of Missing Observations are Unknown
- Learning in POMDPs is Sample-Efficient with Hindsight Observability
- Learning Instance-Specific Augmentations by Capturing Local Invariances
- Learning Intuitive Policies Using Action Features
- Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation
- Learning Mixtures of Gaussians with Censored Data
- Learning Mixtures of Markov Chains and MDPs
- Learning Neural Constitutive Laws from Motion Observations for Generalizable PDE Dynamics
- Learning Neural PDE Solvers with Parameter-Guided Channel Attention
- Learning Noisy OR Bayesian Networks with Max-Product Belief Propagation
- Learning Optimal Group-structured Individualized Treatment Rules with Many Treatments
- Learning Perturbations to Explain Time Series Predictions
- Learning Physical Models that Can Respect Conservation Laws
- Learning Preconditioners for Conjugate Gradient PDE Solvers
- Learning Prescriptive ReLU Networks
- Learning-Rate-Free Learning by D-Adaptation
- Learning Rate Schedules in the Presence of Distribution Shift
- Learning Regions of Interest for Bayesian Optimization with Adaptive Level-Set Estimation
- Learning Representations without Compositional Assumptions
- Learning Signed Distance Functions from Noisy 3D Point Clouds via Noise to Noise Mapping
- Learning Subpocket Prototypes for Generalizable Structure-based Drug Design
- Learning Temporally AbstractWorld Models without Online Experimentation
- Learning the Dynamics of Sparsely Observed Interacting Systems
- Learning the Right Layers a Data-Driven Layer-Aggregation Strategy for Semi-Supervised Learning on Multilayer Graphs
- Learning to acquire novel cognitive tasks with evolution, plasticity and meta-meta-learning
- Learning to Bid in Repeated First-Price Auctions with Budgets
- Learning to Boost Training by Periodic Nowcasting Near Future Weights
- Learning to Decouple Complex Systems
- Learning to Design Analog Circuits to Meet Threshold Specifications
- Learning to Incentivize Information Acquisition: Proper Scoring Rules Meet Principal-Agent Model
- Learning to Initiate and Reason in Event-Driven Cascading Processes
- Learning to Jump: Thinning and Thickening Latent Counts for Generative Modeling
- Learning to Learn from APIs: Black-Box Data-Free Meta-Learning
- Learning to Maximize Mutual Information for Dynamic Feature Selection
- Learning to Optimize Differentiable Games
- Learning to Suggest Breaks: Sustainable Optimization of Long-Term User Engagement
- Learning Unforeseen Robustness from Out-of-distribution Data Using Equivariant Domain Translator
- Learning Unnormalized Statistical Models via Compositional Optimization
- Learning useful representations for shifting tasks and distributions
- Learn to Accumulate Evidence from All Training Samples: Theory and Practice
- LegendreTron: Uprising Proper Multiclass Loss Learning
- Less is More: Task-aware Layer-wise Distillation for Language Model Compression
- LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework
- LESS-VFL: Communication-Efficient Feature Selection for Vertical Federated Learning
- Let's Make Block Coordinate Descent Converge Faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence
- Leveraging Demonstrations to Improve Online Learning: Quality Matters
- Leveraging Label Non-Uniformity for Node Classification in Graph Neural Networks
- Leveraging Offline Data in Online Reinforcement Learning
- Leveraging Proxy of Training Data for Test-Time Adaptation
- LEVER: Learning to Verify Language-to-Code Generation with Execution
- Lifelong Language Pretraining with Distribution-Specialized Experts
- Likelihood Adjusted Semidefinite Programs for Clustering Heterogeneous Data
- Linear Causal Disentanglement via Interventions
- Linear CNNs Discover the Statistical Structure of the Dataset Using Only the Most Dominant Frequencies
- Linearly Constrained Bilevel Optimization: A Smoothed Implicit Gradient Approach
- Linear optimal partial transport embedding
- Linear Time GPs for Inferring Latent Trajectories from Neural Spike Trains
- Linkless Link Prediction via Relational Distillation
- LinSATNet: The Positive Linear Satisfiability Neural Networks
- LipsNet: A Smooth and Robust Neural Network with Adaptive Lipschitz Constant for High Accuracy Optimal Control
- Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy
- LIV: Language-Image Representations and Rewards for Robotic Control
- Localized Learning: Decentralized Model Updates via Non-Global Objectives
- Locally Regularized Neural Differential Equations: Some Black Boxes were meant to remain closed!
- Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning
- Local Vertex Colouring Graph Neural Networks
- LongCoder: A Long-Range Pre-trained Language Model for Code Completion
- Long Horizon Temperature Scaling
- Long-Tailed Recognition by Mutual Information Maximization between Latent Features and Ground-Truth Labels
- Long-Term Rhythmic Video Soundtracker
- Lookahead When It Matters: Adaptive Non-causal Transformers for Streaming Neural Transducers
- LookupFFN: Making Transformers Compute-lite for CPU inference
- Looped Transformers as Programmable Computers
- LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation
- Loss Balancing for Fair Supervised Learning
- Loss-Guided Diffusion Models for Plug-and-Play Controllable Generation
- Lottery Tickets in Evolutionary Optimization: On Sparse Backpropagation-Free Trainability
- Low Complexity Homeomorphic Projection to Ensure Neural-Network Solution Feasibility for Optimization over (Non-)Convex Set
- Lower Bounds for Learning in Revealing POMDPs
- Lowering the Pre-training Tax for Gradient-based Subset Training: A Lightweight Distributed Pre-Training Toolkit
- Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling
- Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single
- LSDS++ : Dual Sampling for Accelerated k-means++
- MABe22: A Multi-Species Multi-Task Benchmark for Learned Representations of Behavior
- Machine Learning Force Fields with Data Cost Aware Training
- Machine Learning for Multimodal Healthcare Data
- Machine Learning with Social Purpose
- MAGANet: Achieving Combinatorial Generalization by Modeling a Group Action
- Magneto: A Foundation Transformer
- MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations
- Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
- MALTS: Matching After Learning to Stretch
- MANSA: Learning Fast and Slow in Multi-Agent Systems
- Marginalization is not Marginal: No Bad VAE Local Minima when Learning Optimal Sparse Representations
- Margin-based Neural Network Watermarking
- Margin-based sampling in high dimensions: When being active is less efficient than staying passive
- Markovian Gaussian Process Variational Autoencoders
- Masked Bayesian Neural Networks : Theoretical Guarantee and its Posterior Inference
- Masked Trajectory Models for Prediction, Representation, and Control
- Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning
- Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels
- Matrix Estimation for Individual Fairness
- Maximal Initial Learning Rates in Deep ReLU Networks
- Maximum Optimality Margin: A Unified Approach for Contextual Linear Programming and Inverse Linear Programming
- Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks
- Measuring the Impact of Programming Language Distribution
- Mechanistic Mode Connectivity
- Memory-Based Dual Gaussian Processes for Sequential Learning
- Memory-Based Meta-Learning on Non-Stationary Distributions
- Men Also Do Laundry: Multi-Attribute Bias Amplification
- MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL
- Metagenomic Binning using Connectivity-constrained Variational Autoencoders
- Meta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural Networks
- Meta-learning Parameterized Skills
- Meta-Learning the Inductive Bias of Simple Neural Circuits
- MetaModulation: Learning Variational Feature Hierarchies for Few-Shot Learning with Fewer Tasks
- Meta Optimal Transport
- Meta-SAGE: Scale Meta-Learning Scheduled Adaptation with Guided Exploration for Mitigating Scale Shift on Combinatorial Optimization
- MetricGAN-OKD: Multi-Metric Optimization of MetricGAN via Online Knowledge Distillation for Speech Enhancement
- MEWL: Few-shot multimodal word learning with referential uncertainty
- MG-GNN: Multigrid Graph Neural Networks for Learning Multilevel Domain Decomposition Methods
- Mimetic Initialization of Self-Attention Layers
- Minimalistic Predictions to Schedule Jobs with Online Precedence Constraints
- Minimal Width for Universal Property of Deep RNN
- Minimax estimation of discontinuous optimal transport maps: The semi-discrete case
- Minimizing Trajectory Curvature of ODE-based Generative Models
- Minimum Width of Leaky-ReLU Neural Networks for Uniform Universal Approximation
- Mirror Sinkhorn: Fast Online Optimization on Transport Polytopes
- Mitigating Memorization of Noisy Labels by Clipping the Model Prediction
- Mitigating Propagation Failures in Physics-informed Neural Networks using Retain-Resample-Release (R3) Sampling
- Mitigating Spurious Correlations in Multi-modal Models during Fine-tuning
- Mitigating the Effects of Non-Identifiability on Inference for Bayesian Neural Networks with Latent Variables
- MixFlows: principled variational inference via mixed flows
- Mixing Predictions for Online Metric Algorithms
- Mixture Proportion Estimation Beyond Irreducibility
- Moccasin: Efficient Tensor Rematerialization for Neural Networks
- Modality-Agnostic Variational Compression of Implicit Neural Representations
- Model-agnostic Measure of Generalization Difficulty
- Model-Aware Contrastive Learning: Towards Escaping the Dilemmas
- Model-based Offline Reinforcement Learning with Count-based Conservatism
- Model-based Reinforcement Learning with Scalable Composite Policy Gradient Estimators
- Model-Bellman Inconsistency for Model-based Offline Reinforcement Learning
- ModelDiff: A Framework for Comparing Learning Algorithms
- Model-Free Robust Average-Reward Reinforcement Learning
- Modeling Dynamic Environments with Scene Graph Memory
- Modeling Temporal Data as Continuous Functions with Stochastic Process Diffusion
- MODeL: Memory Optimizations for Deep Learning
- Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization
- Model Transferability with Responsive Decision Subjects
- Moderately Distributional Exploration for Domain Generalization
- MolDiff: Addressing the Atom-Bond Inconsistency Problem in 3D Molecule Diffusion Generation
- Momentum Ensures Convergence of SIGNSGD under Weaker Assumptions
- Monge, Bregman and Occam: Interpretable Optimal Transport in High-Dimensions with Feature-Sparse Maps
- MonoFlow: Rethinking Divergence GANs via the Perspective of Wasserstein Gradient Flows
- MonoNeRF: Learning Generalizable NeRFs from Monocular Videos without Camera Poses
- Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes
- Monotonic Location Attention for Length Generalization
- Motion Question Answering via Modular Motion Programs
- mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
- Mu$^2$SLAM: Multitask, Multilingual Speech and Language Models
- MultiAdam: Parameter-wise Scale-invariant Optimizer for Multiscale Training of Physics-informed Neural Networks
- Multi-Agent Best Arm Identification with Private Communications
- Multi-Agent Learning from Learners
- Multi-Agent Online Optimization with Delays: Asynchronicity, Adaptivity, and Optimism
- Multi-agent Online Scheduling: MMS Allocations for Indivisible Items
- Multicalibration as Boosting for Regression
- Multi-channel Autobidding with Budget and ROI Constraints
- Multi-class Graph Clustering via Approximated Effective $p$-Resistance
- MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation
- Multi-Environment Pretraining Enables Transfer to Action Limited Datasets
- Multi-Epoch Matrix Factorization Mechanisms for Private Machine Learning
- Multi-Fidelity Covariance Estimation in the Log-Euclidean Geometry
- Multi-Layer Neural Networks as Trainable Ladders of Hilbert Spaces
- Multi-Modal Classifiers for Open-Vocabulary Object Detection
- Multi-Objective GFlowNets
- Multi-Objective Population Based Training
- Multiple Thinking Achieving Meta-Ability Decoupling for Object Navigation
- Multiplier Bootstrap-based Exploration
- Multiply Robust Off-policy Evaluation and Learning under Truncation by Death
- MultiRobustBench: Benchmarking Robustness Against Multiple Attacks
- Multisample Flow Matching: Straightening Flows with Minibatch Couplings
- Multi-Symmetry Ensembles: Improving Diversity and Generalization via Opposing Symmetries
- Multi-Task Differential Privacy Under Distribution Skew
- Multi-task Hierarchical Adversarial Inverse Reinforcement Learning
- Multi-Task Off-Policy Learning from Bandit Feedback
- Multi-task Representation Learning for Pure Exploration in Linear Bandits
- Multi-Task Structural Learning using Local Task Similarity induced Neuron Creation and Removal
- Multi-User Reinforcement Learning with Low Rank Rewards
- Multi-View Masked World Models for Visual Robotic Manipulation
- Muse: Text-To-Image Generation via Masked Generative Transformers
- MyoDex: A Generalizable Prior for Dexterous Manipulation
- N$\text{A}^\text{2}$Q: Neural Attention Additive Model for Interpretable Multi-Agent Q-Learning
- Naive imputation implicitly regularizes high-dimensional linear models
- Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA
- Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path
- Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes
- Nearly Optimal Algorithms with Sublinear Computational Complexity for Online Kernel Regression
- Nearly Optimal Competitive Ratio for Online Allocation Problems with Two-sided Resource Constraints and Finite Requests
- Nearly-Optimal Hierarchical Clustering for Well-Clustered Graphs
- Nearly-tight Bounds for Deep Kernel Learning
- Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR
- Near-Optimal $\Phi$-Regret Learning in Extensive-Form Games
- Near-Optimal Algorithms for Private Online Optimization in the Realizable Regime
- Near-optimal Conservative Exploration in Reinforcement Learning under Episode-wise Constraints
- Near-Optimal Cryptographic Hardness of Agnostically Learning Halfspaces and ReLU Regression under Gaussian Marginals
- Near-Optimal Quantum Coreset Construction Algorithms for Clustering
- NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion
- NeRFool: Uncovering the Vulnerability of Generalizable Neural Radiance Fields against Adversarial Perturbations
- Nested Elimination: A Simple Algorithm for Best-Item Identification From Choice-Based Feedback
- Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization
- Network Effects in Performative Prediction Games
- Neural Algorithmic Reasoning with Causal Regularisation
- Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data
- Neural Compression: From Information Theory to Applications
- Neural Continuous-Discrete State Space Models for Irregularly-Sampled Time Series
- Neural Conversational AI Workshop - What’s left to TEACH (Trustworthy, Enhanced, Adaptable, Capable and Human-centric) chatbots?
- Neural Diffusion Processes
- Neural FIM for learning Fisher information metrics from point cloud data
- Neural Inverse Operators for Solving PDE Inverse Problems
- Neural Latent Aligner: Cross-trial Alignment for Learning Representations of Complex, Naturalistic Neural Data
- Neural Markov Jump Processes
- Neural Network Accelerated Implicit Filtering: Integrating Neural Network Surrogates With Provably Convergent Derivative Free Optimization Methods
- Neural Network Approximations of PDEs Beyond Linearity: A Representational Perspective
- Neural networks trained with SGD learn distributions of increasing complexity
- Neural Prediction Errors enable Analogical Visual Reasoning in Human Standard Intelligence Tests
- Neural signature kernels as infinite-width-depth-limits of controlled ResNets
- NeuralSlice: Neural 3D Triangle Mesh Reconstruction via Slicing 4D Tetrahedral Meshes
- NeuralStagger: Accelerating Physics-constrained Neural PDE Solver with Spatial-temporal Decomposition
- Neural Status Registers
- Neural Stochastic Differential Games for Time-series Analysis
- Neural Wasserstein Gradient Flows for Discrepancies with Riesz Kernels
- Neural Wave Machines: Learning Spatiotemporally Structured Representations with Locally Coupled Oscillatory Recurrent Neural Networks
- Neuro-Symbolic Continual Learning: Knowledge, Reasoning Shortcuts and Concept Rehearsal
- Never mind the metrics---what about the uncertainty? Visualising binary confusion matrix metric distributions to put performance in perspective
- New Frontiers in Learning, Control, and Dynamical Systems
- New metrics and search algorithms for weighted causal DAGs
- NNSplitter: An Active Defense Solution for DNN Model via Automated Weight Obfuscation
- Node Embedding from Neural Hamiltonian Orbits in Graph Neural Networks
- Non-asymptotic Properties of Individualized Treatment Rules from Sequentially Rule-Adaptive Trials
- Non-autoregressive Conditional Diffusion Models for Time Series Prediction
- Nonlinear Advantage: Trained Networks Might Not Be As Complex as You Think
- Nonlinear Causal Discovery with Latent Confounders
- Nonparametric Density Estimation under Distribution Drift
- Nonparametric Extensions of Randomized Response for Private Confidence Sets
- Nonparametric Generative Modeling with Conditional Sliced-Wasserstein Flows
- Nonparametric Iterative Machine Teaching
- Non-stationary Reinforcement Learning under General Function Approximation
- No One Idles: Efficient Heterogeneous Federated Learning with Parallel Edge and Server Computation
- Normalizing Flows for Interventional Density Estimation
- Not All Semantics are Created Equal: Contrastive Self-supervised Learning with Automatic Temperature Individualization
- Not all Strongly Rayleigh Distributions Have Small Probabilistic Generating Circuits
- NP-SemiSeg: When Neural Processes meet Semi-Supervised Semantic Segmentation
- NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning
- Nugget: Neural Agglomerative Embeddings of Text
- NUNO: A General Framework for Learning Parametric PDEs with Non-Uniform Data
- OCD: Learning to Overfit with Conditional Diffusion Models
- ODS: Test-Time Adaptation in the Presence of Open-World Data Shift
- Offline Learning in Markov Games with General Function Approximation
- Offline Meta Reinforcement Learning with In-Distribution Online Adaptation
- Offline Reinforcement Learning with Closed-Form Policy Improvement Operators
- Off-Policy Average Reward Actor-Critic with Deterministic Policy Search
- Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling
- Omnipredictors for Constrained Optimization
- OMS-DPM: Optimizing the Model Schedule for Diffusion Probabilistic Models
- On Balancing Bias and Variance in Unsupervised Multi-Source-Free Domain Adaptation
- On Bridging the Gap between Mean Field and Finite Width Deep Random Multilayer Perceptron with Batch Normalization
- On Computing Optimal Tree Ensembles
- On Coresets for Clustering in Small Dimensional Euclidean spaces
- On Data Manifolds Entailed by Structural Causal Models
- On Distribution Dependent Sub-Logarithmic Query Time of Learned Indexing
- On Enhancing Expressive Power via Compositions of Single Fixed-Size ReLU Network
- One-Shot Compression of Large Edge-Exchangeable Graphs using Bits-Back Coding
- One-Shot Federated Conformal Prediction
- One-shot Imitation in a Non-Stationary Environment via Multi-Modal Skill
- One-sided Matrix Completion from Two Observations Per Row
- One-Step Estimator for Permuted Sparse Recovery
- One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
- One-vs-the-Rest Loss to Focus on Important Samples in Adversarial Training
- On Excess Mass Behavior in Gaussian Mixture Models with Orlicz-Wasserstein Distances
- On Generalizations of Some Distance Based Classifiers for HDLSS Data
- On Heterogeneous Treatment Effects in Heterogeneous Causal Graphs
- On Investigating the Conservative Property of Score-Based Generative Models
- On Kinetic Optimal Probability Paths for Generative Models
- Online Learning in Stackelberg Games with an Omniscient Follower
- Online Learning with Feedback Graphs: The True Shape of Regret
- Online Local Differential Private Quantile Inference via Self-normalization
- Online Mechanism Design for Information Acquisition
- Online Nonstochastic Control with Adversarial and Static Constraints
- Online Platt Scaling with Calibeating
- Online Prototype Alignment for Few-shot Policy Transfer
- Online Restless Bandits with Unobserved States
- On Many-Actions Policy Gradient
- On Over-Squashing in Message Passing Neural Networks: The Impact of Width, Depth, and Topology
- On Penalty-based Bilevel Gradient Descent Method
- On Pitfalls of Test-Time Adaptation
- On Preemption and Learning in Stochastic Scheduling
- On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline
- On Provable Copyright Protection for Generative Models
- On Regularization and Inference with Label Constraints
- On Sampling with Approximate Transport Maps
- On Second-Order Scoring Rules for Epistemic Uncertainty Quantification
- On Strengthening and Defending Graph Reconstruction Attack with Markov Chain Approximation
- On the Complexity of Bayesian Generalization
- On the Connection Between MPNN and Graph Transformer
- On the Convergence of Federated Averaging with Cyclic Client Participation
- On the Convergence of Gradient Flow on Multi-layer Linear Models
- On the Convergence of SARSA with Linear Function Approximation
- On the convergence of the MLE as an estimator of the learning rate in the Exp3 algorithm
- On the Convergence Rate of Gaussianization with Random Rotations
- On the Convergence Rates of Policy Gradient Methods
- On the Correctness of Automatic Differentiation for Neural Networks with Machine-Representable Parameters
- On the Effectiveness of Offline RL for Dialogue Response Generation
- On the Estimation of Gaussian Mixture Copula Models
- On the Expressive Power of Geometric Graph Neural Networks
- On the Forward Invariance of Neural ODEs
- On the Functional Similarity of Robust and Non-Robust Neural Representations
- On the Generalization of Multi-modal Contrastive Learning
- On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization
- On the Global Convergence of Risk-Averse Policy Gradient Methods with Expected Conditional Risk Measures
- On the Identifiability and Estimation of Causal Location-Scale Noise Models
- On the Impact of Algorithmic Recourse on Social Segregation
- On the Impact of Knowledge Distillation for Model Interpretability
- On the Importance of Feature Decorrelation for Unsupervised Representation Learning in Reinforcement Learning
- On the Initialization of Graph Neural Networks
- On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits
- On the Occupancy Measure of Non-Markovian Policies in Continuous MDPs
- On the Optimality of Misspecified Kernel Ridge Regression
- On the Power of Foundation Models
- On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness
- On the Privacy-Robustness-Utility Trilemma in Distributed Learning
- On the Relationship Between Explanation and Prediction: A Causal View
- On the Robustness of Randomized Ensembles to Adversarial Perturbations
- On the Robustness of Text Vectorizers
- On the Role of Attention in Prompt-tuning
- On the Statistical Benefits of Temporal Difference Learning
- On the Stepwise Nature of Self-Supervised Learning
- On the Training Instability of Shuffling SGD with Batch Normalization
- On the Within-Group Fairness of Screening Classifiers
- On Uni-Modal Feature Learning in Supervised Multi-Modal Learning
- On User-Level Private Convex Optimization
- OpenFE: Automated Feature Generation with Expert-level Performance
- Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization
- Open-Vocabulary Universal Image Segmentation with MaskCLIP
- Opponent-Limited Online Search for Imperfect Information Games
- Optimal Arms Identification with Knapsacks
- Optimal Convergence Rates for Agnostic Nyström Kernel Learning
- Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning
- Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs
- Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits
- Optimal LP Rounding and Linear-Time Approximation Algorithms for Clustering Edge-Colored Hypergraphs
- Optimally-weighted Estimators of the Maximum Mean Discrepancy for Likelihood-Free Inference
- Optimal No-Regret Learning for One-Sided Lipschitz Functions
- Optimal Online Generalized Linear Regression with Stochastic Noise and Its Application to Heteroscedastic Bandits
- Optimal randomized multilevel Monte Carlo for repeatedly nested expectations
- Optimal Rates and Efficient Algorithms for Online Bayesian Persuasion
- Optimal Sets and Solution Paths of ReLU Networks
- Optimal Shrinkage for Distributed Second-Order Optimization
- Optimal Stochastic Non-smooth Non-convex Optimization through Online-to-Non-convex Conversion
- Optimal Transport in Learning, Control, and Dynamical Systems
- Optimistic Online Mirror Descent for Bridging Stochastic and Adversarial Online Convex Optimization
- Optimistic Planning by Regularized Dynamic Programming
- Optimization for Amortized Inverse Problems
- Optimizing DDPM Sampling with Shortcut Fine-Tuning
- Optimizing Hyperparameters with Conformal Quantile Regression
- Optimizing Mode Connectivity for Class Incremental Learning
- Optimizing NOTEARS Objectives via Topological Swaps
- Optimizing the Collaboration Structure in Cross-Silo Federated Learning
- Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning
- Orthogonality-Enforced Latent Space in Autoencoders: An Approach to Learning Disentangled Representations
- Oscillation-free Quantization for Low-bit Vision Transformers
- Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation
- Out-of-Distribution Generalization of Federated Learning via Implicit Invariant Relationships
- Out-of-Domain Robustness via Targeted Augmentations
- Overcoming Simplicity Bias in Deep Networks using a Feature Sieve
- Over-parametrization via Lifting for Low-rank Matrix Sensing: Conversion of Spurious Solutions to Strict Saddle Points
- PAC-Bayesian Generalization Bounds for Adversarial Generative Models
- PAC-Bayesian Offline Contextual Bandits With Guarantees
- PAC-Bayes Meets Interactive Learning
- PAC Generalization via Invariant Representations
- PAC Prediction Sets for Large Language Models of Code
- Paging with Succinct Predictions
- Pairwise Ranking Losses of Click-Through Rates Prediction for Welfare Maximization in Ad Auctions
- PaLM-E: An Embodied Multimodal Language Model
- PAL: Program-aided Language Models
- Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation
- Parallel Neurosymbolic Integration with Concordia
- Parallel Online Clustering of Bandits via Hedonic Game
- Parameter-Level Soft-Masking for Continual Learning
- Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models
- Pareto Regret Analyses in Multi-objective Multi-armed Bandit
- Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information Sharing
- Partial Optimality in Cubic Correlation Clustering
- PASTA: Pessimistic Assortment Optimization
- Patch-level Contrastive Learning via Positional Query for Visual Pre-training
- Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks
- Path Neural Networks: Expressive and Accurate Graph Neural Networks
- PCA-based Multi-Task Learning: a Random Matrix Approach
- Performative Recommendation: Diversifying Content via Strategic Incentives
- Performative Reinforcement Learning
- Personalized Federated Learning under Mixture of Distributions
- Personalized Federated Learning with Inferred Collaboration Graphs
- Personalized Subgraph Federated Learning
- Perturbation Analysis of Neural Collapse
- PFGM++: Unlocking the Potential of Physics-Inspired Generative Models
- PFNs4BO: In-Context Learning for Bayesian Optimization
- Phase-aware Adversarial Defense for Improving Adversarial Robustness
- Phase Transitions in the Detection of Correlated Databases
- PINA: Leveraging Side Information in eXtreme Multi-label Classification via Predicted Instance Neighborhood Aggregation
- Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
- PixelAsParam: A Gradient View on Diffusion Sampling with Guidance
- PLay: Parametrically Conditioned Layout Generation using Latent Diffusion
- Poisoning Generative Replay in Continual Learning to Promote Forgetting
- Poisoning Language Models During Instruction Tuning
- Polarity Is All You Need to Learn and Transfer Faster
- Policy Contrastive Imitation Learning
- Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach
- Policy Gradient in Robust MDPs with Global Convergence Guarantee
- Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games
- Policy Regularization with Dataset Constraint for Offline Reinforcement Learning
- Polyhedral Complex Extraction from ReLU Networks using Edge Subdivision
- Polynomial Preconditioning for Gradient Methods
- Polynomial Time and Private Learning of Unbounded Gaussian Mixture Models
- Posterior Sampling for Deep Reinforcement Learning
- POUF: Prompt-Oriented Unsupervised Fine-tuning for Large Pre-trained Models
- PPG Reloaded: An Empirical Study on What Matters in Phasic Policy Gradient
- Practical and Matching Gradient Variance Bounds for Black-Box Variational Bayesian Inference
- Pre-computed memory or on-the-fly encoding? A hybrid approach to retrieval augmentation makes the most of your compute
- Predictable MDP Abstraction for Unsupervised Model-Based RL
- Predicting Ordinary Differential Equations with Transformers
- Predicting Rare Events by Shrinking Towards Proportional Odds
- Predictive Flows for Faster Ford-Fulkerson
- Prefer to Classify: Improving Text Classifiers via Auxiliary Preference Learning
- PreNAS: Preferred One-Shot Learning Towards Efficient Neural Architecture Search
- Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems
- Pre-training for Speech Translation: CTC Meets Optimal Transport
- Pretraining Language Models with Human Preferences
- Pricing Experimental Design: Causal Effect, Expected Revenue and Tail Risk
- Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems
- Principled Acceleration of Iterative Numerical Methods Using Machine Learning
- Principled Offline RL in the Presence of Rich Exogenous Information
- Principled Reinforcement Learning with Human Feedback from Pairwise or K-wise Comparisons
- Privacy-Aware Compression for Federated Learning Through Numerical Mechanism Design
- Private Federated Learning with Autotuned Compression
- Private Statistical Estimation of Many Quantiles
- Probabilistic Attention-to-Influence Neural Models for Event Sequences
- Probabilistic Categorical Adversarial Attack and Adversarial Training
- Probabilistic Concept Bottleneck Models
- Probabilistic Contrastive Learning Recovers the Correct Aleatoric Uncertainty of Ambiguous Inputs
- Probabilistic Imputation for Time-series Classification with Missing Data
- Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood Estimation for Latent Gaussian Models
- Probably Anytime-Safe Stochastic Combinatorial Semi-Bandits
- Progressive Purification for Instance-Dependent Partial Label Learning
- Project and Forget: Solving Large-Scale Metric Constrained Problems
- Projected Tensor Power Method for Hypergraph Community Recovery
- Prometheus: Taming Sample and Communication Complexities in Constrained Decentralized Stochastic Bilevel Learning
- PromptBoosting: Black-Box Text Classification with Ten Forward Passes
- Prompting Large Language Model for Machine Translation: A Case Study
- Propensity Matters: Measuring and Enhancing Balancing for Recommendation
- Proper Losses for Discrete Generative Models
- Proper Scoring Rules for Survival Analysis
- Properties of the Mallows Model Depending on the Number of Alternatives: A Warning for an Experimentalist
- Protecting Language Generation Models via Invisible Watermarking
- Prototype-oriented unsupervised anomaly detection for multivariate time series
- Prototype-Sample Relation Distillation: Towards Replay-Free Continual Learning
- ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts
- Provable Benefit of Mixup for Finding Optimal Decision Boundaries
- Provable Data Subset Selection For Efficient Neural Networks Training
- Provable Dynamic Fusion for Low-Quality Multimodal Data
- Provable Multi-instance Deep AUC Maximization with Stochastic Pooling
- Provable Reset-free Reinforcement Learning by No-Regret Reduction
- Provably and Practically Efficient Neural Contextual Bandits
- Provably Convergent Schrödinger Bridge with Applications to Probabilistic Time Series Imputation
- Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources
- Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP
- Provably Invariant Learning without Domain Information
- Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup
- Provably Learning Object-Centric Representations
- Proximal Causal Learning of Conditional Average Treatment Effects
- Proxy objectives in reinforcement learning from human feedback
- Pruning via Sparsity-indexed ODE: a Continuous Sparsity Viewpoint
- PWSHAP: A Path-Wise Explanation Model for Targeted Variables
- Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
- QASA: Advanced Question Answering on Scientific Articles
- QAS-Bench: Rethinking Quantum Architecture Search and A Benchmark
- Q-Flow: Generative Modeling for Differential Equations of Open Quantum Dynamics with Normalizing Flows
- Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL
- Quantifying Human Priors over Social and Navigation Networks
- Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs
- Quantifying the Variability Collapse of Neural Networks
- Quantile Credit Assignment
- Quantitative Universal Approximation Bounds for Deep Belief Networks
- Quantized Distributed Training of Large Models with Convergence Guarantees
- Quantum 3D Graph Learning with Applications to Molecule Embedding
- QuantumDARTS: Differentiable Quantum Architecture Search for Variational Quantum Algorithms
- Quantum Lower Bounds for Finding Stationary Points of Nonconvex Functions
- Quantum Policy Gradient Algorithm with Optimized Action Decoding
- Quantum Ridgelet Transform: Winning Lottery Ticket of Neural Networks with Quantum Computation
- Quantum Speedups for Zero-Sum Games via Improved Dynamic Gibbs Sampling
- RACE: Improve Multi-Agent Reinforcement Learning with Representation Asymmetry and Collaborative Evolution
- Raising the Cost of Malicious AI-Powered Image Editing
- Random Classification Noise does not defeat All Convex Potential Boosters Irrespective of Model Choice
- Random Grid Neural Processes for Parametric Partial Differential Equations
- Randomized Gaussian Process Upper Confidence Bound with Tighter Bayesian Regret Bounds
- Randomized Schur Complement Views for Graph Contrastive Learning
- Random Matrix Analysis to Balance between Supervised and Unsupervised Learning under the Low Density Separation Assumption
- Random Shuffle Transformer for Image Restoration
- Random Teachers are Good Teachers
- RankMe: Assessing the Downstream Performance of Pretrained Self-Supervised Representations by Their Rank
- Reachability-Aware Laplacian Representation in Reinforcement Learning
- Reasons for the Superiority of Stochastic Estimators over Deterministic Ones: Robustness, Consistency and Perceptual Quality
- Recasting Self-Attention with Holographic Reduced Representations
- Recent Advances in the Generalization Theory of Neural Networks *
- Reconstructive Neuron Pruning for Backdoor Defense
- Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing
- Recovery Bounds on Class-Based Optimal Transport: A Sum-of-Norms Regularization Framework
- ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval
- Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC
- Reducing SO(3) Convolutions to SO(2) for Efficient Equivariant GNNs
- Refined Regret for Adversarial MDPs with Linear Function Approximation
- Refining Generative Process with Discriminator Guidance in Score-based Diffusion Models
- Reflected Diffusion Models
- Regions of Reliability in the Evaluation of Multivariate Probabilistic Forecasts
- Regression with Label Permutation in Generalized Linear Model
- Regression with Sensor Data Containing Incomplete Observations
- Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents
- Regret Minimization and Convergence to Equilibria in General-sum Markov Games
- Regret-Minimizing Double Oracle for Extensive-Form Games
- Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
- Regularization-free Diffeomorphic Temporal Alignment Nets
- Regularizing Towards Soft Equivariance Under Mixed Symmetries
- Reinforcement Learning Can Be More Efficient with Multiple Rewards
- Reinforcement Learning from Human Feedback: A Tutorial *
- Reinforcement Learning from Passive Data via Latent Intentions
- Reinforcement Learning in Low-rank MDPs with Density Features
- Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space
- Reinforcement Learning with History Dependent Dynamic Contexts
- Relevant Walk Search for Explaining Graph Neural Networks
- Reliable Measures of Spread in High Dimensional Latent Spaces
- ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs
- Reparameterized Policy Learning for Multimodal Trajectory Optimization
- Repository-Level Prompt Generation for Large Language Models of Code
- Representation-Driven Reinforcement Learning
- Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL
- Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition
- Representer Point Selection for Explaining Regularized High-dimensional Models
- Reprogramming Pretrained Language Models for Antibody Sequence Infilling
- Responsible AI for Generative AI in Practice: Lessons Learned and Open Challenges
- Restoration based Generative Models
- Restoration-Degradation Beyond Linear Diffusions: A Non-Asymptotic Analysis For DDIM-type Samplers
- Resurrecting Recurrent Neural Networks for Long Sequences
- Rethink DARTS Search Space and Renovate a New Benchmark
- Rethinking Backdoor Attacks
- Rethinking Explaining Graph Neural Networks via Non-parametric Subgraph Matching
- Rethinking Visual Reconstruction: Experience-Based Content Completion Guided by Visual Cues
- Rethinking Warm-Starts with Predictions: Learning Predictions Close to Sets of Optimal Solutions for Faster $\text{L}$-/$\text{L}^\natural$-Convex Function Minimization
- Rethinking Weak Supervision in Helping Contrastive Learning
- Retrieval-Augmented Multimodal Language Modeling
- Retrosynthetic Planning with Dual Value Networks
- Returning The Favour: When Regression Benefits From Probabilistic Causal Knowledge
- Revisiting Bellman Errors for Offline Model Selection
- Revisiting Data-Free Knowledge Distillation with Poisoned Teachers
- Revisiting Discriminative vs. Generative Classifiers: Theory and Implications
- Revisiting Domain Randomization via Relaxed State-Adversarial Policy Optimization
- Revisiting Gradient Clipping: Stochastic bias and tight convergence guarantees
- Revisiting Over-smoothing and Over-squashing Using Ollivier-Ricci Curvature
- Revisiting Pseudo-Label for Single-Positive Multi-Label Learning
- Revisiting Sampling for Combinatorial Optimization
- Revisiting Simple Regret: Fast Rates for Returning a Good Arm
- Revisiting Structured Variational Autoencoders
- Revisiting the Linear-Programming Framework for Offline RL with General Function Approximation
- Revisiting Weighted Aggregation in Federated Learning with Neural Networks
- Reward-Mixing MDPs with Few Latent Contexts are Learnable
- RGE: A Repulsive Graph Rectification for Node Classification via Influence
- Rigid Body Flows for Sampling Molecular Crystal Structures
- RLang: A Declarative Language for Describing Partial World Knowledge to Reinforcement Learning Agents
- RLEG: Vision-Language Representation Learning with Diffusion-based Embedding Generation
- RLSbench: Domain Adaptation Under Relaxed Label Shift
- Robust and private stochastic linear bandits
- Robust and Scalable Bayesian Online Changepoint Detection
- Robust Budget Pacing with a Single Sample
- Robust Camera Pose Refinement for Multi-Resolution Hash Encoding
- Robust Collaborative Learning with Linear Gradient Overhead
- Robust Consensus in Ranking Data Analysis: Definitions, Properties and Computational Issues
- Robust Counterfactual Explanations for Neural Networks With Probabilistic Guarantees
- Robust Explanation for Free or At the Cost of Faithfulness
- Robustly Learning a Single Neuron via Sharpness
- Robustness in Multimodal Learning under Train-Test Modality Mismatch
- Robust Non-Linear Feedback Coding via Power-Constrained Deep Learning
- Robust One-Class Classification with Signed Distance Function using 1-Lipschitz Neural Networks
- Robust Perception through Equivariance
- Robust Satisficing MDPs
- Robust Situational Reinforcement Learning in Face of Context Disturbances
- Robust Speech Recognition via Large-Scale Weak Supervision
- Robust Subtask Learning for Compositional Generalization
- Robust Weak Supervision with Variational Auto-Encoders
- Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?
- Rockmate: an Efficient, Fast, Automatic and Generic Tool for Re-materialization in PyTorch
- Rotation and Translation Invariant Representation Learning with Implicit Neural Representations
- RSC: Accelerate Graph Neural Networks Training via Randomized Sparse Computations
- Run-off Election: Improved Provable Defense against Data Poisoning Attacks
- R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility Across Random User Intents
- SAAL: Sharpness-Aware Active Learning
- Safe Offline Reinforcement Learning with Real-Time Budget Constraints
- Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
- SAM operates far from home: eigenvalue regularization as a dynamical phenomenon
- Sample and Predict Your Latent: Modality-free Sequential Disentanglement via Contrastive Estimation
- Sample Complexity Bounds for Learning High-dimensional Simplices in Noisy Regimes
- Sample Complexity of Probability Divergences under Group Symmetry
- Sampling and Optimization in Discrete Space
- Sampling-Based Accuracy Testing of Posterior Estimators for General Inference
- Sampling-based Nyström Approximation and Kernel Quadrature
- Sampling random graph homomorphisms and applications to network data analysis
- Scalable Adaptive Computation for Iterative Generation
- Scalable Multi-Agent Reinforcement Learning through Intelligent Information Aggregation
- Scalable Safe Policy Improvement via Monte Carlo Tree Search
- Scalable Set Encoding with Universal Mini-Batch Consistency and Unbiased Full Set Gradient Approximation
- Scaling Laws for Generative Mixed-Modal Language Models
- Scaling Laws for Multilingual Neural Machine Translation
- Scaling Laws for Reward Model Overoptimization
- Scaling of Class-wise Training Losses for Post-hoc Calibration
- Scaling Spherical CNNs
- Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory
- Scaling Vision Transformers to 22 Billion Parameters
- Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data
- SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired Image-to-Image Translation
- SE(3) diffusion model with application to protein backbone generation
- Searching Large Neighborhoods for Integer Linear Programs with Contrastive Learning
- Second-Order Optimization with Lazy Hessians
- Second-order regression models exhibit progressive sharpening to the edge of stability
- Secure Federated Correlation Test and Entropy Estimation
- SeedGNN: Graph Neural Network for Supervised Seeded Graph Matching
- SEGA: Structural Entropy Guided Anchor View for Graph Contrastive Learning
- SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation
- Selective Machine Learning of the Average Treatment Effect with an Invalid Instrumental Variable
- Self-Attention Amortized Distributional Projection Optimization for Sliced Wasserstein Point-Cloud Reconstruction
- Self-Interpretable Time Series Prediction with Counterfactual Explanations
- Self-Repellent Random Walks on General Graphs - Achieving Minimal Sampling Variance via Nonlinear Markov Chains
- Self-Supervised Learning in Vision: from Research Advances to Best Practices
- Self-supervised learning of Split Invariant Equivariant representations
- Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations
- SeMAIL: Eliminating Distractors in Visual Imitation via Separated Models
- Semi-Autoregressive Energy Flows: Exploring Likelihood-Free Training of Normalizing Flows
- Semi Bandit dynamics in Congestion Games: Convergence to Nash Equilibrium and No-Regret Guarantees.
- Semi-Dual Unbalanced Quadratic Optimal Transport: fast statistical rates and convergent algorithm.
- Semi-Offline Reinforcement Learning for Optimized Text Generation
- Semiparametrically Efficient Off-Policy Evaluation in Linear Markov Decision Processes
- Semi-Parametric Contextual Pricing Algorithm using Cox Proportional Hazards Model
- Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories
- SemSup-XC: Semantic Supervision for Zero and Few-shot Extreme Classification
- Sequence Modeling with Multiresolution Convolutional Memory
- Sequential Changepoint Detection via Backward Confidence Sequences
- Sequential Counterfactual Risk Minimization
- Sequential Kernelized Independence Testing
- Sequential Monte Carlo Learning for Time Series Structure Discovery
- Sequential Multi-Dimensional Self-Supervised Learning for Clinical Time Series
- Sequential Predictive Conformal Inference for Time Series
- Sequential Strategic Screening
- Sequential Underspecified Instrument Selection for Cause-Effect Estimation
- Set-membership Belief State-based Reinforcement Learning for POMDPs
- Settling the Reward Hypothesis
- SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance
- SGD with Large Step Sizes Learns Sparse Features
- Shape-Guided Dual-Memory Learning for 3D Anomaly Detection
- Shapley Based Residual Decomposition for Instance Analysis
- Sharper Bounds for $\ell_p$ Sensitivity Sampling
- Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments
- Shedding a PAC-Bayesian Light on Adaptive Sliced-Wasserstein Distances
- Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation
- Shortest Edit Path Crossover: A Theory-driven Solution to the Permutation Problem in Evolutionary Neural Architecture Search
- Short-lived High-volume Bandits
- Simple and Fast Group Robustness by Automatic Feature Reweighting
- simple diffusion: End-to-end diffusion for high resolution images
- Simple Disentanglement of Style and Content in Visual Representations
- Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning
- Simple Hardware-Efficient Long Convolutions for Sequence Modeling
- Simplex Random Features
- Simplified Temporal Consistency Reinforcement Learning
- Simplifying Momentum-based Positive-definite Submanifold Optimization with Applications to Deep Learning
- SinDDM: A Single Image Denoising Diffusion Model
- SinFusion: Training Diffusion Models on a Single Image or Video
- Single Point-Based Distributed Zeroth-Order Optimization with a Non-Convex Stochastic Objective Function
- Ske2Grid: Skeleton-to-Grid Representation Learning for Action Recognition
- Sketched Ridgeless Linear Regression: The Role of Downsampling
- Sketch-Flip-Merge: Mergeable Sketches for Private Distinct Counting
- Sketching for First Order Method: Efficient Algorithm for Low-Bandwidth Channel and Vulnerability
- Sketching Meets Differential Privacy: Fast Algorithm for Dynamic Kronecker Projection Maintenance
- SLAMB: Accelerated Large Batch Training with Sparse Communication
- Sliced-Wasserstein on Symmetric Positive Definite Matrices for M/EEG Signals
- SlotGAT: Slot-based Message Passing for Heterogeneous Graphs
- Slot-VAE: Object-Centric Scene Generation with Slot Attention
- Smart Initial Basis Selection for Linear Programs
- Smooth Non-stationary Bandits
- SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
- SMURF-THP: Score Matching-based UnceRtainty quantiFication for Transformer Hawkes Process
- SNeRL: Semantic-aware Neural Radiance Fields for Reinforcement Learning
- Social learning spontaneously emerges by searching optimal heuristics with deep reinforcement learning
- Solving High-Dimensional PDEs with Latent Spectral Models
- Solving Linear Programs with Fast Online Learning Algorithms
- SOM-CPC: Unsupervised Contrastive Learning with Self-Organizing Maps for Structured Representations of High-Rate Time Series
- SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot
- Sparse Learning of Dynamical Systems in RKHS: An Operator-Theoretic Approach
- SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks at the Edge
- Spatial Implicit Neural Representations for Global-Scale Species Mapping
- Spatial-Temporal Graph Learning with Adversarial Contrastive Adaptation
- Specializing Smaller Language Models towards Multi-Step Reasoning
- Special Properties of Gradient Descent with Large Learning Rates
- SpeedDETR: Speed-aware Transformers for End-to-end Object Detection
- Speeding Up Bellman Ford via Minimum Violation Permutations
- Speed-Oblivious Online Scheduling: Knowing (Precise) Speeds is not Necessary
- SpENCNN: Orchestrating Encoding and Sparsity for Fast Homomorphically Encrypted Neural Network Inference
- Spherical Fourier Neural Operators: Learning Stable Dynamics on the Sphere
- Spherical Inducing Features for Orthogonally-Decoupled Gaussian Processes
- SpotEM: Efficient Video Search for Episodic Memory
- spred: Solving L1 Penalty with SGD
- Spurious Valleys and Clustering Behavior of Neural Networks
- SRATTA: Sample Re-ATTribution Attack of Secure Aggregation in Federated Learning.
- Stabilizing GANs' Training with Brownian Motion Controller
- Stabilizing Transformer Training by Preventing Attention Entropy Collapse
- Stable and Consistent Prediction of 3D Characteristic Orientation via Invariant Residual Learning
- Stable Estimation of Heterogeneous Treatment Effects
- State and parameter learning with PARIS particle Gibbs
- Statistical Foundations of Prior-Data Fitted Networks
- Statistical Indistinguishability of Learning Algorithms
- Statistical Inference and A/B Testing for First-Price Pacing Equilibria
- Statistical Inference on Multi-armed Bandits with Delayed Feedback
- Statistical Learning under Heterogenous Distribution Shift
- STEERING : Stein Information Directed Exploration for Model-Based Reinforcement Learning
- Stein Variational Goal Generation for adaptive Exploration in Multi-Goal Reinforcement Learning
- STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition
- Stochastic Gradient Descent-Induced Drift of Representation in a Two-Layer Neural Network
- Stochastic Gradient Descent under Markovian Sampling Schemes
- Stochastic Gradient Succeeds for Bandits
- Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels
- Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies
- Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks
- Strategic Classification with Unknown User Manipulations
- Stratified Adversarial Robustness with Rejection
- Streaming Active Learning with Deep Neural Networks
- Streaming Submodular Maximization with Differential Privacy
- StriderNet: A Graph Reinforcement Learning Approach to Optimize Atomic Structures on Rough Energy Landscapes
- Structural Re-weighting Improves Graph Domain Adaptation
- Structured Cooperative Learning with Graphical Model Priors
- Structured Probabilistic Inference and Generative Modeling
- Structure-informed Language Models Are Protein Designers
- Structure Learning of Latent Factors via Clique Search on Correlation Thresholded Graphs
- StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
- Subequivariant Graph Reinforcement Learning in 3D Environments
- Submodular Order Functions and Assortment Optimization
- Subsample Ridge Ensembles: Equivalences and Generalized Cross-Validation
- Subset-Based Instance Optimality in Private Estimation
- Subset Selection Based On Multiple Rankings in the Presence of Bias: Effectiveness of Fairness Constraints for Multiwinner Voting Score Functions
- Superhuman Fairness
- Supervised Metric Learning to Rank for Retrieval via Contextual Similarity Optimization
- Supported Trust Region Optimization for Offline Reinforcement Learning
- SurCo: Learning Linear SURrogates for COmbinatorial Nonlinear Optimization Problems
- Surface Snapping Optimization Layer for Single Image Object Shape Reconstruction
- SurProGenes: Survival Risk-Ordered Representation of Cancer Patients and Genes for the Identification of Prognostic Genes
- Surrogate Model Extension (SME): A Fast and Accurate Weight Update Attack on Federated Learning
- Surrogate Module Learning: Reduce the Gradient Error Accumulation in Training Spiking Neural Networks
- SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
- Symmetry-Aware Robot Design with Structured Subgroups
- Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning
- Synthetic data for model selection
- Synthetic Data, Real Errors: How (Not) to Publish and Use Synthetic Data
- Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models
- System Identification of Neural Systems: If We Got It Right, Would We Know?
- TabDDPM: Modelling Tabular Data with Diffusion Models
- TabLeak: Tabular Data Leakage in Federated Learning
- Taking the Pulse Of Ethical ML in Health
- Taming graph kernels with random features
- TAN Without a Burn: Scaling Laws of DP-SGD
- Target-Aware Generative Augmentations for Single-Shot Adaptation
- Target-based Surrogates for Stochastic Optimization
- Task-specific experimental design for treatment effect estimation
- Task-Specific Skill Localization in Fine-tuned Language Models
- Taxonomy-Structured Domain Adaptation
- Team Belief DAG: Generalizing the Sequence Form to Team Games for Fast Computation of Correlated Team Max-Min Equilibria via Regret Minimization
- Temporal Label Smoothing for Early Event Prediction
- Temporally Consistent Transformers for Video Generation
- Tensor Decompositions Meet Control Theory: Learning General Mixtures of Linear Dynamical Systems
- Tensor Gaussian Process with Contraction for Multi-Channel Imaging Analysis
- Test-time Adaptation with Slot-Centric Models
- Test-Time Style Shifting: Handling Arbitrary Styles in Domain Generalization
- Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise
- Text-To-4D Dynamic Scene Generation
- Text-To-Concept (and Back) via Cross-Model Alignment
- TGRL: An Algorithm for Teacher Guided Reinforcement Learning
- The Acquisition of Physical Knowledge in Generative Neural Networks
- The Benefits of Mixup for Feature Learning
- The Benefits of Model-Based Generalization in Reinforcement Learning
- The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond
- The case for 4-bit precision: k-bit Inference Scaling Laws
- The Catalog Problem: Clustering and Ordering Variable-Sized Sets
- The Computational Complexity of Concise Hypersphere Classification
- The Dormant Neuron Phenomenon in Deep Reinforcement Learning
- The Edge of Orthogonality: A Simple View of What Makes BYOL Tick
- The Fast Johnson-Lindenstrauss Transform Is Even Faster
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
- The Future of ML in Biology: CRISPR for Health and Climate
- The Hessian perspective into the Nature of Convolutional Neural Networks
- The Ideal Continual Learner: An Agent That Never Forgets
- The Impact of Exploration on Convergence and Performance of Multi-Agent Q-Learning Dynamics
- The Implicit Regularization of Dynamical Stability in Stochastic Gradient Descent
- The Many Facets of Preference-Based Learning
- The Monge Gap: A Regularizer to Learn All Transport Maps
- The multimarginal optimal transport formulation of adversarial multiclass classification
- The Numerical Stability of Hyperbolic Representation Learning
- The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation
- Theoretical Behavior of XAI Methods in the Presence of Suppressor Variables
- Theoretical Bounds on the Network Community Profile from Low-rank Semi-definite Programming
- Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting
- Theory on Forgetting and Generalization of Continual Learning
- The Persistent Laplacian for Data Science: Evaluating Higher-Order Persistent Spectral Representations of Data
- The Power of Learned Locally Linear Models for Nonlinear Policy Optimization
- The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing
- The Power of Uniform Sampling for k-Median
- The Price of Differential Privacy under Continual Observation
- The Regret of Exploration and the Control of Bad Episodes in Reinforcement Learning
- The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning
- The Saddle-Point Method in Differential Privacy
- The Second Workshop on Spurious Correlations, Invariance and Stability
- The SSL Interplay: Augmentations, Inductive Bias, and Generalization
- The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
- The Statistical Scope of Multicalibration
- The Synergy of Scientific and Machine Learning Modelling (SynS & ML) Workshop
- The Test of Tests: A Framework for Differentially Private Hypothesis Testing
- The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning
- The Unreasonable Effectiveness of Few-shot Learning for Machine Translation
- The Value of Out-of-Distribution Data
- The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms
- The Wisdom of Hindsight Makes Language Models Better Instruction Followers
- Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits
- Thompson Sampling with Diffusion Generative Prior
- Thompson Sampling with Less Exploration is Fast and Optimal
- TIDE: Time Derivative Diffusion for Deep Learning on Graphs
- Tied-Augment: Controlling Representation Similarity Improves Data Augmentation
- Tight and fast generalization error bound of graph embedding in metric space
- Tight Certification of Adversarially Trained Neural Networks via Nonconvex Low-Rank Semidefinite Relaxations
- Tight Data Access Bounds for Private Top-$k$ Selection
- Tighter Analysis for ProxSkip
- Tighter Bounds on the Expressivity of Transformer Encoders
- Tighter Information-Theoretic Generalization Bounds from Supersamples
- Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond
- Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits
- Tilted Sparse Additive Models
- TIPS: Topologically Important Path Sampling for Anytime Neural Networks
- Topologically Faithful Image Segmentation via Induced Matching of Persistence Barcodes
- Topological Point Cloud Clustering
- Topological Singularity Detection at Multiple Scales
- Total Variation Graph Neural Networks
- Toward Efficient Gradient-Based Value Estimation
- Toward Large Kernel Models
- Towards a better understanding of representation dynamics under TD-learning
- Towards a Persistence Diagram that is Robust to Noise and Varied Densities
- Towards Better Graph Representation Learning with Parameterized Decomposition & Filtering
- Towards Bridging the Gaps between the Right to Explanation and the Right to be Forgotten
- Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models
- Towards Constituting Mathematical Structures for Learning to Optimize
- Towards Controlled Data Augmentations for Active Learning
- Towards credible visual model interpretation with path attribution
- Towards Deep Attention in Graph Neural Networks: Problems and Remedies
- Towards Explaining Distribution Shifts
- Towards Learning Geometric Eigen-Lengths Crucial for Fitting Tasks
- Towards Learning to Imitate from a Single Video Demonstration
- Towards Omni-generalizable Neural Methods for Vehicle Routing Problems
- Towards Practical Preferential Bayesian Optimization with Skew Gaussian Processes
- Towards Quantum Machine Learning for Constrained Combinatorial Optimization: a Quantum QAP Solver
- Towards Reliable Neural Specifications
- Towards Robust and Safe Reinforcement Learning with Benign Off-policy Data
- Towards Robust Graph Incremental Learning on Evolving Graphs
- Towards Stable and Efficient Adversarial Training against $l_1$ Bounded Adversarial Attacks
- Towards Sustainable Learning: Coresets for Data-efficient Deep Learning
- Towards Theoretical Understanding of Inverse Reinforcement Learning
- Towards Trustworthy Explanation: On Causal Rationalization
- Towards Unbiased Training in Federated Open-world Semi-supervised Learning
- Towards Understanding and Improving GFlowNet Training
- Towards Understanding and Reducing Graph Structural Noise for GNNs
- Towards Understanding Ensemble Distillation in Federated Learning
- Towards Understanding Generalization of Graph Neural Networks
- Towards Understanding Generalization of Macro-AUC in Multi-label Learning
- TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation
- Tractable Control for Autoregressive Language Generation
- Trading-Off Payments and Accuracy in Online Classification with Paid Stochastic Experts
- Trainability, Expressivity and Interpretability in Gated Neural ODEs
- Training Deep Surrogate Models with Large Scale Online Learning
- Training-Free Neural Active Learning with Initialization-Robustness Guarantees
- Training Normalizing Flows from Dependent Data
- Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning
- TRAK: Attributing Model Behavior at Scale
- Transcendental Idealism of Planner: Evaluating Perception from Planning Perspective for Autonomous Driving
- Transformed Distribution Matching for Missing Value Imputation
- Transformer-based Stagewise Decomposition for Large-Scale Multistage Stochastic Optimization
- Transformers as Algorithms: Generalization and Stability in In-context Learning
- Transformers Learn In-Context by Gradient Descent
- Transformers Meet Directed Graphs
- Trapdoor Normalization with Irreversible Ownership Verification
- Traversing Between Modes in Function Space for Fast Ensembling
- Trompt: Towards a Better Deep Neural Network for Tabular Data
- Truncating Trajectories in Monte Carlo Reinforcement Learning
- Trustworthy Policy Learning under the Counterfactual No-Harm Criterion
- Tuning Computer Vision Models With Task Rewards
- Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning
- Tutorial on Multimodal Machine Learning: Principles, Challenges, and Open Questions
- Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy
- Two-Scale Gradient Descent Ascent Dynamics Finds Mixed Nash Equilibria of Continuous Games: A Mean-Field Perspective
- UMD: Unsupervised Model Detection for X2X Backdoor Attacks
- Uncertain Evidence in Probabilistic Models and Stochastic Simulators
- Uncertainty Estimation by Fisher Information-based Evidential Deep Learning
- Uncertainty Estimation for Molecules: Desiderata and Methods
- Unconstrained Online Learning with Unbounded Losses
- Uncovering Adversarial Risks of Test-Time Adaptation
- Under-Counted Tensor Completion with Neural Incorporation of Attributes
- Underspecification Presents Challenges for Credibility in Modern Machine Learning
- Understand and Modularize Generator Optimization in ELECTRA-style Pretraining
- Understanding and Defending Patched-based Adversarial Attacks for Vision Transformer
- Understanding and Generalizing Contrastive Learning from the Inverse Optimal Transport Perspective
- Understanding Backdoor Attacks through the Adaptability Hypothesis
- Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias
- Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing
- Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases
- Understanding Oversquashing in GNNs through the Lens of Effective Resistance
- Understanding Plasticity in Neural Networks
- Understanding Self-Distillation in the Presence of Label Noise
- Understanding Self-Predictive Learning for Reinforcement Learning
- Understanding the Complexity Gains of Single-Task RL with a Curriculum
- Understanding the Distillation Process from Deep Generative Models to Tractable Probabilistic Circuits
- Understanding the Impact of Adversarial Robustness on Accuracy Disparity
- Understanding the Role of Feedback in Online Learning with Switching Costs
- Unearthing InSights into Mars: Unsupervised Source Separation with Limited Data
- Unifying Molecular and Textual Representations via Multi-task Language Modelling
- Unifying Nesterov's Accelerated Gradient Methods for Convex and Strongly Convex Objective Functions
- Unit Scaling: Out-of-the-Box Low-Precision Training
- Universal Morphology Control via Contextual Modulation
- Universal Physics-Informed Neural Networks: Symbolic Differential Operator Discovery with Sparse Data
- Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection Capability
- Unlocking Slot Attention by Changing Optimal Transport Costs
- Unscented Autoencoder
- Unsupervised Out-of-Distribution Detection with Diffusion Inpainting
- Unsupervised Skill Discovery for Learning Shared Structures across Changing Environments
- Unveiling the Latent Space Geometry of Push-Forward Generative Models
- Unveiling The Mask of Position-Information Pattern Through the Mist of Image Features
- UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers
- UPSCALE: Unconstrained Channel Pruning
- User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems
- User-level Private Stochastic Convex Optimization with Optimal Rates
- Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies
- Using Perturbation to Improve Goodness-of-Fit Tests based on Kernelized Stein Discrepancy
- VA-learning as a more efficient alternative to Q-learning
- Variance Control for Distributional Reinforcement Learning
- Variational Autoencoding Neural Operators
- Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills
- Variational Mixture of HyperGenerators for Learning Distributions over Functions
- Variational Open-Domain Question Answering
- Variational Sparse Inverse Cholesky Approximation for Latent Gaussian Processes via Double Kullback-Leibler Minimization
- VectorMapNet: End-to-end Vectorized HD Map Learning
- Vector Quantized Wasserstein Auto-Encoder
- Vector-Valued Control Variates
- Vertical Federated Graph Neural Network for Recommender System
- VIMA: Robot Manipulation with Multimodal Prompts
- Von Mises Mixture Distributions for Molecular Conformation Generation
- Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap
- Wasserstein Barycenter Matching for Graph Size Generalization of Message Passing Neural Networks
- Weakly Supervised Disentangled Generative Causal Representation Learning
- Weakly Supervised Regression with Interval Targets
- Weak Proxies are Sufficient and Preferable for Fairness with Missing Sensitive Attributes
- Weighted Flow Diffusion for Local Graph Clustering with Node Attributes: an Algorithm and Statistical Guarantees
- Weighted Sampling without Replacement for Deep Top-$k$ Classification
- Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality
- What Can Be Learnt With Wide Convolutional Neural Networks?
- What can online reinforcement learning with function approximation benefit from general coverage conditions?
- What do CNNs Learn in the First Layer and Why? A Linear Systems Perspective
- What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL?
- What Makes Entities Similar? A Similarity Flooding Perspective for Multi-sourced Knowledge Graph Embeddings
- When and How Does Known Class Help Discover Unknown Ones? Provable Understanding Through Spectral Analysis
- When does Privileged information Explain Away Label Noise?
- When do Minimax-fair Learning and Empirical Risk Minimization Coincide?
- When is Realizability Sufficient for Off-Policy Reinforcement Learning?
- When Personalization Harms Performance: Reconsidering the Use of Group Attributes in Prediction
- When Sparsity Meets Contrastive Models: Less Graph Data Can Bring Better Class-Balanced Representations
- Which Features are Learnt by Contrastive Learning? On the Role of Simplicity Bias in Class Collapse and Feature Suppression
- Which Invariance Should We Transfer? A Causal Minimax Learning Approach
- Which is Better for Learning with Noisy Labels: The Semi-supervised Method or Modeling Label Noise?
- Which Tricks are Important for Learning to Rank?
- Who Needs to Know? Minimal Knowledge for Optimal Coordination
- Whose Opinions Do Language Models Reflect?
- "Why did the Model Fail?": Attributing Model Performance Changes to Distribution Shifts
- Why does Throwing Away Data Improve Worst-Group Error?
- Why do Nearest Neighbor Language Models Work?
- Why Is Public Pretraining Necessary for Private Model Training?
- Why Random Pruning Is All We Need to Start Sparse
- Why Target Networks Stabilise Temporal Difference Methods
- Width and Depth Limits Commute in Residual Networks
- WL meet VC
- Workshop on Theory of Mind in Communicating Agents
- Wrapped Cauchy Distributed Angular Softmax for Long-Tailed Visual Recognition
- XAI Beyond Classification: Interpretable Neural Clustering
- X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion
- XTab: Cross-table Pretraining for Tabular Transformers