ICML 2022 Schedule

Filter Events

SUN 17 JUL

6 a.m.

7 a.m.

Registration Check-in Desk

(ends 4:00 PM)

9 a.m.

Expo Talk Panel:

Challenges Of Applying Graph Neural Networks

(ends 1:45 PM)

Expo Talk Panel:

Enabling Hand Gesture Customization on Wrist-Worn Devices

(ends 12:00 PM)

Expo Workshop:

Real World RL with Vowpal Wabbit and Azure Personalizer

(ends 11:50 AM)

Expo Demonstration:

AEPsych: active learning for human perception and preferences

(ends 2:00 PM)

Expo Demonstration:

TorchRL: the PyTorch RL Domain library

(ends 2:00 PM)

9:30 a.m.

Expo Talk Panel:

Knowledge graph-based recommendation framework identifies drivers of resistance in EGFR mutant non-small cell lung cancer

(ends 10:15 AM)

10:20 a.m.

Expo Demonstration:

Creating a scalable, reproducible, and reliable environment for model development with AstraZeneca and W&B.

(ends 11:20 AM)

11 a.m.

Cancelled:

Amazon SageMaker Model Parallelism: A General and Flexible Framework for Large Model Training

(ends 11:45 AM)

Expo Demonstration:

Enabling Hand Gesture Customization on Wrist-Worn Devices

(ends 12:00 PM)

11:30 a.m.

Expo Talk Panel:

Machine learning for drug discovery: Challenges and opportunities

(ends 1:00 PM)

Coffee Break:

Coffee Break

(ends 12:00 PM)

12:15 p.m.

Expo Demonstration:

Robust and Fast Detection of Toxic Speech Content via Machine Learning

(ends 1:00 PM)

1:15 p.m.

Expo Talk Panel:

Towards Robust Waveform-Based Acoustic Models

(ends 2:00 PM)

2:30 p.m.

Opening Reception - Catered:

Opening Reception

(ends 4:00 PM)

MON 18 JUL

4 a.m.

Registration Check-in Desk

(ends 3:00 PM)

5 a.m.

Affinity Workshop:

LatinX in AI (LXAI) LXAI Research Workshop

(ends 5:00 PM)

5:30 a.m.

Affinity Workshop:

Women in Machine Learning (WiML) Un-Workshop

(ends 3:40 PM)

6 a.m.

Affinity Workshop:

New In Machine Learning (NewInML)

(ends 1:30 PM)

6:30 a.m.

Tutorial:

Quantitative Reasoning About Data Privacy in Machine Learning

(ends 8:30 AM)

Tutorial:

Validity, Reliability, and Significance: A Tutorial on Statistical Methods for Reproducible Machine Learning

(ends 8:45 AM)

Tutorial:

Causality and Deep Learning: Synergies, Challenges and the Future

(ends 8:30 AM)

7 a.m.

Coffee Break:

Coffee Break

(ends 7:30 AM)

8 a.m.

9 a.m.

Lunch Break:

Lunch Break - on your own

(ends 10:30 AM)

10 a.m.

Tutorial:

Learning for Interactive Agents

(ends 12:00 PM)

Tutorial:

Bridging Learning and Decision Making

(ends 12:00 PM)

Tutorial:

Climate Change and Machine Learning: Opportunities, Challenges, and Considerations

(ends 12:00 PM)

11 a.m.

noon

Coffee Break:

Coffee Break

(ends 12:30 PM)

12:30 p.m.

Tutorial:

Causal Fairness Analysis

(ends 2:30 PM)

Tutorial:

Sampling as First-Order Optimization over a space of probability measures

(ends 2:30 PM)

Tutorial:

Welcome to the "Big Model" Era: Techniques and Systems to Train and Serve Bigger Models

(ends 2:50 PM)

4 p.m.

TUE 19 JUL

3:30 a.m.

Break:

Breakfast on your own

(ends 3:45 AM)

4 a.m.

Registration Check-in Desk

(ends 4:00 PM)

5:45 a.m.

Remarks:

Welcome by Chairs

(ends 6:00 AM)

6 a.m.

Invited Talk:

Towards a Mathematical Theory of Machine Learning

Weinan E

(ends 7:00 AM)

7 a.m.

Coffee Break:

Coffee Break

(ends 7:30 AM)

7:30 a.m.

Social Aspects [7:30-9:00]

Spotlights 7:30-8:05

[7:30] Differentially Private Approximate Quantiles

[7:35] Fairness Interventions as (Dis)Incentives for Strategic Manipulation

[7:40] Robust Models Are More Interpretable Because Attributions Look Normal

[7:45] Sequential Covariate Shift Detection Using Classifier Two-Sample Tests

[7:50] A Joint Exponential Mechanism For Differentially Private Top-

$k$

[7:55] Transfer Learning In Differential Privacy's Hybrid-Model

[8:00] Robust Kernel Density Estimation with Median-of-Means principle

Orals 8:05-8:25

[8:05] Bounding Training Data Reconstruction in Private (Deep) Learning

Spotlights 8:25-9:00

[8:25] Plug & Play Attacks: Towards Robust and Flexible Model Inversion Attacks

[8:30] FriendlyCore: Practical Differentially Private Aggregation

[8:35] ViT-NeT: Interpretable Vision Transformers with Neural Tree Decoder

[8:40] Fishing for User Data in Large-Batch Federated Learning via Gradient Magnification

[8:45] Public Data-Assisted Mirror Descent for Private Model Training

[8:50] Low-Complexity Deep Convolutional Neural Networks on Fully Homomorphic Encryption Using Multiplexed Parallel Convolutions

[8:55] Robin Hood and Matthew Effects: Differential Privacy Has Disparate Impact on Synthetic Data

(ends 9:00 AM)

Probabilistic Methods/Applications [7:30-9:00]

Orals 7:30-7:50

[7:30] Tackling covariate shift with node-based Bayesian neural networks

Spotlights 7:50-8:10

[7:50] Why the Rich Get Richer? On the Balancedness of Random Partition Models

[7:55] A Completely Tuning-Free and Robust Approach to Sparse Precision Matrix Estimation

[8:00] Markov Chain Monte Carlo for Continuous-Time Switching Dynamical Systems

[8:05] Calibrated Learning to Defer with One-vs-All Classifiers

Orals 8:10-8:30

[8:10] Tractable Uncertainty for Structure Learning

Spotlights 8:30-8:55

[8:30] DNA: Domain Generalization with Diversified Neural Averaging

[8:35] Unified Fourier-based Kernel and Nonlinearity Design for Equivariant Networks on Homogeneous Spaces

[8:40] DynaMixer: A Vision MLP Architecture with Dynamic Mixing

[8:45] Channel Importance Matters in Few-Shot Image Classification

[8:50] Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization

(ends 9:00 AM)

Reinforcement Learning [7:30-9:00]

Spotlights 7:30-8:00

[7:30] Dynamic Regret of Online Markov Decision Processes

[7:35] On the Impossibility of Learning to Cooperate with Adaptive Partner Strategies in Repeated Games

[7:40] Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning

[7:45] Provable Reinforcement Learning with a Short-Term Memory

[7:50] Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer

[7:55] Mirror Learning: A Unifying Framework of Policy Optimisation

Orals 8:00-8:20

[8:00] Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP

Spotlights 8:20-8:50

[8:20] Learning Infinite-horizon Average-reward Markov Decision Process with Constraints

[8:25] A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning

[8:30] Langevin Monte Carlo for Contextual Bandits

[8:35] Prompting Decision Transformer for Few-Shot Policy Generalization

[8:40] Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning

[8:45] Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation

(ends 9:00 AM)

Deep Learning: Robustness [7:30-9:00]

Orals 7:30-7:50

[7:30] Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them

Spotlights 7:50-8:10

[7:50] ShiftAddNAS: Hardware-Inspired Search for More Accurate and Efficient Neural Networks

[7:55] Provably Adversarially Robust Nearest Prototype Classifiers

[8:00] Certifying Out-of-Domain Generalization for Blackbox Functions

[8:05] Intriguing Properties of Input-Dependent Randomized Smoothing

Orals 8:10-8:30

[8:10] To Smooth or Not? When Label Smoothing Meets Noisy Labels

Spotlights 8:30-8:55

[8:30] Evaluating the Adversarial Robustness of Adaptive Test-time Defenses

[8:35] On the Generalization Analysis of Adversarial Learning

[8:40] Demystifying the Adversarial Robustness of Random Transformation Defenses

[8:45] Double Sampling Randomized Smoothing

[8:50] TPC: Transformation-Specific Smoothing for Point Cloud Models

(ends 9:00 AM)

APP: Language, Speech and Dialog [7:30-9:00]

Spotlights 7:30-8:00

[7:30] Certified Robustness Against Natural Language Attacks by Causal Intervention

[7:35] A

$^3$ T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing

[7:40] On the Learning of Non-Autoregressive Transformers

[7:45] Latent Diffusion Energy-Based Model for Interpretable Text Modelling

[7:50] UNIREX: A Unified Learning Framework for Language Model Rationale Extraction

[7:55] Black-Box Tuning for Language-Model-as-a-Service

Orals 8:00-8:20

[8:00] Understanding Dataset Difficulty with

$\mathcal{V}$ -Usable Information

Spotlights 8:20-9:00

[8:20] Co-training Improves Prompt-based Learning for Large Language Models

[8:25] Directed Acyclic Transformer for Non-Autoregressive Machine Translation

[8:30] StreamingQA: A Benchmark for Adaptation to New Knowledge over Time in Question Answering Models

[8:35] Unsupervised Detection of Contextualized Embedding Bias with Application to Ideology

[8:40] Generative Cooperative Networks for Natural Language Generation

[8:45] What Language Model Architecture and Pretraining Objective Works Best for Zero-Shot Generalization?

[8:50] Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding

[8:55] ROCK: Causal Inference Principles for Reasoning about Commonsense Causality

(ends 9:00 AM)

Optimization: Convex [7:30-9:00]

Orals 7:30-7:50

[7:30] Exact Optimal Accelerated Complexity for Fixed-Point Iterations

Spotlights 7:50-8:15

[7:50] Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions

[7:55] NysADMM: faster composite convex optimization via low-rank approximation

[8:00] FedNew: A Communication-Efficient and Privacy-Preserving Newton-Type Method for Federated Learning

[8:05] Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers

[8:10] Pairwise Conditional Gradients without Swap Steps and Sparser Kernel Herding

Orals 8:15-8:35

[8:15] Continuous-Time Analysis of Accelerated Gradient Methods via Conservation Laws in Dilated Coordinate Systems

Spotlights 8:35-9:00

[8:35] Only tails matter: Average-Case Universality and Robustness in the Convex Regime

[8:40] Batch Greenkhorn Algorithm for Entropic-Regularized Multimarginal Optimal Transport: Linear Rate of Convergence and Iteration Complexity

[8:45] Approximate Frank-Wolfe Algorithms over Graph-structured Support Sets

[8:50] Neural Fisher Discriminant Analysis: Optimal Neural Network Embeddings in Polynomial Time

[8:55] Active Sampling for Min-Max Fairness

(ends 9:00 AM)

Theory: Online Learning/Bandits [7:30-9:00]

Orals 7:30-7:50

[7:30] Online Learning for Min Sum Set Cover and Pandora’s Box

Spotlights 7:50-8:15

[7:50] Smoothed Adversarial Linear Contextual Bandits with Knapsacks

[7:55] Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback

[8:00] Thompson Sampling for (Combinatorial) Pure Exploration

[8:05] Revisiting Online Submodular Minimization: Gap-Dependent Regret Bounds, Best of Both Worlds and Adversarial Robustness

[8:10] Rotting Infinitely Many-Armed Bandits

Orals 8:15-8:35

[8:15] Batched Dueling Bandits

Spotlights 8:35-9:00

[8:35] Equivalence Analysis between Counterfactual Regret Minimization and Online Mirror Descent

[8:40] Consistent Polyhedral Surrogates for Top-k Classification and Variants

[8:45] Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models

[8:50] Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits

[8:55] Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback

(ends 9:00 AM)

Transfer/Multitask/Meta Learning [7:30-9:00]

Spotlights 7:30-8:05

[7:30] Multi-Task Learning as a Bargaining Game

[7:35] Frustratingly Easy Transferability Estimation

[7:40] Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling

[7:45] A Difference Standardization Method for Mutual Transfer Learning

[7:50] Improving Task-free Continual Learning by Distributionally Robust Memory Evolution

[7:55] A Multi-objective / Multi-task Learning Framework Induced by Pareto Stationarity

[8:00] Sparse Invariant Risk Minimization

Orals 8:05-8:25

[8:05] Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning

Spotlights 8:25-9:00

[8:25] A Closer Look at Smoothness in Domain Adversarial Training

[8:30] Balancing Discriminability and Transferability for Source-Free Domain Adaptation

[8:35] Model Agnostic Sample Reweighting for Out-of-Distribution Learning

[8:40] Zero-shot AutoML with Pretrained Models

[8:45] Efficient Variance Reduction for Meta-learning

[8:50] Generalizing to Evolving Domains with Latent Structure-Aware Sequential Autoencoder

[8:55] Partial disentanglement for domain adaptation

(ends 9:00 AM)

Deep Learning [7:30-9:00]

Spotlights 7:30-8:05

[7:30] Structural Entropy Guided Graph Hierarchical Pooling

[7:35] Self-Supervised Representation Learning via Latent Graph Prediction

[7:40] DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural Network for Traffic Flow Forecasting

[7:45] Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets

[7:50] Omni-Granular Ego-Semantic Propagation for Self-Supervised Graph Representation Learning

[7:55] Analyzing and Mitigating Interference in Neural Architecture Search

[8:00] Reverse Engineering

$\ell_p$ attacks: A block-sparse optimization approach with recovery guarantees

Orals 8:05-8:25

[8:05] Unified Scaling Laws for Routed Language Models

Spotlights 8:25-9:00

[8:25] DRAGONN: Distributed Randomized Approximate Gradients of Neural Networks

[8:30] A deep convolutional neural network that is invariant to time rescaling

[8:35] LyaNet: A Lyapunov Framework for Training Neural ODEs

[8:40] Transfer and Marginalize: Explaining Away Label Noise with Privileged Information

[8:45] On Collective Robustness of Bagging Against Data Poisoning

[8:50] Hindering Adversarial Attacks with Implicit Neural Representations

[8:55] From Noisy Prediction to True Label: Noisy Prediction Calibration via Generative Model

(ends 9:00 AM)

Deep Learning: Generative Models/Autoencoders [7:30-9:00]

Spotlights 7:30-8:05

[7:30] Exploring and Exploiting Hubness Priors for High-Quality GAN Latent Sampling

[7:35] ButterflyFlow: Building Invertible Layers with Butterfly Matrices

[7:40] Controlling Conditional Language Models without Catastrophic Forgetting

[7:45] GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

[7:50] Structure-preserving GANs

[7:55] DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

[8:00] Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models

Orals 8:05-8:25

[8:05] Equivariant Diffusion for Molecule Generation in 3D

Spotlights 8:25-9:00

[8:25] Forward Operator Estimation in Generative Models with Kernel Transfer Operators

[8:30] Conditional GANs with Auxiliary Discriminative Classifier

[8:35] Improved StyleGAN-v2 based Inversion for Out-of-Distribution Images

[8:40] Matching Normalizing Flows and Probability Paths on Manifolds

[8:45] Marginal Distribution Adaptation for Discrete Sets via Module-Oriented Divergence Minimization

[8:50] Learning to Incorporate Texture Saliency Adaptive Attention to Image Cartoonization

[8:55] Region-Based Semantic Factorization in GANs

(ends 9:00 AM)

9 a.m.

Lunch:

Lunch Break - on your own

(ends 10:30 PM)

10:30 a.m.

DL: Algorithms [10:30-12:00]

Spotlights 10:30-11:00

[10:30] Online Continual Learning through Mutual Information Maximization

[10:35] Learning Iterative Reasoning through Energy Minimization

[10:40] DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks

[10:45] PoF: Post-Training of Feature Extractor for Improving Generalization

[10:50] Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation

[10:55] Set Based Stochastic Subsampling

Orals 11:00-11:20

[11:00] Monarch: Expressive Structured Matrices for Efficient and Accurate Training

Spotlights 11:20-11:55

[11:20] Generalizing to New Physical Systems via Context-Informed Dynamics Model

[11:25] Self-conditioning Pre-Trained Language Models

[11:30] TAM: Topology-Aware Margin Loss for Class-Imbalanced Node Classification

[11:35] Bitwidth Heterogeneous Federated Learning with Progressive Weight Dequantization

[11:40] Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning

[11:45] Knowledge Base Question Answering by Case-based Reasoning over Subgraphs

[11:50] When AUC meets DRO: Optimizing Partial AUC for Deep Learning with Non-Convex Convergence Guarantee

(ends 12:00 PM)

SA: Accountability, Transparency and Interpretability [10:30-12:00]

Spotlights 10:30-11:05

[10:30] Meaningfully debugging model mistakes using conceptual counterfactual explanations

[10:35] Measuring the Effect of Training Data on Deep Learning Predictions via Randomized Experiments

[10:40] Robust Counterfactual Explanations for Tree-Based Ensembles

[10:45] A Rigorous Study of Integrated Gradients Method and Extensions to Internal Neuron Attributions

[10:50] Estimating and Penalizing Induced Preference Shifts in Recommender Systems

[10:55] Framework for Evaluating Faithfulness of Local Explanations

[11:00] A Consistent and Efficient Evaluation Strategy for Attribution Methods

Orals 11:05-11:25

[11:05] Training Characteristic Functions with Reinforcement Learning: XAI-methods play Connect Four

Spotlights 11:25-12:00

[11:25] Label-Descriptive Patterns and Their Application to Characterizing Classification Errors

[11:30] XAI for Transformers: Better Explanations through Conservative Propagation

[11:35] Quantification and Analysis of Layer-wise and Pixel-wise Information Discarding

[11:40] Interpretable Off-Policy Learning via Hyperbox Search

[11:45] Neuron Dependency Graphs: A Causal Abstraction of Neural Networks

[11:50] On the Adversarial Robustness of Causal Algorithmic Recourse

[11:55] Knowledge-Grounded Self-Rationalization via Extractive and Natural Language Explanations

(ends 12:00 PM)

APP: Computer Vision [10:30-12:00]

Spotlights 10:30-11:00

[10:30] Robust Group Synchronization via Quadratic Programming

[10:35] UAST: Uncertainty-Aware Siamese Tracking

[10:40] You Only Cut Once: Boosting Data Augmentation with a Single Cut

[10:45] Generative Modeling for Multi-task Visual Learning

[10:50] HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning

[10:55] Parametric Visual Program Induction with Function Modularization

Orals 11:00-11:20

[11:00] Path-Gradient Estimators for Continuous Normalizing Flows

Spotlights 11:20-11:55

[11:20] Variational Feature Pyramid Networks

[11:25] Deep Neural Network Fusion via Graph Matching with Applications to Model Ensemble and Federated Learning

[11:30] VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix

[11:35] Neural Implicit Dictionary Learning via Mixture-of-Expert Training

[11:40] Time Is MattEr: Temporal Self-supervision for Video Transformers

[11:45] Benchmarking and Analyzing Point Cloud Classification under Corruptions

[11:50] Understanding The Robustness in Vision Transformers

(ends 12:00 PM)

Theory [10:30-12:00]

Orals 10:30-10:50

[10:30] Learning Mixtures of Linear Dynamical Systems

Spotlights 10:50-11:15

[10:50] Massively Parallel

$k$ -Means Clustering for Perturbation Resilient Instances

[10:55] Residual-Based Sampling for Online Outlier-Robust PCA

[11:00] Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times

[11:05] Streaming Algorithms for Support-Aware Histograms

[11:10] Power-Law Escape Rate of SGD

Orals 11:15-11:35

[11:15] Generalized Results for the Existence and Consistency of the MLE in the Bradley-Terry-Luce Model

Spotlights 11:35-12:00

[11:35] Faster Algorithms for Learning Convex Functions

[11:40] Feature selection using e-values

[11:45] ActiveHedge: Hedge meets Active Learning

[11:50] One-Pass Algorithms for MAP Inference of Nonsymmetric Determinantal Point Processes

[11:55] Deciphering Lasso-based Classification Through a Large Dimensional Analysis of the Iterative Soft-Thresholding Algorithm

(ends 12:00 PM)

MISC: Unsupervised and Semi-supervised Learning [10:30-12:00]

Spotlights 10:30-11:00

[10:30] An iterative clustering algorithm for the Contextual Stochastic Block Model with optimality guarantees

[10:35] Smoothed Adaptive Weighting for Imbalanced Semi-Supervised Learning: Improve Reliability Against Unknown Distribution Data

[10:40] Class-Imbalanced Semi-Supervised Learning with Adaptive Thresholding

[10:50] Meta-Learning Hypothesis Spaces for Sequential Decision-making

[10:55] A Tighter Analysis of Spectral Clustering, and Beyond

Orals 11:00-11:20

[11:00] Online Active Regression

Spotlights 11:20-11:55

[11:20] On Finite-Sample Identifiability of Contrastive Learning-Based Nonlinear Independent Component Analysis

[11:25] Revisiting Contrastive Learning through the Lens of Neighborhood Component Analysis: an Integrated Framework

[11:30] Open-Sampling: Exploring Out-of-Distribution data for Re-balancing Long-tailed datasets

[11:35] Confidence Score for Source-Free Unsupervised Domain Adaptation

[11:40] Gradient Based Clustering

[11:45] Global Optimization of K-Center Clustering

[11:50] Latent Outlier Exposure for Anomaly Detection with Contaminated Data

(ends 12:00 PM)

PM: Gaussian Processes [10:30-12:00]

Spotlights 10:30-11:00

[10:30] Additive Gaussian Processes Revisited

[10:35] Probabilistic ODE Solutions in Millions of Dimensions

[10:40] Adaptive Gaussian Process Change Point Detection

[10:45] Volatility Based Kernels and Moving Average Means for Accurate Forecasting with Gaussian Processes

[10:50] Fenrir: Physics-Enhanced Regression for Initial Value Problems

[10:55] Variational nearest neighbor Gaussian process

Orals 11:00-11:20

[11:00] Preconditioning for Scalable Gaussian Process Hyperparameter Optimization

Spotlights 11:20-11:50

[11:20] Spectral Representation of Robustness Measures for Optimization Under Input Uncertainty

[11:25] Bayesian Optimization under Stochastic Delayed Feedback

[11:30] Bayesian Optimization for Distributionally Robust Chance-constrained Problem

[11:35] Efficient Distributionally Robust Bayesian Optimization with Worst-case Sensitivity

[11:40] Improved Convergence Rates for Sparse Approximation Methods in Kernel-Based Learning

[11:45] Scalable First-Order Bayesian Optimization via Structured Automatic Differentiation

(ends 12:00 PM)

Reinforcement Learning: Deep/Batch/Offline [10:30-12:00]

Orals 10:30-10:50

[10:30] Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution

Spotlights 10:50-11:15

[10:50] AnyMorph: Learning Transferable Polices By Inferring Agent Morphology

[10:55] DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations

[11:00] Stabilizing Off-Policy Deep Reinforcement Learning from Pixels

[11:05] Influence-Augmented Local Simulators: a Scalable Solution for Fast Deep RL in Large Networked Systems

[11:10] CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer

Orals 11:15-11:35

[11:15] Offline RL Policies Should Be Trained to be Adaptive

Spotlights 11:35-12:00

[11:35] Lyapunov Density Models: Constraining Distribution Shift in Learning-Based Control

[11:40] PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration

[11:45] Supervised Off-Policy Ranking

[11:50] The Primacy Bias in Deep Reinforcement Learning

[11:55] Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning

(ends 12:00 PM)

Optimization [10:30-12:00]

Orals 10:30-10:50

[10:30] Topology-Aware Network Pruning using Multi-stage Graph Embedding and Reinforcement Learning

Spotlights 10:50-11:10

[10:50] Stochastic Reweighted Gradient Descent

[10:55] Sharpened Quasi-Newton Methods: Faster Superlinear Rate and Larger Local Convergence Neighborhood

[11:00] Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in Imaging

[11:05] FedNL: Making Newton-Type Methods Applicable to Federated Learning

Orals 11:10-11:30

[11:10] Solving Stackelberg Prediction Game with Least Squares Loss via Spherically Constrained Least Squares Reformulation

Spotlights 11:30-11:55

[11:30] Dimension-free Complexity Bounds for High-order Nonconvex Finite-sum Optimization

[11:35] Value Function based Difference-of-Convex Algorithm for Bilevel Hyperparameter Selection Problems

[11:40] Probabilistic Bilevel Coreset Selection

[11:45] Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs

[11:50] On Implicit Bias in Overparameterized Bilevel Optimization

(ends 12:00 PM)

DL: Graph Neural Networks [10:30-12:00]

Spotlights 10:30-11:05

[10:30] pathGCN: Learning General Graph Spatial Operators from Paths

[10:35] Graph-Coupled Oscillator Networks

[10:40] HousE: Knowledge Graph Embedding with Householder Parameterization

[10:45] Interpretable and Generalizable Graph Learning via Stochastic Attention Mechanism

[10:50] ProGCL: Rethinking Hard Negative Mining in Graph Contrastive Learning

[10:55] G

$^2$ CN: Graph Gaussian Convolution Networks with Concentrated Graph Filters

[11:00] SpeqNets: Sparsity-aware permutation-equivariant graph networks

Orals 11:05-11:25

[11:05] data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

Spotlights 11:25-11:55

[11:25] Position Prediction as an Effective Pretraining Strategy

[11:30] Orchestra: Unsupervised Federated Learning via Globally Consistent Clustering

[11:35] Deep and Flexible Graph Neural Architecture Search

[11:40] GNNRank: Learning Global Rankings from Pairwise Comparisons via Directed Graph Neural Networks

[11:45] Large-Scale Graph Neural Architecture Search

[11:50] Optimization-Induced Graph Implicit Nonlinear Diffusion

(ends 12:00 PM)

Theory: Bandits/RL/Everything Else [10:30-12:00]

Orals 10:30-10:50

[10:30] Robustness Implies Generalization via Data-Dependent Generalization Bounds

Spotlights 10:50-11:15

[10:50] Learning to Hash Robustly, Guaranteed

[10:55] Policy Gradient Method For Robust Reinforcement Learning

[11:00] A query-optimal algorithm for finding counterfactuals

[11:05] Linear Bandit Algorithms with Sublinear Time Complexity

[11:10] Quantum-Inspired Algorithms from Randomized Numerical Linear Algebra

Orals 11:15-11:35

[11:15] Individual Preference Stability for Clustering

Spotlights 11:35-12:00

[11:35] Correlated Quantization for Distributed Mean Estimation and Optimization

[11:40] Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms

[11:45] Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms

[11:50] The Algebraic Path Problem for Graph Metrics

[11:55] Steerable 3D Spherical Neurons

(ends 12:00 PM)

11 a.m.

noon

Coffee Break:

Coffee Break

(ends 12:30 PM)

12:30 p.m.

Award:

Test of Time Award

(ends 1:00 PM)

1 p.m.

Break:

Short Break

(ends 1:15 PM)

1:15 p.m.

Deep Learning [1:15-2:45]

Spotlights 1:15-1:50

[1:15] Prototype Based Classification from Hierarchy to Fairness

[1:20] Neural-Symbolic Models for Logical Queries on Knowledge Graphs

[1:25] Deep Probability Estimation

[1:30] Uncertainty Modeling in Generative Compressed Sensing

[1:35] Going Deeper into Permutation-Sensitive Graph Neural Networks

[1:40] Learning from Counterfactual Links for Link Prediction

[1:45] Training Discrete Deep Generative Models via Gapped Straight-Through Estimator

Orals 1:50-2:10

[1:50] Correct-N-Contrast: a Contrastive Approach for Improving Robustness to Spurious Correlations

Spotlights 2:10-2:45

[2:10] Principal Component Flows

[2:15] Bit Prioritization in Variational Autoencoders via Progressive Coding

[2:20] Generative Flow Networks for Discrete Probabilistic Modeling

[2:25] Diffusion bridges vector quantized variational autoencoders

[2:30] Mitigating Modality Collapse in Multimodal VAEs via Impartial Optimization

[2:35] Soft Truncation: A Universal Training Technique of Score-based Diffusion Model for High Precision Score Estimation

[2:40] Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin Attack

(ends 2:45 PM)

MISC: Causality [1:15-2:45]

Spotlights 1:15-1:50

[1:15] Coordinated Double Machine Learning

[1:20] Exploiting Independent Instruments: Identification and Distribution Generalization

[1:25] Partial Counterfactual Identification from Observational and Experimental Data

[1:30] On Measuring Causal Contributions via do-interventions

[1:35] The Role of Deconfounding in Meta-learning

[1:40] CITRIS: Causal Identifiability from Temporal Intervened Sequences

[1:45] Online Balanced Experimental Design

Orals 1:50-2:10

[1:50] Minimum Cost Intervention Design for Causal Effect Identification

Spotlights 2:10-2:45

[2:10] Causal structure-based root cause analysis of outliers

[2:15] Instrumental Variable Regression with Confounder Balancing

[2:20] Causal Transformer for Estimating Counterfactual Outcomes

[2:25] Causal Inference Through the Structural Causal Marginal Problem

[2:30] Functional Generalized Empirical Likelihood Estimation for Conditional Moment Restrictions

[2:35] Matching Learned Causal Effects of Neural Networks with Domain Priors

[2:40] Inferring Cause and Effect in the Presence of Heteroscedastic Noise

(ends 2:45 PM)

SA: Trustworthy Machine Learning [1:15-2:45]

Orals 1:15-1:35

[1:15] POEM: Out-of-Distribution Detection with Posterior Sampling

Spotlights 1:35-1:55

[1:35] Selective Network Linearization for Efficient Private Inference

[1:40] Efficient Computation of Higher-Order Subgraph Attribution via Message Passing

[1:45] A Theoretical Analysis on Independence-driven Importance Weighting for Covariate-shift Generalization

[1:50] Modular Conformal Calibration

Orals 1:55-2:15

[1:55] Rethinking Image-Scaling Attacks: The Interplay Between Vulnerabilities in Machine Learning Systems

Spotlights 2:15-2:40

[2:15] Context-Aware Drift Detection

[2:20] Accelerating Shapley Explanation via Contributive Cooperator Selection

[2:25] An Equivalence Between Data Poisoning and Byzantine Gradient Attacks

[2:30] DAVINZ: Data Valuation using Deep Neural Networks at Initialization

[2:35] Sample Efficient Learning of Predictors that Complement Humans

(ends 2:45 PM)

T: Learning/Deep Learning Theory [1:15-2:45]

Orals 1:15-1:35

[1:15] H-Consistency Bounds for Surrogate Loss Minimizers

Spotlights 1:35-2:00

[1:35] Learning General Halfspaces with Adversarial Label Noise via Online Gradient Descent

[1:40] The Teaching Dimension of Regularized Kernel Learners

[1:45] Sparse Mixed Linear Regression with Guarantees: Taming an Intractable Problem with Invex Relaxation

[1:50] TURF: Two-Factor, Universal, Robust, Fast Distribution Learning Algorithm

[1:55] Multiclass learning with margin: exponential rates with no bias-variance trade-off

Orals 2:00-2:20

[2:00] Refined Convergence Rates for Maximum Likelihood Estimation under Finite Mixture Models

Spotlights 2:20-2:45

[2:20] High Probability Guarantees for Nonconvex Stochastic Gradient Descent with Heavy Tails

[2:25] An Initial Alignment between Neural Network and Target is Needed for Gradient Descent to Learn

[2:30] Inductive Biases and Variable Creation in Self-Attention Mechanisms

[2:35] Topology-aware Generalization of Decentralized SGD

[2:40] Understanding Gradient Descent on the Edge of Stability in Deep Learning

(ends 2:45 PM)

APP: Neuroscience, Cognitive Science [1:15-2:45]

Spotlights 1:15-1:45

[1:15] Bayesian Nonparametric Learning for Point Processes with Spatial Homogeneity: A Spatial Analysis of NBA Shot Locations

[1:20] On the Effects of Artificial Data Modification

[1:25] Deep Squared Euclidean Approximation to the Levenshtein Distance for DNA Storage

[1:30] How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating and Auditing Generative Models

[1:35] Error-driven Input Modulation: Solving the Credit Assignment Problem without a Backward Pass

[1:40] How to Train Your Wide Neural Network Without Backprop: An Input-Weight Alignment Perspective

Orals 1:45-2:05

[1:45] Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and Fairness

Spotlights 2:05-2:45

[2:05] Describing Differences between Text Distributions with Natural Language

[2:10] Distinguishing rule- and exemplar-based generalization in learning systems

[2:15] Burst-Dependent Plasticity and Dendritic Amplification Support Target-Based Learning and Hierarchical Imitation Learning

[2:20] A Deep Learning Approach for the Segmentation of Electroencephalography Data in Eye Tracking Applications

[2:25] Minimizing Control for Credit Assignment with Strong Feedback

[2:30] Self-Supervised Models of Audio Effectively Explain Human Cortical Responses to Speech

[2:35] Towards Scaling Difference Target Propagation by Learning Backprop Targets

[2:40] Content Addressable Memory Without Catastrophic Forgetting by Heteroassociation with a Fixed Scaffold

(ends 2:45 PM)

PM: Monte Carlo and Sampling Methods [1:15-2:45]

Orals 1:15-1:35

[1:15] Scalable MCMC Sampling for Nonsymmetric Determinantal Point Processes

Spotlights 1:35-2:00

[1:35] Robust SDE-Based Variational Formulations for Solving Linear PDEs via Deep Learning

[1:40] Hessian-Free High-Resolution Nesterov Acceleration For Sampling

[1:45] LSB: Local Self-Balancing MCMC in Discrete Spaces

[1:50] A Langevin-like Sampler for Discrete Distributions

[1:55] Scalable Spike-and-Slab

Orals 2:00-2:20

[2:00] Nonparametric Involutive Markov Chain Monte Carlo

Spotlights 2:20-2:45

[2:20] Continual Repeated Annealed Flow Transport Monte Carlo

[2:25] Algorithms for the Communication of Samples

[2:30] Low-Precision Stochastic Gradient Langevin Dynamics

[2:35] Fast Relative Entropy Coding with A* coding

[2:40] Accurate Quantization of Measures via Interacting Particle-based Optimization

(ends 2:45 PM)

OPT: Non-Convex [1:15-2:45]

Spotlights 1:15-1:45

[1:15] Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective

[1:20] Convergence and Recovery Guarantees of the K-Subspaces Method for Subspace Clustering

[1:25] Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the

$O(\epsilon^{-7/4})$ Complexity

[1:30] Understanding the unstable convergence of gradient descent

[1:35] Federated Minimax Optimization: Improved Convergence Analyses and Algorithms

[1:40] Inductive Matrix Completion: No Bad Local Minima and a Fast Algorithm

Orals 1:45-2:05

[1:45] FedNest: Federated Bilevel, Minimax, and Compositional Optimization

Spotlights 2:05-2:35

[2:05] AdaGrad Avoids Saddle Points

[2:10] Fast and Provable Nonconvex Tensor RPCA

[2:15] On Convergence of Gradient Descent Ascent: A Tight Local Analysis

[2:20] Convergence Rates of Non-Convex Stochastic Gradient Descent Under a Generic Lojasiewicz Condition and Local Smoothness

[2:25] A Single-Loop Gradient Descent and Perturbed Ascent Algorithm for Nonconvex Functional Constrained Optimization

[2:30] Anticorrelated Noise Injection for Improved Generalization

(ends 2:45 PM)

RL: Multi-agent [1:15-2:45]

Spotlights 1:15-1:45

[1:15] Model-Free Opponent Shaping

[1:20] Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning

[1:25] Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation

[1:30] Disentangling Sources of Risk for Distributional Multi-Agent Reinforcement Learning

[1:35] Scalable Deep Reinforcement Learning Algorithms for Mean Field Games

[1:40] Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning

Orals 1:45-2:05

[1:45] Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence

Spotlights 2:05-2:45

[2:05] Self-Organized Polynomial-Time Coordination Graphs

[2:10] Individual Reward Assisted Multi-Agent Reinforcement Learning

[2:15] Generalized Beliefs for Cooperative AI

[2:20] Greedy when Sure and Conservative when Uncertain about the Opponents

[2:25] Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning

[2:30] Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy

[2:35] Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games

[2:40] Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis

(ends 2:45 PM)

DL: Sequential Models [1:15-2:45]

Spotlights 1:15-1:45

[1:15] Modeling Irregular Time Series with Continuous Recurrent Units

[1:20] TACTiS: Transformer-Attentional Copulas for Time Series

[1:25] CerDEQ: Certifiable Deep Equilibrium Model

[1:30] Approximately Equivariant Networks for Imperfectly Symmetric Dynamics

[1:35] IDYNO: Learning Nonparametric DAGs from Interventional Dynamic Data

[1:40] GSmooth: Certified Robustness against Semantic Transformations via Generalized Randomized Smoothing

Orals 1:45-2:05

[1:45] Neural Laplace: Learning diverse classes of differential equations in the Laplace domain

Spotlights 2:05-2:45

[2:05] Improving Language Models by Retrieving from Trillions of Tokens

[2:10] Closed-Form Diffeomorphic Transformations for Time Series Alignment

[2:15] Removing Batch Normalization Boosts Adversarial Training

[2:20] Forget-free Continual Learning with Winning Subnetworks

[2:25] FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting

[2:30] Adversarial Robustness against Multiple and Single

$l_p$ -Threat Models via Quick Fine-Tuning of Robust Classifiers

[2:35] On the Practicality of Deterministic Epistemic Uncertainty

[2:40] Combining Diverse Feature Priors

(ends 2:45 PM)

Theory [1:15-2:45]

Orals 1:15-1:35

[1:15] Cooperative Online Learning in Stochastic and Adversarial MDPs

Spotlights 1:35-2:00

[1:35] Simple and near-optimal algorithms for hidden stratification and multi-group learning

[1:40] Being Properly Improper

[1:45] Neural Network Pruning Denoises the Features and Makes Local Connectivity Emerge in Visual Tasks

[1:50] On the Finite-Time Complexity and Practical Computation of Approximate Stationarity Concepts of Lipschitz Functions

[1:55] Nearly Optimal Policy Optimization with Stable at Any Time Guarantee

Orals 2:00-2:20

[2:00] Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces

Spotlights 2:20-2:45

[2:20] Minimax M-estimation under Adversarial Contamination

[2:25] Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

[2:30] Efficiently Learning the Topology and Behavior of a Networked Dynamical System Via Active Queries

[2:35] Boosting Graph Structure Learning with Dummy Nodes

[2:40] Lazy Estimation of Variable Importance for Large Neural Networks

(ends 2:45 PM)

3:30 p.m.

Poster Session 1 [3:30-5:30]

Posters 3:30-5:30

DNA: Domain Generalization with Diversified Neural Averaging

Unified Fourier-based Kernel and Nonlinearity Design for Equivariant Networks on Homogeneous Spaces

DynaMixer: A Vision MLP Architecture with Dynamic Mixing

Channel Importance Matters in Few-Shot Image Classification

Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization

Pure Noise to the Rescue of Insufficient Data: Improving Imbalanced Classification by Training on Random Noise Images

Certified Robustness Against Natural Language Attacks by Causal Intervention

$^3$ T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing

On the Learning of Non-Autoregressive Transformers

Latent Diffusion Energy-Based Model for Interpretable Text Modelling

UNIREX: A Unified Learning Framework for Language Model Rationale Extraction

Black-Box Tuning for Language-Model-as-a-Service

Understanding Dataset Difficulty with

$\mathcal{V}$ -Usable Information

Co-training Improves Prompt-based Learning for Large Language Models

Directed Acyclic Transformer for Non-Autoregressive Machine Translation

StreamingQA: A Benchmark for Adaptation to New Knowledge over Time in Question Answering Models

Unsupervised Detection of Contextualized Embedding Bias with Application to Ideology

Generative Cooperative Networks for Natural Language Generation

What Language Model Architecture and Pretraining Objective Works Best for Zero-Shot Generalization?

Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding

Robust Group Synchronization via Quadratic Programming

UAST: Uncertainty-Aware Siamese Tracking

You Only Cut Once: Boosting Data Augmentation with a Single Cut

Generative Modeling for Multi-task Visual Learning

HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning

Parametric Visual Program Induction with Function Modularization

Deep Neural Network Fusion via Graph Matching with Applications to Model Ensemble and Federated Learning

VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix

Neural Implicit Dictionary Learning via Mixture-of-Expert Training

Time Is MattEr: Temporal Self-supervision for Video Transformers

Benchmarking and Analyzing Point Cloud Classification under Corruptions

Understanding The Robustness in Vision Transformers

Bayesian Nonparametric Learning for Point Processes with Spatial Homogeneity: A Spatial Analysis of NBA Shot Locations

On the Effects of Artificial Data Modification

Deep Squared Euclidean Approximation to the Levenshtein Distance for DNA Storage

How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating and Auditing Generative Models

Error-driven Input Modulation: Solving the Credit Assignment Problem without a Backward Pass

How to Train Your Wide Neural Network Without Backprop: An Input-Weight Alignment Perspective

Describing Differences between Text Distributions with Natural Language

Distinguishing rule- and exemplar-based generalization in learning systems

Burst-Dependent Plasticity and Dendritic Amplification Support Target-Based Learning and Hierarchical Imitation Learning

A Deep Learning Approach for the Segmentation of Electroencephalography Data in Eye Tracking Applications

Minimizing Control for Credit Assignment with Strong Feedback

Self-Supervised Models of Audio Effectively Explain Human Cortical Responses to Speech

Towards Scaling Difference Target Propagation by Learning Backprop Targets

Content Addressable Memory Without Catastrophic Forgetting by Heteroassociation with a Fixed Scaffold

Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them

ShiftAddNAS: Hardware-Inspired Search for More Accurate and Efficient Neural Networks

Provably Adversarially Robust Nearest Prototype Classifiers

Certifying Out-of-Domain Generalization for Blackbox Functions

Intriguing Properties of Input-Dependent Randomized Smoothing

To Smooth or Not? When Label Smoothing Meets Noisy Labels

Evaluating the Adversarial Robustness of Adaptive Test-time Defenses

On the Generalization Analysis of Adversarial Learning

Demystifying the Adversarial Robustness of Random Transformation Defenses

Double Sampling Randomized Smoothing

TPC: Transformation-Specific Smoothing for Point Cloud Models

Structural Entropy Guided Graph Hierarchical Pooling

Self-Supervised Representation Learning via Latent Graph Prediction

DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural Network for Traffic Flow Forecasting

Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets

Omni-Granular Ego-Semantic Propagation for Self-Supervised Graph Representation Learning

Analyzing and Mitigating Interference in Neural Architecture Search

Reverse Engineering

$\ell_p$ attacks: A block-sparse optimization approach with recovery guarantees

Unified Scaling Laws for Routed Language Models

DRAGONN: Distributed Randomized Approximate Gradients of Neural Networks

A deep convolutional neural network that is invariant to time rescaling

LyaNet: A Lyapunov Framework for Training Neural ODEs

Transfer and Marginalize: Explaining Away Label Noise with Privileged Information

On Collective Robustness of Bagging Against Data Poisoning

Hindering Adversarial Attacks with Implicit Neural Representations

From Noisy Prediction to True Label: Noisy Prediction Calibration via Generative Model

Exploring and Exploiting Hubness Priors for High-Quality GAN Latent Sampling

ButterflyFlow: Building Invertible Layers with Butterfly Matrices

Controlling Conditional Language Models without Catastrophic Forgetting

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Structure-preserving GANs

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models

Equivariant Diffusion for Molecule Generation in 3D

Forward Operator Estimation in Generative Models with Kernel Transfer Operators

Conditional GANs with Auxiliary Discriminative Classifier

Improved StyleGAN-v2 based Inversion for Out-of-Distribution Images

Matching Normalizing Flows and Probability Paths on Manifolds

Marginal Distribution Adaptation for Discrete Sets via Module-Oriented Divergence Minimization

Learning to Incorporate Texture Saliency Adaptive Attention to Image Cartoonization

Region-Based Semantic Factorization in GANs

Online Continual Learning through Mutual Information Maximization

Learning Iterative Reasoning through Energy Minimization

DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks

PoF: Post-Training of Feature Extractor for Improving Generalization

Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation

Set Based Stochastic Subsampling

Monarch: Expressive Structured Matrices for Efficient and Accurate Training

Generalizing to New Physical Systems via Context-Informed Dynamics Model

Self-conditioning Pre-Trained Language Models

TAM: Topology-Aware Margin Loss for Class-Imbalanced Node Classification

Bitwidth Heterogeneous Federated Learning with Progressive Weight Dequantization

Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning

Knowledge Base Question Answering by Case-based Reasoning over Subgraphs

When AUC meets DRO: Optimizing Partial AUC for Deep Learning with Non-Convex Convergence Guarantee

pathGCN: Learning General Graph Spatial Operators from Paths

Graph-Coupled Oscillator Networks

HousE: Knowledge Graph Embedding with Householder Parameterization

Interpretable and Generalizable Graph Learning via Stochastic Attention Mechanism

ProGCL: Rethinking Hard Negative Mining in Graph Contrastive Learning

$^2$ CN: Graph Gaussian Convolution Networks with Concentrated Graph Filters

SpeqNets: Sparsity-aware permutation-equivariant graph networks

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

Position Prediction as an Effective Pretraining Strategy

Orchestra: Unsupervised Federated Learning via Globally Consistent Clustering

Deep and Flexible Graph Neural Architecture Search

GNNRank: Learning Global Rankings from Pairwise Comparisons via Directed Graph Neural Networks

Large-Scale Graph Neural Architecture Search

Optimization-Induced Graph Implicit Nonlinear Diffusion

Prototype Based Classification from Hierarchy to Fairness

Neural-Symbolic Models for Logical Queries on Knowledge Graphs

Deep Probability Estimation

Uncertainty Modeling in Generative Compressed Sensing

Going Deeper into Permutation-Sensitive Graph Neural Networks

Learning from Counterfactual Links for Link Prediction

Training Discrete Deep Generative Models via Gapped Straight-Through Estimator

Correct-N-Contrast: a Contrastive Approach for Improving Robustness to Spurious Correlations

Principal Component Flows

Bit Prioritization in Variational Autoencoders via Progressive Coding

Generative Flow Networks for Discrete Probabilistic Modeling

Diffusion bridges vector quantized variational autoencoders

Mitigating Modality Collapse in Multimodal VAEs via Impartial Optimization

Soft Truncation: A Universal Training Technique of Score-based Diffusion Model for High Precision Score Estimation

Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin Attack

Modeling Irregular Time Series with Continuous Recurrent Units

TACTiS: Transformer-Attentional Copulas for Time Series

CerDEQ: Certifiable Deep Equilibrium Model

Approximately Equivariant Networks for Imperfectly Symmetric Dynamics

IDYNO: Learning Nonparametric DAGs from Interventional Dynamic Data

GSmooth: Certified Robustness against Semantic Transformations via Generalized Randomized Smoothing

Neural Laplace: Learning diverse classes of differential equations in the Laplace domain

Improving Language Models by Retrieving from Trillions of Tokens

Closed-Form Diffeomorphic Transformations for Time Series Alignment

Removing Batch Normalization Boosts Adversarial Training

Forget-free Continual Learning with Winning Subnetworks

FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting

Adversarial Robustness against Multiple and Single

$l_p$ -Threat Models via Quick Fine-Tuning of Robust Classifiers

On the Practicality of Deterministic Epistemic Uncertainty

Combining Diverse Feature Priors

Multi-Task Learning as a Bargaining Game

Frustratingly Easy Transferability Estimation

Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling

A Difference Standardization Method for Mutual Transfer Learning

Improving Task-free Continual Learning by Distributionally Robust Memory Evolution

A Multi-objective / Multi-task Learning Framework Induced by Pareto Stationarity

Sparse Invariant Risk Minimization

Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning

A Closer Look at Smoothness in Domain Adversarial Training

Balancing Discriminability and Transferability for Source-Free Domain Adaptation

Model Agnostic Sample Reweighting for Out-of-Distribution Learning

Zero-shot AutoML with Pretrained Models

Efficient Variance Reduction for Meta-learning

Generalizing to Evolving Domains with Latent Structure-Aware Sequential Autoencoder

Partial disentanglement for domain adaptation

An iterative clustering algorithm for the Contextual Stochastic Block Model with optimality guarantees

Smoothed Adaptive Weighting for Imbalanced Semi-Supervised Learning: Improve Reliability Against Unknown Distribution Data

Class-Imbalanced Semi-Supervised Learning with Adaptive Thresholding

Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders

Meta-Learning Hypothesis Spaces for Sequential Decision-making

A Tighter Analysis of Spectral Clustering, and Beyond

Online Active Regression

On Finite-Sample Identifiability of Contrastive Learning-Based Nonlinear Independent Component Analysis

Revisiting Contrastive Learning through the Lens of Neighborhood Component Analysis: an Integrated Framework

Open-Sampling: Exploring Out-of-Distribution data for Re-balancing Long-tailed datasets

Confidence Score for Source-Free Unsupervised Domain Adaptation

Gradient Based Clustering

Global Optimization of K-Center Clustering

Latent Outlier Exposure for Anomaly Detection with Contaminated Data

Coordinated Double Machine Learning

Exploiting Independent Instruments: Identification and Distribution Generalization

Partial Counterfactual Identification from Observational and Experimental Data

On Measuring Causal Contributions via do-interventions

The Role of Deconfounding in Meta-learning

CITRIS: Causal Identifiability from Temporal Intervened Sequences

Online Balanced Experimental Design

Minimum Cost Intervention Design for Causal Effect Identification

Causal structure-based root cause analysis of outliers

Instrumental Variable Regression with Confounder Balancing

Causal Transformer for Estimating Counterfactual Outcomes

Causal Inference Through the Structural Causal Marginal Problem

Functional Generalized Empirical Likelihood Estimation for Conditional Moment Restrictions

Matching Learned Causal Effects of Neural Networks with Domain Priors

Inferring Cause and Effect in the Presence of Heteroscedastic Noise

Exact Optimal Accelerated Complexity for Fixed-Point Iterations

Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions

NysADMM: faster composite convex optimization via low-rank approximation

FedNew: A Communication-Efficient and Privacy-Preserving Newton-Type Method for Federated Learning

Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers

Pairwise Conditional Gradients without Swap Steps and Sparser Kernel Herding

Continuous-Time Analysis of Accelerated Gradient Methods via Conservation Laws in Dilated Coordinate Systems

Only tails matter: Average-Case Universality and Robustness in the Convex Regime

Batch Greenkhorn Algorithm for Entropic-Regularized Multimarginal Optimal Transport: Linear Rate of Convergence and Iteration Complexity

Approximate Frank-Wolfe Algorithms over Graph-structured Support Sets

Neural Fisher Discriminant Analysis: Optimal Neural Network Embeddings in Polynomial Time

Active Sampling for Min-Max Fairness

Topology-Aware Network Pruning using Multi-stage Graph Embedding and Reinforcement Learning

Stochastic Reweighted Gradient Descent

Sharpened Quasi-Newton Methods: Faster Superlinear Rate and Larger Local Convergence Neighborhood

Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in Imaging

FedNL: Making Newton-Type Methods Applicable to Federated Learning

Solving Stackelberg Prediction Game with Least Squares Loss via Spherically Constrained Least Squares Reformulation

Dimension-free Complexity Bounds for High-order Nonconvex Finite-sum Optimization

Value Function based Difference-of-Convex Algorithm for Bilevel Hyperparameter Selection Problems

Probabilistic Bilevel Coreset Selection

Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs

On Implicit Bias in Overparameterized Bilevel Optimization

Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective

Convergence and Recovery Guarantees of the K-Subspaces Method for Subspace Clustering

Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the

$O(\epsilon^{-7/4})$ Complexity

Understanding the unstable convergence of gradient descent

Federated Minimax Optimization: Improved Convergence Analyses and Algorithms

Inductive Matrix Completion: No Bad Local Minima and a Fast Algorithm

FedNest: Federated Bilevel, Minimax, and Compositional Optimization

AdaGrad Avoids Saddle Points

Fast and Provable Nonconvex Tensor RPCA

On Convergence of Gradient Descent Ascent: A Tight Local Analysis

Convergence Rates of Non-Convex Stochastic Gradient Descent Under a Generic Lojasiewicz Condition and Local Smoothness

A Single-Loop Gradient Descent and Perturbed Ascent Algorithm for Nonconvex Functional Constrained Optimization

Anticorrelated Noise Injection for Improved Generalization

Tackling covariate shift with node-based Bayesian neural networks

Why the Rich Get Richer? On the Balancedness of Random Partition Models

A Completely Tuning-Free and Robust Approach to Sparse Precision Matrix Estimation

Markov Chain Monte Carlo for Continuous-Time Switching Dynamical Systems

Calibrated Learning to Defer with One-vs-All Classifiers

Tractable Uncertainty for Structure Learning

Path-Gradient Estimators for Continuous Normalizing Flows

Variational Feature Pyramid Networks

Additive Gaussian Processes Revisited

Probabilistic ODE Solutions in Millions of Dimensions

Adaptive Gaussian Process Change Point Detection

Volatility Based Kernels and Moving Average Means for Accurate Forecasting with Gaussian Processes

Fenrir: Physics-Enhanced Regression for Initial Value Problems

Variational nearest neighbor Gaussian process

Preconditioning for Scalable Gaussian Process Hyperparameter Optimization

Spectral Representation of Robustness Measures for Optimization Under Input Uncertainty

Bayesian Optimization under Stochastic Delayed Feedback

Bayesian Optimization for Distributionally Robust Chance-constrained Problem

Efficient Distributionally Robust Bayesian Optimization with Worst-case Sensitivity

Improved Convergence Rates for Sparse Approximation Methods in Kernel-Based Learning

Scalable First-Order Bayesian Optimization via Structured Automatic Differentiation

Scalable MCMC Sampling for Nonsymmetric Determinantal Point Processes

Robust SDE-Based Variational Formulations for Solving Linear PDEs via Deep Learning

Hessian-Free High-Resolution Nesterov Acceleration For Sampling

LSB: Local Self-Balancing MCMC in Discrete Spaces

A Langevin-like Sampler for Discrete Distributions

Scalable Spike-and-Slab

Nonparametric Involutive Markov Chain Monte Carlo

Continual Repeated Annealed Flow Transport Monte Carlo

Algorithms for the Communication of Samples

Low-Precision Stochastic Gradient Langevin Dynamics

Fast Relative Entropy Coding with A* coding

Accurate Quantization of Measures via Interacting Particle-based Optimization

Dynamic Regret of Online Markov Decision Processes

On the Impossibility of Learning to Cooperate with Adaptive Partner Strategies in Repeated Games

Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning

Provable Reinforcement Learning with a Short-Term Memory

Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer

Mirror Learning: A Unifying Framework of Policy Optimisation

Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP

Learning Infinite-horizon Average-reward Markov Decision Process with Constraints

A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning

Langevin Monte Carlo for Contextual Bandits

Prompting Decision Transformer for Few-Shot Policy Generalization

Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning

Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation

Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution

AnyMorph: Learning Transferable Polices By Inferring Agent Morphology

DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations

Stabilizing Off-Policy Deep Reinforcement Learning from Pixels

Influence-Augmented Local Simulators: a Scalable Solution for Fast Deep RL in Large Networked Systems

CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer

Offline RL Policies Should Be Trained to be Adaptive

Lyapunov Density Models: Constraining Distribution Shift in Learning-Based Control

PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration

Supervised Off-Policy Ranking

The Primacy Bias in Deep Reinforcement Learning

Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning

Model-Free Opponent Shaping

Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning

Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation

Disentangling Sources of Risk for Distributional Multi-Agent Reinforcement Learning

Scalable Deep Reinforcement Learning Algorithms for Mean Field Games

Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning

Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence

Self-Organized Polynomial-Time Coordination Graphs

Individual Reward Assisted Multi-Agent Reinforcement Learning

Generalized Beliefs for Cooperative AI

Greedy when Sure and Conservative when Uncertain about the Opponents

Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning

Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy

Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games

Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis

Differentially Private Approximate Quantiles

Fairness Interventions as (Dis)Incentives for Strategic Manipulation

Robust Models Are More Interpretable Because Attributions Look Normal

Sequential Covariate Shift Detection Using Classifier Two-Sample Tests

A Joint Exponential Mechanism For Differentially Private Top-

$k$

Transfer Learning In Differential Privacy's Hybrid-Model

Robust Kernel Density Estimation with Median-of-Means principle

Bounding Training Data Reconstruction in Private (Deep) Learning

Plug & Play Attacks: Towards Robust and Flexible Model Inversion Attacks

FriendlyCore: Practical Differentially Private Aggregation

ViT-NeT: Interpretable Vision Transformers with Neural Tree Decoder

Fishing for User Data in Large-Batch Federated Learning via Gradient Magnification

Public Data-Assisted Mirror Descent for Private Model Training

Low-Complexity Deep Convolutional Neural Networks on Fully Homomorphic Encryption Using Multiplexed Parallel Convolutions

Robin Hood and Matthew Effects: Differential Privacy Has Disparate Impact on Synthetic Data

Meaningfully debugging model mistakes using conceptual counterfactual explanations

Measuring the Effect of Training Data on Deep Learning Predictions via Randomized Experiments

Robust Counterfactual Explanations for Tree-Based Ensembles

A Rigorous Study of Integrated Gradients Method and Extensions to Internal Neuron Attributions

Estimating and Penalizing Induced Preference Shifts in Recommender Systems

Framework for Evaluating Faithfulness of Local Explanations

A Consistent and Efficient Evaluation Strategy for Attribution Methods

Training Characteristic Functions with Reinforcement Learning: XAI-methods play Connect Four

Label-Descriptive Patterns and Their Application to Characterizing Classification Errors

XAI for Transformers: Better Explanations through Conservative Propagation

Quantification and Analysis of Layer-wise and Pixel-wise Information Discarding

Interpretable Off-Policy Learning via Hyperbox Search

Neuron Dependency Graphs: A Causal Abstraction of Neural Networks

On the Adversarial Robustness of Causal Algorithmic Recourse

Knowledge-Grounded Self-Rationalization via Extractive and Natural Language Explanations

POEM: Out-of-Distribution Detection with Posterior Sampling

Selective Network Linearization for Efficient Private Inference

Efficient Computation of Higher-Order Subgraph Attribution via Message Passing

A Theoretical Analysis on Independence-driven Importance Weighting for Covariate-shift Generalization

Modular Conformal Calibration

Rethinking Image-Scaling Attacks: The Interplay Between Vulnerabilities in Machine Learning Systems

Context-Aware Drift Detection

Accelerating Shapley Explanation via Contributive Cooperator Selection

An Equivalence Between Data Poisoning and Byzantine Gradient Attacks

DAVINZ: Data Valuation using Deep Neural Networks at Initialization

Sample Efficient Learning of Predictors that Complement Humans

Online Learning for Min Sum Set Cover and Pandora’s Box

Smoothed Adversarial Linear Contextual Bandits with Knapsacks

Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback

Thompson Sampling for (Combinatorial) Pure Exploration

Revisiting Online Submodular Minimization: Gap-Dependent Regret Bounds, Best of Both Worlds and Adversarial Robustness

Rotting Infinitely Many-Armed Bandits

Batched Dueling Bandits

Equivalence Analysis between Counterfactual Regret Minimization and Online Mirror Descent

Consistent Polyhedral Surrogates for Top-k Classification and Variants

Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models

Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits

Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback

Learning Mixtures of Linear Dynamical Systems

Massively Parallel

$k$ -Means Clustering for Perturbation Resilient Instances

Residual-Based Sampling for Online Outlier-Robust PCA

Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times

Streaming Algorithms for Support-Aware Histograms

Power-Law Escape Rate of SGD

Generalized Results for the Existence and Consistency of the MLE in the Bradley-Terry-Luce Model

Faster Algorithms for Learning Convex Functions

Feature selection using e-values

ActiveHedge: Hedge meets Active Learning

One-Pass Algorithms for MAP Inference of Nonsymmetric Determinantal Point Processes

Deciphering Lasso-based Classification Through a Large Dimensional Analysis of the Iterative Soft-Thresholding Algorithm

Robustness Implies Generalization via Data-Dependent Generalization Bounds

Learning to Hash Robustly, Guaranteed

Policy Gradient Method For Robust Reinforcement Learning

A query-optimal algorithm for finding counterfactuals

Linear Bandit Algorithms with Sublinear Time Complexity

Quantum-Inspired Algorithms from Randomized Numerical Linear Algebra

Individual Preference Stability for Clustering

Correlated Quantization for Distributed Mean Estimation and Optimization

Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms

Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms

The Algebraic Path Problem for Graph Metrics

Steerable 3D Spherical Neurons

H-Consistency Bounds for Surrogate Loss Minimizers

Learning General Halfspaces with Adversarial Label Noise via Online Gradient Descent

The Teaching Dimension of Regularized Kernel Learners

Sparse Mixed Linear Regression with Guarantees: Taming an Intractable Problem with Invex Relaxation

TURF: Two-Factor, Universal, Robust, Fast Distribution Learning Algorithm

Multiclass learning with margin: exponential rates with no bias-variance trade-off

Refined Convergence Rates for Maximum Likelihood Estimation under Finite Mixture Models

High Probability Guarantees for Nonconvex Stochastic Gradient Descent with Heavy Tails

An Initial Alignment between Neural Network and Target is Needed for Gradient Descent to Learn

Inductive Biases and Variable Creation in Self-Attention Mechanisms

Topology-aware Generalization of Decentralized SGD

Understanding Gradient Descent on the Edge of Stability in Deep Learning

Cooperative Online Learning in Stochastic and Adversarial MDPs

Simple and near-optimal algorithms for hidden stratification and multi-group learning

Being Properly Improper

Neural Network Pruning Denoises the Features and Makes Local Connectivity Emerge in Visual Tasks

On the Finite-Time Complexity and Practical Computation of Approximate Stationarity Concepts of Lipschitz Functions

Nearly Optimal Policy Optimization with Stable at Any Time Guarantee

Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces

Minimax M-estimation under Adversarial Contamination

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

Efficiently Learning the Topology and Behavior of a Networked Dynamical System Via Active Queries

Boosting Graph Structure Learning with Dummy Nodes

Lazy Estimation of Variable Importance for Large Neural Networks

(ends 5:30 PM)

4 p.m.

WED 20 JUL

3:30 a.m.

Break:

Breakfast on your own

(ends 3:45 AM)

4 a.m.

Registration Check-in Desk

(ends 4:00 PM)

6 a.m.

Invited Talk:

Solving the Right Problems: Making ML Models Relevant to Healthcare and the Life Sciences

Regina Barzilay

(ends 7:00 AM)

7 a.m.

Coffee Break:

Coffee Break

(ends 7:30 AM)

7:30 a.m.

Deep Learning [7:30-9:00]

Spotlights 7:30-8:05

[7:30] Towards understanding how momentum improves generalization in deep learning

[7:35] What Can Linear Interpolation of Neural Network Loss Landscapes Tell Us?

[7:40] Deep equilibrium networks are sensitive to initialization statistics

[7:45] Scaling-up Diverse Orthogonal Convolutional Networks by a Paraunitary Framework

[7:50] Stability Based Generalization Bounds for Exponential Family Langevin Dynamics

[7:55] Local Augmentation for Graph Neural Networks

[8:00] On Non-local Convergence Analysis of Deep Linear Networks

Orals 8:05-8:25

[8:05] Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum

Spotlights 8:25-9:00

[8:25] Diversified Adversarial Attacks based on Conjugate Gradient Method

[8:30] On the Optimization Landscape of Neural Collapse under MSE Loss: Global Optimality with Unconstrained Features

[8:35] On the Equivalence Between Temporal and Static Equivariant Graph Representations

[8:40] Robust Training under Label Noise by Over-parameterization

[8:45] Implicit Bias of the Step Size in Linear Diagonal Neural Networks

[8:50] Extended Unconstrained Features Model for Exploring Deep Neural Collapse

[8:55] Score-Guided Intermediate Level Optimization: Fast Langevin Mixing for Inverse Problems

(ends 9:00 AM)

MISC: General Machine Learning Techniques [7:30-9:00]

Spotlights 7:30-8:05

[7:30] Weisfeiler-Lehman Meets Gromov-Wasserstein

[7:35] GenLabel: Mixup Relabeling using Generative Models

[7:40] When and How Mixup Improves Calibration

[7:45] On Transportation of Mini-batches: A Hierarchical Approach

[7:50] VariGrow: Variational Architecture Growing for Task-Agnostic Continual Learning based on Bayesian Novelty

[7:55] Beyond Images: Label Noise Transition Matrix Estimation for Tasks with Lower-Quality Features

[8:00] A Model-Agnostic Randomized Learning Framework based on Random Hypothesis Subspace Sampling

Orals 8:05-8:25

[8:05] Stable Conformal Prediction Sets

Spotlights 8:25-9:00

[8:25] Rethinking Fano’s Inequality in Ensemble Learning

[8:30] FITNESS: (Fine Tune on New and Similar Samples) to detect anomalies in streams with drift and outliers

[8:35] Improving Mini-batch Optimal Transport via Partial Transportation

[8:40] Near-optimal rate of consistency for linear models with missing values

[8:45] Permutation Search of Tensor Network Structures via Local Sampling

[8:50] Revisiting Label Smoothing and Knowledge Distillation Compatibility: What was Missing?

[8:55] DNNR: Differential Nearest Neighbors Regression

(ends 9:00 AM)

T: Learning Theory/Domain Adaptation [7:30-9:00]

Spotlights 7:30-8:00

[7:30] Learning Domain Adaptive Object Detection with Probabilistic Teacher

[7:35] Adaptive Data Analysis with Correlated Observations

[7:40] Efficient PAC Learning from the Crowd with Pairwise Comparisons

[7:45] On the Statistical Benefits of Curriculum Learning

[7:50] Feature and Parameter Selection in Stochastic Linear Bandits

[7:55] Disentangled Federated Learning for Tackling Attributes Skew via Invariant Aggregation and Diversity Transferring

Orals 8:00-8:20

[8:00] A new similarity measure for covariate shift with applications to nonparametric regression

Spotlights 8:20-9:00

[8:20] Contextual Bandits with Large Action Spaces: Made Practical

[8:25] Identifiability Conditions for Domain Adaptation

[8:30] Streaming Algorithms for High-Dimensional Robust Statistics

[8:35] Popular decision tree algorithms are provably noise tolerant

[8:40] Understanding and Improving Knowledge Graph Embedding for Entity Alignment

[8:45] Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning

[8:50] Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees

[8:55] Understanding Gradual Domain Adaptation: Improved Analysis, Optimal Path and Beyond

(ends 9:00 AM)

Applications [7:30-9:00]

Spotlights 7:30-8:05

[7:30] Skin Deep Unlearning: Artefact and Instrument Debiasing in the Context of Melanoma Classification

[7:35] One-Pass Diversified Sampling with Application to Terabyte-Scale Genomic Sequence Streams

[7:40] Unsupervised Flow-Aligned Sequence-to-Sequence Learning for Video Restoration

[7:45] ME-GAN: Learning Panoptic Electrocardio Representations for Multi-view ECG Synthesis Conditioned on Heart Diseases

[7:50] Variational Mixtures of ODEs for Inferring Cellular Gene Expression Dynamics

[7:55] Bayesian Imitation Learning for End-to-End Mobile Manipulation

[8:00] De novo mass spectrometry peptide sequencing with a transformer model

Orals 8:05-8:25

[8:05] Learning inverse folding from millions of predicted structures

Spotlights 8:25-9:00

[8:25] Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance

[8:30] MAE-DET: Revisiting Maximum Entropy Principle in Zero-Shot NAS for Efficient Object Detection

[8:35] Proximal Exploration for Model-guided Protein Sequence Design

[8:40] Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-time Retrieval

[8:45] How to Fill the Optimum Set? Population Gradient Descent with Harmless Diversity

[8:50] Examining Scaling and Transfer of Language Model Architectures for Machine Translation

[8:55] State Transition of Dendritic Spines Improves Learning of Sparse Spiking Neural Networks

(ends 9:00 AM)

PM: Variational Inference/Bayesian Models and Methods [7:30-9:00]

Orals 7:30-7:50

[7:30] How Tempering Fixes Data Augmentation in Bayesian Neural Networks

Spotlights 7:50-8:15

[7:50] Surrogate Likelihoods for Variational Annealed Importance Sampling

[7:55] Nonparametric Sparse Tensor Factorization with Hierarchical Gamma Processes

[8:00] Fat–Tailed Variational Inference with Anisotropic Tail Adaptive Flows

[8:05] Variational Sparse Coding with Learned Thresholding

[8:10] Structured Stochastic Gradient MCMC

Orals 8:15-8:35

[8:15] BAMDT: Bayesian Additive Semi-Multivariate Decision Trees for Nonparametric Regression

Spotlights 8:35-8:50

[8:35] Variational Inference with Locally Enhanced Bounds for Hierarchical Models

[8:40] Centroid Approximation for Bootstrap: Improving Particle Quality at Inference

[8:45] Deep Reference Priors: What is the best way to pretrain a model?

(ends 9:00 AM)

Reinforcement Learning: Deep RL [7:30-9:00]

Spotlights 7:30-8:05

[7:30] Modeling Strong and Human-Like Gameplay with KL-Regularized Search

[7:35] Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget Matters

[7:40] Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning

[7:45] Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models and Amortized Policy Search

[7:50] Generalized Data Distribution Iteration

[7:55] Optimizing Tensor Network Contraction Using Reinforcement Learning

[8:00] History Compression via Language Models in Reinforcement Learning

Orals 8:05-8:25

[8:05] REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy Transfer

Spotlights 8:25-9:00

[8:25] LeNSE: Learning To Navigate Subgraph Embeddings for Large-Scale Combinatorial Optimisation

[8:30] Efficient Learning for AlphaZero via Path Consistency

[8:35] A data-driven approach for learning to control computers

[8:40] Zero-Shot Reward Specification via Grounded Natural Language

[8:45] How to Stay Curious while avoiding Noisy TVs using Aleatoric Uncertainty Estimation

[8:50] Model-Value Inconsistency as a Signal for Epistemic Uncertainty

[8:55] Improving Policy Optimization with Generalist-Specialist Learning

(ends 9:00 AM)

DL: Theory [7:30-9:00]

Spotlights 7:30-8:05

[7:30] On Numerical Integration in Neural Ordinary Differential Equations

[7:35] Reverse Engineering the Neural Tangent Kernel

[7:40] Principled Knowledge Extrapolation with GANs

[7:45] Informed Learning by Wide Neural Networks: Convergence, Generalization and Sampling Complexity

[7:50] Data Augmentation as Feature Manipulation

[7:55] Convolutional and Residual Networks Provably Contain Lottery Tickets

[8:00] Feature Learning and Signal Propagation in Deep Neural Networks

Orals 8:05-8:25

[8:05] Robust Training of Neural Networks Using Scale Invariant Architectures

Spotlights 8:25-9:00

[8:25] Understanding Contrastive Learning Requires Incorporating Inductive Biases

[8:30] Implicit Regularization with Polynomial Growth in Deep Tensor Factorization

[8:35] Deep Network Approximation in Terms of Intrinsic Parameters

[8:40] Coin Flipping Neural Networks

[8:45] Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint

[8:50] More Than a Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize

[8:55] SE(3) Equivariant Graph Neural Networks with Complete Local Frames

(ends 9:00 AM)

Social Aspects/MISC [7:30-9:00]

Spotlights 7:30-8:05

[7:30] Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings

[7:35] Label-Free Explainability for Unsupervised Models

[7:40] Towards Theoretical Analysis of Transformation Complexity of ReLU DNNs

[7:45] A Study of Face Obfuscation in ImageNet

[7:50] Fair Representation Learning through Implicit Path Alignment

[7:55] Mitigating Neural Network Overconfidence with Logit Normalization

[8:00] Learning fair representation with a parametric integral probability metric

Orals 8:05-8:25

[8:05] Privacy for Free: How does Dataset Condensation Help Privacy?

Spotlights 8:25-9:00

[8:25] Fair Generalized Linear Models with a Convex Penalty

[8:30] HyperPrompt: Prompt-based Task-Conditioning of Transformers

[8:35] Validating Causal Inference Methods

[8:40] The Multivariate Community Hawkes Model for Dependent Relational Events in Continuous-time Networks

[8:45] Scalable Deep Gaussian Markov Random Fields for General Graphs

[8:50] Anytime Information Cascade Popularity Prediction via Self-Exciting Processes

[8:55] Deep Variational Graph Convolutional Recurrent Network for Multivariate Time Series Anomaly Detection

(ends 9:00 AM)

OPT: First Order [7:30-9:00]

Orals 7:30-7:50

[7:30] Adapting to Mixing Time in Stochastic Optimization with Markovian Data

Spotlights 7:50-8:15

[7:50] Fast Composite Optimization and Statistical Recovery in Federated Learning

[7:55] Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity

[8:00] Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning

[8:05] Optimal Algorithms for Stochastic Multi-Level Compositional Optimization

[8:10] Finite-Sum Coupled Compositional Stochastic Optimization: Theory and Applications

Orals 8:15-8:35

[8:15] Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent

Spotlights 8:35-9:00

[8:35] Statistical inference with implicit SGD: proximal Robbins-Monro vs. Polyak-Ruppert

[8:40] ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!

[8:45] Communication-Efficient Adaptive Federated Learning

[8:50] RECAPP: Crafting a More Efficient Catalyst for Convex Optimization

[8:55] Kill a Bird with Two Stones: Closing the Convergence Gaps in Non-Strongly Convex Optimization by Directly Accelerated SVRG with Double Compensation and Snapshots

(ends 9:00 AM)

T: Game Theory/RL/Planning [7:30-9:00]

Orals 7:30-7:50

[7:30] A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes

Spotlights 7:50-8:10

[7:50] The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces

[7:55] Extracting Latent State Representations with Linear Dynamics from Rich Observations

[8:00] For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria

[8:05] Consensus Multiplicative Weights Update: Learning to Learn using Projector-based Game Signatures

Orals 8:10-8:30

[8:10] Learning Markov Games with Adversarial Opponents: Efficient Algorithms and Fundamental Limits

Spotlights 8:30-8:55

[8:30] Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic Responses

[8:35] Learning to Infer Structures of Network Games

[8:40] Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

[8:45] Near-Optimal Learning of Extensive-Form Games with Imperfect Information

[8:50] Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation

(ends 9:00 AM)

8 a.m.

9 a.m.

Break:

Lunch Break On Your Own

(ends 10:30 AM)

10:15 a.m.

Deep Learning/APP:Computer Vision [10:15-11:45]

Spotlights 10:15-10:50

[10:15] From data to functa: Your data point is a function and you can treat it like one

[10:20] DisPFL: Towards Communication-Efficient Personalized Federated Learning via Decentralized Sparse Training

[10:25] Differentiable Top-k Classification Learning

[10:30] Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks

[10:35] Characterizing and Overcoming the Greedy Nature of Learning in Multi-modal Deep Neural Networks

[10:40] Training Your Sparse Neural Network Better with Any Mask

[10:45] Federated Learning with Positive and Unlabeled Data

Orals 10:50-11:10

[10:50] Generating 3D Molecules for Target Protein Binding

Spotlights 11:10-11:45

[11:10] Sparse Double Descent: Where Network Pruning Aggravates Overfitting

[11:15] Collaboration of Experts: Achieving 80% Top-1 Accuracy on ImageNet with 100M FLOPs

[11:20] Revisiting Consistency Regularization for Deep Partial Label Learning

[11:25] Stochastic smoothing of the top-K calibrated hinge loss for deep imbalanced classification

[11:30] A Unified Weight Initialization Paradigm for Tensorial Convolutional Neural Networks

[11:35] PLATINUM: Semi-Supervised Model Agnostic Meta-Learning using Submodular Mutual Information

[11:40] Multicoated Supermasks Enhance Hidden Networks

(ends 11:45 AM)

Theory [10:15-11:45]

Spotlights 10:15-10:50

[10:15] Choosing Answers in Epsilon-Best-Answer Identification for Linear Bandits

[10:20] On the Finite-Time Performance of the Knowledge Gradient Algorithm

[10:25] Expression might be enough: representing pressure and demand for reinforcement learning based traffic signal control

[10:30] Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers

[10:35] No-Regret Learning in Time-Varying Zero-Sum Games

[10:40] Achieving Minimax Rates in Pool-Based Batch Active Learning

[10:45] Active Multi-Task Representation Learning

Orals 10:50-11:10

[10:50] Active fairness auditing

Spotlights 11:10-11:45

[11:10] Metric-Fair Active Learning

[11:15] Metric-Fair Classifier Derandomization

[11:20] Interactively Learning Preference Constraints in Linear Bandits

[11:25] Convergence of Uncertainty Sampling for Active Learning

[11:30] Thompson Sampling for Robust Transfer in Multi-Task Bandits

[11:35] Constants Matter: The Performance Gains of Active Learning

[11:40] Cross-Space Active Learning on Graph Convolutional Networks

(ends 11:45 AM)

APP: Chemistry and Drug Discovery [10:15-11:45]

Spotlights 10:15-10:45

[10:15] MemSR: Training Memory-efficient Lightweight Model for Image Super-Resolution

[10:20] PINs: Progressive Implicit Networks for Multi-Scale Neural Representations

[10:25] Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders

[10:30] Generative Coarse-Graining of Molecular Conformations

[10:35] LIMO: Latent Inceptionism for Targeted Molecule Generation

[10:40] Learning to Separate Voices by Spatial Regions

Orals 10:45-11:05

[10:45] 3DLinker: An E(3) Equivariant Variational Autoencoder for Molecular Linker Design

Spotlights 11:05-11:40

[11:05] 3D Infomax improves GNNs for Molecular Property Prediction

[11:10] Biological Sequence Design with GFlowNets

[11:15] Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets

[11:20] Retroformer: Pushing the Limits of End-to-end Retrosynthesis Transformer

[11:25] Constrained Optimization with Dynamic Bound-scaling for Effective NLP Backdoor Defense

[11:30] Path-Aware and Structure-Preserving Generation of Synthetically Accessible Molecules

[11:35] EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction

(ends 11:45 AM)

MISC: Representation Learning/Causality [10:15-11:45]

Spotlights 10:15-10:45

[10:15] Decomposing Temporal High-Order Interactions via Latent ODEs

[10:20] Log-Euclidean Signatures for Intrinsic Distances Between Unaligned Datasets

[10:25] DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck

[10:30] End-to-End Balancing for Causal Continuous Treatment-Effect Estimation

[10:35] Role-based Multiplex Network Embedding

[10:40] Measure Estimation in the Barycentric Coding Model

Orals 10:45-11:05

[10:45] RieszNet and ForestRiesz: Automatic Debiased Machine Learning with Neural Nets and Random Forests

Spotlights 11:05-11:35

[11:05] Counterfactual Transportability: A Formal Approach

[11:10] Identification of Linear Non-Gaussian Latent Hierarchical Structure

[11:15] COAT: Measuring Object Compositionality in Emergent Representations

[11:20] Generalization and Robustness Implications in Object-Centric Learning

[11:25] NAFS: A Simple yet Tough-to-beat Baseline for Graph Representation Learning

[11:30] Action-Sufficient State Representation Learning for Control with Structural Constraints

(ends 11:45 AM)

PM: Bayesian Models and Methods [10:15-11:45]

Orals 10:15-10:35

[10:15] Bayesian Continuous-Time Tucker Decomposition

Spotlights 10:35-11:00

[10:35] Approximate Bayesian Computation with Domain Expert in the Loop

[10:40] Discrete Probabilistic Inverse Optimal Transport

[10:45] Easy Variational Inference for Categorical Models via an Independent Binary Approximation

[10:50] Streaming Inference for Infinite Feature Models

[10:55] Optimizing Sequential Experimental Design with Deep Reinforcement Learning

Orals 11:00-11:20

[11:00] Function-space Inference with Sparse Implicit Processes

Spotlights 11:20-11:45

[11:20] Variational Inference for Infinitely Deep Neural Networks

[11:25] Personalized Federated Learning via Variational Bayesian Inference

[11:30] Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling

[11:35] Bayesian Deep Embedding Topic Meta-Learner

[11:40] Efficient Approximate Inference for Stationary Kernel on Frequency Domain

(ends 11:45 AM)

Reinforcement Learning [10:15-11:45]

Spotlights 10:15-10:45

[10:15] Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning

[10:20] Analysis of Stochastic Processes through Replay Buffers

[10:25] Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning

[10:30] Communicating via Markov Decision Processes

[10:35] PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation

[10:40] DNS: Determinantal Point Process Based Neural Network Sampler for Ensemble Reinforcement Learning

Orals 10:45-11:05

[10:45] Planning with Diffusion for Flexible Behavior Synthesis

Spotlights 11:05-11:40

[11:05] A Temporal-Difference Approach to Policy Gradient Estimation

[11:10] MASER: Multi-Agent Reinforcement Learning with Subgoals Generated from Experience Replay Buffer

[11:15] Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency

[11:20] Actor-Critic based Improper Reinforcement Learning

[11:25] On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs

[11:30] The Geometry of Robust Value Functions

[11:35] Denoised MDPs: Learning World Models Better Than the World Itself

(ends 11:45 AM)

SA: Trustworthy Machine Learning [10:15-11:45]

Orals 10:15-10:35

[10:15] Tight and Robust Private Mean Estimation with Few Users

Spotlights 10:35-11:00

[10:35] QSFL: A Two-Level Uplink Communication Optimization Framework for Federated Learning

[10:40] Robustness and Accuracy Could Be Reconcilable by (Proper) Definition

[10:45] Sanity Simulations for Saliency Methods

[10:50] Out-of-Distribution Detection with Deep Nearest Neighbors

[10:55] Differentially Private Maximal Information Coefficients

Orals 11:00-11:20

[11:00] Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data

Spotlights 11:20-11:45

[11:20] On the Difficulty of Defending Self-Supervised Learning against Model Extraction

[11:25] Adversarial Attack and Defense for Non-Parametric Two-Sample Tests

[11:30] Certified Adversarial Robustness Under the Bounded Support Set

[11:35] Predicting Out-of-Distribution Error with the Projection Norm

[11:40] Adversarially Robust Models may not Transfer Better: Sufficient Conditions for Domain Transferability from the View of Regularization

(ends 11:45 AM)

DL: Robustness [10:15-11:45]

Spotlights 10:15-10:50

[10:15] Generating Distributional Adversarial Examples to Evade Statistical Detectors

[10:20] Improving Out-of-Distribution Robustness via Selective Augmentation

[10:25] Modeling Adversarial Noise for Adversarial Training

[10:30] Improving Adversarial Robustness via Mutual Information Estimation

[10:35] FOCUS: Familiar Objects in Common and Uncommon Settings

[10:40] Query-Efficient and Scalable Black-Box Adversarial Attacks on Discrete Sequential Data via Bayesian Optimization

[10:45] Test-Time Training Can Close the Natural Distribution Shift Performance Gap in Deep Learning Based Compressed Sensing

Orals 10:50-11:10

[10:50] A Dynamical System Perspective for Lipschitz Neural Networks

Spotlights 11:10-11:45

[11:10] Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)

[11:15] Neurotoxin: Durable Backdoors in Federated Learning

[11:20] Bayesian Learning with Information Gain Provably Bounds Risk for a Robust Adversarial Defense

[11:25] Maximum Likelihood Training for Score-based Diffusion ODEs by High Order Denoising Score Matching

[11:30] Fast Lossless Neural Compression with Integer-Only Discrete Flows

[11:35] SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

[11:40] SCHA-VAE: Hierarchical Context Aggregation for Few-Shot Generation

(ends 11:45 AM)

T: Online Learning and Bandits/Learning Theory [10:15-11:45]

Orals 10:15-10:35

[10:15] Generative Trees: Adversarial and Copycat

Spotlights 10:35-11:00

[10:35] A Resilient Distributed Boosting Algorithm

[10:40] Online Learning and Pricing with Reusable Resources: Linear Bandits with Sub-Exponential Rewards

[10:45] On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation

[10:50] Congested Bandits: Optimal Routing via Short-term Resets

[10:55] Stochastic Rising Bandits

Orals 11:00-11:20

[11:00] Agnostic Learnability of Halfspaces via Logistic Loss

Spotlights 11:20-11:45

[11:20] Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics for Convex Losses in High-Dimension

[11:25] PDE-Based Optimal Strategy for Unconstrained Online Learning

[11:30] Provable Acceleration of Heavy Ball beyond Quadratics for a Class of Polyak-Lojasiewicz Functions when the Non-Convexity is Averaged-Out

[11:35] On Learning Mixture of Linear Regressions in the Non-Realizable Setting

[11:40] Random Forest Density Estimation

(ends 11:45 AM)

11:45 a.m.

Break:

Coffee Break

(ends 12:15 PM)

12:15 p.m.

Invited Talk:

Synthetic Control Methods and Difference-In-Differences

Guido Imbens

(ends 1:15 PM)

1 p.m.

1:15 p.m.

Break:

Short Break

(ends 1:30 PM)

1:30 p.m.

Deep Learning [1:30-3:00]

Spotlights 1:30-2:05

[1:30]

$p$ -Laplacian Based Graph Neural Networks

[1:35] Equivariant Quantum Graph Circuits

[1:40] A Theoretical Comparison of Graph Neural Network Extensions

[1:45] Variational On-the-Fly Personalization

[1:50] Deep symbolic regression for recurrence prediction

[1:55] Geometric Multimodal Contrastive Representation Learning

[2:00] Universality of Winning Tickets: A Renormalization Group Perspective

Orals 2:05-2:25

[2:05] Partial and Asymmetric Contrastive Learning for Out-of-Distribution Detection in Long-Tailed Recognition

Spotlights 2:25-3:00

[2:25] Loss Function Learning for Domain Generalization by Implicit Gradient

[2:30] GraphFM: Improving Large-Scale GNN Training via Feature Momentum

[2:35] Generalization Guarantee of Training Graph Convolutional Networks with Graph Topology Sampling

[2:40] A Differential Entropy Estimator for Training Neural Networks

[2:45] Scaling Out-of-Distribution Detection for Real-World Settings

[2:50] Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations

[2:55] SPECTRE: Spectral Conditioning Helps to Overcome the Expressivity Limits of One-shot Graph Generators

(ends 3:00 PM)

Theory [1:30-3:00]

Spotlights 1:30-2:05

[1:30] The dynamics of representation learning in shallow, non-linear autoencoders

[1:35] Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

[1:40] Estimation in Rotationally Invariant Generalized Linear Models via Approximate Message Passing

[1:45] Failure and success of the spectral bias prediction for Laplace Kernel Ridge Regression: the case of low-dimensional data

[1:50] Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

[1:55] Universal Joint Approximation of Manifolds and Densities by Simple Injective Flows

[2:00] Bounding the Width of Neural Networks via Coupled Initialization - A Worst Case Analysis

Orals 2:05-2:25

[2:05] Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression

Spotlights 2:25-3:00

[2:25] The Neural Race Reduction: Dynamics of Abstraction in Gated Networks

[2:30] Efficient Learning of CNNs using Patch Based Features

[2:35] Neural Tangent Kernel Analysis of Deep Narrow Neural Networks

[2:40] Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)

[2:45] Fully-Connected Network on Noncompact Symmetric Space and Ridgelet Transform based on Helgason-Fourier Analysis

[2:50] Non-Vacuous Generalisation Bounds for Shallow Neural Networks

[2:55] Maslow's Hammer in Catastrophic Forgetting: Node Re-Use vs. Node Activation

(ends 3:00 PM)

Applications [1:30-3:00]

Spotlights 1:30-2:05

[1:30] SoQal: Selective Oracle Questioning for Consistency Based Active Learning of Cardiac Signals

[1:35] Matching Structure for Dual Learning

[1:40] BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

[1:45] YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone

[1:50] Inducing Causal Structure for Interpretable Neural Networks

[1:55] SDQ: Stochastic Differentiable Quantization with Mixed Precision

[2:00] IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages

Orals 2:05-2:25

[2:05] Re-evaluating Word Mover's Distance

Spotlights 2:25-3:00

[2:25] Translatotron 2: High-quality direct speech-to-speech translation with voice preservation

[2:30] Robust alignment of cross-session recordings of neural population activity by behaviour via unsupervised domain adaptation

[2:35] Symmetric Machine Theory of Mind

[2:40] PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance

[2:45] LCANets: Lateral Competition Improves Robustness Against Corruption and Attack

[2:50] Reconstructing Nonlinear Dynamical Systems from Multi-Modal Time Series

[2:55] Neural Language Models are not Born Equal to Fit Brain Data, but Training Helps

(ends 3:00 PM)

Reinforcement Learning [1:30-3:05]

Spotlights 1:30-2:05

[1:30] Greedy based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning

[1:35] Bayesian Nonparametrics for Offline Skill Discovery

[1:40] Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in the Mean-Field Regime

[1:45] Curriculum Reinforcement Learning via Constrained Optimal Transport

[1:50] Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPs

[1:55] Stabilizing Q-learning with Linear Architectures for Provable Efficient Learning

[2:00] Constrained Offline Policy Optimization

Orals 2:05-2:25

[2:05] Causal Dynamics Learning for Task-Independent State Abstraction

Spotlights 2:25-3:05

[2:25] Leveraging Approximate Symbolic Models for Reinforcement Learning via Skill Diversity

[2:30] Reinforcement Learning with Action-Free Pre-Training from Videos

[2:35] Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods

[2:40] Delayed Reinforcement Learning by Imitation

[2:45] Reachability Constrained Reinforcement Learning

[2:50] Adaptive Model Design for Markov Decision Process

[2:55] Goal Misgeneralization in Deep Reinforcement Learning

[3:00] Translating Robot Skills: Learning Unsupervised Skill Correspondences Across Robots

(ends 3:05 PM)

DL: Algorithms [1:30-3:00]

Spotlights 1:30-2:05

[1:30] The Infinite Contextual Graph Markov Model

[1:35] RankSim: Ranking Similarity Regularization for Deep Imbalanced Regression

[1:40] Detached Error Feedback for Distributed SGD with Random Sparsification

[1:45] Training OOD Detectors in their Natural Habitats

[1:50] Constrained Gradient Descent: A Powerful and Principled Evasion Attack Against Neural Networks

[1:55] Neural Tangent Kernel Empowered Federated Learning

[2:00] Probabilistically Robust Learning: Balancing Average- and Worst-case Performance

Orals 2:05-2:25

[2:05] Adversarially trained neural representations are already as robust as biological neural representations

Spotlights 2:25-3:00

[2:25] Feature Space Particle Inference for Neural Network Ensembles

[2:30] A Study on the Ramanujan Graph Property of Winning Lottery Tickets

[2:35] PAC-Net: A Model Pruning Approach to Inductive Transfer Learning

[2:40] EDEN: Communication-Efficient and Robust Distributed Mean Estimation for Federated Learning

[2:45] Fisher SAM: Information Geometry and Sharpness Aware Minimisation

[2:50] Deep Networks on Toroids: Removing Symmetries Reveals the Structure of Flat Regions in the Landscape Geometry

[2:55] Towards Understanding Sharpness-Aware Minimization

(ends 3:00 PM)

SA: Privacy-preserving Statistics and Machine Learning [1:30-3:00]

Spotlights 1:30-2:05

[1:30] Improved Regret for Differentially Private Exploration in Linear MDP

[1:35] Differentially Private Community Detection for Stochastic Block Models

[1:40] Understanding Clipping for Federated Learning: Convergence and Client-Level Differential Privacy

[1:45] Hermite Polynomial Features for Private Data Generation

[1:50] How to Steer Your Adversary: Targeted and Efficient Model Stealing Defenses with Gradient Redirection

[1:55] Deduplicating Training Data Mitigates Privacy Risks in Language Models

[2:00] Private frequency estimation via projective geometry

Orals 2:05-2:25

[2:05] The Poisson Binomial Mechanism for Unbiased Federated Learning with Secure Aggregation

Spotlights 2:25-3:00

[2:25] Faster Privacy Accounting via Evolving Discretization

[2:30] The Fundamental Price of Secure Aggregation in Differentially Private Federated Learning

[2:35] Private Adaptive Optimization with Side information

[2:40] Secure Quantized Training for Deep Learning

[2:45] Private optimization in the interpolation regime: faster rates and hardness results

[2:50] Differentially Private Coordinate Descent for Composite Empirical Risk Minimization

[2:55] Private Streaming SCO in

$\ell_p$ geometry with Applications in High Dimensional Online Decision Making

(ends 3:00 PM)

Deep Learning/Optimization [1:30-3:00]

Spotlights 1:30-2:05

[1:30] Neural Tangent Kernel Beyond the Infinite-Width Limit: Effects of Depth and Initialization

[1:35] Implicit Bias of Linear Equivariant Networks

[1:40] The State of Sparse Training in Deep Reinforcement Learning

[1:45] Set Norm and Equivariant Skip Connections: Putting the Deep in Deep Sets

[1:50] Datamodels: Understanding Predictions with Data and Data with Predictions

[1:55] Revisiting and Advancing Fast Adversarial Training Through The Lens of Bi-Level Optimization

[2:00] Deep Causal Metric Learning

Orals 2:05-2:25

[2:05] Not All Poisons are Created Equal: Robust Training against Data Poisoning

Spotlights 2:25-3:00

[2:25] Learning Symmetric Embeddings for Equivariant World Models

[2:30] Accelerated Federated Learning with Decoupled Adaptive Optimization

[2:35] Byzantine Machine Learning Made Easy By Resilient Averaging of Momentums

[2:40] TSPipe: Learn from Teacher Faster with Pipelines

[2:45] Personalized Federated Learning through Local Memorization

[2:50] Three-stage Evolution and Fast Equilibrium for SGD with Non-degerate Critical Points

[2:55] Optimization-Derived Learning with Essential Convergence Analysis of Training and Hyper-training

(ends 3:00 PM)

Miscellaneous Aspects of Machine Learning/Reinforcement Learning [1:30-3:00]

Spotlights 1:30-2:05

[1:30] Gradient Descent on Neurons and its Link to Approximate Second-order Optimization

[1:35] A Tree-based Model Averaging Approach for Personalized Treatment Effect Estimation from Heterogeneous Data Sources

[1:40] Efficient Online ML API Selection for Multi-Label Classification Tasks

[1:45] Entropic Causal Inference: Graph Identifiability

[1:50] Architecture Agnostic Federated Learning for Neural Networks

[1:55] Conformal Prediction Sets with Limited False Positives

[2:00] Scalable Computation of Causal Bounds

Orals 2:05-2:25

[2:05] LIDL: Local Intrinsic Dimension Estimation Using Approximate Likelihood

Spotlights 2:25-3:00

[2:25] Learning Pseudometric-based Action Representations for Offline Reinforcement Learning

[2:30] A Statistical Manifold Framework for Point Cloud Data

[2:35] HyperImpute: Generalized Iterative Imputation with Automatic Model Selection

[2:40] A Natural Actor-Critic Framework for Zero-Sum Markov Games

[2:45] Distributionally Robust

$Q$ -Learning

[2:50] Sparsity in Partially Controllable Linear Systems

[2:55] Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation

(ends 3:00 PM)

Deep Learning/MISC [1:30-3:00]

Spotlights 1:30-2:00

[1:30] A New Perspective on the Effects of Spectrum in Graph Neural Networks

[1:35] Molecular Representation Learning via Heterogeneous Motif Graph Neural Networks

[1:40] Partial Label Learning via Label Influence Function

[1:45] Minimax Classification under Concept Drift with Multidimensional Adaptation and Performance Guarantees

[1:50] Understanding Robust Overfitting of Adversarial Training and Beyond

[1:55] A Random Matrix Analysis of Data Stream Clustering: Coping With Limited Memory Resources

Orals 2:00-2:20

[2:00] Hierarchical Shrinkage: Improving the accuracy and interpretability of tree-based models.

Spotlights 2:20-2:55

[2:20] Supervised Learning with General Risk Functionals

[2:25] Locally Sparse Neural Networks for Tabular Biomedical Data

[2:30] Dual Perspective of Label-Specific Feature Learning for Multi-Label Classification

[2:35] Detecting Corrupted Labels Without Training a Model to Predict

[2:40] Prototype-Anchored Learning for Learning with Imperfect Annotations

[2:45] Learning to Predict Graphs with Fused Gromov-Wasserstein Barycenters

[2:50] Deep Safe Incomplete Multi-view Clustering: Theorem and Algorithm

(ends 3:00 PM)

3:30 p.m.

Poster Session 2 [3:30-5:30]

Posters 3:30-5:30

Skin Deep Unlearning: Artefact and Instrument Debiasing in the Context of Melanoma Classification

One-Pass Diversified Sampling with Application to Terabyte-Scale Genomic Sequence Streams

Unsupervised Flow-Aligned Sequence-to-Sequence Learning for Video Restoration

ME-GAN: Learning Panoptic Electrocardio Representations for Multi-view ECG Synthesis Conditioned on Heart Diseases

Variational Mixtures of ODEs for Inferring Cellular Gene Expression Dynamics

Bayesian Imitation Learning for End-to-End Mobile Manipulation

De novo mass spectrometry peptide sequencing with a transformer model

Learning inverse folding from millions of predicted structures

Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance

MAE-DET: Revisiting Maximum Entropy Principle in Zero-Shot NAS for Efficient Object Detection

Proximal Exploration for Model-guided Protein Sequence Design

Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-time Retrieval

How to Fill the Optimum Set? Population Gradient Descent with Harmless Diversity

Examining Scaling and Transfer of Language Model Architectures for Machine Translation

State Transition of Dendritic Spines Improves Learning of Sparse Spiking Neural Networks

MemSR: Training Memory-efficient Lightweight Model for Image Super-Resolution

PINs: Progressive Implicit Networks for Multi-Scale Neural Representations

Translating Robot Skills: Learning Unsupervised Skill Correspondences Across Robots

ROCK: Causal Inference Principles for Reasoning about Commonsense Causality

Generative Coarse-Graining of Molecular Conformations

LIMO: Latent Inceptionism for Targeted Molecule Generation

Learning to Separate Voices by Spatial Regions

3DLinker: An E(3) Equivariant Variational Autoencoder for Molecular Linker Design

3D Infomax improves GNNs for Molecular Property Prediction

Biological Sequence Design with GFlowNets

Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets

Retroformer: Pushing the Limits of End-to-end Retrosynthesis Transformer

Constrained Optimization with Dynamic Bound-scaling for Effective NLP Backdoor Defense

Path-Aware and Structure-Preserving Generation of Synthetically Accessible Molecules

EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction

SoQal: Selective Oracle Questioning for Consistency Based Active Learning of Cardiac Signals

Matching Structure for Dual Learning

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone

Inducing Causal Structure for Interpretable Neural Networks

SDQ: Stochastic Differentiable Quantization with Mixed Precision

IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages

Re-evaluating Word Mover's Distance

Translatotron 2: High-quality direct speech-to-speech translation with voice preservation

Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and Fairness

Robust alignment of cross-session recordings of neural population activity by behaviour via unsupervised domain adaptation

Symmetric Machine Theory of Mind

PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance

LCANets: Lateral Competition Improves Robustness Against Corruption and Attack

Reconstructing Nonlinear Dynamical Systems from Multi-Modal Time Series

Neural Language Models are not Born Equal to Fit Brain Data, but Training Helps

Towards understanding how momentum improves generalization in deep learning

What Can Linear Interpolation of Neural Network Loss Landscapes Tell Us?

Deep equilibrium networks are sensitive to initialization statistics

Scaling-up Diverse Orthogonal Convolutional Networks by a Paraunitary Framework

Stability Based Generalization Bounds for Exponential Family Langevin Dynamics

Local Augmentation for Graph Neural Networks

On Non-local Convergence Analysis of Deep Linear Networks

Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum

Diversified Adversarial Attacks based on Conjugate Gradient Method

On the Optimization Landscape of Neural Collapse under MSE Loss: Global Optimality with Unconstrained Features

On the Equivalence Between Temporal and Static Equivariant Graph Representations

Robust Training under Label Noise by Over-parameterization

Implicit Bias of the Step Size in Linear Diagonal Neural Networks

Extended Unconstrained Features Model for Exploring Deep Neural Collapse

Score-Guided Intermediate Level Optimization: Fast Langevin Mixing for Inverse Problems

On Numerical Integration in Neural Ordinary Differential Equations

Reverse Engineering the Neural Tangent Kernel

Principled Knowledge Extrapolation with GANs

Informed Learning by Wide Neural Networks: Convergence, Generalization and Sampling Complexity

Data Augmentation as Feature Manipulation

Convolutional and Residual Networks Provably Contain Lottery Tickets

Feature Learning and Signal Propagation in Deep Neural Networks

Robust Training of Neural Networks Using Scale Invariant Architectures

Understanding Contrastive Learning Requires Incorporating Inductive Biases

Implicit Regularization with Polynomial Growth in Deep Tensor Factorization

Deep Network Approximation in Terms of Intrinsic Parameters

Coin Flipping Neural Networks

Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint

More Than a Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize

SE(3) Equivariant Graph Neural Networks with Complete Local Frames

From data to functa: Your data point is a function and you can treat it like one

DisPFL: Towards Communication-Efficient Personalized Federated Learning via Decentralized Sparse Training

Differentiable Top-k Classification Learning

Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks

Characterizing and Overcoming the Greedy Nature of Learning in Multi-modal Deep Neural Networks

Training Your Sparse Neural Network Better with Any Mask

Federated Learning with Positive and Unlabeled Data

Generating 3D Molecules for Target Protein Binding

Sparse Double Descent: Where Network Pruning Aggravates Overfitting

Collaboration of Experts: Achieving 80% Top-1 Accuracy on ImageNet with 100M FLOPs

Revisiting Consistency Regularization for Deep Partial Label Learning

Stochastic smoothing of the top-K calibrated hinge loss for deep imbalanced classification

A Unified Weight Initialization Paradigm for Tensorial Convolutional Neural Networks

PLATINUM: Semi-Supervised Model Agnostic Meta-Learning using Submodular Mutual Information

Multicoated Supermasks Enhance Hidden Networks

Generating Distributional Adversarial Examples to Evade Statistical Detectors

Improving Out-of-Distribution Robustness via Selective Augmentation

Modeling Adversarial Noise for Adversarial Training

Improving Adversarial Robustness via Mutual Information Estimation

FOCUS: Familiar Objects in Common and Uncommon Settings

Query-Efficient and Scalable Black-Box Adversarial Attacks on Discrete Sequential Data via Bayesian Optimization

Test-Time Training Can Close the Natural Distribution Shift Performance Gap in Deep Learning Based Compressed Sensing

A Dynamical System Perspective for Lipschitz Neural Networks

Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)

Neurotoxin: Durable Backdoors in Federated Learning

Bayesian Learning with Information Gain Provably Bounds Risk for a Robust Adversarial Defense

Maximum Likelihood Training for Score-based Diffusion ODEs by High Order Denoising Score Matching

Fast Lossless Neural Compression with Integer-Only Discrete Flows

SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

SCHA-VAE: Hierarchical Context Aggregation for Few-Shot Generation

DAdaQuant: Doubly-adaptive quantization for communication-efficient Federated Learning

Unsupervised Time-Series Representation Learning with Iterative Bilinear Temporal-Spectral Fusion

RetrievalGuard: Provably Robust 1-Nearest Neighbor Image Retrieval

Modeling Structure with Undirected Neural Networks

Certified Neural Network Watermarks with Randomized Smoothing

Improved Certified Defenses against Data Poisoning with (Deterministic) Finite Aggregation

Adversarial Vulnerability of Randomized Ensembles

Robustness Verification for Contrastive Learning

The CLRS Algorithmic Reasoning Benchmark

Finding Global Homophily in Graph Neural Networks When Meeting Heterophily

Understanding Robust Generalization in Learning Regular Languages

Improving Robustness against Real-World and Worst-Case Distribution Shifts through Decision Region Quantification

AdAUC: End-to-end Adversarial AUC Optimization Against Long-tail Problems

A Modern Self-Referential Weight Matrix That Learns to Modify Itself

Short-Term Plasticity Neurons Learning to Learn and Forget

$p$ -Laplacian Based Graph Neural Networks

Equivariant Quantum Graph Circuits

A Theoretical Comparison of Graph Neural Network Extensions

Variational On-the-Fly Personalization

Deep symbolic regression for recurrence prediction

Geometric Multimodal Contrastive Representation Learning

Universality of Winning Tickets: A Renormalization Group Perspective

Partial and Asymmetric Contrastive Learning for Out-of-Distribution Detection in Long-Tailed Recognition

Loss Function Learning for Domain Generalization by Implicit Gradient

GraphFM: Improving Large-Scale GNN Training via Feature Momentum

Generalization Guarantee of Training Graph Convolutional Networks with Graph Topology Sampling

A Differential Entropy Estimator for Training Neural Networks

Scaling Out-of-Distribution Detection for Real-World Settings

Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations

SPECTRE: Spectral Conditioning Helps to Overcome the Expressivity Limits of One-shot Graph Generators

The Infinite Contextual Graph Markov Model

RankSim: Ranking Similarity Regularization for Deep Imbalanced Regression

Detached Error Feedback for Distributed SGD with Random Sparsification

Training OOD Detectors in their Natural Habitats

Constrained Gradient Descent: A Powerful and Principled Evasion Attack Against Neural Networks

Neural Tangent Kernel Empowered Federated Learning

Probabilistically Robust Learning: Balancing Average- and Worst-case Performance

Adversarially trained neural representations are already as robust as biological neural representations

Feature Space Particle Inference for Neural Network Ensembles

A Study on the Ramanujan Graph Property of Winning Lottery Tickets

PAC-Net: A Model Pruning Approach to Inductive Transfer Learning

EDEN: Communication-Efficient and Robust Distributed Mean Estimation for Federated Learning

Fisher SAM: Information Geometry and Sharpness Aware Minimisation

Deep Networks on Toroids: Removing Symmetries Reveals the Structure of Flat Regions in the Landscape Geometry

Towards Understanding Sharpness-Aware Minimization

Neural Tangent Kernel Beyond the Infinite-Width Limit: Effects of Depth and Initialization

Implicit Bias of Linear Equivariant Networks

The State of Sparse Training in Deep Reinforcement Learning

Set Norm and Equivariant Skip Connections: Putting the Deep in Deep Sets

Datamodels: Understanding Predictions with Data and Data with Predictions

Revisiting and Advancing Fast Adversarial Training Through The Lens of Bi-Level Optimization

Deep Causal Metric Learning

Not All Poisons are Created Equal: Robust Training against Data Poisoning

Learning Symmetric Embeddings for Equivariant World Models

NISPA: Neuro-Inspired Stability-Plasticity Adaptation for Continual Learning in Sparse Networks

Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm

Auxiliary Learning with Joint Task and Data Scheduling

Large-scale Stochastic Optimization of NDCG Surrogates for Deep Learning with Provable Convergence

Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers

A New Perspective on the Effects of Spectrum in Graph Neural Networks

Molecular Representation Learning via Heterogeneous Motif Graph Neural Networks

Weisfeiler-Lehman Meets Gromov-Wasserstein

GenLabel: Mixup Relabeling using Generative Models

When and How Mixup Improves Calibration

On Transportation of Mini-batches: A Hierarchical Approach

VariGrow: Variational Architecture Growing for Task-Agnostic Continual Learning based on Bayesian Novelty

Beyond Images: Label Noise Transition Matrix Estimation for Tasks with Lower-Quality Features

A Model-Agnostic Randomized Learning Framework based on Random Hypothesis Subspace Sampling

Stable Conformal Prediction Sets

Rethinking Fano’s Inequality in Ensemble Learning

FITNESS: (Fine Tune on New and Similar Samples) to detect anomalies in streams with drift and outliers

Improving Mini-batch Optimal Transport via Partial Transportation

Near-optimal rate of consistency for linear models with missing values

Permutation Search of Tensor Network Structures via Local Sampling

Revisiting Label Smoothing and Knowledge Distillation Compatibility: What was Missing?

DNNR: Differential Nearest Neighbors Regression

HyperPrompt: Prompt-based Task-Conditioning of Transformers

Validating Causal Inference Methods

The Multivariate Community Hawkes Model for Dependent Relational Events in Continuous-time Networks

Scalable Deep Gaussian Markov Random Fields for General Graphs

Anytime Information Cascade Popularity Prediction via Self-Exciting Processes

Deep Variational Graph Convolutional Recurrent Network for Multivariate Time Series Anomaly Detection

Decomposing Temporal High-Order Interactions via Latent ODEs

Log-Euclidean Signatures for Intrinsic Distances Between Unaligned Datasets

DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck

End-to-End Balancing for Causal Continuous Treatment-Effect Estimation

Role-based Multiplex Network Embedding

Measure Estimation in the Barycentric Coding Model

RieszNet and ForestRiesz: Automatic Debiased Machine Learning with Neural Nets and Random Forests

Counterfactual Transportability: A Formal Approach

Identification of Linear Non-Gaussian Latent Hierarchical Structure

COAT: Measuring Object Compositionality in Emergent Representations

Generalization and Robustness Implications in Object-Centric Learning

NAFS: A Simple yet Tough-to-beat Baseline for Graph Representation Learning

Action-Sufficient State Representation Learning for Control with Structural Constraints

Gradient Descent on Neurons and its Link to Approximate Second-order Optimization

A Tree-based Model Averaging Approach for Personalized Treatment Effect Estimation from Heterogeneous Data Sources

Efficient Online ML API Selection for Multi-Label Classification Tasks

Entropic Causal Inference: Graph Identifiability

Architecture Agnostic Federated Learning for Neural Networks

Conformal Prediction Sets with Limited False Positives

Scalable Computation of Causal Bounds

LIDL: Local Intrinsic Dimension Estimation Using Approximate Likelihood

Learning Pseudometric-based Action Representations for Offline Reinforcement Learning

A Statistical Manifold Framework for Point Cloud Data

HyperImpute: Generalized Iterative Imputation with Automatic Model Selection

Partial Label Learning via Label Influence Function

Minimax Classification under Concept Drift with Multidimensional Adaptation and Performance Guarantees

Understanding Robust Overfitting of Adversarial Training and Beyond

A Random Matrix Analysis of Data Stream Clustering: Coping With Limited Memory Resources

Hierarchical Shrinkage: Improving the accuracy and interpretability of tree-based models.

Supervised Learning with General Risk Functionals

Adapting to Mixing Time in Stochastic Optimization with Markovian Data

Fast Composite Optimization and Statistical Recovery in Federated Learning

Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity

Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning

Optimal Algorithms for Stochastic Multi-Level Compositional Optimization

Finite-Sum Coupled Compositional Stochastic Optimization: Theory and Applications

Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent

Statistical inference with implicit SGD: proximal Robbins-Monro vs. Polyak-Ruppert

ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!

Communication-Efficient Adaptive Federated Learning

RECAPP: Crafting a More Efficient Catalyst for Convex Optimization

Kill a Bird with Two Stones: Closing the Convergence Gaps in Non-Strongly Convex Optimization by Directly Accelerated SVRG with Double Compensation and Snapshots

Accelerated Federated Learning with Decoupled Adaptive Optimization

Byzantine Machine Learning Made Easy By Resilient Averaging of Momentums

TSPipe: Learn from Teacher Faster with Pipelines

Personalized Federated Learning through Local Memorization

Three-stage Evolution and Fast Equilibrium for SGD with Non-degerate Critical Points

Optimization-Derived Learning with Essential Convergence Analysis of Training and Hyper-training

Generalizing Gaussian Smoothing for Random Search

A General Recipe for Likelihood-free Bayesian Optimization

Constrained Discrete Black-Box Optimization using Mixed-Integer Programming

Risk-Averse No-Regret Learning in Online Convex Games

Improve Single-Point Zeroth-Order Optimization Using High-Pass and Low-Pass Filters

Robust Multi-Objective Bayesian Optimization Under Input Noise

Gradient-Free Method for Heavily Constrained Nonconvex Optimization

Sequential- and Parallel- Constrained Max-value Entropy Search via Information Lower Bound

The power of first-order smooth optimization for black-box non-smooth problems

How Tempering Fixes Data Augmentation in Bayesian Neural Networks

Surrogate Likelihoods for Variational Annealed Importance Sampling

Nonparametric Sparse Tensor Factorization with Hierarchical Gamma Processes

Fat–Tailed Variational Inference with Anisotropic Tail Adaptive Flows

Variational Sparse Coding with Learned Thresholding

Structured Stochastic Gradient MCMC

BAMDT: Bayesian Additive Semi-Multivariate Decision Trees for Nonparametric Regression

Variational Inference with Locally Enhanced Bounds for Hierarchical Models

Centroid Approximation for Bootstrap: Improving Particle Quality at Inference

Deep Reference Priors: What is the best way to pretrain a model?

Bayesian Continuous-Time Tucker Decomposition

Approximate Bayesian Computation with Domain Expert in the Loop

Discrete Probabilistic Inverse Optimal Transport

Easy Variational Inference for Categorical Models via an Independent Binary Approximation

Streaming Inference for Infinite Feature Models

Optimizing Sequential Experimental Design with Deep Reinforcement Learning

Function-space Inference with Sparse Implicit Processes

Variational Inference for Infinitely Deep Neural Networks

Personalized Federated Learning via Variational Bayesian Inference

Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling

Bayesian Deep Embedding Topic Meta-Learner

Efficient Approximate Inference for Stationary Kernel on Frequency Domain

Modeling Strong and Human-Like Gameplay with KL-Regularized Search

Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget Matters

Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning

Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models and Amortized Policy Search

Generalized Data Distribution Iteration

Optimizing Tensor Network Contraction Using Reinforcement Learning

History Compression via Language Models in Reinforcement Learning

REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy Transfer

LeNSE: Learning To Navigate Subgraph Embeddings for Large-Scale Combinatorial Optimisation

Efficient Learning for AlphaZero via Path Consistency

A data-driven approach for learning to control computers

Zero-Shot Reward Specification via Grounded Natural Language

How to Stay Curious while avoiding Noisy TVs using Aleatoric Uncertainty Estimation

Model-Value Inconsistency as a Signal for Epistemic Uncertainty

Improving Policy Optimization with Generalist-Specialist Learning

Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning

Analysis of Stochastic Processes through Replay Buffers

Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning

Communicating via Markov Decision Processes

PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation

DNS: Determinantal Point Process Based Neural Network Sampler for Ensemble Reinforcement Learning

Planning with Diffusion for Flexible Behavior Synthesis

A Temporal-Difference Approach to Policy Gradient Estimation

MASER: Multi-Agent Reinforcement Learning with Subgoals Generated from Experience Replay Buffer

Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency

Actor-Critic based Improper Reinforcement Learning

On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs

The Geometry of Robust Value Functions

Denoised MDPs: Learning World Models Better Than the World Itself

Greedy based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning

Bayesian Nonparametrics for Offline Skill Discovery

Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in the Mean-Field Regime

Curriculum Reinforcement Learning via Constrained Optimal Transport

Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPs

Stabilizing Q-learning with Linear Architectures for Provable Efficient Learning

Constrained Offline Policy Optimization

Causal Dynamics Learning for Task-Independent State Abstraction

Leveraging Approximate Symbolic Models for Reinforcement Learning via Skill Diversity

Reinforcement Learning with Action-Free Pre-Training from Videos

Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods

Delayed Reinforcement Learning by Imitation

Reachability Constrained Reinforcement Learning

Adaptive Model Design for Markov Decision Process

Goal Misgeneralization in Deep Reinforcement Learning

A Natural Actor-Critic Framework for Zero-Sum Markov Games

Distributionally Robust

$Q$ -Learning

Sparsity in Partially Controllable Linear Systems

Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation

Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings

Label-Free Explainability for Unsupervised Models

Towards Theoretical Analysis of Transformation Complexity of ReLU DNNs

A Study of Face Obfuscation in ImageNet

Fair Representation Learning through Implicit Path Alignment

Mitigating Neural Network Overconfidence with Logit Normalization

Learning fair representation with a parametric integral probability metric

Privacy for Free: How does Dataset Condensation Help Privacy?

Fair Generalized Linear Models with a Convex Penalty

Tight and Robust Private Mean Estimation with Few Users

QSFL: A Two-Level Uplink Communication Optimization Framework for Federated Learning

Robustness and Accuracy Could Be Reconcilable by (Proper) Definition

Sanity Simulations for Saliency Methods

Out-of-Distribution Detection with Deep Nearest Neighbors

Differentially Private Maximal Information Coefficients

Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data

On the Difficulty of Defending Self-Supervised Learning against Model Extraction

Adversarial Attack and Defense for Non-Parametric Two-Sample Tests

Certified Adversarial Robustness Under the Bounded Support Set

Predicting Out-of-Distribution Error with the Projection Norm

Adversarially Robust Models may not Transfer Better: Sufficient Conditions for Domain Transferability from the View of Regularization

Improved Regret for Differentially Private Exploration in Linear MDP

Differentially Private Community Detection for Stochastic Block Models

Understanding Clipping for Federated Learning: Convergence and Client-Level Differential Privacy

Hermite Polynomial Features for Private Data Generation

How to Steer Your Adversary: Targeted and Efficient Model Stealing Defenses with Gradient Redirection

Deduplicating Training Data Mitigates Privacy Risks in Language Models

Private frequency estimation via projective geometry

The Poisson Binomial Mechanism for Unbiased Federated Learning with Secure Aggregation

Faster Privacy Accounting via Evolving Discretization

The Fundamental Price of Secure Aggregation in Differentially Private Federated Learning

Private Adaptive Optimization with Side information

Secure Quantized Training for Deep Learning

Private optimization in the interpolation regime: faster rates and hardness results

Differentially Private Coordinate Descent for Composite Empirical Risk Minimization

Private Streaming SCO in

$\ell_p$ geometry with Applications in High Dimensional Online Decision Making

Learning Domain Adaptive Object Detection with Probabilistic Teacher

Adaptive Data Analysis with Correlated Observations

Efficient PAC Learning from the Crowd with Pairwise Comparisons

On the Statistical Benefits of Curriculum Learning

Feature and Parameter Selection in Stochastic Linear Bandits

Disentangled Federated Learning for Tackling Attributes Skew via Invariant Aggregation and Diversity Transferring

A new similarity measure for covariate shift with applications to nonparametric regression

Contextual Bandits with Large Action Spaces: Made Practical

Identifiability Conditions for Domain Adaptation

Streaming Algorithms for High-Dimensional Robust Statistics

Popular decision tree algorithms are provably noise tolerant

Understanding and Improving Knowledge Graph Embedding for Entity Alignment

Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning

Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees

Understanding Gradual Domain Adaptation: Improved Analysis, Optimal Path and Beyond

A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes

The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces

Extracting Latent State Representations with Linear Dynamics from Rich Observations

For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria

Consensus Multiplicative Weights Update: Learning to Learn using Projector-based Game Signatures

Learning Markov Games with Adversarial Opponents: Efficient Algorithms and Fundamental Limits

Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic Responses

Learning to Infer Structures of Network Games

Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

Near-Optimal Learning of Extensive-Form Games with Imperfect Information

Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation

Choosing Answers in Epsilon-Best-Answer Identification for Linear Bandits

On the Finite-Time Performance of the Knowledge Gradient Algorithm

Expression might be enough: representing pressure and demand for reinforcement learning based traffic signal control

Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers

No-Regret Learning in Time-Varying Zero-Sum Games

Achieving Minimax Rates in Pool-Based Batch Active Learning

Active Multi-Task Representation Learning

Active fairness auditing

Metric-Fair Active Learning

Metric-Fair Classifier Derandomization

Interactively Learning Preference Constraints in Linear Bandits

Convergence of Uncertainty Sampling for Active Learning

Thompson Sampling for Robust Transfer in Multi-Task Bandits

Constants Matter: The Performance Gains of Active Learning

Cross-Space Active Learning on Graph Convolutional Networks

Generative Trees: Adversarial and Copycat

A Resilient Distributed Boosting Algorithm

Online Learning and Pricing with Reusable Resources: Linear Bandits with Sub-Exponential Rewards

On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation

Congested Bandits: Optimal Routing via Short-term Resets

Stochastic Rising Bandits

Agnostic Learnability of Halfspaces via Logistic Loss

Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics for Convex Losses in High-Dimension

PDE-Based Optimal Strategy for Unconstrained Online Learning

Provable Acceleration of Heavy Ball beyond Quadratics for a Class of Polyak-Lojasiewicz Functions when the Non-Convexity is Averaged-Out

On Learning Mixture of Linear Regressions in the Non-Realizable Setting

Random Forest Density Estimation

The dynamics of representation learning in shallow, non-linear autoencoders

Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

Estimation in Rotationally Invariant Generalized Linear Models via Approximate Message Passing

Failure and success of the spectral bias prediction for Laplace Kernel Ridge Regression: the case of low-dimensional data

Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

Universal Joint Approximation of Manifolds and Densities by Simple Injective Flows

Bounding the Width of Neural Networks via Coupled Initialization - A Worst Case Analysis

Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression

The Neural Race Reduction: Dynamics of Abstraction in Gated Networks

Efficient Learning of CNNs using Patch Based Features

Neural Tangent Kernel Analysis of Deep Narrow Neural Networks

Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)

Fully-Connected Network on Noncompact Symmetric Space and Ridgelet Transform based on Helgason-Fourier Analysis

Non-Vacuous Generalisation Bounds for Shallow Neural Networks

Maslow's Hammer in Catastrophic Forgetting: Node Re-Use vs. Node Activation

(ends 5:30 PM)

4 p.m.

THU 21 JUL

3:30 a.m.

Break:

Breakfast on your own

(ends 3:45 AM)

6 a.m.

Invited Talk:

Design for Inference in Drug Discovery and Development

Aviv Regev

(ends 7:00 AM)

7 a.m.

Break:

Coffee Break

(ends 7:30 AM)

7:30 a.m.

Deep Learning [7:30-9:00]

Spotlights 7:30-8:00

[7:30] Does the Data Induce Capacity Control in Deep Learning?

[7:35] Fighting Fire with Fire: Avoiding DNN Shortcuts through Priming

[7:40] Memory-Based Model Editing at Scale

[7:45] Winning the Lottery Ahead of Time: Efficient Early Network Pruning

[7:50] Active Learning on a Budget: Opposite Strategies Suit High and Low Budgets

[7:55] AutoSNN: Towards Energy-Efficient Spiking Neural Networks

Orals 8:00-8:20

[8:00] Overcoming Oscillations in Quantization-Aware Training

Spotlights 8:20-8:55

[8:20] Dataset Condensation via Efficient Synthetic-Data Parameterization

[8:25] Searching for BurgerFormer with Micro-Meso-Macro Space Design

[8:30] Multi-scale Feature Learning Dynamics: Insights for Double Descent

[8:35] Dataset Condensation with Contrastive Signals

[8:40] Equivariant Priors for compressed sensing with unknown orientation

[8:45] Injecting Logical Constraints into Neural Networks via Straight-Through Estimators

[8:50] Prioritized Training on Points that are Learnable, Worth Learning, and not yet Learnt

(ends 9:00 AM)

T: Bandits/Online Learning/Reinforcement Learning [7:30-9:00]

Orals 7:30-7:50

[7:30] First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach

Spotlights 7:50-8:10

[7:50] Generic Coreset for Scalable Learning of Monotonic Kernels: Logistic Regression, Sigmoid and more

[7:55] Shuffle Private Linear Contextual Bandits

[8:00] Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity

[8:05] Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes

Orals 8:10-8:30

[8:10] Label Ranking through Nonparametric Regression

Spotlights 8:30-9:00

[8:30] Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost

[8:35] A Simple Unified Framework for High Dimensional Bandit Problems

[8:40] A Reduction from Linear Contextual Bandits Lower Bounds to Estimations Lower Bounds

[8:45] Branching Reinforcement Learning

[8:50] Fast rates for noisy interpolation require rethinking the effect of inductive bias

[8:55] Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path

(ends 9:00 AM)

APP: Physics/Computer Vision [7:30-9:00]

Spotlights 7:30-8:05

[7:30] Structure Preserving Neural Networks: A Case Study in the Entropy Closure of the Boltzmann Equation

[7:35] Composing Partial Differential Equations with Physics-Aware Neural Networks

[7:40] Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval

[7:45] Towards Coherent and Consistent Use of Entities in Narrative Generation

[7:50] Pure Noise to the Rescue of Insufficient Data: Improving Imbalanced Classification by Training on Random Noise Images

[7:55] Optimally Controllable Perceptual Lossy Compression

[8:00] Learning to Solve PDE-constrained Inverse Problems with Graph Networks

Orals 8:05-8:25

[8:05] ModLaNets: Learning Generalisable Dynamics via Modularity and Physical Inductive Bias

Spotlights 8:25-9:00

[8:25] Learning to Estimate and Refine Fluid Motion with Physical Dynamics

[8:30] Tractable Dendritic RNNs for Reconstructing Nonlinear Dynamical Systems

[8:35] An Intriguing Property of Geophysics Inversion

[8:40] Particle Transformer for Jet Tagging

[8:45] BabelTower: Learning to Auto-parallelized Program Translation

[8:50] ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

[8:55] On Distribution Shift in Learning-based Bug Detectors

(ends 9:00 AM)

Reinforcement Learning [7:30-9:00]

Orals 7:30-7:50

[7:30] The Importance of Non-Markovianity in Maximum State Entropy Exploration

Spotlights 7:50-8:15

[7:50] Continuous Control with Action Quantization from Demonstrations

[7:55] Plan Your Target and Learn Your Skills: Transferable State-Only Imitation Learning via Decoupled Policy Optimization

[8:00] Inverse Contextual Bandits: Learning How Behavior Evolves over Time

[8:05] Balancing Sample Efficiency and Suboptimality in Inverse Reinforcement Learning

[8:10] Towards Uniformly Superhuman Autonomy via Subdominance Minimization

Orals 8:15-8:35

[8:15] Causal Imitation Learning under Temporally Correlated Noise

Spotlights 8:35-9:00

[8:35] Interactive Inverse Reinforcement Learning for Cooperative Games

[8:40] A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines

[8:45] Robust Imitation Learning against Variations in Environment Dynamics

[8:50] Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations

[8:55] Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation

(ends 9:00 AM)

DL: Generative Models and Autoencoders [7:30-9:00]

Spotlights 7:30-8:05

[7:30] A Neural Tangent Kernel Perspective of GANs

[7:35] Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models

[7:40] Neural Inverse Transform Sampler

[7:45] Antibody-Antigen Docking and Design via Hierarchical Structure Refinement

[7:50] Diffusion Models for Adversarial Purification

[7:55] Gaussian Mixture Variational Autoencoder with Contrastive Learning for Multi-Label Classification

[8:00] VarScene: A Deep Generative Model for Realistic Scene Graph Synthesis

Orals 8:05-8:25

[8:05] It’s Raw! Audio Generation with State-Space Models

Spotlights 8:25-9:00

[8:25] Unsupervised Image Representation Learning with Deep Latent Particles

[8:30] Learning Efficient and Robust Ordinary Differential Equations via Invertible Neural Networks

[8:35] Neuro-Symbolic Hierarchical Rule Induction

[8:40] General-purpose, long-context autoregressive modeling with Perceiver AR

[8:45] Marginal Tail-Adaptive Normalizing Flows

[8:50] SkexGen: Autoregressive Generation of CAD Construction Sequences with Disentangled Codebooks

[8:55] NeuroFluid: Fluid Dynamics Grounding with Particle-Driven Neural Radiance Fields

(ends 9:00 AM)

Theory/Social Aspects [7:30-9:00]

Orals 7:30-7:50

[7:30] Federated Reinforcement Learning: Linear Speedup Under Markovian Sampling

Spotlights 7:50-8:15

[7:50] Entropic Gromov-Wasserstein between Gaussian Distributions

[7:55] No-Regret Learning in Partially-Informed Auctions

[8:00] On Last-Iterate Convergence Beyond Zero-Sum Games

[8:05] Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the Gap Between Learning in Extensive-Form and Normal-Form Games

[8:10] Fictitious Play and Best-Response Dynamics in Identical Interest and Zero-Sum Stochastic Games

Orals 8:15-8:35

[8:15] On the Convergence of Inexact Predictor-Corrector Methods for Linear Programming

Spotlights 8:35-9:00

[8:35] Nested Bandits

[8:40] Information Discrepancy in Strategic Learning

[8:45] A Psychological Theory of Explainability

[8:50] Task-aware Privacy Preservation for Multi-dimensional Data

[8:55] Strategic Representation

(ends 9:00 AM)

Miscellaneous Aspects of Machine Learning [7:30-9:00]

Spotlights 7:30-7:55

[7:30] Estimating Instance-dependent Bayes-label Transition Matrix using a Deep Neural Network

[7:35] Invariant Ancestry Search

[7:40] Unaligned Supervision for Automatic Music Transcription in The Wild

[7:45] Fourier Learning with Cyclical Data

[7:50] Linear Adversarial Concept Erasure

Orals 7:55-8:15

[7:55] Score Matching Enables Causal Discovery of Nonlinear Additive Noise Models

Spotlights 8:15-8:50

[8:15] Provable Domain Generalization via Invariant-Feature Subspace Recovery

[8:20] Subspace Learning for Effective Meta-Learning

[8:25] Continual Learning via Sequential Function-Space Variational Inference

[8:30] Efficient Test-Time Model Adaptation without Forgetting

[8:35] Gaussian Process Uniform Error Bounds with Unknown Hyperparameters for Safety-Critical Applications

[8:40] Input Dependent Sparse Gaussian Processes

[8:45] AutoIP: A United Framework to Integrate Physics into Gaussian Processes

(ends 9:00 AM)

Deep Learning/Optimization [7:30-9:00]

Spotlights 7:30-8:00

[7:30] Equivariance versus Augmentation for Spherical Images

[7:35] Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training

[7:40] Neural Network Poisson Models for Behavioural and Neural Spike Train Data

[7:45] A Branch and Bound Framework for Stronger Adversarial Attacks of ReLU Networks

[7:50] GACT: Activation Compressed Training for Generic Network Architectures

[7:55] Fast Finite Width Neural Tangent Kernel

Orals 8:00-8:20

[8:00] G-Mixup: Graph Data Augmentation for Graph Classification

Spotlights 8:20-9:00

[8:20] Universal Hopfield Networks: A General Framework for Single-Shot Associative Memory Models

[8:25] Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts

[8:30] Continual Learning with Guarantees via Weight Interval Constraints

[8:35] Faster Fundamental Graph Algorithms via Learned Predictions

[8:40] Practical Almost-Linear-Time Approximation Algorithms for Hybrid and Overlapping Graph Clustering

[8:45] Fair and Fast k-Center Clustering for Data Summarization

[8:50] Online and Consistent Correlation Clustering

[8:55] Generalized Leverage Scores: Geometric Interpretation and Applications

(ends 9:00 AM)

MISC/Deep Learning [7:30-9:00]

Spotlights 7:30-8:00

[7:30] Blurs Behave Like Ensembles: Spatial Smoothings to Improve Accuracy, Uncertainty, and Robustness

[7:35] Breaking Down Out-of-Distribution Detection: Many Methods Based on OOD Training Data Estimate a Combination of the Same Core Quantities

[7:40] Comprehensive Analysis of Negative Sampling in Knowledge Graph Representation Learning

[7:45] Linearity Grafting: Relaxed Neuron Pruning Helps Certifiable Robustness

[7:50] A Hierarchical Transitive-Aligned Graph Kernel for Un-attributed Graphs

[7:55] Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time

Orals 8:00-8:20

[8:00] Random Gegenbauer Features for Scalable Kernel Methods

Spotlights 8:20-8:55

[8:20] Robust Meta-learning with Sampling Noise and Label Noise via Eigen-Reptile

[8:25] Functional Output Regression with Infimal Convolution: Exploring the Huber and

$\epsilon$ -insensitive Losses

[8:30] Measuring dissimilarity with diffeomorphism invariance

[8:35] Importance Weighted Kernel Bayes' Rule

[8:40] An Asymptotic Test for Conditional Independence using Analytic Kernel Embeddings

[8:45] Nyström Kernel Mean Embeddings

[8:50] Distribution Regression with Sliced Wasserstein Kernels

(ends 9:00 AM)

Optimization/Reinforcement Learning [7:30-9:00]

Spotlights 7:30-8:05

[7:30] Adapting k-means Algorithms for Outliers

[7:35] Accelerated, Optimal and Parallel: Some results on model-based stochastic optimization

[7:40] Online Algorithms with Multiple Predictions

[7:45] Parsimonious Learning-Augmented Caching

[7:50] RUMs from Head-to-Head Contests

[7:55] Quant-BnB: A Scalable Branch-and-Bound Method for Optimal Decision Trees with Continuous Features

[8:00] Robustness in Multi-Objective Submodular Optimization: a Quantile Approach

Orals 8:05-8:25

[8:05] The Unsurprising Effectiveness of Pre-Trained Vision Models for Control

Spotlights 8:25-9:00

[8:25] COLA: Consistent Learning with Opponent-Learning Awareness

[8:30] A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games

[8:35] A Framework for Learning to Request Rich and Contextually Useful Information from Humans

[8:40] Learning Stochastic Shortest Path with Linear Function Approximation

[8:45] Difference Advantage Estimation for Multi-Agent Policy Gradients

[8:50] Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification

[8:55] Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets

(ends 9:00 AM)

9 a.m.

Break:

Lunch Break On Your Own

(ends 10:30 AM)

10:30 a.m.

Deep Learning: SSL/GNN [10:30-12:00]

Spotlights 10:30-11:05

[10:30] Adversarial Masking for Self-Supervised Learning

[10:35] Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance

[10:40] OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

[10:45] Multirate Training of Neural Networks

[10:50] Variational Wasserstein gradient flow

[10:55] Building Robust Ensembles via Margin Boosting

[11:00] Investigating Generalization by Controlling Normalized Margin

Orals 11:05-11:25

[11:05] Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation

Spotlights 11:25-12:00

[11:25] VLUE: A Multi-Task Multi-Dimension Benchmark for Evaluating Vision-Language Pre-training

[11:30] Let Invariant Rationale Discovery Inspire Graph Contrastive Learning

[11:35] Graph Neural Architecture Search Under Distribution Shifts

[11:40] How Powerful are Spectral Graph Neural Networks

[11:45] Constraint-based graph network simulator

[11:50] PACE: A Parallelizable Computation Encoder for Directed Acyclic Graphs

[11:55] Structure-Aware Transformer for Graph Representation Learning

(ends 12:00 PM)

Theory: Game Theory and Optimization [10:30-12:00]

Orals 10:30-10:50

[10:30] UnderGrad: A Universal Black-Box Optimization Method with Almost Dimension-Free Convergence Rate Guarantees

Spotlights 10:50-11:15

[10:50] Safe Learning in Tree-Form Sequential Decision Making: Handling Hard and Soft Constraints

[10:55] A Marriage between Adversarial Team Games and 2-player Games: Enabling Abstractions, No-regret Learning, and Subgame Solving

[11:00] Exact Learning of Preference Structure: Single-peaked Preferences and Beyond

[11:05] Selling Data To a Machine Learner: Pricing via Costly Signaling

[11:10] Hardness and Algorithms for Robust and Sparse Optimization

Orals 11:15-11:35

[11:15] A Convergent and Dimension-Independent Min-Max Optimization Algorithm

Spotlights 11:35-12:00

[11:35] Stochastic Continuous Submodular Maximization: Boosting via Non-oblivious Function

[11:40] Accelerated Gradient Methods for Geodesically Convex Optimization: Tractable Algorithms and Convergence Analysis

[11:45] The Complexity of k-Means Clustering when Little is Known

[11:50] Iterative Hard Thresholding with Adaptive Regularization: Sparser Solutions Without Sacrificing Runtime

[11:55] 3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation

(ends 12:00 PM)

Deep Learning: Attention Mechanisms [10:30-12:00]

Spotlights 10:30-11:05

[10:30] Ripple Attention for Visual Perception with Sub-quadratic Complexity

[10:35] Self-supervised Models are Good Teaching Assistants for Vision Transformers

[10:40] Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations

[10:45] In defense of dual-encoders for neural ranking

[10:50] From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers

[10:55] Linear Complexity Randomized Self-attention Mechanism

[11:00] Efficient Representation Learning via Adaptive Context Pooling

Orals 11:05-11:25

[11:05] Toward Compositional Generalization in Object-Oriented World Modeling

Spotlights 11:25-12:00

[11:25] Fast Population-Based Reinforcement Learning on a Single Machine

[11:30] NeuralEF: Deconstructing Kernels by Deep Neural Networks

[11:35] Visual Attention Emerges from Recurrent Sparse Reconstruction

[11:40] Transformer Quality in Linear Time

[11:45] What Dense Graph Do You Need for Self-Attention?

[11:50] Dual Decomposition of Convex Optimization Layers for Consistent Attention in Medical Images

[11:55] Multi Resolution Analysis (MRA) for Approximate Self-Attention

(ends 12:00 PM)

Applications [10:30-12:00]

Spotlights 10:30-11:05

[10:30] A Context-Integrated Transformer-Based Neural Network for Auction Design

[10:35] Domain Adaptation for Time Series Forecasting via Attention Sharing

[10:40] Continuous-Time Modeling of Counterfactual Outcomes Using Neural Controlled Differential Equations

[10:45] Disentangling Disease-related Representation from Obscure for Disease Prediction

[10:50] Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization

[10:55] Blocks Assemble! Learning to Assemble with Large-Scale Structured Reinforcement Learning

[11:00] Learning of Cluster-based Feature Importance for Electronic Health Record Time-series

Orals 11:05-11:25

[11:05] Do Differentiable Simulators Give Better Policy Gradients?

Spotlights 11:25-12:00

[11:25] Adaptive Conformal Predictions for Time Series

[11:30] Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

[11:35] Rethinking Graph Neural Networks for Anomaly Detection

[11:40] Fast Aquatic Swimmer Optimization with Differentiable Projective Dynamics and Neural Network Hydrodynamic Models

[11:45] Proving Theorems using Incremental Learning and Hindsight Experience Replay

[11:50] Discovering Generalizable Spatial Goal Representations via Graph-based Active Reward Learning

[11:55] Neural Inverse Kinematic

(ends 12:00 PM)

Reinforcement Learning [10:30-12:00]

Orals 10:30-10:50

[10:30] Learning Bellman Complete Representations for Offline Policy Evaluation

Spotlights 10:50-11:15

[10:50] Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning

[10:55] A Simple Reward-free Approach to Constrained Reinforcement Learning

[11:00] Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching

[11:05] Temporal Difference Learning for Model Predictive Control

[11:10] Model Selection in Batch Policy Optimization

Orals 11:15-11:35

[11:15] Adversarially Trained Actor Critic for Offline Reinforcement Learning

Spotlights 11:35-12:00

[11:35] Optimal Estimation of Policy Gradient via Double Fitted Iteration

[11:40] Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes

[11:45] Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory

[11:50] Lagrangian Method for Q-Function Learning (with Applications to Machine Translation)

[11:55] On the Role of Discount Factor in Offline Reinforcement Learning

(ends 12:00 PM)

MISC/Social Aspects [10:30-12:00]

Spotlights 10:30-11:05

[10:30] Learning Stable Classifiers by Transferring Unstable Features

[10:35] Data-Efficient Double-Win Lottery Tickets from Robust Pre-training

[10:40] Attentional Meta-learners for Few-shot Polythetic Classification

[10:45] C*-algebra Net: A New Approach Generalizing Neural Network Parameters to C*-algebra

[10:50] Nonlinear Feature Diffusion on Hypergraphs

[10:55] Kernel Methods for Radial Transformed Compositional Data with Many Zeros

[11:00] Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning

Orals 11:05-11:25

[11:05] Causal Conceptions of Fairness and their Consequences

Spotlights 11:25-11:55

[11:25] Fairness with Adaptive Weights

[11:30] Understanding Instance-Level Impact of Fairness Constraints

[11:35] Achieving Fairness at No Utility Cost via Data Reweighing with Influence

[11:40] Mitigating Gender Bias in Face Recognition using the von Mises-Fisher Mixture Model

[11:45] Selective Regression under Fairness Criteria

[11:50] Input-agnostic Certified Group Fairness via Gaussian Parameter Smoothing

(ends 12:00 PM)

Deep Learning/MISC [10:30-12:00]

Spotlights 10:30-11:00

[10:30] Dynamic Topic Models for Temporal Document Networks

[10:35] A Functional Information Perspective on Model Interpretation

[10:40] Be Like Water: Adaptive Floating Point for Machine Learning

[10:45] Lie Point Symmetry Data Augmentation for Neural PDE Solvers

[10:50] Fast Provably Robust Decision Trees and Boosting

[10:55] Order Constraints in Optimal Transport

Orals 11:00-11:20

[11:00] Sublinear-Time Clustering Oracle for Signed Graphs

Spotlights 11:20-11:55

[11:20] PAC-Bayesian Bounds on Rate-Efficient Classifiers

[11:25] More Efficient Sampling for Tensor Decomposition With Worst-Case Guarantees

[11:30] Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning

[11:35] On the Convergence of Local Stochastic Compositional Gradient Descent with Momentum

[11:40] SPDY: Accurate Pruning with Speedup Guarantees

[11:45] Flashlight: Enabling Innovation in Tools for Machine Learning

[11:50] On the Robustness of CountSketch to Adaptive Inputs

(ends 12:00 PM)

Optimization/Reinforcement Learning [10:30-12:00]

Orals 10:30-10:50

[10:30] Streaming Algorithm for Monotone k-Submodular Maximization with Cardinality Constraints

Spotlights 10:50-11:10

[10:50] Adaptive Accelerated (Extra-)Gradient Methods with Variance Reduction

[10:55] Adaptive Second Order Coresets for Data-efficient Machine Learning

[11:00] Nesterov Accelerated Shuffling Gradient Method for Convex Optimization

[11:05] Efficient Low Rank Convex Bounds for Pairwise Discrete Graphical Models

Orals 11:10-11:30

[11:10] Deletion Robust Submodular Maximization over Matroids

Spotlights 11:30-11:55

[11:30] The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks

[11:35] Instance Dependent Regret Analysis of Kernelized Bandits

[11:40] EAT-C: Environment-Adversarial sub-Task Curriculum for Efficient Reinforcement Learning

[11:45] Tell me why! Explanations support learning relational and causal structure

[11:50] Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics

(ends 12:00 PM)

Probabilistic Methods/MISC [10:30-12:00]

Orals 10:30-10:50

[10:30] Stochastic Deep Networks with Linear Competing Units for Model-Agnostic Meta-Learning

Spotlights 10:50-11:10

[10:50] Nonparametric Factor Trajectory Learning for Dynamic Tensor Decomposition

[10:55] Nonparametric Embeddings of Sparse High-Order Interaction Events

[11:00] Adapting the Linearised Laplace Model Evidence for Modern Deep Learning

[11:05] NOMU: Neural Optimization-based Model Uncertainty

Orals 11:10-11:30

[11:10] Bayesian Model Selection, the Marginal Likelihood, and Generalization

Spotlights 11:30-12:00

[11:30] Fast-Rate PAC-Bayesian Generalization Bounds for Meta-Learning

[11:35] Wide Neural Networks Forget Less Catastrophically

[11:40] A Unified View on PAC-Bayes Bounds for Meta-Learning

[11:45] MAML and ANIL Provably Learn Representations

[11:50] C-MinHash: Improving Minwise Hashing with Circulant Permutation

[11:55] Proximal Denoiser for Convergent Plug-and-Play Optimization with Nonconvex Regularization

(ends 12:00 PM)

11 a.m.

Registration Check-in Desk

(ends 10:00 PM)

noon

Break:

Coffee Break

(ends 12:30 PM)

12:30 p.m.

Deep Learning [12:30-2:00]

Spotlights 12:30-1:05

[12:30] Bregman Neural Networks

[12:35] Quantifying and Learning Linear Symmetry-Based Disentanglement

[12:40] Exploiting Redundancy: Separable Group Convolutional Networks on Lie Groups

[12:45] PDO-s3DCNNs: Partial Differential Operator Based Steerable 3D CNNs

[12:50] Utilizing Expert Features for Contrastive Learning of Time-Series Representations

[12:55] (Non-)Convergence Results for Predictive Coding Networks

[1:00] Representation Topology Divergence: A Method for Comparing Neural Network Representations.

Orals 1:05-1:25

[1:05] Measuring Representational Robustness of Neural Networks Through Shared Invariances

Spotlights 1:25-2:00

[1:25] The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention

[1:30] Flowformer: Linearizing Transformers with Conservation Flows

[1:35] Spatial-Channel Token Distillation for Vision MLPs

[1:40] Neurocoder: General-Purpose Computation Using Stored Neural Programs

[1:45] Improving Transformers with Probabilistic Attention Keys

[1:50] Rethinking Attention-Model Explainability through Faithfulness Violation Test

[1:55] AGNAS: Attention-Guided Micro- and Macro-Architecture Search

(ends 2:00 PM)

T: Online Learning and Bandits [12:30-2:00]

Spotlights 12:30-1:05

[12:30] Nearly Optimal Catoni’s M-estimator for Infinite Variance

[12:35] Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk

[12:40] Local Linear Convergence of Douglas-Rachford for Linear Programming: a Probabilistic Analysis

[12:45] Contextual Information-Directed Sampling

[12:50] Breaking the

$\sqrt{T}$ Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits

[12:55] Universal and data-adaptive algorithms for model selection in linear contextual bandits

[1:00] Regret Minimization with Performative Feedback

Orals 1:05-1:25

[1:05] A Simple yet Universal Strategy for Online Convex Optimization

Spotlights 1:25-2:00

[1:25] Deep Hierarchy in Bandits

[1:30] Distributionally-Aware Kernelized Bandit Problems for Risk Aversion

[1:35] Asymptotically-Optimal Gaussian Bandits with Side Observations

[1:40] Learning from a Learning User for Optimal Recommendations

[1:45] Thresholded Lasso Bandit

[1:50] Versatile Dueling Bandits: Best-of-both World Analyses for Learning from Relative Preferences

[1:55] Decentralized Online Convex Optimization in Networked Systems

(ends 2:00 PM)

Applications/Optimization [12:30-2:00]

Spotlights 12:30-1:00

[12:30] GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

[12:35] Deploying Convolutional Networks on Untrusted Platforms Using 2D Holographic Reduced Representations

[12:40] Object Permanence Emerges in a Random Walk along Memory

[12:45] Flow-Guided Sparse Transformer for Video Deblurring

[12:50] N-Penetrate: Active Learning of Neural Collision Handler for Complex 3D Mesh Deformations

[12:55] Staged Training for Transformer Language Models

Orals 1:00-1:20

[1:00] Near-Exact Recovery for Tomographic Inverse Problems via Deep Learning

Spotlights 1:20-2:00

[1:20] Self-supervised learning with random-projection quantizer for speech recognition

[1:25] Learning Multiscale Transformer Models for Sequence Generation

[1:30] NP-Match: When Neural Processes meet Semi-Supervised Learning

[1:35] Proximal and Federated Random Reshuffling

[1:40] Federated Learning with Partial Model Personalization

[1:45] A Stochastic Multi-Rate Control Framework For Modeling Distributed Optimization Algorithms

[1:50] Tackling Data Heterogeneity: A New Unified Framework for Decentralized SGD with Sample-induced Topology

[1:55] Iterative Double Sketching for Faster Least-Squares Optimization

(ends 2:00 PM)

Applications/MISC [12:30-2:00]

Spotlights 12:30-1:05

[12:30] Revisiting End-to-End Speech-to-Text Translation From Scratch

[12:35] Data Scaling Laws in NMT: The Effect of Noise and Architecture

[12:40] Dialog Inpainting: Turning Documents into Dialogs

[12:45] Safe Exploration for Efficient Policy Evaluation and Comparison

[12:50] Adversarial Attacks on Gaussian Process Bandits

[12:55] GALAXY: Graph-based Active Learning at the Extreme

[1:00] When Are Linear Stochastic Bandits Attackable?

Orals 1:05-1:25

[1:05] UniRank: Unimodal Bandit Algorithms for Online Ranking

Spotlights 1:25-2:00

[1:25] Correlation Clustering via Strong Triadic Closure Labeling: Fast Approximation Algorithms and Practical Lower Bounds

[1:30] Interactive Correlation Clustering with Existential Cluster Constraints

[1:35] Simultaneous Graph Signal Clustering and Graph Learning

[1:40] Bregman Power k-Means for Clustering Exponential Family Data

[1:45] SpaceMAP: Visualizing High-Dimensional Data by Space Expansion

[1:50] Unsupervised Ground Metric Learning Using Wasserstein Singular Vectors

[1:55] Understanding Doubly Stochastic Clustering

(ends 2:00 PM)

Optimization/Reinforcement Learning [12:30-2:00]

Spotlights 12:30-1:05

[12:30] Learning to Cut by Looking Ahead: Cutting Plane Selection via Imitation Learning

[12:35] A Regret Minimization Approach to Multi-Agent Control

[12:40] Multi-slots Online Matching with High Entropy

[12:45] Decision-Focused Learning: Through the Lens of Learning to Rank

[12:50] On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces

[12:55] Asking for Knowledge (AFK): Training RL Agents to Query External Knowledge Using Language

[1:00] Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning

Orals 1:05-1:25

[1:05] An Analytical Update Rule for General Policy Optimization

Spotlights 1:25-2:00

[1:25] Making Linear MDPs Practical via Contrastive Representation Learning

[1:30] Flow-based Recurrent Belief State Learning for POMDPs

[1:35] A Parametric Class of Approximate Gradient Updates for Policy Optimization

[1:40] Retrieval-Augmented Reinforcement Learning

[1:45] Robust Policy Learning over Multiple Uncertainty Sets

[1:50] Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent RL

[1:55] Learning Dynamics and Generalization in Deep Reinforcement Learning

(ends 2:00 PM)

Reinforcement Learning/Optimization [12:30-2:00]

Orals 12:30-12:50

[12:30] From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses

Spotlights 12:50-1:15

[12:50] Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error

[12:55] EqR: Equivariant Representations for Data-Efficient Reinforcement Learning

[1:00] Imitation Learning by Estimating Expertise of Demonstrators

[1:05] Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments

[1:10] Off-Policy Evaluation for Large Action Spaces via Embeddings

Orals 1:15-1:35

[1:15] Online Decision Transformer

Spotlights 1:35-2:00

[1:35] Learning-based Optimisation of Particle Accelerators Under Partial Observability Without Real-World Training

[1:40] How to Leverage Unlabeled Data in Offline Reinforcement Learning

[1:45] Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning

[1:50] Lightweight Projective Derivative Codes for Compressed Asynchronous Gradient Descent

[1:55] Compressed-VFL: Communication-Efficient Learning with Vertically Partitioned Data

(ends 2:00 PM)

Social Aspects/Optimization [12:30-2:00]

Orals 12:30-12:50

[12:30] Generalized Strategic Classification and the Case of Aligned Incentives

Spotlights 12:50-1:10

[12:50] Improving Screening Processes via Calibrated Subset Selection

[12:55] On the Convergence of the Shapley Value in Parametric Bayesian Learning Games

[1:00] Data-SUITE: Data-centric identification of in-distribution incongruous examples

[1:05] Counterfactual Prediction for Outcome-Oriented Treatments

Orals 1:10-1:30

[1:10] Optimal Algorithms for Mean Estimation under Local Differential Privacy

Spotlights 1:30-1:55

[1:30] Least Squares Estimation using Sketched Data with Heteroskedastic Errors

[1:35] Debiaser Beware: Pitfalls of Centering Regularized Transport Maps

[1:40] Bregman Proximal Langevin Monte Carlo via Bregman--Moreau Envelopes

[1:45] Active Nearest Neighbor Regression Through Delaunay Refinement

[1:50] A Convergence Theory for SVGD in the Population Limit under Talagrand's Inequality T1

(ends 2:00 PM)

Optimization/Theory [12:30-2:00]

Spotlights 12:30-1:00

[12:30] ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training

[12:35] Federated Learning with Label Distribution Skew via Logits Calibration

[12:40] Adaptive Random Walk Gradient Descent for Decentralized Optimization

[12:45] POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging

[12:50] Secure Distributed Training at Scale

[12:55] ASAP.SGD: Instance-based Adaptiveness to Staleness in Asynchronous SGD

Orals 1:00-1:20

[1:00] Anarchic Federated Learning

Spotlights 1:20-1:55

[1:20] Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning

[1:25] Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach

[1:30] Sketching Algorithms and Lower Bounds for Ridge Regression

[1:35] On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning

[1:40] Utility Theory for Sequential Decision Making

[1:45] Online Learning with Knapsacks: the Best of Both Worlds

[1:50] Optimal Clustering with Noisy Queries via Multi-Armed Bandit

(ends 2:00 PM)

Optimization/Probabilistic Methods [12:30-2:00]

Spotlights 12:30-1:00

[12:30] Global Optimization Networks

[12:35] Generalized Federated Learning via Sharpness Aware Minimization

[12:40] Delay-Adaptive Step-sizes for Asynchronous Learning

[12:45] FedScale: Benchmarking Model and System Performance of Federated Learning at Scale

[12:50] Learning Augmented Binary Search Trees

[12:55] Communication-efficient Distributed Learning for Large Batch Optimization

Orals 1:00-1:20

[1:00] Born-Infeld (BI) for AI: Energy-Conserving Descent (ECD) for Optimization

Spotlights 1:20-1:55

[1:20] A Simple Guard for Learned Optimizers

[1:25] An Exact Symbolic Reduction of Linear Smart Predict+Optimize to Mixed Integer Linear Programming

[1:30] Multi-Level Branched Regularization for Federated Learning

[1:35] Revisiting the Effects of Stochasticity for Hamiltonian Samplers

[1:40] Scaling Structured Inference with Randomization

[1:45] Discrete Tree Flows via Tree-Structured Permutations

[1:50] Calibrated and Sharp Uncertainties in Deep Learning via Density Estimation

(ends 2:00 PM)

2 p.m.

Reception:

ICML Reception

(ends 3:00 PM)

3 p.m.

Poster Session 3 [3:00-5:00]

Posters 3:00-5:00

Structure Preserving Neural Networks: A Case Study in the Entropy Closure of the Boltzmann Equation

Composing Partial Differential Equations with Physics-Aware Neural Networks

Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval

Towards Coherent and Consistent Use of Entities in Narrative Generation

Optimally Controllable Perceptual Lossy Compression

Learning to Solve PDE-constrained Inverse Problems with Graph Networks

ModLaNets: Learning Generalisable Dynamics via Modularity and Physical Inductive Bias

Learning to Estimate and Refine Fluid Motion with Physical Dynamics

Tractable Dendritic RNNs for Reconstructing Nonlinear Dynamical Systems

An Intriguing Property of Geophysics Inversion

Particle Transformer for Jet Tagging

BabelTower: Learning to Auto-parallelized Program Translation

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

On Distribution Shift in Learning-based Bug Detectors

A Context-Integrated Transformer-Based Neural Network for Auction Design

Domain Adaptation for Time Series Forecasting via Attention Sharing

Continuous-Time Modeling of Counterfactual Outcomes Using Neural Controlled Differential Equations

Disentangling Disease-related Representation from Obscure for Disease Prediction

Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization

Blocks Assemble! Learning to Assemble with Large-Scale Structured Reinforcement Learning

Learning of Cluster-based Feature Importance for Electronic Health Record Time-series

Do Differentiable Simulators Give Better Policy Gradients?

Adaptive Conformal Predictions for Time Series

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Rethinking Graph Neural Networks for Anomaly Detection

Fast Aquatic Swimmer Optimization with Differentiable Projective Dynamics and Neural Network Hydrodynamic Models

Proving Theorems using Incremental Learning and Hindsight Experience Replay

Discovering Generalizable Spatial Goal Representations via Graph-based Active Reward Learning

Neural Inverse Kinematic

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Deploying Convolutional Networks on Untrusted Platforms Using 2D Holographic Reduced Representations

Object Permanence Emerges in a Random Walk along Memory

Flow-Guided Sparse Transformer for Video Deblurring

N-Penetrate: Active Learning of Neural Collision Handler for Complex 3D Mesh Deformations

Staged Training for Transformer Language Models

Near-Exact Recovery for Tomographic Inverse Problems via Deep Learning

Self-supervised learning with random-projection quantizer for speech recognition

Learning Multiscale Transformer Models for Sequence Generation

NP-Match: When Neural Processes meet Semi-Supervised Learning

Revisiting End-to-End Speech-to-Text Translation From Scratch

Data Scaling Laws in NMT: The Effect of Noise and Architecture

Dialog Inpainting: Turning Documents into Dialogs

Does the Data Induce Capacity Control in Deep Learning?

Fighting Fire with Fire: Avoiding DNN Shortcuts through Priming

Memory-Based Model Editing at Scale

Winning the Lottery Ahead of Time: Efficient Early Network Pruning

Active Learning on a Budget: Opposite Strategies Suit High and Low Budgets

AutoSNN: Towards Energy-Efficient Spiking Neural Networks

Overcoming Oscillations in Quantization-Aware Training

Dataset Condensation via Efficient Synthetic-Data Parameterization

Searching for BurgerFormer with Micro-Meso-Macro Space Design

Multi-scale Feature Learning Dynamics: Insights for Double Descent

Dataset Condensation with Contrastive Signals

Equivariant Priors for compressed sensing with unknown orientation

Injecting Logical Constraints into Neural Networks via Straight-Through Estimators

Prioritized Training on Points that are Learnable, Worth Learning, and not yet Learnt

A Neural Tangent Kernel Perspective of GANs

Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models

Neural Inverse Transform Sampler

Antibody-Antigen Docking and Design via Hierarchical Structure Refinement

Diffusion Models for Adversarial Purification

Gaussian Mixture Variational Autoencoder with Contrastive Learning for Multi-Label Classification

VarScene: A Deep Generative Model for Realistic Scene Graph Synthesis

It’s Raw! Audio Generation with State-Space Models

Unsupervised Image Representation Learning with Deep Latent Particles

Learning Efficient and Robust Ordinary Differential Equations via Invertible Neural Networks

Neuro-Symbolic Hierarchical Rule Induction

General-purpose, long-context autoregressive modeling with Perceiver AR

Marginal Tail-Adaptive Normalizing Flows

SkexGen: Autoregressive Generation of CAD Construction Sequences with Disentangled Codebooks

NeuroFluid: Fluid Dynamics Grounding with Particle-Driven Neural Radiance Fields

Equivariance versus Augmentation for Spherical Images

Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training

Neural Network Poisson Models for Behavioural and Neural Spike Train Data

A Branch and Bound Framework for Stronger Adversarial Attacks of ReLU Networks

GACT: Activation Compressed Training for Generic Network Architectures

Fast Finite Width Neural Tangent Kernel

G-Mixup: Graph Data Augmentation for Graph Classification

Universal Hopfield Networks: A General Framework for Single-Shot Associative Memory Models

Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts

Continual Learning with Guarantees via Weight Interval Constraints

Blurs Behave Like Ensembles: Spatial Smoothings to Improve Accuracy, Uncertainty, and Robustness

Breaking Down Out-of-Distribution Detection: Many Methods Based on OOD Training Data Estimate a Combination of the Same Core Quantities

Comprehensive Analysis of Negative Sampling in Knowledge Graph Representation Learning

Linearity Grafting: Relaxed Neuron Pruning Helps Certifiable Robustness

Adversarial Masking for Self-Supervised Learning

Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance

OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Multirate Training of Neural Networks

Variational Wasserstein gradient flow

Building Robust Ensembles via Margin Boosting

Investigating Generalization by Controlling Normalized Margin

Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation

VLUE: A Multi-Task Multi-Dimension Benchmark for Evaluating Vision-Language Pre-training

Let Invariant Rationale Discovery Inspire Graph Contrastive Learning

Graph Neural Architecture Search Under Distribution Shifts

How Powerful are Spectral Graph Neural Networks

Constraint-based graph network simulator

PACE: A Parallelizable Computation Encoder for Directed Acyclic Graphs

Structure-Aware Transformer for Graph Representation Learning

Ripple Attention for Visual Perception with Sub-quadratic Complexity

Self-supervised Models are Good Teaching Assistants for Vision Transformers

Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations

In defense of dual-encoders for neural ranking

From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers

Linear Complexity Randomized Self-attention Mechanism

Efficient Representation Learning via Adaptive Context Pooling

Toward Compositional Generalization in Object-Oriented World Modeling

Fast Population-Based Reinforcement Learning on a Single Machine

NeuralEF: Deconstructing Kernels by Deep Neural Networks

Visual Attention Emerges from Recurrent Sparse Reconstruction

Transformer Quality in Linear Time

What Dense Graph Do You Need for Self-Attention?

Dual Decomposition of Convex Optimization Layers for Consistent Attention in Medical Images

Multi Resolution Analysis (MRA) for Approximate Self-Attention

Dynamic Topic Models for Temporal Document Networks

A Functional Information Perspective on Model Interpretation

Be Like Water: Adaptive Floating Point for Machine Learning

Lie Point Symmetry Data Augmentation for Neural PDE Solvers

Bregman Neural Networks

Quantifying and Learning Linear Symmetry-Based Disentanglement

Exploiting Redundancy: Separable Group Convolutional Networks on Lie Groups

PDO-s3DCNNs: Partial Differential Operator Based Steerable 3D CNNs

Utilizing Expert Features for Contrastive Learning of Time-Series Representations

(Non-)Convergence Results for Predictive Coding Networks

Representation Topology Divergence: A Method for Comparing Neural Network Representations.

Measuring Representational Robustness of Neural Networks Through Shared Invariances

The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention

Flowformer: Linearizing Transformers with Conservation Flows

Spatial-Channel Token Distillation for Vision MLPs

Neurocoder: General-Purpose Computation Using Stored Neural Programs

Improving Transformers with Probabilistic Attention Keys

Rethinking Attention-Model Explainability through Faithfulness Violation Test

AGNAS: Attention-Guided Micro- and Macro-Architecture Search

Convergence of Invariant Graph Networks

Rich Feature Construction for the Optimization-Generalization Dilemma

NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework

Resilient and Communication Efficient Learning for Heterogeneous Federated Systems

Augment with Care: Contrastive Learning for Combinatorial Problems

Cycle Representation Learning for Inductive Relation Prediction

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Do More Negative Samples Necessarily Hurt In Contrastive Learning?

MetAug: Contrastive Learning via Meta Feature Augmentation

Investigating Why Contrastive Learning Benefits Robustness against Label Noise

Contrastive Learning with Boosted Memorization

Identity-Disentangled Adversarial Augmentation for Self-supervised Learning

Interventional Contrastive Learning with Meta Semantic Regularizer

On the Surrogate Gap between Contrastive and Supervised Losses

Exploring the Gap between Collapsed & Whitened Features in Self-Supervised Learning

Locally Sparse Neural Networks for Tabular Biomedical Data

Dual Perspective of Label-Specific Feature Learning for Multi-Label Classification

Detecting Corrupted Labels Without Training a Model to Predict

Prototype-Anchored Learning for Learning with Imperfect Annotations

Learning to Predict Graphs with Fused Gromov-Wasserstein Barycenters

Deep Safe Incomplete Multi-view Clustering: Theorem and Algorithm

Estimating Instance-dependent Bayes-label Transition Matrix using a Deep Neural Network

Invariant Ancestry Search

Unaligned Supervision for Automatic Music Transcription in The Wild

Fourier Learning with Cyclical Data

Linear Adversarial Concept Erasure

Score Matching Enables Causal Discovery of Nonlinear Additive Noise Models

Provable Domain Generalization via Invariant-Feature Subspace Recovery

Subspace Learning for Effective Meta-Learning

Continual Learning via Sequential Function-Space Variational Inference

Efficient Test-Time Model Adaptation without Forgetting

A Hierarchical Transitive-Aligned Graph Kernel for Un-attributed Graphs

Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time

Random Gegenbauer Features for Scalable Kernel Methods

Robust Meta-learning with Sampling Noise and Label Noise via Eigen-Reptile

Functional Output Regression with Infimal Convolution: Exploring the Huber and

$\epsilon$ -insensitive Losses

Measuring dissimilarity with diffeomorphism invariance

Importance Weighted Kernel Bayes' Rule

An Asymptotic Test for Conditional Independence using Analytic Kernel Embeddings

Nyström Kernel Mean Embeddings

Distribution Regression with Sliced Wasserstein Kernels

Learning Stable Classifiers by Transferring Unstable Features

Data-Efficient Double-Win Lottery Tickets from Robust Pre-training

Attentional Meta-learners for Few-shot Polythetic Classification

C*-algebra Net: A New Approach Generalizing Neural Network Parameters to C*-algebra

Nonlinear Feature Diffusion on Hypergraphs

Kernel Methods for Radial Transformed Compositional Data with Many Zeros

Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning

Fast Provably Robust Decision Trees and Boosting

Order Constraints in Optimal Transport

Sublinear-Time Clustering Oracle for Signed Graphs

PAC-Bayesian Bounds on Rate-Efficient Classifiers

More Efficient Sampling for Tensor Decomposition With Worst-Case Guarantees

Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning

On the Convergence of Local Stochastic Compositional Gradient Descent with Momentum

SPDY: Accurate Pruning with Speedup Guarantees

Flashlight: Enabling Innovation in Tools for Machine Learning

On the Robustness of CountSketch to Adaptive Inputs

Fast-Rate PAC-Bayesian Generalization Bounds for Meta-Learning

Wide Neural Networks Forget Less Catastrophically

A Unified View on PAC-Bayes Bounds for Meta-Learning

MAML and ANIL Provably Learn Representations

C-MinHash: Improving Minwise Hashing with Circulant Permutation

Proximal Denoiser for Convergent Plug-and-Play Optimization with Nonconvex Regularization

Safe Exploration for Efficient Policy Evaluation and Comparison

Adversarial Attacks on Gaussian Process Bandits

GALAXY: Graph-based Active Learning at the Extreme

When Are Linear Stochastic Bandits Attackable?

UniRank: Unimodal Bandit Algorithms for Online Ranking

Correlation Clustering via Strong Triadic Closure Labeling: Fast Approximation Algorithms and Practical Lower Bounds

Interactive Correlation Clustering with Existential Cluster Constraints

Simultaneous Graph Signal Clustering and Graph Learning

Bregman Power k-Means for Clustering Exponential Family Data

SpaceMAP: Visualizing High-Dimensional Data by Space Expansion

Unsupervised Ground Metric Learning Using Wasserstein Singular Vectors

Understanding Doubly Stochastic Clustering

Faster Fundamental Graph Algorithms via Learned Predictions

Practical Almost-Linear-Time Approximation Algorithms for Hybrid and Overlapping Graph Clustering

Fair and Fast k-Center Clustering for Data Summarization

Online and Consistent Correlation Clustering

Generalized Leverage Scores: Geometric Interpretation and Applications

Adapting k-means Algorithms for Outliers

Accelerated, Optimal and Parallel: Some results on model-based stochastic optimization

Online Algorithms with Multiple Predictions

Parsimonious Learning-Augmented Caching

RUMs from Head-to-Head Contests

Quant-BnB: A Scalable Branch-and-Bound Method for Optimal Decision Trees with Continuous Features

Robustness in Multi-Objective Submodular Optimization: a Quantile Approach

Streaming Algorithm for Monotone k-Submodular Maximization with Cardinality Constraints

Adaptive Accelerated (Extra-)Gradient Methods with Variance Reduction

Adaptive Second Order Coresets for Data-efficient Machine Learning

Nesterov Accelerated Shuffling Gradient Method for Convex Optimization

Efficient Low Rank Convex Bounds for Pairwise Discrete Graphical Models

Deletion Robust Submodular Maximization over Matroids

The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks

Instance Dependent Regret Analysis of Kernelized Bandits

Proximal and Federated Random Reshuffling

Federated Learning with Partial Model Personalization

A Stochastic Multi-Rate Control Framework For Modeling Distributed Optimization Algorithms

Tackling Data Heterogeneity: A New Unified Framework for Decentralized SGD with Sample-induced Topology

Iterative Double Sketching for Faster Least-Squares Optimization

Learning to Cut by Looking Ahead: Cutting Plane Selection via Imitation Learning

A Regret Minimization Approach to Multi-Agent Control

Multi-slots Online Matching with High Entropy

Decision-Focused Learning: Through the Lens of Learning to Rank

Lightweight Projective Derivative Codes for Compressed Asynchronous Gradient Descent

Compressed-VFL: Communication-Efficient Learning with Vertically Partitioned Data

Least Squares Estimation using Sketched Data with Heteroskedastic Errors

Debiaser Beware: Pitfalls of Centering Regularized Transport Maps

Bregman Proximal Langevin Monte Carlo via Bregman--Moreau Envelopes

Active Nearest Neighbor Regression Through Delaunay Refinement

A Convergence Theory for SVGD in the Population Limit under Talagrand's Inequality T1

ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training

Federated Learning with Label Distribution Skew via Logits Calibration

Adaptive Random Walk Gradient Descent for Decentralized Optimization

POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging

Secure Distributed Training at Scale

ASAP.SGD: Instance-based Adaptiveness to Staleness in Asynchronous SGD

Anarchic Federated Learning

Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning

Global Optimization Networks

Generalized Federated Learning via Sharpness Aware Minimization

Delay-Adaptive Step-sizes for Asynchronous Learning

FedScale: Benchmarking Model and System Performance of Federated Learning at Scale

Learning Augmented Binary Search Trees

Communication-efficient Distributed Learning for Large Batch Optimization

Born-Infeld (BI) for AI: Energy-Conserving Descent (ECD) for Optimization

A Simple Guard for Learned Optimizers

An Exact Symbolic Reduction of Linear Smart Predict+Optimize to Mixed Integer Linear Programming

Multi-Level Branched Regularization for Federated Learning

Gaussian Process Uniform Error Bounds with Unknown Hyperparameters for Safety-Critical Applications

Input Dependent Sparse Gaussian Processes

AutoIP: A United Framework to Integrate Physics into Gaussian Processes

Stochastic Deep Networks with Linear Competing Units for Model-Agnostic Meta-Learning

Nonparametric Factor Trajectory Learning for Dynamic Tensor Decomposition

Nonparametric Embeddings of Sparse High-Order Interaction Events

Adapting the Linearised Laplace Model Evidence for Modern Deep Learning

NOMU: Neural Optimization-based Model Uncertainty

Bayesian Model Selection, the Marginal Likelihood, and Generalization

Revisiting the Effects of Stochasticity for Hamiltonian Samplers

Scaling Structured Inference with Randomization

Discrete Tree Flows via Tree-Structured Permutations

Calibrated and Sharp Uncertainties in Deep Learning via Density Estimation

The Importance of Non-Markovianity in Maximum State Entropy Exploration

Continuous Control with Action Quantization from Demonstrations

Plan Your Target and Learn Your Skills: Transferable State-Only Imitation Learning via Decoupled Policy Optimization

Inverse Contextual Bandits: Learning How Behavior Evolves over Time

Balancing Sample Efficiency and Suboptimality in Inverse Reinforcement Learning

Towards Uniformly Superhuman Autonomy via Subdominance Minimization

Causal Imitation Learning under Temporally Correlated Noise

Interactive Inverse Reinforcement Learning for Cooperative Games

A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines

Robust Imitation Learning against Variations in Environment Dynamics

Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations

Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation

The Unsurprising Effectiveness of Pre-Trained Vision Models for Control

COLA: Consistent Learning with Opponent-Learning Awareness

A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games

A Framework for Learning to Request Rich and Contextually Useful Information from Humans

Learning Stochastic Shortest Path with Linear Function Approximation

Difference Advantage Estimation for Multi-Agent Policy Gradients

Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification

Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets

Learning Bellman Complete Representations for Offline Policy Evaluation

Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning

A Simple Reward-free Approach to Constrained Reinforcement Learning

Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching

Temporal Difference Learning for Model Predictive Control

Model Selection in Batch Policy Optimization

Adversarially Trained Actor Critic for Offline Reinforcement Learning

Optimal Estimation of Policy Gradient via Double Fitted Iteration

Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes

Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory

Lagrangian Method for Q-Function Learning (with Applications to Machine Translation)

On the Role of Discount Factor in Offline Reinforcement Learning

EAT-C: Environment-Adversarial sub-Task Curriculum for Efficient Reinforcement Learning

Tell me why! Explanations support learning relational and causal structure

Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics

Generalised Policy Improvement with Geometric Policy Composition

Offline Meta-Reinforcement Learning with Online Self-Supervision

Divergence-Regularized Multi-Agent Actor-Critic

Understanding Policy Gradient Algorithms: A Sensitivity-Based Approach

Off-Policy Reinforcement Learning with Delayed Rewards

Direct Behavior Specification via Constrained Reinforcement Learning

Large Batch Experience Replay

Evolving Curricula with Regret-Based Environment Design

Robust Deep Reinforcement Learning through Bootstrapped Opportunistic Curriculum

Transformers are Meta-Reinforcement Learners

Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks

Constrained Variational Policy Optimization for Safe Reinforcement Learning

On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces

Asking for Knowledge (AFK): Training RL Agents to Query External Knowledge Using Language

Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning

An Analytical Update Rule for General Policy Optimization

Making Linear MDPs Practical via Contrastive Representation Learning

Flow-based Recurrent Belief State Learning for POMDPs

A Parametric Class of Approximate Gradient Updates for Policy Optimization

Retrieval-Augmented Reinforcement Learning

Robust Policy Learning over Multiple Uncertainty Sets

Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent RL

Learning Dynamics and Generalization in Deep Reinforcement Learning

From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses

Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error

EqR: Equivariant Representations for Data-Efficient Reinforcement Learning

Imitation Learning by Estimating Expertise of Demonstrators

Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments

Off-Policy Evaluation for Large Action Spaces via Embeddings

Online Decision Transformer

Learning-based Optimisation of Particle Accelerators Under Partial Observability Without Real-World Training

How to Leverage Unlabeled Data in Offline Reinforcement Learning

Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning

A Psychological Theory of Explainability

Task-aware Privacy Preservation for Multi-dimensional Data

Strategic Representation

Causal Conceptions of Fairness and their Consequences

Fairness with Adaptive Weights

Understanding Instance-Level Impact of Fairness Constraints

Achieving Fairness at No Utility Cost via Data Reweighing with Influence

Mitigating Gender Bias in Face Recognition using the von Mises-Fisher Mixture Model

Selective Regression under Fairness Criteria

Input-agnostic Certified Group Fairness via Gaussian Parameter Smoothing

Generalized Strategic Classification and the Case of Aligned Incentives

Improving Screening Processes via Calibrated Subset Selection

On the Convergence of the Shapley Value in Parametric Bayesian Learning Games

Data-SUITE: Data-centric identification of in-distribution incongruous examples

Counterfactual Prediction for Outcome-Oriented Treatments

Optimal Algorithms for Mean Estimation under Local Differential Privacy

First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach

Generic Coreset for Scalable Learning of Monotonic Kernels: Logistic Regression, Sigmoid and more

Shuffle Private Linear Contextual Bandits

Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity

Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes

Label Ranking through Nonparametric Regression

Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost

A Simple Unified Framework for High Dimensional Bandit Problems

A Reduction from Linear Contextual Bandits Lower Bounds to Estimations Lower Bounds

Branching Reinforcement Learning

Fast rates for noisy interpolation require rethinking the effect of inductive bias

Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path

Federated Reinforcement Learning: Linear Speedup Under Markovian Sampling

Entropic Gromov-Wasserstein between Gaussian Distributions

No-Regret Learning in Partially-Informed Auctions

On Last-Iterate Convergence Beyond Zero-Sum Games

Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the Gap Between Learning in Extensive-Form and Normal-Form Games

Fictitious Play and Best-Response Dynamics in Identical Interest and Zero-Sum Stochastic Games

On the Convergence of Inexact Predictor-Corrector Methods for Linear Programming

Nested Bandits

Information Discrepancy in Strategic Learning

UnderGrad: A Universal Black-Box Optimization Method with Almost Dimension-Free Convergence Rate Guarantees

Safe Learning in Tree-Form Sequential Decision Making: Handling Hard and Soft Constraints

A Marriage between Adversarial Team Games and 2-player Games: Enabling Abstractions, No-regret Learning, and Subgame Solving

Exact Learning of Preference Structure: Single-peaked Preferences and Beyond

Selling Data To a Machine Learner: Pricing via Costly Signaling

Hardness and Algorithms for Robust and Sparse Optimization

A Convergent and Dimension-Independent Min-Max Optimization Algorithm

Stochastic Continuous Submodular Maximization: Boosting via Non-oblivious Function

Accelerated Gradient Methods for Geodesically Convex Optimization: Tractable Algorithms and Convergence Analysis

The Complexity of k-Means Clustering when Little is Known

Iterative Hard Thresholding with Adaptive Regularization: Sparser Solutions Without Sacrificing Runtime

3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation

Nearly Optimal Catoni’s M-estimator for Infinite Variance

Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk

Local Linear Convergence of Douglas-Rachford for Linear Programming: a Probabilistic Analysis

Contextual Information-Directed Sampling

Breaking the

$\sqrt{T}$ Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits

Universal and data-adaptive algorithms for model selection in linear contextual bandits

Regret Minimization with Performative Feedback

A Simple yet Universal Strategy for Online Convex Optimization

Deep Hierarchy in Bandits

Distributionally-Aware Kernelized Bandit Problems for Risk Aversion

Asymptotically-Optimal Gaussian Bandits with Side Observations

Learning from a Learning User for Optimal Recommendations

Thresholded Lasso Bandit

Versatile Dueling Bandits: Best-of-both World Analyses for Learning from Relative Preferences

Decentralized Online Convex Optimization in Networked Systems

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach

Sketching Algorithms and Lower Bounds for Ridge Regression

On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning

Utility Theory for Sequential Decision Making

Online Learning with Knapsacks: the Best of Both Worlds

Optimal Clustering with Noisy Queries via Multi-Armed Bandit

(ends 5:00 PM)

4 p.m.

FRI 22 JUL

3:30 a.m.

Break:

Breakfast on your own

(ends 3:45 AM)

4 a.m.

Registration Check-in Desk

(ends 4:00 PM)

5 a.m.

Workshop:

The 1st Workshop on Healthcare AI and COVID-19

(ends 3:00 PM)

5:30 a.m.

Workshop:

ICML 2022 Workshop on Computational Biology

(ends 2:30 PM)

5:40 a.m.

Workshop:

Adaptive Experimental Design and Active Learning in the Real World

(ends 4:30 PM)

5:45 a.m.

Workshop:

Topology, Algebra, and Geometry in Machine Learning (TAG-ML)

(ends 3:00 PM)

Workshop:

ICML workshop on Machine Learning for Cybersecurity (ICML-ML4Cyber)

(ends 3:30 PM)

Workshop:

Spurious correlations, Invariance, and Stability (SCIS)

(ends 4:30 PM)

Workshop:

Knowledge Retrieval and Language Models

(ends 2:40 PM)

Workshop:

Workshop on Formal Verification of Machine Learning

(ends 3:00 PM)

Workshop:

Beyond Bayes: Paths Towards Universal Reasoning Systems

(ends 3:00 PM)

Workshop:

Machine Learning for Astrophysics

(ends 5:00 PM)

Workshop:

DataPerf: Benchmarking Data for Data-Centric AI

(ends 3:00 PM)

5:50 a.m.

Workshop:

New Frontiers in Adversarial Machine Learning

(ends 2:10 PM)

Workshop:

1st ICML 2022 Workshop on Safe Learning for Autonomous Driving (SL4AD)

(ends 3:15 PM)

5:55 a.m.

Workshop:

Machine Learning for Audio Synthesis

(ends 3:00 PM)

6 a.m.

Workshop:

Theory and Practice of Differential Privacy

(ends 3:00 PM)

Workshop:

Dynamic Neural Networks

(ends 5:00 PM)

Workshop:

Workshop on Machine Learning in Computational Design

(ends 2:15 PM)

Workshop:

Decision Awareness in Reinforcement Learning

(ends 5:00 PM)

Workshop:

Shift happens: Crowdsourcing metrics and test datasets beyond ImageNet

(ends 4:15 PM)

7 a.m.

Coffee Break:

Coffee Break

(ends 7:30 AM)

9 a.m.

Lunch - on your own:

Lunch

(ends 10:30 AM)

noon

Coffee Break:

Coffee Break

(ends 12:30 PM)

4 p.m.

SAT 23 JUL

3:30 a.m.

Break:

Breakfast on your own

(ends 3:45 AM)

4 a.m.

Registration Check-in Desk

(ends 9:00 AM)

5:30 a.m.

Workshop:

AI for Agent-Based Modelling (AI4ABM)

(ends 2:30 PM)

5:45 a.m.

Workshop:

Hardware-aware efficient training (HAET)

(ends 2:30 PM)

Workshop:

Complex feedback in online learning

(ends 3:00 PM)

5:50 a.m.

Workshop:

The First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward

(ends 2:30 PM)

5:55 a.m.

Workshop:

Updatable Machine Learning

(ends 2:30 PM)

Workshop:

Workshop on Human-Machine Collaboration and Teaming

(ends 2:00 PM)

6 a.m.

Workshop:

Disinformation Countermeasures and Machine Learning (DisCoML)

(ends 3:00 PM)

Workshop:

AI for Science

(ends 3:00 PM)

Workshop:

Continuous Time Perspectives in Machine Learning

(ends 3:00 PM)

Workshop:

Responsible Decision Making in Dynamic Environments

(ends 2:30 PM)

Workshop:

The ICML Expressive Vocalizations (ExVo) Workshop and Competition 2022

(ends 2:00 PM)

Workshop:

Principles of Distribution Shift (PODS)

(ends 2:40 PM)

6:15 a.m.

Workshop:

2nd Workshop on Interpretable Machine Learning in Healthcare (IMLH)

(ends 2:30 PM)

Affinity Workshop:

Queer in AI @ ICML 2022 Affinity Workshop

(ends 12:00 PM)

6:20 a.m.

Workshop:

Workshop on Distribution-Free Uncertainty Quantification

(ends 2:45 PM)

7 a.m.

Coffee Break:

Coffee Break

(ends 7:30 AM)

9 a.m.

Lunch - on your own:

Lunch on your own

(ends 10:30 AM)

noon

Coffee Break:

Coffee Break

(ends 12:30 PM)