Skip to yearly menu bar Skip to main content

Show Detail Timezone:
Filter Rooms:  

6 a.m.
7 a.m.
9 a.m.
Expo Talk Panel:
(ends 1:45 PM)
Expo Demonstration:
(ends 2:00 PM)
11:30 a.m.
Coffee Break
12:15 p.m.
1:15 p.m.
Expo Talk Panel:
(ends 2:00 PM)
2:30 p.m.
Opening Reception - Catered:
(ends 4:00 PM)

4 a.m.
5 a.m.
Affinity Workshop:
(ends 5:00 PM)
5:30 a.m.
Affinity Workshop:
(ends 3:40 PM)
6 a.m.
Affinity Workshop:
(ends 1:30 PM)
7 a.m.
Coffee Break
9 a.m.
Lunch Break - on your own
11 a.m.
Coffee Break
4 p.m.

3:30 a.m.
Breakfast on your own
4 a.m.
5:45 a.m.
(ends 6:00 AM)
6 a.m.
Invited Talk:
Weinan E
(ends 7:00 AM)
7 a.m.
Coffee Break
7:30 a.m.
Spotlights 7:30-8:05
[7:30] Differentially Private Approximate Quantiles
[7:35] Fairness Interventions as (Dis)Incentives for Strategic Manipulation
[7:40] Robust Models Are More Interpretable Because Attributions Look Normal
[7:45] Sequential Covariate Shift Detection Using Classifier Two-Sample Tests
[7:50] A Joint Exponential Mechanism For Differentially Private Top-$k$
[7:55] Transfer Learning In Differential Privacy's Hybrid-Model
[8:00] Robust Kernel Density Estimation with Median-of-Means principle
Orals 8:05-8:25
[8:05] Bounding Training Data Reconstruction in Private (Deep) Learning
Spotlights 8:25-9:00
[8:25] Plug & Play Attacks: Towards Robust and Flexible Model Inversion Attacks
[8:30] FriendlyCore: Practical Differentially Private Aggregation
[8:35] ViT-NeT: Interpretable Vision Transformers with Neural Tree Decoder
[8:40] Fishing for User Data in Large-Batch Federated Learning via Gradient Magnification
[8:45] Public Data-Assisted Mirror Descent for Private Model Training
[8:50] Low-Complexity Deep Convolutional Neural Networks on Fully Homomorphic Encryption Using Multiplexed Parallel Convolutions
[8:55] Robin Hood and Matthew Effects: Differential Privacy Has Disparate Impact on Synthetic Data
(ends 9:00 AM)
Orals 7:30-7:50
[7:30] Tackling covariate shift with node-based Bayesian neural networks
Spotlights 7:50-8:10
[7:50] Why the Rich Get Richer? On the Balancedness of Random Partition Models
[7:55] A Completely Tuning-Free and Robust Approach to Sparse Precision Matrix Estimation
[8:00] Markov Chain Monte Carlo for Continuous-Time Switching Dynamical Systems
[8:05] Calibrated Learning to Defer with One-vs-All Classifiers
Orals 8:10-8:30
[8:10] Tractable Uncertainty for Structure Learning
Spotlights 8:30-8:55
[8:30] DNA: Domain Generalization with Diversified Neural Averaging
[8:35] Unified Fourier-based Kernel and Nonlinearity Design for Equivariant Networks on Homogeneous Spaces
[8:40] DynaMixer: A Vision MLP Architecture with Dynamic Mixing
[8:45] Channel Importance Matters in Few-Shot Image Classification
[8:50] Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization
(ends 9:00 AM)
Spotlights 7:30-8:00
[7:30] Dynamic Regret of Online Markov Decision Processes
[7:35] On the Impossibility of Learning to Cooperate with Adaptive Partner Strategies in Repeated Games
[7:40] Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning
[7:45] Provable Reinforcement Learning with a Short-Term Memory
[7:50] Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer
[7:55] Mirror Learning: A Unifying Framework of Policy Optimisation
Orals 8:00-8:20
[8:00] Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP
Spotlights 8:20-8:50
[8:20] Learning Infinite-horizon Average-reward Markov Decision Process with Constraints
[8:25] A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning
[8:30] Langevin Monte Carlo for Contextual Bandits
[8:35] Prompting Decision Transformer for Few-Shot Policy Generalization
[8:40] Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning
[8:45] Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation
(ends 9:00 AM)
Orals 7:30-7:50
[7:30] Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them
Spotlights 7:50-8:10
[7:50] ShiftAddNAS: Hardware-Inspired Search for More Accurate and Efficient Neural Networks
[7:55] Provably Adversarially Robust Nearest Prototype Classifiers
[8:00] Certifying Out-of-Domain Generalization for Blackbox Functions
[8:05] Intriguing Properties of Input-Dependent Randomized Smoothing
Orals 8:10-8:30
[8:10] To Smooth or Not? When Label Smoothing Meets Noisy Labels
Spotlights 8:30-8:55
[8:30] Evaluating the Adversarial Robustness of Adaptive Test-time Defenses
[8:35] On the Generalization Analysis of Adversarial Learning
[8:40] Demystifying the Adversarial Robustness of Random Transformation Defenses
[8:45] Double Sampling Randomized Smoothing
[8:50] TPC: Transformation-Specific Smoothing for Point Cloud Models
(ends 9:00 AM)
Spotlights 7:30-8:00
[7:30] Certified Robustness Against Natural Language Attacks by Causal Intervention
[7:35] A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing
[7:40] On the Learning of Non-Autoregressive Transformers
[7:45] Latent Diffusion Energy-Based Model for Interpretable Text Modelling
[7:50] UNIREX: A Unified Learning Framework for Language Model Rationale Extraction
[7:55] Black-Box Tuning for Language-Model-as-a-Service
Orals 8:00-8:20
[8:00] Understanding Dataset Difficulty with $\mathcal{V}$-Usable Information
Spotlights 8:20-9:00
[8:20] Co-training Improves Prompt-based Learning for Large Language Models
[8:25] Directed Acyclic Transformer for Non-Autoregressive Machine Translation
[8:30] StreamingQA: A Benchmark for Adaptation to New Knowledge over Time in Question Answering Models
[8:35] Unsupervised Detection of Contextualized Embedding Bias with Application to Ideology
[8:40] Generative Cooperative Networks for Natural Language Generation
[8:45] What Language Model Architecture and Pretraining Objective Works Best for Zero-Shot Generalization?
[8:50] Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
[8:55] ROCK: Causal Inference Principles for Reasoning about Commonsense Causality
(ends 9:00 AM)
Orals 7:30-7:50
[7:30] Exact Optimal Accelerated Complexity for Fixed-Point Iterations
Spotlights 7:50-8:15
[7:50] Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions
[7:55] NysADMM: faster composite convex optimization via low-rank approximation
[8:00] FedNew: A Communication-Efficient and Privacy-Preserving Newton-Type Method for Federated Learning
[8:05] Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers
[8:10] Pairwise Conditional Gradients without Swap Steps and Sparser Kernel Herding
Orals 8:15-8:35
[8:15] Continuous-Time Analysis of Accelerated Gradient Methods via Conservation Laws in Dilated Coordinate Systems
Spotlights 8:35-9:00
[8:35] Only tails matter: Average-Case Universality and Robustness in the Convex Regime
[8:40] Batch Greenkhorn Algorithm for Entropic-Regularized Multimarginal Optimal Transport: Linear Rate of Convergence and Iteration Complexity
[8:45] Approximate Frank-Wolfe Algorithms over Graph-structured Support Sets
[8:50] Neural Fisher Discriminant Analysis: Optimal Neural Network Embeddings in Polynomial Time
[8:55] Active Sampling for Min-Max Fairness
(ends 9:00 AM)
Orals 7:30-7:50
[7:30] Online Learning for Min Sum Set Cover and Pandora’s Box
Spotlights 7:50-8:15
[7:50] Smoothed Adversarial Linear Contextual Bandits with Knapsacks
[7:55] Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback
[8:00] Thompson Sampling for (Combinatorial) Pure Exploration
[8:05] Revisiting Online Submodular Minimization: Gap-Dependent Regret Bounds, Best of Both Worlds and Adversarial Robustness
[8:10] Rotting Infinitely Many-Armed Bandits
Orals 8:15-8:35
[8:15] Batched Dueling Bandits
Spotlights 8:35-9:00
[8:35] Equivalence Analysis between Counterfactual Regret Minimization and Online Mirror Descent
[8:40] Consistent Polyhedral Surrogates for Top-k Classification and Variants
[8:45] Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models
[8:50] Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
[8:55] Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback
(ends 9:00 AM)
Spotlights 7:30-8:05
[7:30] Multi-Task Learning as a Bargaining Game
[7:35] Frustratingly Easy Transferability Estimation
[7:40] Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling
[7:45] A Difference Standardization Method for Mutual Transfer Learning
[7:50] Improving Task-free Continual Learning by Distributionally Robust Memory Evolution
[7:55] A Multi-objective / Multi-task Learning Framework Induced by Pareto Stationarity
[8:00] Sparse Invariant Risk Minimization
Orals 8:05-8:25
[8:05] Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning
Spotlights 8:25-9:00
[8:25] A Closer Look at Smoothness in Domain Adversarial Training
[8:30] Balancing Discriminability and Transferability for Source-Free Domain Adaptation
[8:35] Model Agnostic Sample Reweighting for Out-of-Distribution Learning
[8:40] Zero-shot AutoML with Pretrained Models
[8:45] Efficient Variance Reduction for Meta-learning
[8:50] Generalizing to Evolving Domains with Latent Structure-Aware Sequential Autoencoder
[8:55] Partial disentanglement for domain adaptation
(ends 9:00 AM)
Spotlights 7:30-8:05
[7:30] Structural Entropy Guided Graph Hierarchical Pooling
[7:35] Self-Supervised Representation Learning via Latent Graph Prediction
[7:40] DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural Network for Traffic Flow Forecasting
[7:45] Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets
[7:50] Omni-Granular Ego-Semantic Propagation for Self-Supervised Graph Representation Learning
[7:55] Analyzing and Mitigating Interference in Neural Architecture Search
[8:00] Reverse Engineering $\ell_p$ attacks: A block-sparse optimization approach with recovery guarantees
Orals 8:05-8:25
[8:05] Unified Scaling Laws for Routed Language Models
Spotlights 8:25-9:00
[8:25] DRAGONN: Distributed Randomized Approximate Gradients of Neural Networks
[8:30] A deep convolutional neural network that is invariant to time rescaling
[8:35] LyaNet: A Lyapunov Framework for Training Neural ODEs
[8:40] Transfer and Marginalize: Explaining Away Label Noise with Privileged Information
[8:45] On Collective Robustness of Bagging Against Data Poisoning
[8:50] Hindering Adversarial Attacks with Implicit Neural Representations
[8:55] From Noisy Prediction to True Label: Noisy Prediction Calibration via Generative Model
(ends 9:00 AM)
Spotlights 7:30-8:05
[7:30] Exploring and Exploiting Hubness Priors for High-Quality GAN Latent Sampling
[7:35] ButterflyFlow: Building Invertible Layers with Butterfly Matrices
[7:40] Controlling Conditional Language Models without Catastrophic Forgetting
[7:45] GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
[7:50] Structure-preserving GANs
[7:55] DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
[8:00] Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models
Orals 8:05-8:25
[8:05] Equivariant Diffusion for Molecule Generation in 3D
Spotlights 8:25-9:00
[8:25] Forward Operator Estimation in Generative Models with Kernel Transfer Operators
[8:30] Conditional GANs with Auxiliary Discriminative Classifier
[8:35] Improved StyleGAN-v2 based Inversion for Out-of-Distribution Images
[8:40] Matching Normalizing Flows and Probability Paths on Manifolds
[8:45] Marginal Distribution Adaptation for Discrete Sets via Module-Oriented Divergence Minimization
[8:50] Learning to Incorporate Texture Saliency Adaptive Attention to Image Cartoonization
[8:55] Region-Based Semantic Factorization in GANs
(ends 9:00 AM)
9 a.m.
Lunch Break - on your own
10:30 a.m.
Spotlights 10:30-11:00
[10:30] Online Continual Learning through Mutual Information Maximization
[10:35] Learning Iterative Reasoning through Energy Minimization
[10:40] DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks
[10:45] PoF: Post-Training of Feature Extractor for Improving Generalization
[10:50] Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation
[10:55] Set Based Stochastic Subsampling
Orals 11:00-11:20
[11:00] Monarch: Expressive Structured Matrices for Efficient and Accurate Training
Spotlights 11:20-11:55
[11:20] Generalizing to New Physical Systems via Context-Informed Dynamics Model
[11:25] Self-conditioning Pre-Trained Language Models
[11:30] TAM: Topology-Aware Margin Loss for Class-Imbalanced Node Classification
[11:35] Bitwidth Heterogeneous Federated Learning with Progressive Weight Dequantization
[11:40] Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning
[11:45] Knowledge Base Question Answering by Case-based Reasoning over Subgraphs
[11:50] When AUC meets DRO: Optimizing Partial AUC for Deep Learning with Non-Convex Convergence Guarantee
(ends 12:00 PM)
Spotlights 10:30-11:05
[10:30] Meaningfully debugging model mistakes using conceptual counterfactual explanations
[10:35] Measuring the Effect of Training Data on Deep Learning Predictions via Randomized Experiments
[10:40] Robust Counterfactual Explanations for Tree-Based Ensembles
[10:45] A Rigorous Study of Integrated Gradients Method and Extensions to Internal Neuron Attributions
[10:50] Estimating and Penalizing Induced Preference Shifts in Recommender Systems
[10:55] Framework for Evaluating Faithfulness of Local Explanations
[11:00] A Consistent and Efficient Evaluation Strategy for Attribution Methods
Orals 11:05-11:25
[11:05] Training Characteristic Functions with Reinforcement Learning: XAI-methods play Connect Four
Spotlights 11:25-12:00
[11:25] Label-Descriptive Patterns and Their Application to Characterizing Classification Errors
[11:30] XAI for Transformers: Better Explanations through Conservative Propagation
[11:35] Quantification and Analysis of Layer-wise and Pixel-wise Information Discarding
[11:40] Interpretable Off-Policy Learning via Hyperbox Search
[11:45] Neuron Dependency Graphs: A Causal Abstraction of Neural Networks
[11:50] On the Adversarial Robustness of Causal Algorithmic Recourse
[11:55] Knowledge-Grounded Self-Rationalization via Extractive and Natural Language Explanations
(ends 12:00 PM)
Spotlights 10:30-11:00
[10:30] Robust Group Synchronization via Quadratic Programming
[10:35] UAST: Uncertainty-Aware Siamese Tracking
[10:40] You Only Cut Once: Boosting Data Augmentation with a Single Cut
[10:45] Generative Modeling for Multi-task Visual Learning
[10:50] HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning
[10:55] Parametric Visual Program Induction with Function Modularization
Orals 11:00-11:20
[11:00] Path-Gradient Estimators for Continuous Normalizing Flows
Spotlights 11:20-11:55
[11:20] Variational Feature Pyramid Networks
[11:25] Deep Neural Network Fusion via Graph Matching with Applications to Model Ensemble and Federated Learning
[11:30] VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
[11:35] Neural Implicit Dictionary Learning via Mixture-of-Expert Training
[11:40] Time Is MattEr: Temporal Self-supervision for Video Transformers
[11:45] Benchmarking and Analyzing Point Cloud Classification under Corruptions
[11:50] Understanding The Robustness in Vision Transformers
(ends 12:00 PM)
Orals 10:30-10:50
[10:30] Learning Mixtures of Linear Dynamical Systems
Spotlights 10:50-11:15
[10:50] Massively Parallel $k$-Means Clustering for Perturbation Resilient Instances
[10:55] Residual-Based Sampling for Online Outlier-Robust PCA
[11:00] Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times
[11:05] Streaming Algorithms for Support-Aware Histograms
[11:10] Power-Law Escape Rate of SGD
Orals 11:15-11:35
[11:15] Generalized Results for the Existence and Consistency of the MLE in the Bradley-Terry-Luce Model
Spotlights 11:35-12:00
[11:35] Faster Algorithms for Learning Convex Functions
[11:40] Feature selection using e-values
[11:45] ActiveHedge: Hedge meets Active Learning
[11:50] One-Pass Algorithms for MAP Inference of Nonsymmetric Determinantal Point Processes
[11:55] Deciphering Lasso-based Classification Through a Large Dimensional Analysis of the Iterative Soft-Thresholding Algorithm
(ends 12:00 PM)
Spotlights 10:30-11:00
[10:30] An iterative clustering algorithm for the Contextual Stochastic Block Model with optimality guarantees
[10:35] Smoothed Adaptive Weighting for Imbalanced Semi-Supervised Learning: Improve Reliability Against Unknown Distribution Data
[10:40] Class-Imbalanced Semi-Supervised Learning with Adaptive Thresholding
[10:50] Meta-Learning Hypothesis Spaces for Sequential Decision-making
[10:55] A Tighter Analysis of Spectral Clustering, and Beyond
Orals 11:00-11:20
[11:00] Online Active Regression
Spotlights 11:20-11:55
[11:20] On Finite-Sample Identifiability of Contrastive Learning-Based Nonlinear Independent Component Analysis
[11:25] Revisiting Contrastive Learning through the Lens of Neighborhood Component Analysis: an Integrated Framework
[11:30] Open-Sampling: Exploring Out-of-Distribution data for Re-balancing Long-tailed datasets
[11:35] Confidence Score for Source-Free Unsupervised Domain Adaptation
[11:40] Gradient Based Clustering
[11:45] Global Optimization of K-Center Clustering
[11:50] Latent Outlier Exposure for Anomaly Detection with Contaminated Data
(ends 12:00 PM)
Spotlights 10:30-11:00
[10:30] Additive Gaussian Processes Revisited
[10:35] Probabilistic ODE Solutions in Millions of Dimensions
[10:40] Adaptive Gaussian Process Change Point Detection
[10:45] Volatility Based Kernels and Moving Average Means for Accurate Forecasting with Gaussian Processes
[10:50] Fenrir: Physics-Enhanced Regression for Initial Value Problems
[10:55] Variational nearest neighbor Gaussian process
Orals 11:00-11:20
[11:00] Preconditioning for Scalable Gaussian Process Hyperparameter Optimization
Spotlights 11:20-11:50
[11:20] Spectral Representation of Robustness Measures for Optimization Under Input Uncertainty
[11:25] Bayesian Optimization under Stochastic Delayed Feedback
[11:30] Bayesian Optimization for Distributionally Robust Chance-constrained Problem
[11:35] Efficient Distributionally Robust Bayesian Optimization with Worst-case Sensitivity
[11:40] Improved Convergence Rates for Sparse Approximation Methods in Kernel-Based Learning
[11:45] Scalable First-Order Bayesian Optimization via Structured Automatic Differentiation
(ends 12:00 PM)
Orals 10:30-10:50
[10:30] Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution
Spotlights 10:50-11:15
[10:50] AnyMorph: Learning Transferable Polices By Inferring Agent Morphology
[10:55] DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations
[11:00] Stabilizing Off-Policy Deep Reinforcement Learning from Pixels
[11:05] Influence-Augmented Local Simulators: a Scalable Solution for Fast Deep RL in Large Networked Systems
[11:10] CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer
Orals 11:15-11:35
[11:15] Offline RL Policies Should Be Trained to be Adaptive
Spotlights 11:35-12:00
[11:35] Lyapunov Density Models: Constraining Distribution Shift in Learning-Based Control
[11:40] PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration
[11:45] Supervised Off-Policy Ranking
[11:50] The Primacy Bias in Deep Reinforcement Learning
[11:55] Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning
(ends 12:00 PM)
Orals 10:30-10:50
[10:30] Topology-Aware Network Pruning using Multi-stage Graph Embedding and Reinforcement Learning
Spotlights 10:50-11:10
[10:50] Stochastic Reweighted Gradient Descent
[10:55] Sharpened Quasi-Newton Methods: Faster Superlinear Rate and Larger Local Convergence Neighborhood
[11:00] Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in Imaging
[11:05] FedNL: Making Newton-Type Methods Applicable to Federated Learning
Orals 11:10-11:30
[11:10] Solving Stackelberg Prediction Game with Least Squares Loss via Spherically Constrained Least Squares Reformulation
Spotlights 11:30-11:55
[11:30] Dimension-free Complexity Bounds for High-order Nonconvex Finite-sum Optimization
[11:35] Value Function based Difference-of-Convex Algorithm for Bilevel Hyperparameter Selection Problems
[11:40] Probabilistic Bilevel Coreset Selection
[11:45] Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs
[11:50] On Implicit Bias in Overparameterized Bilevel Optimization
(ends 12:00 PM)
Spotlights 10:30-11:05
[10:30] pathGCN: Learning General Graph Spatial Operators from Paths
[10:35] Graph-Coupled Oscillator Networks
[10:40] HousE: Knowledge Graph Embedding with Householder Parameterization
[10:45] Interpretable and Generalizable Graph Learning via Stochastic Attention Mechanism
[10:50] ProGCL: Rethinking Hard Negative Mining in Graph Contrastive Learning
[10:55] G$^2$CN: Graph Gaussian Convolution Networks with Concentrated Graph Filters
[11:00] SpeqNets: Sparsity-aware permutation-equivariant graph networks
Orals 11:05-11:25
[11:05] data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Spotlights 11:25-11:55
[11:25] Position Prediction as an Effective Pretraining Strategy
[11:30] Orchestra: Unsupervised Federated Learning via Globally Consistent Clustering
[11:35] Deep and Flexible Graph Neural Architecture Search
[11:40] GNNRank: Learning Global Rankings from Pairwise Comparisons via Directed Graph Neural Networks
[11:45] Large-Scale Graph Neural Architecture Search
[11:50] Optimization-Induced Graph Implicit Nonlinear Diffusion
(ends 12:00 PM)
Orals 10:30-10:50
[10:30] Robustness Implies Generalization via Data-Dependent Generalization Bounds
Spotlights 10:50-11:15
[10:50] Learning to Hash Robustly, Guaranteed
[10:55] Policy Gradient Method For Robust Reinforcement Learning
[11:00] A query-optimal algorithm for finding counterfactuals
[11:05] Linear Bandit Algorithms with Sublinear Time Complexity
[11:10] Quantum-Inspired Algorithms from Randomized Numerical Linear Algebra
Orals 11:15-11:35
[11:15] Individual Preference Stability for Clustering
Spotlights 11:35-12:00
[11:35] Correlated Quantization for Distributed Mean Estimation and Optimization
[11:40] Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms
[11:45] Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms
[11:50] The Algebraic Path Problem for Graph Metrics
[11:55] Steerable 3D Spherical Neurons
(ends 12:00 PM)
Coffee Break
12:30 p.m.
(ends 1:00 PM)
1 p.m.
Short Break
1:15 p.m.
Spotlights 1:15-1:50
[1:15] Prototype Based Classification from Hierarchy to Fairness
[1:20] Neural-Symbolic Models for Logical Queries on Knowledge Graphs
[1:25] Deep Probability Estimation
[1:30] Uncertainty Modeling in Generative Compressed Sensing
[1:35] Going Deeper into Permutation-Sensitive Graph Neural Networks
[1:40] Learning from Counterfactual Links for Link Prediction
[1:45] Training Discrete Deep Generative Models via Gapped Straight-Through Estimator
Orals 1:50-2:10
[1:50] Correct-N-Contrast: a Contrastive Approach for Improving Robustness to Spurious Correlations
Spotlights 2:10-2:45
[2:10] Principal Component Flows
[2:15] Bit Prioritization in Variational Autoencoders via Progressive Coding
[2:20] Generative Flow Networks for Discrete Probabilistic Modeling
[2:25] Diffusion bridges vector quantized variational autoencoders
[2:30] Mitigating Modality Collapse in Multimodal VAEs via Impartial Optimization
[2:35] Soft Truncation: A Universal Training Technique of Score-based Diffusion Model for High Precision Score Estimation
[2:40] Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin Attack
(ends 2:45 PM)
Spotlights 1:15-1:50
[1:15] Coordinated Double Machine Learning
[1:20] Exploiting Independent Instruments: Identification and Distribution Generalization
[1:25] Partial Counterfactual Identification from Observational and Experimental Data
[1:30] On Measuring Causal Contributions via do-interventions
[1:35] The Role of Deconfounding in Meta-learning
[1:40] CITRIS: Causal Identifiability from Temporal Intervened Sequences
[1:45] Online Balanced Experimental Design
Orals 1:50-2:10
[1:50] Minimum Cost Intervention Design for Causal Effect Identification
Spotlights 2:10-2:45
[2:10] Causal structure-based root cause analysis of outliers
[2:15] Instrumental Variable Regression with Confounder Balancing
[2:20] Causal Transformer for Estimating Counterfactual Outcomes
[2:25] Causal Inference Through the Structural Causal Marginal Problem
[2:30] Functional Generalized Empirical Likelihood Estimation for Conditional Moment Restrictions
[2:35] Matching Learned Causal Effects of Neural Networks with Domain Priors
[2:40] Inferring Cause and Effect in the Presence of Heteroscedastic Noise
(ends 2:45 PM)
Orals 1:15-1:35
[1:15] POEM: Out-of-Distribution Detection with Posterior Sampling
Spotlights 1:35-1:55
[1:35] Selective Network Linearization for Efficient Private Inference
[1:40] Efficient Computation of Higher-Order Subgraph Attribution via Message Passing
[1:45] A Theoretical Analysis on Independence-driven Importance Weighting for Covariate-shift Generalization
[1:50] Modular Conformal Calibration
Orals 1:55-2:15
[1:55] Rethinking Image-Scaling Attacks: The Interplay Between Vulnerabilities in Machine Learning Systems
Spotlights 2:15-2:40
[2:15] Context-Aware Drift Detection
[2:20] Accelerating Shapley Explanation via Contributive Cooperator Selection
[2:25] An Equivalence Between Data Poisoning and Byzantine Gradient Attacks
[2:30] DAVINZ: Data Valuation using Deep Neural Networks at Initialization
[2:35] Sample Efficient Learning of Predictors that Complement Humans
(ends 2:45 PM)
Orals 1:15-1:35
[1:15] H-Consistency Bounds for Surrogate Loss Minimizers
Spotlights 1:35-2:00
[1:35] Learning General Halfspaces with Adversarial Label Noise via Online Gradient Descent
[1:40] The Teaching Dimension of Regularized Kernel Learners
[1:45] Sparse Mixed Linear Regression with Guarantees: Taming an Intractable Problem with Invex Relaxation
[1:50] TURF: Two-Factor, Universal, Robust, Fast Distribution Learning Algorithm
[1:55] Multiclass learning with margin: exponential rates with no bias-variance trade-off
Orals 2:00-2:20
[2:00] Refined Convergence Rates for Maximum Likelihood Estimation under Finite Mixture Models
Spotlights 2:20-2:45
[2:20] High Probability Guarantees for Nonconvex Stochastic Gradient Descent with Heavy Tails
[2:25] An Initial Alignment between Neural Network and Target is Needed for Gradient Descent to Learn
[2:30] Inductive Biases and Variable Creation in Self-Attention Mechanisms
[2:35] Topology-aware Generalization of Decentralized SGD
[2:40] Understanding Gradient Descent on the Edge of Stability in Deep Learning
(ends 2:45 PM)
Spotlights 1:15-1:45
[1:15] Bayesian Nonparametric Learning for Point Processes with Spatial Homogeneity: A Spatial Analysis of NBA Shot Locations
[1:20] On the Effects of Artificial Data Modification
[1:25] Deep Squared Euclidean Approximation to the Levenshtein Distance for DNA Storage
[1:30] How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating and Auditing Generative Models
[1:35] Error-driven Input Modulation: Solving the Credit Assignment Problem without a Backward Pass
[1:40] How to Train Your Wide Neural Network Without Backprop: An Input-Weight Alignment Perspective
Orals 1:45-2:05
[1:45] Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and Fairness
Spotlights 2:05-2:45
[2:05] Describing Differences between Text Distributions with Natural Language
[2:10] Distinguishing rule- and exemplar-based generalization in learning systems
[2:15] Burst-Dependent Plasticity and Dendritic Amplification Support Target-Based Learning and Hierarchical Imitation Learning
[2:20] A Deep Learning Approach for the Segmentation of Electroencephalography Data in Eye Tracking Applications
[2:25] Minimizing Control for Credit Assignment with Strong Feedback
[2:30] Self-Supervised Models of Audio Effectively Explain Human Cortical Responses to Speech
[2:35] Towards Scaling Difference Target Propagation by Learning Backprop Targets
[2:40] Content Addressable Memory Without Catastrophic Forgetting by Heteroassociation with a Fixed Scaffold
(ends 2:45 PM)
Orals 1:15-1:35
[1:15] Scalable MCMC Sampling for Nonsymmetric Determinantal Point Processes
Spotlights 1:35-2:00
[1:35] Robust SDE-Based Variational Formulations for Solving Linear PDEs via Deep Learning
[1:40] Hessian-Free High-Resolution Nesterov Acceleration For Sampling
[1:45] LSB: Local Self-Balancing MCMC in Discrete Spaces
[1:50] A Langevin-like Sampler for Discrete Distributions
[1:55] Scalable Spike-and-Slab
Orals 2:00-2:20
[2:00] Nonparametric Involutive Markov Chain Monte Carlo
Spotlights 2:20-2:45
[2:20] Continual Repeated Annealed Flow Transport Monte Carlo
[2:25] Algorithms for the Communication of Samples
[2:30] Low-Precision Stochastic Gradient Langevin Dynamics
[2:35] Fast Relative Entropy Coding with A* coding
[2:40] Accurate Quantization of Measures via Interacting Particle-based Optimization
(ends 2:45 PM)
Spotlights 1:15-1:45
[1:15] Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective
[1:20] Convergence and Recovery Guarantees of the K-Subspaces Method for Subspace Clustering
[1:25] Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the $O(\epsilon^{-7/4})$ Complexity
[1:30] Understanding the unstable convergence of gradient descent
[1:35] Federated Minimax Optimization: Improved Convergence Analyses and Algorithms
[1:40] Inductive Matrix Completion: No Bad Local Minima and a Fast Algorithm
Orals 1:45-2:05
[1:45] FedNest: Federated Bilevel, Minimax, and Compositional Optimization
Spotlights 2:05-2:35
[2:05] AdaGrad Avoids Saddle Points
[2:10] Fast and Provable Nonconvex Tensor RPCA
[2:15] On Convergence of Gradient Descent Ascent: A Tight Local Analysis
[2:20] Convergence Rates of Non-Convex Stochastic Gradient Descent Under a Generic Lojasiewicz Condition and Local Smoothness
[2:25] A Single-Loop Gradient Descent and Perturbed Ascent Algorithm for Nonconvex Functional Constrained Optimization
[2:30] Anticorrelated Noise Injection for Improved Generalization
(ends 2:45 PM)
Spotlights 1:15-1:45
[1:15] Model-Free Opponent Shaping
[1:20] Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning
[1:25] Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation
[1:30] Disentangling Sources of Risk for Distributional Multi-Agent Reinforcement Learning
[1:35] Scalable Deep Reinforcement Learning Algorithms for Mean Field Games
[1:40] Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning
Orals 1:45-2:05
[1:45] Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence
Spotlights 2:05-2:45
[2:05] Self-Organized Polynomial-Time Coordination Graphs
[2:10] Individual Reward Assisted Multi-Agent Reinforcement Learning
[2:15] Generalized Beliefs for Cooperative AI
[2:20] Greedy when Sure and Conservative when Uncertain about the Opponents
[2:25] Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning
[2:30] Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy
[2:35] Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games
[2:40] Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis
(ends 2:45 PM)
Spotlights 1:15-1:45
[1:15] Modeling Irregular Time Series with Continuous Recurrent Units
[1:20] TACTiS: Transformer-Attentional Copulas for Time Series
[1:25] CerDEQ: Certifiable Deep Equilibrium Model
[1:30] Approximately Equivariant Networks for Imperfectly Symmetric Dynamics
[1:35] IDYNO: Learning Nonparametric DAGs from Interventional Dynamic Data
[1:40] GSmooth: Certified Robustness against Semantic Transformations via Generalized Randomized Smoothing
Orals 1:45-2:05
[1:45] Neural Laplace: Learning diverse classes of differential equations in the Laplace domain
Spotlights 2:05-2:45
[2:05] Improving Language Models by Retrieving from Trillions of Tokens
[2:10] Closed-Form Diffeomorphic Transformations for Time Series Alignment
[2:15] Removing Batch Normalization Boosts Adversarial Training
[2:20] Forget-free Continual Learning with Winning Subnetworks
[2:25] FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting
[2:30] Adversarial Robustness against Multiple and Single $l_p$-Threat Models via Quick Fine-Tuning of Robust Classifiers
[2:35] On the Practicality of Deterministic Epistemic Uncertainty
[2:40] Combining Diverse Feature Priors
(ends 2:45 PM)
Orals 1:15-1:35
[1:15] Cooperative Online Learning in Stochastic and Adversarial MDPs
Spotlights 1:35-2:00
[1:35] Simple and near-optimal algorithms for hidden stratification and multi-group learning
[1:40] Being Properly Improper
[1:45] Neural Network Pruning Denoises the Features and Makes Local Connectivity Emerge in Visual Tasks
[1:50] On the Finite-Time Complexity and Practical Computation of Approximate Stationarity Concepts of Lipschitz Functions
[1:55] Nearly Optimal Policy Optimization with Stable at Any Time Guarantee
Orals 2:00-2:20
[2:00] Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces
Spotlights 2:20-2:45
[2:20] Minimax M-estimation under Adversarial Contamination
[2:25] Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits
[2:30] Efficiently Learning the Topology and Behavior of a Networked Dynamical System Via Active Queries
[2:35] Boosting Graph Structure Learning with Dummy Nodes
[2:40] Lazy Estimation of Variable Importance for Large Neural Networks
(ends 2:45 PM)
3:30 p.m.
Posters 3:30-5:30
(ends 5:30 PM)
4 p.m.

3:30 a.m.
Breakfast on your own
4 a.m.
7 a.m.
Coffee Break
7:30 a.m.
Spotlights 7:30-8:05
[7:30] Towards understanding how momentum improves generalization in deep learning
[7:35] What Can Linear Interpolation of Neural Network Loss Landscapes Tell Us?
[7:40] Deep equilibrium networks are sensitive to initialization statistics
[7:45] Scaling-up Diverse Orthogonal Convolutional Networks by a Paraunitary Framework
[7:50] Stability Based Generalization Bounds for Exponential Family Langevin Dynamics
[7:55] Local Augmentation for Graph Neural Networks
[8:00] On Non-local Convergence Analysis of Deep Linear Networks
Orals 8:05-8:25
[8:05] Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum
Spotlights 8:25-9:00
[8:25] Diversified Adversarial Attacks based on Conjugate Gradient Method
[8:30] On the Optimization Landscape of Neural Collapse under MSE Loss: Global Optimality with Unconstrained Features
[8:35] On the Equivalence Between Temporal and Static Equivariant Graph Representations
[8:40] Robust Training under Label Noise by Over-parameterization
[8:45] Implicit Bias of the Step Size in Linear Diagonal Neural Networks
[8:50] Extended Unconstrained Features Model for Exploring Deep Neural Collapse
[8:55] Score-Guided Intermediate Level Optimization: Fast Langevin Mixing for Inverse Problems
(ends 9:00 AM)
Spotlights 7:30-8:05
[7:30] Weisfeiler-Lehman Meets Gromov-Wasserstein
[7:35] GenLabel: Mixup Relabeling using Generative Models
[7:40] When and How Mixup Improves Calibration
[7:45] On Transportation of Mini-batches: A Hierarchical Approach
[7:50] VariGrow: Variational Architecture Growing for Task-Agnostic Continual Learning based on Bayesian Novelty
[7:55] Beyond Images: Label Noise Transition Matrix Estimation for Tasks with Lower-Quality Features
[8:00] A Model-Agnostic Randomized Learning Framework based on Random Hypothesis Subspace Sampling
Orals 8:05-8:25
[8:05] Stable Conformal Prediction Sets
Spotlights 8:25-9:00
[8:25] Rethinking Fano’s Inequality in Ensemble Learning
[8:30] FITNESS: (Fine Tune on New and Similar Samples) to detect anomalies in streams with drift and outliers
[8:35] Improving Mini-batch Optimal Transport via Partial Transportation
[8:40] Near-optimal rate of consistency for linear models with missing values
[8:45] Permutation Search of Tensor Network Structures via Local Sampling
[8:50] Revisiting Label Smoothing and Knowledge Distillation Compatibility: What was Missing?
[8:55] DNNR: Differential Nearest Neighbors Regression
(ends 9:00 AM)
Spotlights 7:30-8:00
[7:30] Learning Domain Adaptive Object Detection with Probabilistic Teacher
[7:35] Adaptive Data Analysis with Correlated Observations
[7:40] Efficient PAC Learning from the Crowd with Pairwise Comparisons
[7:45] On the Statistical Benefits of Curriculum Learning
[7:50] Feature and Parameter Selection in Stochastic Linear Bandits
[7:55] Disentangled Federated Learning for Tackling Attributes Skew via Invariant Aggregation and Diversity Transferring
Orals 8:00-8:20
[8:00] A new similarity measure for covariate shift with applications to nonparametric regression
Spotlights 8:20-9:00
[8:20] Contextual Bandits with Large Action Spaces: Made Practical
[8:25] Identifiability Conditions for Domain Adaptation
[8:30] Streaming Algorithms for High-Dimensional Robust Statistics
[8:35] Popular decision tree algorithms are provably noise tolerant
[8:40] Understanding and Improving Knowledge Graph Embedding for Entity Alignment
[8:45] Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning
[8:50] Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees
[8:55] Understanding Gradual Domain Adaptation: Improved Analysis, Optimal Path and Beyond
(ends 9:00 AM)
Spotlights 7:30-8:05
[7:30] Skin Deep Unlearning: Artefact and Instrument Debiasing in the Context of Melanoma Classification
[7:35] One-Pass Diversified Sampling with Application to Terabyte-Scale Genomic Sequence Streams
[7:40] Unsupervised Flow-Aligned Sequence-to-Sequence Learning for Video Restoration
[7:45] ME-GAN: Learning Panoptic Electrocardio Representations for Multi-view ECG Synthesis Conditioned on Heart Diseases
[7:50] Variational Mixtures of ODEs for Inferring Cellular Gene Expression Dynamics
[7:55] Bayesian Imitation Learning for End-to-End Mobile Manipulation
[8:00] De novo mass spectrometry peptide sequencing with a transformer model
Orals 8:05-8:25
[8:05] Learning inverse folding from millions of predicted structures
Spotlights 8:25-9:00
[8:25] Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance
[8:30] MAE-DET: Revisiting Maximum Entropy Principle in Zero-Shot NAS for Efficient Object Detection
[8:35] Proximal Exploration for Model-guided Protein Sequence Design
[8:40] Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-time Retrieval
[8:45] How to Fill the Optimum Set? Population Gradient Descent with Harmless Diversity
[8:50] Examining Scaling and Transfer of Language Model Architectures for Machine Translation
[8:55] State Transition of Dendritic Spines Improves Learning of Sparse Spiking Neural Networks
(ends 9:00 AM)
Orals 7:30-7:50
[7:30] How Tempering Fixes Data Augmentation in Bayesian Neural Networks
Spotlights 7:50-8:15
[7:50] Surrogate Likelihoods for Variational Annealed Importance Sampling
[7:55] Nonparametric Sparse Tensor Factorization with Hierarchical Gamma Processes
[8:00] Fat–Tailed Variational Inference with Anisotropic Tail Adaptive Flows
[8:05] Variational Sparse Coding with Learned Thresholding
[8:10] Structured Stochastic Gradient MCMC
Orals 8:15-8:35
[8:15] BAMDT: Bayesian Additive Semi-Multivariate Decision Trees for Nonparametric Regression
Spotlights 8:35-8:50
[8:35] Variational Inference with Locally Enhanced Bounds for Hierarchical Models
[8:40] Centroid Approximation for Bootstrap: Improving Particle Quality at Inference
[8:45] Deep Reference Priors: What is the best way to pretrain a model?
(ends 9:00 AM)
Spotlights 7:30-8:05
[7:30] Modeling Strong and Human-Like Gameplay with KL-Regularized Search
[7:35] Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget Matters
[7:40] Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning
[7:45] Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models and Amortized Policy Search
[7:50] Generalized Data Distribution Iteration
[7:55] Optimizing Tensor Network Contraction Using Reinforcement Learning
[8:00] History Compression via Language Models in Reinforcement Learning
Orals 8:05-8:25
[8:05] REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy Transfer
Spotlights 8:25-9:00
[8:25] LeNSE: Learning To Navigate Subgraph Embeddings for Large-Scale Combinatorial Optimisation
[8:30] Efficient Learning for AlphaZero via Path Consistency
[8:35] A data-driven approach for learning to control computers
[8:40] Zero-Shot Reward Specification via Grounded Natural Language
[8:45] How to Stay Curious while avoiding Noisy TVs using Aleatoric Uncertainty Estimation
[8:50] Model-Value Inconsistency as a Signal for Epistemic Uncertainty
[8:55] Improving Policy Optimization with Generalist-Specialist Learning
(ends 9:00 AM)
Spotlights 7:30-8:05
[7:30] On Numerical Integration in Neural Ordinary Differential Equations
[7:35] Reverse Engineering the Neural Tangent Kernel
[7:40] Principled Knowledge Extrapolation with GANs
[7:45] Informed Learning by Wide Neural Networks: Convergence, Generalization and Sampling Complexity
[7:50] Data Augmentation as Feature Manipulation