# Downloads 2020

Number of events: 1132

- 1st Workshop on Language in Reinforcement Learning (LaReL)
- 2nd ICML Workshop on Human in the Loop Learning (HILL)
- 4th Lifelong Learning Workshop
- 5th ICML Workshop on Human Interpretability in Machine Learning (WHI)
- 7th ICML Workshop on Automated Machine Learning (AutoML 2020)
- Abstraction Mechanisms Predict Generalization in Deep Neural Networks
- Accelerated Message Passing for Entropy-Regularized MAP Inference
- Accelerated Stochastic Gradient-free and Projection-free Methods
- Accelerating Large-Scale Inference with Anisotropic Vector Quantization
- Accelerating the diffusion-based ensemble sampling by non-reversible dynamics
- Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization
- Acceleration through spectral density estimation
- Accountable Off-Policy Evaluation With Kernel Bellman Statistics
- ACFlow: Flow Models for Arbitrary Conditional Likelihoods
- A Chance-Constrained Generative Framework for Sequence Optimization
- Active Learning on Attributed Graphs via Graph Cognizant Logistic Regression and Preemptive Query Generation
- Active World Model Learning in Agent-rich Environments with Progress Curiosity
- Adaptive Adversarial Multi-task Representation Learning
- Adaptive Checkpoint Adjoint Method for Gradient Estimation in Neural ODE
- Adaptive Droplet Routing in Digital Microfluidic Biochips Using Deep Reinforcement Learning
- Adaptive Estimator Selection for Off-Policy Evaluation
- Adaptive Gradient Descent without Descent
- Adaptive Region-Based Active Learning
- Adaptive Reward-Poisoning Attacks against Reinforcement Learning
- Adaptive Sampling for Estimating Probability Distributions
- Adaptive Sketching for Fast and Convergent Canonical Polyadic Decomposition
- AdaScale SGD: A User-Friendly Algorithm for Distributed Training
- Adding seemingly uninformative labels helps in low data regimes
- A Distributional Framework For Data Valuation
- A distributional view on multi-objective policy optimization
- Adversarial Attacks on Copyright Detection Systems
- Adversarial Attacks on Probabilistic Autoregressive Forecasting Models
- Adversarial Filters of Dataset Biases
- Adversarial Learning Guarantees for Linear Hypotheses and Neural Networks
- Adversarial Mutual Information for Text Generation
- Adversarial Neural Pruning with Latent Vulnerability Suppression
- Adversarial Nonnegative Matrix Factorization
- Adversarial Risk via Optimal Transport and Optimal Couplings
- Adversarial Robustness Against the Union of Multiple Perturbation Models
- Adversarial Robustness for Code
- Adversarial Robustness via Runtime Masking and Cleansing
- A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation
- A Flexible Framework for Nonparametric Graphical Modeling that Accommodates Machine Learning
- A Flexible Latent Space Model for Multilayer Networks
- A Free-Energy Principle for Representation Learning
- A Game Theoretic Framework for Model Based Reinforcement Learning
- A general recurrent state space framework for modeling neural dynamics during decision-making
- A Generative Model for Molecular Distance Geometry
- A Generic First-Order Algorithmic Framework for Bi-Level Programming Beyond Lower-Level Singleton
- Agent57: Outperforming the Atari Human Benchmark
- A Geometric Approach to Archetypal Analysis via Sparse Projections
- Aggregation of Multiple Knockoffs
- A Graph to Graphs Framework for Retrosynthesis Prediction
- Aligned Cross Entropy for Non-Autoregressive Machine Translation
- Alleviating Privacy Attacks via Causal Learning
- All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference
- Almost Tune-Free Variance Reduction
- A Markov Decision Process Model for Socio-Economic Systems Impacted by Climate Change
- A Mean Field Analysis Of Deep ResNet And Beyond: Towards Provably Optimization Via Overparameterization From Depth
- Amortised Learning by Wake-Sleep
- Amortized Finite Element Analysis for Fast PDE-Constrained Optimization
- Amortized Population Gibbs Samplers with Neural Sufficient Statistics
- An Accelerated DFO Algorithm for Finite-sum Convex Functions
- Analytic Marching: An Analytic Meshing Solution from Deep Implicit Surface Networks
- A Natural Lottery Ticket Winner: Reinforcement Learning with Ordinary Neural Circuits
- Anderson Acceleration of Proximal Gradient Methods
- A Nearly-Linear Time Algorithm for Exact Community Recovery in Stochastic Block Model
- An EM Approach to Non-autoregressive Conditional Sequence Generation
- An end-to-end approach for the verification problem: learning the right distance
- An end-to-end Differentially Private Latent Dirichlet Allocation Using a Spectral Algorithm
- A new regret analysis for Adam-type algorithms
- An Explicitly Relational Neural Network Architecture
- Angular Visual Hardness
- An Imitation Learning Approach for Cache Replacement
- An Investigation of Why Overparameterization Exacerbates Spurious Correlations
- An Optimistic Perspective on Offline Deep Reinforcement Learning
- A Pairwise Fair and Community-preserving Approach to k-Center Clustering
- Approximating Stacked and Bidirectional Recurrent Architectures with the Delayed Recurrent Neural Network
- Approximation Capabilities of Neural ODEs and Invertible Residual Networks
- Approximation Guarantees of Local Search Algorithms via Localizability of Set Functions
- A quantile-based approach for hyperparameter transfer learning
- AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation
- A Sample Complexity Separation between Non-Convex and Convex Meta-Learning
- A Sequential Self Teaching Approach for Improving Generalization in Sound Event Recognition
- A Simple Framework for Contrastive Learning of Visual Representations
- A simpler approach to accelerated optimization: iterative averaging meets optimism
- Associative Memory in Iterated Overparameterized Sigmoid Autoencoders
- A Swiss Army Knife for Minimax Optimal Transport
- Asynchronous Coagent Networks
- A Tree-Structured Decoder for Image-to-Markup Generation
- Attacks Which Do Not Kill Training Make Adversarial Learning Stronger
- Attentive Group Equivariant Convolutional Networks
- A Unified Theory of Decentralized SGD with Changing Topology and Local Updates
- AutoGAN-Distiller: Searching to Compress Generative Adversarial Networks
- Automated Synthetic-to-Real Generalization
- Automatic Reparameterisation of Probabilistic Programs
- Automatic Shortcut Removal for Self-Supervised Representation Learning
- AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
- Balancing Competing Objectives with Noisy Data: Score-Based Classifiers for Welfare-Aware Machine Learning
- Bandits for BMO Functions
- Bandits with Adversarial Scaling
- Batch Reinforcement Learning with Hyperparameter Gradients
- Batch Stationary Distribution Estimation
- Bayesian Deep Learning and a Probabilistic Perspective of Model Construction
- Bayesian Differential Privacy for Machine Learning
- Bayesian Experimental Design for Implicit Models by Mutual Information Neural Estimation
- Bayesian Graph Neural Networks with Adaptive Connection Sampling
- Bayesian Learning from Sequential Data using Gaussian Processes with Signature Covariances
- Bayesian Optimisation over Multiple Continuous and Categorical Inputs
- Bayesian Sparsification of Deep C-valued Networks
- Being Bayesian about Categorical Probability
- Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks
- Best Arm Identification for Cascading Bandits in the Fixed Confidence Setting
- Better depth-width trade-offs for neural networks through the lens of dynamical systems
- Beyond first order methods in machine learning systems
- Beyond Signal Propagation: Is Feature Diversity Necessary in Deep Neural Network Initialization?
- Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels
- Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
- Bidirectional Model-based Policy Optimization
- BINOCULARS for efficient, nonmyopic sequential experimental design
- Bio-Inspired Hashing for Unsupervised Similarity Search
- Bisection-Based Pricing for Repeated Contextual Auctions against Strategic Buyer
- Black-box Certification and Learning under Adversarial Perturbations
- Black-Box Methods for Restoring Monotonicity
- Black-Box Variational Inference as a Parametric Approximation to Langevin Dynamics
- Boosted Histogram Transform for Regression
- Boosting Deep Neural Network Efficiency with Dual-Module Inference
- Boosting for Control of Dynamical Systems
- Boosting Frank-Wolfe by Chasing Gradients
- Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning
- Born-again Tree Ensembles
- Bounding the fairness and accuracy of classifiers from population statistics
- BoXHED: Boosted eXact Hazard Estimator with Dynamic covariates
- Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration for Mean-Field Reinforcement Learning
- Breaking the Curse of Space Explosion: Towards Efficient NAS with Curriculum Search
- Bridge Between Perception and Reasoning: Graph Neural Networks & Beyond
- Bridging the Gap Between f-GANs and Wasserstein GANs
- Budgeted Online Influence Maximization
- Calibration, Entropy Rates, and Memory in Language Models
- Can Autonomous Vehicles Identify, Recover From, and Adapt to Distribution Shifts?
- Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?
- Can Stochastic Zeroth-Order Frank-Wolfe Method Converge Faster for Non-Convex Problems?
- Causal Effect Estimation and Optimal Dose Suggestions in Mobile Health
- Causal Effect Identifiability under Partial-Observability
- Causal Inference using Gaussian Processes with Structured Latent Confounders
- Causal Modeling for Fairness In Dynamical Systems
- Causal Reinforcement Learning
- Causal Strategic Linear Regression
- Causal Structure Discovery from Distributions Arising from Mixtures of DAGs
- CAUSE: Learning Granger Causality from Event Sequences using Attribution Methods
- Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings
- Certified Data Removal from Machine Learning Models
- Certified Robustness to Label-Flipping Attacks via Randomized Smoothing
- Challenges in Deploying and Monitoring Machine Learning Systems
- Channel Equilibrium Networks for Learning Deep Representation
- Characterizing Distribution Equivalence and Structure Learning for Cyclic and Acyclic Directed Graphs
- Choice Set Optimization Under Discrete Choice Models of Group Decisions
- Circuit-Based Intrinsic Methods to Detect Overfitting
- Class-Weighted Classification: Trade-offs and Robust Approaches
- Clinician-in-the-Loop Decision Making: Reinforcement Learning with Near-Optimal Set-Valued Policies
- Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning
- Closing the convergence gap of SGD without replacement
- CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information
- Collaborative Machine Learning with Incentive-Aware Model Rewards
- Collapsed Amortized Variational Inference for Switching Nonlinear Dynamical Systems
- Combinatorial Pure Exploration for Dueling Bandit
- Combining Differentiable PDE Solvers and Graph Neural Networks for Fluid Flow Prediction
- CoMic: Complementary Task Learning & Mimicry for Reusable Skills
- Communication-Efficient Distributed PCA by Riemannian Optimization
- Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks
- Complexity of Finding Stationary Points of Nonconvex Nonsmooth Functions
- Composable Sketches for Functions of Frequencies: Beyond the Worst Case
- Compressive sensing with un-trained neural networks: Gradient descent finds a smooth approximation
- Computational and Statistical Tradeoffs in Inferring Combinatorial Structures of Ising Model
- Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions
- Concept Bottleneck Models
- Concise Explanations of Neural Networks using Adversarial Training
- Conditional gradient methods for stochastically constrained convex minimization
- Confidence-Aware Learning for Deep Neural Networks
- Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks
- Confidence Sets and Hypothesis Testing in a Likelihood-Free Inference Setting
- ConQUR: Mitigating Delusional Bias in Deep Q-Learning
- Consistent Estimators for Learning to Defer to an Expert
- Consistent Structured Prediction with Max-Min Margin Markov Networks
- Constant Curvature Graph Convolutional Networks
- Constrained Markov Decision Processes via Backward Value Functions
- Constructive Universal High-Dimensional Distribution Generation through Deep ReLU Networks
- Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning
- Context Aware Local Differential Privacy
- Continuous Graph Neural Networks
- Continuously Indexed Domain Adaptation
- Continuous Time Bayesian Networks with Clocks
- Continuous-time Lower Bounds for Gradient-based Algorithms
- Contrastive Multi-View Representation Learning on Graphs
- Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning
- Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics
- ControlVAE: Controllable Variational Autoencoder
- Convergence of a Stochastic Gradient Method with Momentum for Non-Smooth Non-Convex Optimization
- Convergence Rates of Variational Inference in Sparse Deep Learning
- Converging to Team-Maxmin Equilibria in Zero-Sum Multiplayer Games
- Convex Calibrated Surrogates for the Multi-Label F-Measure
- Convex Representation Learning for Generalized Invariance in Semi-Inner-Product Space
- Convolutional dictionary learning based auto-encoders for natural exponential-family distributions
- Convolutional Kernel Networks for Graph-Structured Data
- Cooperative Multi-Agent Bandits with Heavy Tails
- Coresets for Clustering in Graphs of Bounded Treewidth
- Coresets for Data-efficient Training of Machine Learning Models
- Correlation Clustering with Asymmetric Classification Errors
- Cost-Effective Interactive Attention Learning with Neural Attention Processes
- Cost-effectively Identifying Causal Effects When Only Response Variable is Observable
- Counterfactual Cross-Validation: Stable Model Selection Procedure for Causal Inference Models
- Countering Language Drift with Seeded Iterated Learning
- CURL: Contrastive Unsupervised Representations for Reinforcement Learning
- Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness
- Curvature-corrected learning dynamics in deep neural networks
- Customizing ML Predictions for Online Algorithms
- Data Amplification: Instance-Optimal Property Estimation
- Data-Dependent Differentially Private Parameter Learning for Directed Graphical Models
- Data-Efficient Image Recognition with Contrastive Predictive Coding
- Data preprocessing to mitigate bias: A maximum entropy based approach
- Data Valuation using Reinforcement Learning
- DeBayes: a Bayesian Method for Debiasing Network Embeddings
- Debiased Sinkhorn barycenters
- Decentralised Learning with Random Features and Distributed Gradient Descent
- Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions
- Decision Trees for Decision-Making under the Predict-then-Optimize Framework
- Decoupled Greedy Learning of CNNs
- DeepCoDA: personalized interpretability for compositional health data
- Deep Coordination Graphs
- Deep Divergence Learning
- Deep Gaussian Markov Random Fields
- Deep Graph Random Process for Relational-Thinking-Based Speech Recognition
- Deep Isometric Learning for Visual Recognition
- Deep k-NN for Noisy Labels
- DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training
- Deep Molecular Programming: A Natural Implementation of Binary-Weight ReLU Neural Networks
- Deep Reasoning Networks for Unsupervised Pattern De-mixing with Constraint Reasoning
- Deep Reinforcement Learning with Smooth Policy
- Deep Streaming Label Learning
- Defense Through Diverse Directions
- DeltaGrad: Rapid retraining of machine learning models
- Description Based Text Classification with Reinforcement Learning
- Designing Optimal Dynamic Treatment Regimes: A Causal Reinforcement Learning Approach
- DessiLBI: Exploring Structural Sparsity of Deep Networks via Differential Inclusion Paths
- Detecting Out-of-Distribution Examples with Gram Matrices
- Differentiable Likelihoods for Fast Inversion of 'Likelihood-Free' Dynamical Systems
- Differentiable Product Quantization for End-to-End Embedding Compression
- Differentially Private Set Union
- Differentiating through the Fréchet Mean
- DINO: Distributed Newton-Type Optimization Method
- Discount Factor as a Regularizer in Reinforcement Learning
- Discriminative Adversarial Search for Abstractive Summarization
- Discriminative Jackknife: Quantifying Uncertainty in Deep Learning via Higher-Order Influence Functions
- Disentangling Trainability and Generalization in Deep Neural Networks
- Dispersed Exponential Family Mixture VAEs for Interpretable Text Generation
- Dissecting Non-Vacuous Generalization Bounds based on the Mean-Field Approximation
- Distance Metric Learning with Joint Representation Diversification
- Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery
- Distributed Online Optimization over a Heterogeneous Network
- Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits
- Distribution Augmentation for Generative Modeling
- Divide and Conquer: Leveraging Intermediate Feature Representations for Quantized Training of Neural Networks
- Divide, Conquer, and Combine: a New Inference Strategy for Probabilistic Programs with Stochastic Support
- Does label smoothing mitigate label noise?
- Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making
- Do GANs always have Nash equilibria?
- Doing Some Good with Machine Learning
- Domain Adaptive Imitation Learning
- Domain Aggregation Networks for Multi-Source Domain Adaptation
- Don't Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TinyScript
- Do RNN and LSTM have Long Memory?
- Double-Loop Unadjusted Langevin Algorithm
- Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
- Double Trouble in Double Descent: Bias and Variance(s) in the Lazy Regime
- Doubly robust off-policy evaluation with shrinkage
- Doubly Stochastic Variational Inference for Neural Processes with Hierarchical Latent Variables
- Do We Need Zero Training Loss After Achieving Zero Training Error?
- Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation
- DROCC: Deep Robust One-Class Classification
- DropNet: Reducing Neural Network Complexity via Iterative Pruning
- DRWR: A Differentiable Renderer without Rendering for Unsupervised 3D Structure Learning from Silhouette Images
- Duality in RKHSs with Infinite Dimensional Outputs: Application to Robust Losses
- Dual Mirror Descent for Online Allocation Problems
- Dual-Path Distillation: A Unified Framework to Improve Black-Box Attacks
- Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising
- Dynamics of Deep Neural Networks and Neural Tangent Hierarchy
- ECLIPSE: An Extreme-Scale Linear Program Solver for Web-Applications
- Economics of privacy and data labor
- Educating Text Autoencoders: Latent Representation Guidance via Denoising
- Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors
- Efficient Continuous Pareto Exploration in Multi-Task Learning
- Efficient Domain Generalization via Common-Specific Low-Rank Decomposition
- Efficient Identification in Linear Structural Causal Models with Auxiliary Cutsets
- Efficient Intervention Design for Causal Discovery with Latents
- Efficiently Learning Adversarially Robust Halfspaces with Noise
- Efficiently sampling functions from Gaussian process posteriors
- Efficiently Solving MDPs with Stochastic Mirror Descent
- Efficient Non-conjugate Gaussian Process Factor Models for Spike Count Data using Polynomial Approximations
- Efficient nonparametric statistical inference on population feature importance using Shapley values
- Efficient Optimistic Exploration in Linear-Quadratic Regulators via Lagrangian Relaxation
- Efficient Policy Learning from Surrogate-Loss Classification Reductions
- Efficient Proximal Mapping of the 1-path-norm of Shallow Networks
- Efficient Robustness Certificates for Discrete Data: Sparsity-Aware Randomized Smoothing for Graphs, Images and More
- Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits
- Eliminating the Invariance on the Loss Landscape of Linear Autoencoders
- Emergence of Separable Manifolds in Deep Language Representations
- Empirical Study of the Benefits of Overparameterization in Learning Latent Variable Models
- Encoding Musical Style with Transformer Autoencoders
- Energy-Based Processes for Exchangeable Data
- Enhanced POET: Open-ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions
- Enhancing Simple Models by Exploiting What They Already Know
- Entropy Minimization In Emergent Languages
- Epidemiology and Machine Learning
- Equivariant Flows: Exact Likelihood Generative Learning for Symmetric Densities
- Equivariant Neural Rendering
- Error-Bounded Correction of Noisy Labels
- Error Estimation for Sketched SVD via the Bootstrap
- Estimating Generalization under Distribution Shifts via Domain-Invariant Representations
- Estimating Model Uncertainty of Neural Networks in Sparse Information Form
- Estimating Q(s,s') with Deep Deterministic Dynamics Gradients
- Estimating the Error of Randomized Newton Methods: A Bootstrap Approach
- Estimating the Number and Effect Sizes of Non-null Hypotheses
- Estimation of Bounds on Potential Outcomes For Decision Making
- Evaluating Lossy Compression Rates of Deep Generative Models
- Evaluating Machine Accuracy on ImageNet
- Evaluating the Performance of Reinforcement Learning Algorithms
- Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination
- Evolutionary Topology Search for Tensor Network Decomposition
- Expert Learning through Generalized Inverse Multiobjective Optimization: Models, Insights, and Algorithms
- Explainable and Discourse Topic-aware Neural Language Understanding
- Explainable k-Means and k-Medians Clustering
- Explaining Groups of Points in Low-Dimensional Representations
- Explicit Gradient Learning for Black-Box Optimization
- Exploration Through Reward Biasing: Reward-Biased Maximum Likelihood Estimation for Stochastic Multi-Armed Bandits
- Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills
- Extra-gradient with player sampling for faster convergence in n-player games
- Extrapolation for Large-batch Training in Deep Learning
- Extreme Multi-label Classification from Aggregated Labels
- FACT: A Diagnostic for Group Fairness Trade-offs
- Fair Generative Modeling via Weak Supervision
- Fair k-Centers via Maximum Matching
- Fair Learning with Private Demographic Data
- Fairwashing explanations with off-manifold detergent
- Familywise Error Rate Control by Interactive Unmasking
- Fast Adaptation to New Environments via Policy-Dynamics Value Functions
- Fast and Consistent Learning of Hidden Markov Models by Incorporating Non-Consecutive Correlations
- Fast and Private Submodular and $k$-Submodular Functions Maximization with Matroid Constraints
- Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods
- Fast computation of Nash Equilibria in Imperfect Information Games
- Fast Deterministic CUR Matrix Decomposition with Accuracy Assurance
- Fast Differentiable Sorting and Ranking
- Faster Graph Embeddings via Coarsening
- Fast Learning of Graph Neural Networks with Guaranteed Generalizability: One-hidden-layer Case
- Fast OSCAR and OWL Regression via Safe Screening Rules
- Feature-map-level Online Adversarial Knowledge Distillation
- Feature Noise Induces Loss Discrepancy Across Groups
- Feature Quantization Improves GAN Training
- Feature Selection using Stochastic Gates
- FedBoost: A Communication-Efficient Algorithm for Federated Learning
- Federated Learning for User Privacy and Data Confidentiality
- Federated Learning with Only Positive Labels
- FetchSGD: Communication-Efficient Federated Learning with Sketching
- Few-shot Domain Adaptation by Causal Mechanism Transfer
- Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs
- Fiduciary Bandits
- Fiedler Regularization: Learning Neural Networks with Graph Sparsity
- Finding trainable sparse networks through Neural Tangent Transfer
- Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent
- Finite-Time Convergence in Continuous-Time Optimization
- Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games
- Flexible and Efficient Long-Range Planning Through Curious Exploration
- Forecasting Sequential Data Using Consistent Koopman Autoencoders
- FormulaZero: Distributionally Robust Online Adaptation via Offline Population Synthesis
- Fractal Gaussian Networks: A sparse random graph model based on Gaussian Multiplicative Chaos
- Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise
- Frequency Bias in Neural Networks for Input of Non-Uniform Density
- Frequentist Uncertainty in Recurrent Neural Networks via Blockwise Influence Functions
- From Chaos to Order: Symmetry and Conservation Laws in Game Dynamics
- From ImageNet to Image Classification: Contextualizing Progress on Benchmarks
- From Importance Sampling to Doubly Robust Policy Gradient
- From Local SGD to Local Fixed-Point Methods for Federated Learning
- From PAC to Instance-Optimal Sample Complexity in the Plackett-Luce Model
- From Sets to Multisets: Provable Variational Inference for Probabilistic Integer Submodular Models
- FR-Train: A Mutual Information-Based Approach to Fair and Robust Training
- Frustratingly Simple Few-Shot Object Detection
- Full Law Identification in Graphical Models of Missing Data: Completeness Results
- Fully Parallel Hyperparameter Search: Reshaped Space-Filling
- Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations
- Gamification of Pure Exploration for Linear Bandits
- Generalisation error in learning with random features and the hidden manifold model
- Generalization and Representational Limits of Graph Neural Networks
- Generalization Error of Generalized Linear Models in High Dimensions
- Generalization Guarantees for Sparse Kernel Approximation with Entropic Optimal Features
- Generalization to New Actions in Reinforcement Learning
- Generalization via Derandomization
- Generalized and Scalable Optimal Sparse Decision Trees
- Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data
- Generating Programmatic Referring Expressions via Program Synthesis
- Generative Adversarial Imitation Learning with Neural Network Parameterization: Global Optimality and Convergence Rate
- Generative Flows with Matrix Exponential
- Generative Pretraining From Pixels
- Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data
- Global Concavity and Optimization in a Class of Dynamic Discrete Choice Models
- GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation
- Goal-Aware Prediction: Learning to Model What Matters
- Goodness-of-Fit Tests for Inhomogeneous Random Graphs
- Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection
- Go Wide, Then Narrow: Efficient Training of Deep Thin Networks
- GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values
- Gradient-free Online Learning in Continuous Games with Delayed Rewards
- Gradient Temporal-Difference Learning with Regularized Corrections
- Graph-based Nearest Neighbor Search: From Practice to Theory
- Graph-based, Self-Supervised Program Repair from Diagnostic Feedback
- Graph Convolutional Network for Recommendation with Low-pass Collaborative Filters
- Graph Filtration Learning
- Graph Homomorphism Convolution
- Graph Optimal Transport for Cross-Domain Alignment
- GraphOpt: Learning Optimization Models of Graph Formation
- Graph Random Neural Features for Distance-Preserving Graph Representations
- Graph Representation Learning and Beyond (GRL+)
- Graph Structure of Neural Networks
- Growing Action Spaces
- Growing Adaptive Multi-hyperplane Machines
- Guided Learning of Nonconvex Models through Successive Functional Gradient Optimization
- Haar Graph Pooling
- Hallucinative Topological Memory for Zero-Shot Visual Planning
- Handling the Positive-Definite Constraint in the Bayesian Learning Rule
- Harmonic Decompositions of Convolutional Networks
- Healing Products of Gaussian Process Experts
- Healthcare Systems, Population Health, and the Role of Health-tech
- Hierarchical Generation of Molecular Graphs using Structural Motifs
- Hierarchically Decoupled Imitation For Morphological Transfer
- Hierarchical Verification for Adversarial Robustness
- High-dimensional Robust Mean Estimation via Gradient Descent
- History-Gradient Aided Batch Size Adaptation for Variance Reduced Algorithms
- How Good is the Bayes Posterior in Deep Neural Networks Really?
- How recurrent networks implement contextual processing in sentiment analysis
- How to Solve Fair k-Center in Massive Data Models
- How to Train Your Neural ODE: the World of Jacobian and Kinetic Regularization
- Human and Machine Learning for Assistive Autonomy
- Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization
- Hypernetwork approach to generating point clouds
- ICML 2020 Workshop on Computational Biology
- Identifying Statistical Bias in Dataset Replication
- Identifying the Reward Function by Anchor Actions
- Implicit Class-Conditioned Domain Alignment for Unsupervised Domain Adaptation
- Implicit competitive regularization in GANs
- Implicit differentiation of Lasso-type models for hyperparameter optimization
- Implicit Euler Skip Connections: Enhancing Adversarial Robustness via Numerical Stability
- Implicit Generative Modeling for Efficient Exploration
- Implicit Geometric Regularization for Learning Shapes
- Implicit Learning Dynamics in Stackelberg Games: Equilibria Characterization, Convergence Analysis, and Empirical Study
- Implicit Regularization of Random Feature Models
- Improved Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance
- Improved Communication Cost in Distributed PageRank Computation – A Theoretical Study
- Improved Optimistic Algorithms for Logistic Bandits
- Improved Sleeping Bandits with Stochastic Action Sets and Adversarial Rewards
- Improving generalization by controlling label-noise information in neural network weights
- Improving Generative Imagination in Object-Centric World Models
- Improving Molecular Design by Stochastic Iterative Target Augmentation
- Improving Robustness of Deep-Learning-Based Image Reconstruction
- Improving the Gating Mechanism of Recurrent Neural Networks
- Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: Joint Gradient Estimation and Tracking
- Improving Transformer Optimization Through Better Initialization
- Imputer: Sequence Modelling via Imputation and Dynamic Programming
- Incentives in Machine Learning
- Incremental Sampling Without Replacement for Sequence Models
- Individual Calibration with Randomized Forecasting
- Individual Fairness for k-Clustering
- Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks
- Inductive-bias-driven Reinforcement Learning For Efficient Schedules in Heterogeneous Clusters
- Inductive Biases, Invariances and Generalization in Reinforcement Learning
- Inductive Relation Prediction by Subgraph Reasoning
- Inertial Block Proximal Methods for Non-Convex Non-Smooth Optimization
- Inexact Tensor Methods with Dynamic Accuracies
- Inferring DQN structure for high-dimensional continuous control
- Infinite attention: NNGP and NTK for deep attention networks
- Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems
- Influenza Forecasting Framework based on Gaussian Processes
- InfoGAN-CR and ModelCentrality: Self-supervised Model Training and Selection for Disentangling GANs
- Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains
- Information-Theoretic Local Minima Characterization and Regularization
- Informative Dropout for Robust Representation Learning: A Shape-bias Perspective
- INNF+: Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models
- Input-Sparsity Low Rank Approximation in Schatten Norm
- InstaHide: Instance-hiding Schemes for Private Distributed Learning
- Inter-domain Deep Gaussian Processes
- Interference and Generalization in Temporal Difference Learning
- Interferometric Graph Transform: a Deep Unsupervised Graph Representation
- Interpolation between Residual and Non-Residual Networks
- Interpretable, Multidimensional, Multimodal Anomaly Detection with Negative Sampling for Detection of Device Failure
- Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions
- Interpretations are Useful: Penalizing Explanations to Align Neural Networks with Prior Knowledge
- Interpreting Robust Optimization via Adversarial Influence Functions
- Intrinsic Reward Driven Imitation Learning via Generative Model
- Invariant Causal Prediction for Block MDPs
- Invariant Rationalization
- Invariant Risk Minimization Games
- Inverse Active Sensing: Modeling and Understanding Timely Decision-Making
- Invertible generative models for inverse problems: mitigating representation error and dataset bias
- Involutive MCMC: a Unifying Framework
- IPBoost – Non-Convex Boosting via Integer Programming
- Is Local SGD Better than Minibatch SGD?
- Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing
- It's Not What Machines Can Learn, It's What We Cannot Teach
- Kernel interpolation with continuous volume sampling
- Kernelized Stein Discrepancy Tests of Goodness-of-fit for Time-to-Event Data
- Kernel Methods for Cooperative Multi-Agent Contextual Bandits
- Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning
- k-means++: few more steps yield constant approximation
- Knowing The What But Not The Where in Bayesian Optimization
- Label-Noise Robust Domain Adaptation
- Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks
- Laplacian Regularized Few-Shot Learning
- Latent Bernoulli Autoencoder
- Latent Space Factorisation and Manipulation via Matrix Subspace Projection
- Latent Variable Modelling with Hyperbolic Normalizing Flows
- Law & Machine Learning
- Layered Sampling for Robust Optimization Problems
- LazyIter: A Fast Algorithm for Counting Markov Equivalent DAGs and Designing Experiments
- Learnable Group Transform For Time-Series
- Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization
- Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition
- Learning Algebraic Multigrid Using Graph Neural Networks
- Learning and Evaluating Contextual Embedding of Source Code
- Learning and Sampling of Atomic Interventions from Observations
- Learning Autoencoders with Relational Regularization
- Learning Calibratable Policies using Programmatic Style-Consistency
- Learning Compound Tasks without Task-specific Knowledge via Imitation and Self-supervised Learning
- Learning De-biased Representations with Biased Representations
- Learning Deep Kernels for Non-Parametric Two-Sample Tests
- Learning disconnected manifolds: a no GAN's land
- Learning Discrete Structured Representations by Adversarially Maximizing Mutual Information
- Learning Efficient Multi-agent Communication: An Information Bottleneck Approach
- Learning Factorized Weight Matrix for Joint Filtering
- Learning Fair Policies in Multi-Objective (Deep) Reinforcement Learning with Average and Discounted Rewards
- Learning Flat Latent Manifolds with VAEs
- Learning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints
- Learning from Irregularly-Sampled Time Series: A Missing Data Perspective
- Learning Human Objectives by Evaluating Hypothetical Behavior
- Learning Mixtures of Graphs from Epidemic Cascades
- Learning Near Optimal Policies with Low Inherent Bellman Error
- Learning Opinions in Social Networks
- Learning Optimal Tree Models under Beam Search
- Learning Portable Representations for High-Level Planning
- Learning Quadratic Games on Networks
- Learning Reasoning Strategies in End-to-End Differentiable Proving
- Learning Representations that Support Extrapolation
- Learning Robot Skills with Temporal Variational Inference
- Learning Selection Strategies in Buchberger’s Algorithm
- Learning Similarity Metrics for Numerical Simulations
- Learning Structured Latent Factors from Dependent Data:A Generative Model Framework from Information-Theoretic Perspective
- Learning Task-Agnostic Embedding of Multiple Black-Box Experts for Multi-Task Model Fusion
- Learning the piece-wise constant graph structure of a varying Ising model
- Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling
- Learning the Valuations of a $k$-demand Agent
- Learning to Branch for Multi-Task Learning
- Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules
- Learning to Encode Position for Transformer with Continuous Dynamical Model
- Learning to Learn Kernels with Variational Random Features
- Learning to Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning
- Learning to Rank Learning Curves
- Learning to Score Behaviors for Guided Policy Optimization
- Learning to Simulate and Design for Structural Engineering
- Learning to Simulate Complex Physics with Graph Networks
- Learning To Stop While Learning To Predict
- Learning What to Defer for Maximum Independent Sets
- Learning with Bounded Instance- and Label-dependent Label Noise
- Learning with Feature and Distribution Evolvable Streams
- Learning with Good Feature Representations in Bandits and in RL with a Generative Model
- Learning with Missing Values
- Learning with Multiple Complementary Labels
- LEEP: A New Measure to Evaluate Transferability of Learned Representations
- Let's Agree to Agree: Neural Networks Share Classification Order on Real Datasets
- Leveraging Frequency Analysis for Deep Fake Image Recognition
- Leveraging Procedural Generation to Benchmark Reinforcement Learning
- Lifted Disjoint Paths with Application in Multiple Object Tracking
- Likelihood-free MCMC with Amortized Approximate Ratio Estimators
- Linear bandits with Stochastic Delayed Feedback
- Linear Convergence of Randomized Primal-Dual Coordinate Method for Large-scale Linear Constrained Convex Programming
- Linear Lower Bounds and Conditioning of Differentiable Games
- Linear Mode Connectivity and the Lottery Ticket Hypothesis
- (Locally) Differentially Private Combinatorial Semi-Bandits
- Logarithmic Regret for Adversarial Online Control
- Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently
- Logistic Regression for Massive Data with Rare Events
- Lookahead-Bounded Q-learning
- Lorentz Group Equivariant Neural Network for Particle Physics
- Loss Function Search for Face Recognition
- Low Bias Low Variance Gradient Estimates for Hierarchical Boolean Stochastic Networks
- Lower Complexity Bounds for Finite-Sum Convex-Concave Minimax Optimization Problems
- LowFER: Low-rank Bilinear Pooling for Link Prediction
- Low-loss connection of weight vectors: distribution-based approaches
- Low-Rank Bottleneck in Multi-head Attention Models
- Low-Variance and Zero-Variance Baselines for Extensive-Form Games
- LP-SparseMAP: Differentiable Relaxed Optimization for Sparse Structured Prediction
- LTF: A Label Transformation Framework for Correcting Label Shift
- Machine Learning for Global Health
- Machine Learning for Healthcare: Challenges, Methods, Frontiers
- Machine Learning for Media Discovery
- Machine Learning with Signal Processing
- Manifold Identification for Ultimately Communication-Efficient Distributed Optimization
- Mapping natural-language problems to formal-language solutions using structured neural representations
- Margin-aware Adversarial Domain Adaptation with Optimal Transport
- Maximum-and-Concatenation Networks
- Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning
- Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat at Label Shift Adaptation
- Measuring Non-Expert Comprehension of Machine Learning Fairness Metrics
- Median Matrix Completion: from Embarrassment to Optimality
- Message Passing Least Squares Framework and its Application to Rotation Synchronization
- MetaFun: Meta-Learning with Iterative Functional Updates
- Meta-learning for Mixed Linear Regression
- Meta-Learning with Shared Amortized Variational Inference
- Meta-learning with Stochastic Linear Bandits
- Meta Variance Transfer: Learning to Augment from the Others
- Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack
- Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation
- Minimax Pareto Fairness: A Multi Objective Perspective
- Minimax Rate for Learning From Pairwise Comparisons in the BTL Model
- Minimax Weight and Q-Function Learning for Off-Policy Evaluation
- Min-Max Optimization without Gradients: Convergence and Applications to Black-Box Evasion and Poisoning Attacks
- Missing Data Imputation using Optimal Transport
- Mix-n-Match : Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning
- ML Interpretability for Scientific Discovery
- MLRetrospectives: A Venue for Self-Reflection in ML Research
- Model-Based Methods in Reinforcement Learning
- Model-Based Reinforcement Learning with Value-Targeted Regression
- Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
- Model Fusion with Kullback--Leibler Divergence
- Modulating Surrogates for Bayesian Optimization
- Momentum-Based Policy Gradient Methods
- Momentum Improves Normalized SGD
- MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time
- Moniqua: Modulo Quantized Communication in Decentralized SGD
- Monte-Carlo Tree Search as Regularized Policy Optimization
- More Data Can Expand The Generalization Gap Between Adversarially Robust and Standard Models
- More Information Supervised Probabilistic Deep Face Embedding Learning
- Multi-Agent Determinantal Q-Learning
- Multi-Agent Routing Value Iteration Network
- Multiclass Neural Network Minimization via Tropical Newton Polytope Approximation
- Multidimensional Shape Constraints
- Multi-fidelity Bayesian Optimization with Max-value Entropy Search and its Parallelization
- Multigrid Neural Memory
- Multilinear Latent Conditioning for Generating Unseen Attribute Combinations
- Multinomial Logit Bandit with Low Switching Cost
- Multi-objective Bayesian Optimization using Pareto-frontier Entropy
- Multi-Objective Molecule Generation using Interpretable Substructures
- Multi-Precision Policy Enforced Training (MuPPET) : A Precision-Switching Strategy for Quantised Fixed-Point Training of CNNs
- Multiresolution Tensor Learning for Efficient and Interpretable Spatial Analysis
- Multi-step Greedy Reinforcement Learning Algorithms
- Multi-Task Learning with User Preferences: Gradient Descent with Controlled Ascent in Pareto Optimization
- Mutual Transfer Learning for Massive Data
- My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits
- NADS: Neural Architecture Distribution Search for Uncertainty Awareness
- Naive Exploration is Optimal for Online LQR
- Near Input Sparsity Time Kernel Embeddings via Adaptive Sampling
- Near-linear time Gaussian process optimization with adaptive batching and resparsification
- Nearly Linear Row Sampling Algorithm for Quantile Regression
- Near-optimal Regret Bounds for Stochastic Shortest Path
- Near-optimal sample complexity bounds for learning Latent $k-$polytopes and applications to Ad-Mixtures
- Near-Tight Margin-Based Generalization Bounds for Support Vector Machines
- Negative Dependence and Submodularity: Theory and Applications in Machine Learning
- Negative Sampling in Semi-Supervised learning
- Nested Subspace Arrangement for Representation of Relational Data
- NetGAN without GAN: From Random Walks to Low-Rank Approximations
- Neural Architecture Search in A Proxy Validation Loss Landscape
- Neural Clustering Processes
- Neural Contextual Bandits with UCB-based Exploration
- Neural Datalog Through Time: Informed Temporal Modeling via Logical Specification
- Neural Kernels Without Tangents
- Neural Network Control Policy Verification With Persistent Adversarial Perturbation
- Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-layer Networks
- Neural Topic Modeling with Continual Lifelong Learning
- Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"
- New Oracle-Efficient Algorithms for Private Synthetic Data Release
- NGBoost: Natural Gradient Boosting for Probabilistic Prediction
- Non-autoregressive Machine Translation with Disentangled Context Transformer
- Non-Autoregressive Neural Text-to-Speech
- Non-convex Learning via Replica Exchange Stochastic Gradient MCMC
- Nonparametric Score Estimators
- Non-separable Non-stationary random fields
- Non-Stationary Delayed Bandits with Intermediate Observations
- No-Regret and Incentive-Compatible Online Learning
- No-Regret Exploration in Goal-Oriented Reinforcement Learning
- Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks Using PAC-Bayesian Analysis
- Normalized Loss Functions for Deep Learning with Noisy Labels
- Normalizing Flows on Tori and Spheres
- Object-Oriented Learning: Perception, Representation, and Reasoning
- Obtaining Adjustable Regularization for Free via Iterate Averaging
- Off-Policy Actor-Critic with Shared Experience Replay
- On a projective ensemble approach to two sample test for equality of distributions
- On Breaking Deep Generative Model-based Defenses and Beyond
- On conditional versus marginal bias in multi-armed bandits
- On Contrastive Learning for Likelihood-free Inference
- On Convergence-Diagnostic based Step Sizes for Stochastic Gradient Descent
- On Coresets for Regularized Regression
- On Differentially Private Stochastic Convex Optimization with Heavy-tailed Data
- On Efficient Constructions of Checkpoints
- On Efficient Low Distortion Ultrametric Embedding
- One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control
- One-shot Distributed Ridge Regression in High Dimensions
- One Size Fits All: Can We Train One Denoiser for All Noise Levels?
- On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems
- On hyperparameter tuning in general clustering problemsm
- On Implicit Regularization in $\beta$-VAEs
- On Layer Normalization in the Transformer Architecture
- On Learning Language-Invariant Representations for Universal Machine Translation
- On Learning Sets of Symmetric Elements
- On Leveraging Pretrained GANs for Generation with Limited Data
- Online Bayesian Moment Matching based SAT Solver Heuristics
- Online Continual Learning from Imbalanced Data
- Online Control of the False Coverage Rate and False Sign Rate
- Online Convex Optimization in the Random Order Model
- Online Dense Subgraph Discovery via Blurred-Graph Feedback
- Online Learned Continual Compression with Adaptive Quantization Modules
- Online Learning for Active Cache Synchronization
- Online Learning with Dependent Stochastic Feedback Graphs
- Online Learning with Imperfect Hints
- Online metric algorithms with untrusted predictions
- Online mirror descent and dual averaging: keeping pace in the dynamic case
- Online Multi-Kernel Learning with Graph-Structured Feedback
- Online Pricing with Offline Data: Phase Transition and Inverse Square Law
- On Lp-norm Robustness of Ensemble Decision Stumps and Trees
- On Relativistic f-Divergences
- On Second-Order Group Influence Functions for Black-Box Predictions
- On Semi-parametric Inference for BART
- On the consistency of top-k surrogate losses
- On the Convergence of Nesterov's Accelerated Gradient Method in Stochastic Settings
- On the Expressivity of Neural Networks for Deep Reinforcement Learning
- On the Generalization Benefit of Noise in Stochastic Gradient Descent
- On the Generalization Effects of Linear Transformations in Data Augmentation
- On the Global Convergence Rates of Softmax Policy Gradient Methods
- On the Global Optimality of Model-Agnostic Meta-Learning
- On the (In)tractability of Computing Normalizing Constants for the Product of Determinantal Point Processes
- On the Iteration Complexity of Hypergradient Computation
- On the Noisy Gradient Descent that Generalizes as SGD
- On the Number of Linear Regions of Convolutional Neural Networks
- On the Power of Compressed Sensing with Generative Models
- On the Relation between Quality-Diversity Evaluation and Distribution-Fitting Goal in Text Generation
- On the Sample Complexity of Adversarial Multi-Source PAC Learning
- On the Theoretical Properties of the Network Jackknife
- On the Unreasonable Effectiveness of the Greedy Algorithm: Greedy Adapts to Sharpness
- On Thompson Sampling with Langevin Algorithms
- On Unbalanced Optimal Transport: An Analysis of Sinkhorn Algorithm
- On Validation and Planning of An Optimal Decision Rule with Application in Healthcare Studies
- On Variational Learning of Controllable Representations for Text without Supervision
- Operation-Aware Soft Channel Pruning using Differentiable Masks
- Optimal approximation for unconstrained non-submodular minimization
- Optimal Bounds between f-Divergences and Integral Probability Metrics
- Optimal Continual Learning has Perfect Memory and is NP-hard
- Optimal Differential Privacy Composition for Exponential Mechanisms
- Optimal Estimator for Unlabeled Linear Regression
- Optimally Solving Two-Agent Decentralized POMDPs Under One-Sided Information Sharing
- Optimal Non-parametric Learning in Repeated Contextual Auctions with Strategic Buyer
- Optimal Randomized First-Order Methods for Least-Squares Problems
- Optimal Robust Learning of Discrete Distributions from Batches
- Optimal Sequential Maximization: One Interview is Enough!
- Optimal transport mapping via input convex neural networks
- Optimistic Bounds for Multi-output Learning
- Optimistic Policy Optimization with Bandit Feedback
- Optimization and Analysis of the pAp@k Metric for Recommender Systems
- Optimization from Structured Samples for Coverage Functions
- Optimization Theory for ReLU Neural Networks Trained with Normalization Layers
- Optimizer Benchmarking Needs to Account for Hyperparameter Tuning
- Optimizing Black-box Metrics with Adaptive Surrogates
- Optimizing Data Usage via Differentiable Rewards
- Optimizing Dynamic Structures with Bayesian Generative Search
- Optimizing for the Future in Non-Stationary MDPs
- Optimizing Long-term Social Welfare in Recommender Systems: A Constrained Matching Approach
- Option Discovery in the Absence of Rewards with Manifold Analysis
- OPtions as REsponses: Grounding behavioural hierarchies in multi-agent reinforcement learning
- Oracle Efficient Private Non-Convex Optimization
- Ordinal Non-negative Matrix Factorization for Recommendation
- Orthogonalized SGD and Nested Architectures for Anytime Neural Networks
- “Other-Play” for Zero-Shot Coordination
- Overfitting in adversarially robust deep learning
- PackIt: A Virtual Environment for Geometric Planning
- Parallel Algorithm for Non-Monotone DR-Submodular Maximization
- Parameter-free, Dynamic, and Strongly-Adaptive Online Learning
- Parameter-free Online Optimization
- Parameterized Rate-Distortion Stochastic Encoder
- Parametric Gaussian Process Regressors
- Partial Trace Regression and Low-Rank Kraus Decomposition
- Participatory Approaches to Machine Learning
- PDO-eConvs: Partial Differential Operator Based Equivariant Convolutions
- Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates
- PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
- PENNI: Pruned Kernel Sharing for Efficient CNN Inference
- Perceptual Generative Autoencoders
- Performative Prediction
- Piecewise Linear Regression via a Difference of Convex Functions
- Planning to Explore via Self-Supervised World Models
- p-Norm Flow Diffusion for Local Graph Clustering
- Poisson Learning: Graph Based Semi-Supervised Learning At Very Low Label Rates
- Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning
- PolyGen: An Autoregressive Generative Model of 3D Meshes
- Polynomial Tensor Sketch for Element-wise Function of Low-Rank Matrix
- Population-Based Black-Box Optimization for Biological Sequence Design
- PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination
- PowerNorm: Rethinking Batch Normalization in Transformers
- Predicting Choice with Set-Dependent Aggregation
- Predicting deliberative outcomes
- Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control
- Predictive Coding for Locally-Linear Control
- Predictive Multiplicity in Classification
- Predictive Sampling with Forecasting Autoregressive Models
- Preference Modeling with Context-Dependent Salient Features
- Preselection Bandits
- Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification
- Principled learning method for Wasserstein distributionally robust optimization with local perturbations
- Private Counting from Anonymous Messages: Near-Optimal Accuracy with Vanishing Communication Overhead
- Privately detecting changes in unknown distributions
- Privately Learning Markov Random Fields
- Private Outsourced Bayesian Optimization
- Private Query Release Assisted by Public Data
- Private Reinforcement Learning with PAC and Regret Guarantees
- Probing Emergent Semantics in Predictive Agents via Question Answering
- Problems with Shapley-value-based explanations as feature importance measures
- Progressive Graph Learning for Open-Set Domain Adaptation
- Progressive Identification of True Labels for Partial-Label Learning
- Projection-free Distributed Online Convex Optimization with $O(\sqrt{T})$ Communication Complexity
- Projective Preferential Bayesian Optimization
- Proper Network Interpretability Helps Adversarial Robustness in Classification
- Provable guarantees for decision tree induction: the agnostic setting
- Provable Representation Learning for Imitation Learning via Bi-level Optimization
- Provable Self-Play Algorithms for Competitive Reinforcement Learning
- Provable Smoothness Guarantees for Black-Box Variational Inference
- Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation
- Provably Efficient Exploration in Policy Optimization
- Provably Efficient Model-based Policy Adaptation
- Proving the Lottery Ticket Hypothesis: Pruning is All You Need
- Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup
- Quadratically Regularized Subgradient Methods for Weakly Convex Optimization with Weakly Convex Constraints
- Quantized Decentralized Stochastic Learning over Directed Graphs
- Quantum Boosting
- Quantum Expectation-Maximization for Gaussian mixture models
- Quantum Machine Learning : Prospects and Challenges
- Q-value Path Decomposition for Deep Multiagent Reinforcement Learning
- R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games
- Radioactive data: tracing through training
- Random extrapolation for primal-dual coordinate descent
- Random Hypervolume Scalarizations for Provable Multi-Objective Black Box Optimization
- Randomization matters How to defend against strong adversarial attacks
- Randomized Block-Diagonal Preconditioning for Parallel Learning
- Randomized Smoothing of All Shapes and Sizes
- Randomly Projected Additive Gaussian Processes for Regression
- Random Matrix Theory Proves that Deep Learning Representations of GAN-data Behave as Gaussian Mixtures
- Rank Aggregation from Pairwise Comparisons in the Presence of Adversarial Corruptions
- Rate-distortion optimization guided autoencoder for isometric embedding in Euclidean latent space
- Ready Policy One: World Building Through Active Learning
- Real-Time Optimisation for Online Learning in Auctions
- Real World Experiment Design and Active Learning
- Recent Advances in High-Dimensional Robust Statistics
- Recht-Re Noncommutative Arithmetic-Geometric Mean Conjecture is False
- Recovery of Sparse Signals from a Mixture of Linear Samples
- Recurrent Hierarchical Topic-Guided RNN for Language Generation
- Reducing Sampling Error in Batch Temporal Difference Learning
- Refined bounds for algorithm configuration: The knife-edge of dual class approximability
- Regularized Optimal Transport is Ground Cost Adversarial
- Reinforcement Learning for Integer Programming: Learning to Cut
- Reinforcement Learning for Molecular Design Guided by Quantum Mechanics
- Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism
- Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound
- Relaxing Bijectivity Constraints with Continuously Indexed Normalising Flows
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks
- Reliable Fidelity and Diversity Metrics for Generative Models
- Representation Learning via Adversarially-Contrastive Optimal Transport
- Representation Learning Without Labels
- Representations for Stable Off-Policy Reinforcement Learning
- Representing Unordered Data Using Complex-Weighted Multiset Automata
- Reserve Pricing in Repeated Second-Price Auctions with Strategic Bidders
- Responsive Safety in Reinforcement Learning by PID Lagrangian Methods
- Restarted Bayesian Online Change-point Detector achieves Optimal Detection Delay
- Rethinking Bias-Variance Trade-off for Generalization of Neural Networks
- Retrieval Augmented Language Model Pre-Training
- Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search
- Reverse-engineering deep ReLU networks
- Revisiting Fundamentals of Experience Replay
- Revisiting Spatial Invariance with Low-Rank Local Connectivity
- Revisiting Training Strategies and Generalization Performance in Deep Metric Learning
- Reward-Free Exploration for Reinforcement Learning
- RIFLE: Backpropagation in Depth for Deep Transfer Learning through Re-Initializing the Fully-connected LayEr
- Rigging the Lottery: Making All Tickets Winners
- Robust and Stable Black Box Explanations
- Robust Bayesian Classification Using An Optimistic Score Ratio
- Robust Graph Representation Learning via Neural Sparsification
- Robustifying Sequential Neural Processes
- Robust Learning with the Hilbert-Schmidt Independence Criterion
- Robustness to Programmable String Transformations via Augmented Abstract Training
- Robustness to Spurious Correlations via Human Annotations
- Robust One-Bit Recovery via ReLU Generative Networks: Near-Optimal Statistical Rate and Global Landscape Analysis
- Robust Outlier Arm Identification
- Robust Pricing in Dynamic Mechanism Design
- ROMA: Multi-Agent Reinforcement Learning with Emergent Roles
- Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data
- Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences
- Safe Reinforcement Learning in Constrained Markov Decision Processes
- Safe screening rules for L0-regression from Perspective Relaxations
- Sample Amplification: Increasing Dataset Size even when Learning is Impossible
- Sample Complexity Bounds for 1-bit Compressive Sensing and Binary Stable Embeddings with Generative Priors
- Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning
- SCAFFOLD: Stochastic Controlled Averaging for Federated Learning
- Scalable and Efficient Comparison-based Search without Features
- Scalable Deep Generative Modeling for Sparse Graphs
- Scalable Differentiable Physics for Learning and Control
- Scalable Differential Privacy with Certified Robustness in Adversarial Learning
- Scalable Exact Inference in Multi-Output Gaussian Processes
- Scalable Gaussian Process Separation for Kernels with a Non-Stationary Phase
- Scalable Identification of Partially Observed Systems with Certainty-Equivalent EM
- Scalable Nearest Neighbor Search for Optimal Transport
- Scaling up Hybrid Probabilistic Inference with Logical and Arithmetic Constraints via Message Passing
- Schatten Norms in Matrix Streams: Hello Sparsity, Goodbye Dimension
- SDE-Net: Equipping Deep Neural Networks with Uncertainty Estimates
- Searching to Exploit Memorization Effect in Learning with Noisy Labels
- Second-Order Provable Defenses against Adversarial Attacks
- Selective Dyna-style Planning Under Limited Model Capacity
- Self-Attentive Associative Memory
- Self-Attentive Hawkes Process
- Self-Concordant Analysis of Frank-Wolfe Algorithms
- Self-Modulating Nonparametric Event-Tensor Factorization
- Self-PU: Self Boosted and Calibrated Positive-Unlabeled Training
- Self-supervised Label Augmentation via Input Transformations
- Self-supervision in Audio and Speech
- Semiparametric Nonlinear Bipartite Graph Representation Learning with Provable Guarantees
- Semismooth Newton Algorithm for Efficient Projections onto $\ell_{1, \infty}$-norm Ball
- Semi-Supervised Learning with Normalizing Flows
- Semi-Supervised StyleGAN for Disentanglement Learning
- Sequence Generation with Mixed Representations
- Sequential Cooperative Bayesian Inference
- Sequential Transfer in Reinforcement Learning with a Generative Model
- Set Functions for Time Series
- Sets Clustering
- SGD Learns One-Layer Networks in WGANs
- Sharp Composition Bounds for Gaussian Differential Privacy via Edgeworth Expansion
- Sharp Statistical Guaratees for Adversarially Robust Gaussian Classification
- SIGUA: Forgetting May Make Learning with Noisy Labels More Robust
- SimGANs: Simulator-Based Generative Adversarial Networks for ECG Synthesis to Improve Deep ECG Classification
- Simple and Deep Graph Convolutional Networks
- Simple and sharp analysis of k-means||
- Simultaneous Inference for Massive Data: Distributed Bootstrap
- Single Point Transductive Prediction
- Skew-Fit: State-Covering Self-Supervised Reinforcement Learning
- Small Data, Big Decisions: Model Selection in the Small-Data Regime
- Smaller, more accurate regression forests using tree alternating optimization
- Small-GAN: Speeding up GAN Training using Core-Sets
- SoftSort: A Continuous Relaxation for the argsort Operator
- Soft Threshold Weight Reparameterization for Learnable Sparsity
- Source Separation with Deep Generative Priors
- Sparse Convex Optimization via Adaptively Regularized Hard Thresholding
- Sparse Gaussian Processes with Spherical Harmonic Features
- Sparse Shrunk Additive Models
- Sparse Sinkhorn Attention
- Sparse Subspace Clustering with Entropy-Norm
- Sparsified Linear Programming for Zero-Sum Equilibrium Finding
- Spectral Clustering with Graph Neural Networks for Graph Pooling
- Spectral Frank-Wolfe Algorithm: Strict Complementarity and Linear Convergence
- Spectral Graph Matching and Regularized Quadratic Relaxations: Algorithm and Theory
- Spectral Subsampling MCMC for Stationary Time Series
- Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks
- Spread Divergence
- Stabilizing Differentiable Architecture Search via Perturbation-based Regularization
- Stabilizing Transformers for Reinforcement Learning
- State Space Expectation Propagation: Efficient Inference Schemes for Temporal Gaussian Processes
- Statistically Efficient Off-Policy Policy Gradients
- Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization
- Stochastically Dominant Distributional Reinforcement Learning
- Stochastic bandits with arm-dependent delays
- Stochastic Coordinate Minimization with Progressive Precision for Stochastic Convex Optimization
- Stochastic Differential Equations with Variational Wishart Diffusions
- Stochastic Flows and Geometric Optimization on the Orthogonal Group
- Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization
- Stochastic Gauss-Newton Algorithms for Nonconvex Compositional Optimization
- Stochastic Gradient and Langevin Processes
- Stochastic Hamiltonian Gradient Methods for Smooth Games
- Stochastic Latent Residual Video Prediction
- Stochastic Optimization for Non-convex Inf-Projection Problems
- Stochastic Optimization for Regularized Wasserstein Estimators
- StochasticRank: Global Optimization of Scale-Free Discrete Functions
- Stochastic Regret Minimization in Extensive-Form Games
- Stochastic Subspace Cubic Newton Method
- Strategic Classification is Causal Modeling in Disguise
- Strategyproof Mean Estimation from Multiple-Choice Questions
- Streaming Coresets for Symmetric Tensor Factorization
- Streaming k-Submodular Maximization under Noise subject to Size Constraint
- Streaming Submodular Maximization under a k-Set System Constraint
- Strength from Weakness: Fast Learning Using Weak Supervision
- Striving for Simplicity and Performance in Off-Policy DRL: Output Normalization and Non-Uniform Sampling
- Stronger and Faster Wasserstein Adversarial Attacks
- Structural Language Models of Code
- Structure Adaptive Algorithms for Stochastic Bandits
- Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis
- Structured Policy Iteration for Linear Quadratic Regulator
- Structured Prediction with Partial Labelling through the Infimum Loss
- Student Specialization in Deep Rectified Networks With Finite Width and Input Dimension
- Student-Teacher Curriculum Learning via Reinforcement Learning: Predicting Hospital Inpatient Admission Location
- Sub-Goal Trees -- a Framework for Goal-Based Reinforcement Learning
- Sub-linear Memory Sketches for Near Neighbor Search on Streaming Data
- Submodular Optimization: From Discrete to Continuous and Back
- Subspace Fitting Meets Regression: The Effects of Supervision and Orthonormality Constraints on Double Descent of Generalization Errors
- Super-efficiency of automatic differentiation for functions defined as a minimum
- Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent
- Supervised learning: no loss no cry
- Supervised Quantile Normalization for Low Rank Matrix Factorization
- Symbolic Network: Generalized Neural Policies for Relational MDPs
- Tails of Lipschitz Triangular Flows
- TaskNorm: Rethinking Batch Normalization for Meta-Learning
- Task-Oriented Active Perception and Planning in Environments with Partially Known Semantics
- Task Understanding from Confusing Multi-task Data
- Taylor Expansion Policy Optimization
- T-Basis: a Compact Representation for Neural Networks
- Teaching with Limited Information on the Learner's Behaviour
- Temporal Logic Point Processes
- Temporal Phenotyping using Deep Predictive Clustering of Disease Progression
- Tensor denoising and completion based on ordinal observations
- Test-Time Training with Self-Supervision for Generalization under Distribution Shifts
- T-GD: Transferable GAN-generated Images Detection Framework
- The Boomerang Sampler
- The Buckley-Osthus model and the block preferential attachment model: statistical analysis and application
- The Complexity of Finding Stationary Points with Stochastic Gradient Descent
- The continuous categorical: a novel simplex-valued exponential family
- The Cost-free Nature of Optimally Tuning Tikhonov Regularizers and Other Ordered Smoothers
- The Differentiable Cross-Entropy Method
- The Effect of Natural Distribution Shift on Question Answering Models
- The FAST Algorithm for Submodular Maximization
- The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent
- The Implicit and Explicit Regularization Effects of Dropout
- The Implicit Regularization of Stochastic Gradient Flow for Least Squares
- The Intrinsic Robustness of Stochastic Bandits to Strategic Manipulation
- The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks
- The Many Shapley Values for Model Explanation
- The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization
- The Non-IID Data Quagmire of Decentralized Machine Learning
- Theoretical Foundations of Reinforcement Learning
- The Performance Analysis of Generalized Margin Maximizers on Separable Data
- The Role of Regularization in Classification of High-dimensional Noisy Gaussian Mixture
- The Sample Complexity of Best-$k$ Items Selection from Pairwise Comparisons
- The Shapley Taylor Interaction Index
- The Tree Ensemble Layer: Differentiability meets Conditional Computation
- The Usual Suspects? Reassessing Blame for VAE Posterior Collapse
- Thompson Sampling Algorithms for Mean-Variance Bandits
- Thompson Sampling via Local Uncertainty
- Tightening Exploration in Upper Confidence Reinforcement Learning
- Time-aware Large Kernel Convolutions
- Time-Consistent Self-Supervision for Semi-Supervised Learning
- Time Series Deconfounder: Estimating Treatment Effects over Time in the Presence of Hidden Confounders
- Too Relaxed to Be Fair
- Topic Modeling via Full Dependence Mixtures
- Topological Autoencoders
- Topologically Densified Distributions
- Towards Accurate Post-training Network Quantization via Bit-Split and Stitching
- Towards Adaptive Residual Network Training: A Neural-ODE Perspective
- Towards a General Theory of Infinite-Width Limits of Neural Classifiers
- Towards non-parametric drift detection via Dynamic Adapting Window Independence Drift Detection (DAWIDD)
- Towards Understanding the Dynamics of the First-Order Adversaries
- Towards Understanding the Regularization of Adversarial Robustness on Neural Networks
- Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
- Training Binary Neural Networks through Learning with Noisy Supervision
- Training Binary Neural Networks using the Bayesian Learning Rule
- Training Deep Energy-Based Models with f-Divergence Minimization
- Training Linear Neural Networks: Non-Local Convergence and Complexity Results
- Training Neural Networks for and by Interpolation
- TrajectoryNet: A Dynamic Optimal Transport Network for Modeling Cellular Dynamics
- Transfer Learning without Knowing: Reprogramming Black-box Machine Learning Models with Scarce Data and Limited Resources
- Transformation of ReLU-based recurrent neural networks from discrete-time to continuous-time
- Transformer Hawkes Process
- Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
- Transparency Promotion with Model-Agnostic Linear Competitors
- Tuning-free Plug-and-Play Proximal Algorithm for Inverse Imaging Problems
- Two Routes to Scalable Credit Assignment without Weight Symmetry
- Two Simple Ways to Learn Individual Fairness Metrics from Data
- Unbiased Risk Estimators Can Mislead: A Case Study of Learning with Complementary Labels
- Uncertainty and Robustness in Deep Learning Workshop (UDL)
- Uncertainty-Aware Lookahead Factor Models for Quantitative Investing
- Uncertainty Estimation Using a Single Deep Deterministic Neural Network
- Uncertainty quantification for nonconvex tensor completion: Confidence intervals, heteroscedasticity and optimality
- Understanding and Mitigating the Tradeoff between Robustness and Accuracy
- Understanding and Stabilizing GANs' Training Dynamics Using Control Theory
- Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere
- Understanding Self-Training for Gradual Domain Adaptation
- Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling
- Understanding the Impact of Model Incoherence on Convergence of Incremental SGD with Random Reshuffle
- Undirected Graphical Models as Approximate Posteriors
- Uniform Convergence of Rank-weighted Learning
- UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training
- Unique Properties of Flat Minima in Deep Networks
- Universal Asymptotic Optimality of Polyak Momentum
- Universal Equivariant Multilayer Perceptrons
- Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift
- Unraveling Meta-Learning: Understanding Feature Representations for Few-Shot Tasks
- Unsupervised Discovery of Interpretable Directions in the GAN Latent Space
- Unsupervised Speech Decomposition via Triple Information Bottleneck
- Unsupervised Transfer Learning for Spatiotemporal Predictive Networks
- Up or Down? Adaptive Rounding for Post-Training Quantization
- Upper bounds for Model-Free Row-Sparse Principal Component Analysis
- Variable Skipping for Autoregressive Range Density Estimation
- Variance Reduced Coordinate Descent with Acceleration: New Method With a Surprising Application to Finite-Sum Problems
- Variance Reduction and Quasi-Newton for Particle-Based Variational Inference
- Variance Reduction in Stochastic Particle-Optimization Sampling
- Variational Autoencoders with Riemannian Brownian Motion Priors
- Variational Bayesian Quantization
- Variational Imitation Learning with Diverse-quality Demonstrations
- Variational Inference for Sequential Data with Future Likelihood Estimates
- Variational Label Enhancement
- VFlow: More Expressive Generative Flows with Variational Data Augmentation
- VideoOneNet: Bidirectional Convolutional Recurrent OneNet with Trainable Data Steps for Video Processing
- Video Prediction via Example Guidance
- Visual Grounding of Learned Physical Models
- Voice Separation with an Unknown Number of Multiple Speakers
- WaveFlow: A Compact Flow-based Model for Raw Audio
- Weakly-Supervised Disentanglement Without Compromises
- What can I do here? A Theory of Affordances in Reinforcement Learning
- What Can Learned Intrinsic Rewards Capture?
- What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization?
- When are Non-Parametric Methods Robust?
- When deep denoising meets iterative phase retrieval
- When Demands Evolve Larger and Noisier: Learning and Earning in a Growing Environment
- When Does Self-Supervision Help Graph Convolutional Networks?
- When Explanations Lie: Why Many Modified BP Attributions Fail
- Which Tasks Should Be Learned Together in Multi-task Learning?
- Why Are Learned Indexes So Effective?
- Why bigger is not always better: on finite and infinite neural networks
- WiML D&I Chairs Remarks: Sinead Williamson and Rachel Thomas
- Word-Level Speech Recognition With a Letter to Word Encoder
- Working Memory Graphs
- Workshop on AI for Autonomous Driving (AIAD)
- Workshop on Continual Learning
- Workshop on eXtreme Classification: Theory and Applications
- Workshop on Learning in Artificial Open Worlds
- XtarNet: Learning to Extract Task-Adaptive Representation for Incremental Few-Shot Learning
- XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalisation
- XXAI: Extending Explainable AI Beyond Deep Models and Classifiers
- Zeno++: Robust Fully Asynchronous SGD