### ICML 2019 Events with Videos

## Accepted Talks

- Two-level Explanations in Music Emotion Recognition
- NPR: Neural Personalised Ranking for Song Selection
- A Model-Driven Exploration of Accent Within the Amateur Singing Voice
- Interactive Neural Audio Synthesis
- Visualizing and Understanding Self-attention based Music Tagging
- A CycleGAN for style transfer between drum & bass subgenres

## Best Paper Talks

## Contributed Talks

- Contributed Talk 1: A Boosting Tree Based AutoML System for Lifelong Machine Learning
- Submodular Batch Selection for Training Deep Neural Networks
- Interpretability Contributed Talks
- Ngo Trong Trung
- Contributed Talk 2: Transfer NAS: Knowledge Transfer between Search Spaces with Transformer Agents
- Exact Sampling of Determinantal Point Processes with Sublinear Time Preprocessing
- Contributed Talk 3: Random Search and Reproducibility for Neural Architecture Search
- Seq2Slate: Re-ranking and Slate Optimization with RNNs
- Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret
- Contributed Talk: Learning Exploration Policies for Model-Agnostic Meta-Reinforcement Learning
- Contributed Talk: Lifelong Learning via Online Leverage Score Sampling
- Contributed Talk: Improving Relevance Prediction with Transfer Learning in Large-scale Retrieval Systems
- Contributed Talk: Continual Adaptation for Efficient Machine Communication

## Contributed Talks

## Contributed talks

- On Two Ways to use Determinantal Point Processes for Monte Carlo Integration
- Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem
- Detecting Extrapolation with Influence Functions
- How Can We Be So Dense? The Robustness of Highly Sparse Representations
- Subspace Inference for Bayesian Deep Learning
- Quality of Uncertainty Quantification for Bayesian Neural Network Inference
- 'In-Between' Uncertainty in Bayesian Neural Networks
- Iris R. Seaman
- Using AI for Economic Upliftment of Handicraft Industry
- Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening
- Faraz Torabi
- Seyed Kamyar Seyed Ghasemipour
- Nicholas R Waytowich
- Learning Global Variations in Outdoor PM_2.5 Concentrations with Satellite Images
- Pareto Efficient Fairness for Skewed Subgroup Data
- Crisis Sub-Events on Social Media: A Case Study of Wildfires
- Abhishek Das
- Towards Detecting Dyslexia in Children's Handwriting Using Neural Networks

## Crowdsourcing Deep Learning Phenomenas

## Discussions

## Discussion Panels

## Invited Talks

- Machine learning for robots to think fast
- Best Paper
- The U.S. Census Bureau Tries to be a Good Data Steward in the 21st Century
- Test of Time Award
- Online Dictionary Learning for Sparse Coding
- What 4 year olds can do and AI canâ€™t (yet)
- Best Paper
- Keynote by Dan Roy: Progress on Nonvacuous Generalization Bounds
- Caroline Uhler
- Invited Talk 1: Adaptive Tolling for Multiagent Traffic Optimization
- Keynote by Chelsea Finn: Training for Generalization
- Invited Talk 2: The Strategic Perils of Learning from Historical Data
- Keynote by Sham Kakade: Prediction, Learning, and Memory
- Invited Talk 3: Trend-Following Trading Strategies and Financial Market Stability
- Francisco LePort
- Jeff Bilmes: Deep Submodular Synergies
- Keynote by Mikhail Belkin: A Hard Look at Generalization and its Theories
- Keynote by Jason Lee: On the Foundations of Deep Learning: SGD, Overparametrization, and Generalization
- Michal Valko: How Negative Dependence Broke the Quadratic Barrier for Learning with Graphs and Kernels
- Invited Talk 5: Intra-day Stock Price Prediction as a Measure of Market Efficiency
- Sergei Levine: Distribution Matching and Mutual Information in Reinforcement Learning
- Suchi Saria (John Hopkins) - Link between Causal Inference and Reinforcement Learning and Applications to Learning from Offline/Observational Data
- Dawn Woodard (Uber) - Dynamic Pricing and Matching for Ride-Hailing
- Cheng Zhang: Active Mini-Batch Sampling using Repulsive Point Processes
- Building and Structuring Training Sets for Multi-Task Learning (Alex Ratner)
- Salman Avestimehr: Lagrange Coded Computing: Optimal Design for Resilient, Secure, and Private Distributed Learning
- Meta-Learning: Challenges and Frontiers (Chelsea Finn)
- Invited Talk by Professor Alexei Efros (UC Berkeley)
- Emo Todorov
- Making Efficient use of Musical Annotations
- Rashmi Vinayak: Resilient ML inference via coded computation: A learning-based approach
- Characterizing Musical Correlates of Large-Scale Discovery Behavior
- Pieter Abbeel
- Markus Weimer: A case for coded computing on elastic compute
- ARUBA: Efficient and Adaptive Meta-Learning with Provable Guarantees (Ameet Talwalkar)
- Personalization at Amazon Music
- Alex Dimakis: Coding Theory for Distributed Learning
- Efficient Lifelong Learning Algorithms: Regret Bounds and Statistical Guarantees (Massimiliano Pontil)
- Wei Zhang: Distributed deep learning system building at IBM: Scale-up and Scale-out case studies
- Multi-Task Learning in the Wilderness (Andrej Karpathy)
- CVPR19 Media Forensics workshop: a Preview
- Olga Russakovsky
- Recent Trends in Personalization: A Netflix Perspective (Justin Basilico)
- Invited Talk by Tom Van de Weghe (Stanford & VRT)
- User-curated shaping of expressive performances
- Toward Robust AI Systems for Understanding and Reasoning Over Multimodal Data (Hannaneh Hajishirzi)

## Invited Talk 1S

## Invited Talk 2 (Bernt Schiele)s

## Invited Talk 3S

## Invited Talk 4S

## Invited Talk 5S

## Invited Talk 6 (Katerina Fragkiadaki)s

## Invited speakers

## Invited talks

- Invited Talk: James Philbin
- Linearized two-layers neural networks in high dimension
- Hardware Efficiency Aware Neural Architecture Search and Compression
- Victor-Emmanuel Brunel: Negative Association and Discrete Determinantal Point Processes
- Invited Talk: Sanja Fidler
- Loss landscape and behaviour of algorithms in the spiked matrix-tensor model
- Structured matrices for efficient deep learning
- Why it's hard to mitigate climate change, and how to do better
- Tackling climate change challenges with AI through collaboration
- On the Interplay between Physics and Deep Learning
- Understanding the Challenges of Algorithm and Hardware Co-design for Deep Neural Networks
- Why Deep Learning Works: Traditional and Heavy-Tailed Implicit Self-Regularization in Deep Neural Networks
- Personalized Visualization of the Impact of Climate Change
- Advances in Climate Informatics: Machine Learning for the Study of Climate Change
- DNN Training and Inference with Hyper-Scaled Precision
- Is Optimization a sufficient language to understand Deep Learning?
- Mixed Precision Training & Inference
- Geoscience data and models for the Climate Change AI community
- Understanding overparameterized neural networks
- ML vs. Climate Change, Applications in Energy at DeepMind
- Sergey Levine: Unsupervised Reinforcement Learning and Meta-Learning
- Joyce Chai
- Vivienne Sze: Exploiting redundancy for efficient processing of DNNs and beyond
- Stefano Ermon
- Peter Stone: Learning Curricula for Transfer Learning in RL
- Jacob Andreas: Linguistic Scaffolds for Policy Learning
- Karol Hausman: Skill Representation and Supervision in Multi-Task Reinforcement Learning
- Martha White: Learning Representations for Continual Learning
- Natasha Jaques
- Natalia Diaz-Rodriguez: Continual Learning and Robotics: an overview
- Pierre Sermanet
- Andrew Saxe: Intriguing phenomena in training and generalization dynamics of deep networks
- Jeff Clune: Towards Solving Catastrophic Forgetting with Neuromodulation & Learning Curricula by Generating Environments
- Nicolas Heess: TBD
- Kris Kitani
- Benjamin Rosman: Exploiting Structure For Accelerating Reinforcement Learning

## Keynotes

- Keynote by Max Welling: A Nonparametric Bayesian Approach to Deep Learning (without GPs)
- Keynote by Kilian Weinberger: On Calibration and Fairness
- Keynote by Suchi Saria: Safety Challenges with Black-Box Predictors and Novel Learning Approaches for Failure Proofing
- Keynote by Dawn Song: Adversarial Machine Learning: Challenges, Lessons, and Future Directions
- Keynote by Terrance Boult: The Deep Unknown: on Open-set and Adversarial Examples in Deep Learning
- From Listening to Watching, A Recommender Systems Perspective
- Doina Precup
- Solving societal challenges with AI through partnerships
- AI for Ecology and Conservation
- AI for Whose Social Good?
- How to Volunteer as an AI Researcher
- Creating constructive change and avoiding unintended consequences from machine learning
- Assisting Vulnerable Communities through AI and OR: from Data to Deployed Decisions

## Keynote Talks

- Keynote by Peter Frazier: Grey-box Bayesian Optimization for AutoML
- Keynote by Rachel Thomas: Lessons Learned from Helping 200,000 non-ML experts use ML
- Keynote by Jeff Dean: An Overview of Google's Work on AutoML and Future Directions
- Keynote by Charles Sutton: Towards Semi-Automated Machine Learning

## Keynote talks

## Lightning Talks

## Opening Remarks

## Opening remarks

## Orals

- Adversarial Attacks on Node Embeddings via Graph Poisoning
- SelectiveNet: A Deep Neural Network with an Integrated Reject Option
- ELF OpenGo: an analysis and open reimplementation of AlphaZero
- A Contrastive Divergence for Combining Variational Inference and MCMC
- Refined Complexity of PCA with Outliers
- PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization
- Validating Causal Inference Models via Influence Functions
- Data Shapley: Equitable Valuation of Data for Machine Learning
- First-Order Adversarial Vulnerability of Neural Networks and Input Dimension
- Manifold Mixup: Better Representations by Interpolating Hidden States
- Making Deep Q-learning methods robust to time discretization
- Calibrated Approximate Bayesian Inference
- On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms
- Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization
- Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks
- Feature Grouping as a Stochastic Regularizer for High-Dimensional Structured Data
- On Certifying Non-Uniform Bounds against Adversarial Attacks
- Processing Megapixel Images with Deep Attention-Sampling Models
- Nonlinear Distributional Gradient Temporal-Difference Learning
- Moment-Based Variational Inference for Markov Jump Processes
- Passed & Spurious: Descent Algorithms and Local Minima in Spiked Matrix-Tensor Models
- Faster Stochastic Alternating Direction Method of Multipliers for Nonconvex Optimization
- Learning to Groove with Inverse Sequence Transformations
- Metric-Optimized Example Weights
- Improving Adversarial Robustness via Promoting Ensemble Diversity
- TapNet: Neural Network Augmented with Task-Adaptive Projection for Few-Shot Learning
- Composing Entropic Policies using Divergence Correction
- Understanding MCMC Dynamics as Flows on the Wasserstein Space
- Teaching a black-box learner
- Lower Bounds for Smooth Nonconvex Finite-Sum Optimization
- Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI
- Improving Model Selection by Employing the Test Data
- Adversarial camera stickers: A physical camera-based attack on deep learning systems
- Online Meta-Learning
- TibGM: A Transferable and Information-Based Graphical Model Approach for Reinforcement Learning
- LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations
- PAC Learnability of Node Functions in Networked Dynamical Systems
- Nonconvex Variance Reduced Optimization with Arbitrary Sampling
- HOList: An Environment for Machine Learning of Higher Order Logic Theorem Proving
- Topological Data Analysis of Decision Boundaries with Application to Model Selection
- Adversarial examples from computational constraints
- Training Neural Networks with Local Error Signals
- Multi-Agent Adversarial Inverse Reinforcement Learning
- Amortized Monte Carlo Integration
- Online learning with kernel losses
- Error Feedback Fixes SignSGD and other Gradient Compression Schemes
- Molecular Hypergraph Grammar with Its Application to Molecular Optimization
- Contextual Memory Trees
- POPQORN: Quantifying Robustness of Recurrent Neural Networks
- GMNN: Graph Markov Neural Networks
- Policy Consolidation for Continual Reinforcement Learning
- Stein Point Markov Chain Monte Carlo
- Nearest Neighbor and Kernel Survival Analysis: Nonasymptotic Error Bounds and Strong Consistency Rates
- A Composite Randomized Incremental Gradient Method
- Graph Neural Network for Music Score Data and Modeling Expressive Piano Performance
- Sparse Extreme Multi-label Learning with Oracle Property
- Using Pre-Training Can Improve Model Robustness and Uncertainty
- Self-Attention Graph Pooling
- Off-Policy Deep Reinforcement Learning without Exploration
- Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations
- Fast Rates for a kNN Classifier Robust to Unknown Asymmetric Label Noise
- Optimal Continuous DR-Submodular Maximization and Applications to Provable Mean Field Inference
- Learning to Prove Theorems via Interacting with Proof Assistants
- Shape Constraints for Set Functions
- Generalized No Free Lunch Theorem for Adversarial Robustness
- Combating Label Noise in Deep Learning using Abstention
- Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation
- Particle Flow Bayes' Rule
- Uniform Convergence Rate of the Kernel Density Estimator Adaptive to Intrinsic Volume Dimension
- Multiplicative Weights Updates as a distributed constrained optimization algorithm: Convergence to second-order stationary points almost always
- Circuit-GNN: Graph Neural Networks for Distributed Circuit Design
- On The Power of Curriculum Learning in Training Deep Networks
- PROVEN: Verifying Robustness of Neural Networks with a Probabilistic Approach
- LGM-Net: Learning to Generate Matching Networks for Few-Shot Learning
- Revisiting the Softmax Bellman Operator: New Benefits and New Perspective
- Correlated Variational Auto-Encoders
- Maximum Likelihood Estimation for Learning Populations of Parameters
- Katalyst: Boosting Convex Katayusha for Non-Convex Problems with a Large Condition Number
- Learning to Optimize Multigrid PDE Solvers
- Voronoi Boundary Classification: A High-Dimensional Geometric Approach via Weighted Monte Carlo Integration
- On Learning Invariant Representations for Domain Adaptation
- Self-Attention Generative Adversarial Networks
- An Investigation of Model-Free Planning
- Towards a Unified Analysis of Random Fourier Features
- Generalized Approximate Survey Propagation for High-Dimensional Estimation
- Projection onto Minkowski Sums with Application to Constrained Learning
- Safe Policy Improvement with Baseline Bootstrapping
- A Block Coordinate Descent Proximal Method for Simultaneous Filtering and Parameter Estimation
- Robust Decision Trees Against Adversarial Examples
- Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models
- Multivariate-Information Adversarial Ensemble for Scalable Joint Distribution Matching
- CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning
- Learning deep kernels for exponential family densities
- Boosted Density Estimation Remastered
- Blended Conditonal Gradients
- Distributional Reinforcement Learning for Efficient Exploration
- Learning Hawkes Processes Under Synchronization Noise
- Automatic Classifiers as Scientific Instruments: One Step Further Away from Ground-Truth
- Adversarial Generation of Time-Frequency Features with application in audio synthesis
- High-Fidelity Image Generation With Fewer Labels
- Task-Agnostic Dynamics Priors for Deep Reinforcement Learning
- Bayesian Deconditional Kernel Mean Embeddings
- Inference and Sampling of $K_{33}$-free Ising Models
- Acceleration of SVRG and Katyusha X by Inexact Preconditioning
- Optimistic Policy Optimization via Multiple Importance Sampling
- Generative Adversarial User Model for Reinforcement Learning Based Recommendation System
- Look Ma, No Latent Variables: Accurate Cutset Networks via Compilation
- On the Universality of Invariant Networks
- Revisiting precision recall definition for generative modeling
- Diagnosing Bottlenecks in Deep Q-learning Algorithms
- A Kernel Perspective for Regularizing Deep Neural Networks
- Random Matrix Improved Covariance Estimation for a Large Class of Metrics
- Characterization of Convex Objective Functions and Optimal Expected Convergence Rates for SGD
- Neural Logic Reinforcement Learning
- A Statistical Investigation of Long Memory in Language and Music
- Optimal Transport for structured data with application on graphs
- Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks
- Wasserstein of Wasserstein Loss for Learning Generative Models
- Collaborative Evolutionary Reinforcement Learning
- A Persistent Weisfeiler--Lehman Procedure for Graph Classification
- Dual Entangled Polynomial Code: Three-Dimensional Coding for Distributed Matrix Multiplication
- A Conditional-Gradient-Based Augmented Lagrangian Framework
- Learning to Collaborate in Markov Decision Processes
- Deep Factors for Forecasting
- Learning Optimal Linear Regularizers
- Gauge Equivariant Convolutional Networks and the Icosahedral CNN
- Flat Metric Minimization with Applications in Generative Modeling
- EMI: Exploration with Mutual Information
- Rehashing Kernel Evaluation in High Dimensions
- Neural Joint Source-Channel Coding
- SGD: General Analysis and Improved Rates
- Predictor-Corrector Policy Optimization
- Weakly-Supervised Temporal Localization via Occurrence Count Learning
- On Symmetric Losses for Learning from Corrupted Labels
- Feature-Critic Networks for Heterogeneous Domain Generalization
- Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs
- Imitation Learning from Imperfect Demonstration
- Large-Scale Sparse Kernel Canonical Correlation Analysis
- Doubly-Competitive Distribution Estimation
- Curvature-Exploiting Acceleration of Elastic Net Computations
- Learning a Prior over Intent via Meta-Inverse Reinforcement Learning
- Switching Linear Dynamics for Variational Bayes Filtering
- Learning to Convolve: A Generalized Weight-Tying Approach
- Non-Parametric Priors For Generative Adversarial Networks
- Curiosity-Bottleneck: Exploration By Distilling Task-Specific Novelty
- A Kernel Theory of Modern Data Augmentation
- Homomorphic Sensing
- Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication
- DeepMDP: Learning Continuous Latent Space Models for Representation Learning
- Imputing Missing Events in Continuous-Time Event Streams
- Regularization in directable environments with application to Tetris
- On Dropout and Nuclear Norm Regularization
- Lipschitz Generative Adversarial Nets
- Dynamic Weights in Multi-Objective Deep Reinforcement Learning
- kernelPSI: a Post-Selection Inference Framework for Nonlinear Variable Selection
- Phaseless PCA: Low-Rank Matrix Recovery from Column-wise Phaseless Measurements
- Safe Grid Search with Optimal Complexity
- Importance Sampling Policy Evaluation with an Estimated Behavior Policy
- Understanding and Controlling Memory in Recurrent Neural Networks
- Improved Dynamic Graph Learning through Fault-Tolerant Sparsification
- Gradient Descent Finds Global Minima of Deep Neural Networks
- HexaGAN: Generative Adversarial Nets for Real World Classification
- Fingerprint Policy Optimisation for Robust Reinforcement Learning
- Scalable Learning in Reproducing Kernel Krein Spaces
- Rate Distortion For Model Compression:From Theory To Practice
- SAGA with Arbitrary Sampling
- Learning from a Learner
- Recurrent Kalman Networks: Factorized Inference in High-Dimensional Deep Feature Spaces
- Heterogeneous Model Reuse via Optimizing Multiparty Multiclass Margin
- Composable Core-sets for Determinant Maximization: A Simple Near-Optimal Algorithm
- Graph Matching Networks for Learning the Similarity of Graph Structured Objects
- An Investigation into Neural Net Optimization via Hessian Eigenvalue Density
- Dirichlet Simplex Nest and Geometric Inference
- Formal Privacy for Functional Data with Gaussian Perturbations
- Natural Analysts in Adaptive Data Analysis
- Separable value functions across time-scales
- Subspace Robust Wasserstein Distances
- Rethinking Lossy Compression: The Rate-Distortion-Perception Tradeoff
- Sublinear Time Nearest Neighbor Search over Generalized Weighted Space
- BayesNAS: A Bayesian Approach for Neural Architecture Search
- Differentiable Linearized ADMM
- Bayesian leave-one-out cross-validation for large data
- Graphical-model based estimation and inference for differential privacy
- CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration
- Learning Action Representations for Reinforcement Learning
- Decomposing feature-level variation with Covariate Gaussian Process Latent Variable Models
- Collaborative Channel Pruning for Deep Networks
- Compressing Gradient Optimizers via Count-Sketches
- Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks
- Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search
- Rao-Blackwellized Stochastic Gradients for Discrete Distributions
- White-box vs Black-box: Bayes Optimal Strategies for Membership Inference
- Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction
- Bayesian Counterfactual Risk Minimization
- Active Manifolds: A non-linear analogue to Active Subspaces
- Same, Same But Different: Recovering Neural Network Quantization Error Through Weight Factorization
- Scalable Fair Clustering
- Shallow-Deep Networks: Understanding and Mitigating Network Overthinking
- A Quantitative Analysis of the Effect of Batch Normalization on Gradient Descent
- Neurally-Guided Structure Inference
- Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints
- Per-Decision Option Discounting
- Optimal Minimal Margin Maximization with Boosting
- GDPP: Learning Diverse Generations using Determinantal Point Processes
- Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator
- Graph U-Nets
- The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
- Bayesian Joint Spike-and-Slab Graphical Lasso
- Sublinear Space Private Algorithms Under the Sliding Window Model
- Optimality Implies Kernel Sum Classifiers are Statistically Efficient
- Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
- Generalized Linear Rule Models
- Fault Tolerance in Iterative-Convergent Machine Learning
- SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver
- AdaGrad stepsizes: sharp convergence over nonconvex landscapes
- Rotation Invariant Householder Parameterization for Bayesian PCA
- Locally Private Bayesian Inference for Count Models
- The Implicit Fairness Criterion of Unconstrained Learning
- A Theory of Regularized Markov Decision Processes
- Fast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications
- GEOMetrics: Exploiting Geometric Structure for Graph-Encoded Objects
- Static Automatic Batching In TensorFlow
- Area Attention
- Beyond Backprop: Online Alternating Minimization with Auxiliary Variables
- A Framework for Bayesian Optimization in Embedded Subspaces
- Low Latency Privacy Preserving Inference
- Weak Detection of Signal in the Spiked Wigner Model
- Discovering Options for Exploration by Minimizing Cover Time
- Variational Inference for sparse network reconstruction from count data
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
- Improving Neural Network Quantization without Retraining using Outlier Channel Splitting
- The Evolved Transformer
- SWALP : Stochastic Weight Averaging in Low Precision Training
- Convolutional Poisson Gamma Belief Network
- Communication Complexity in Locally Private Distribution Estimation and Heavy Hitters
- Rademacher Complexity for Adversarially Robust Generalization
- Policy Certificates: Towards Accountable Reinforcement Learning
- Simplifying Graph Convolutional Networks
- Geometry Aware Convolutional Filters for Omnidirectional Images Representation
- Memory-Optimal Direct Convolutions for Maximizing Classification Accuracy in Embedded Applications
- Jumpout : Improved Dropout for Deep Neural Networks with ReLUs
- Efficient optimization of loops and limits with randomized telescoping sums
- Automatic Posterior Transformation for Likelihood-Free Inference
- Poission Subsampled R\'enyi Differential Privacy
- Provably efficient RL with Rich Observations via Latent State Decoding
- Action Robust Reinforcement Learning and Applications in Continuous Control
- Robust Influence Maximization for Hyperparametric Models
- A Personalized Affective Memory Model for Improving Emotion Recognition
- DL2: Training and Querying Neural Networks with Logic
- Stochastic Deep Networks
- Self-similar Epochs: Value in arrangement
- Active Learning for Decision-Making from Imbalanced Observational Data
- Benefits and Pitfalls of the Exponential Mechanism with Applications to Hilbert Spaces and Functional PCA
- Information-Theoretic Considerations in Batch Reinforcement Learning
- The Value Function Polytope in Reinforcement Learning
- HyperGAN: A Generative Model for Diverse, Performant Neural Networks
- Temporal Gaussian Mixture Layer for Videos
- Theoretically Principled Trade-off between Robustness and Accuracy
- Sum-of-Squares Polynomial Flow
- Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning
- Distribution calibration for regression
- On the Convergence and Robustness of Adversarial Training
- Distributed Learning with Sublinear Communication
- Complexity of Linear Regions in Deep Networks
- Exploiting Worker Correlation for Label Aggregation in Crowdsourcing
- Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards
- The Odds are Odd: A Statistical Test for Detecting Adversarial Examples
- FloWaveNet : A Generative Flow for Raw Audio
- Maximum Entropy-Regularized Multi-Goal Reinforcement Learning
- Graph Convolutional Gaussian Processes
- Learning with Bad Training Data via Iterative Trimmed Loss Minimization
- On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization
- On Connected Sublevel Sets in Deep Learning
- Efficient Amortised Bayesian Inference for Hierarchical and Nonlinear Dynamical Systems
- Target Tracking for Contextual Bandits: Application to Demand Side Management
- ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation
- Are Generative Classifiers More Robust to Adversarial Attacks?
- Imitating Latent Policies from Observation
- Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation
- Adversarial Examples Are a Natural Consequence of Test Error in Noise
- A Multitask Multiple Kernel Learning Algorithm for Survival Analysis with Application to Cancer Biology
- Correlated bandits or: How to minimize mean-squared error online
- Certified Adversarial Robustness via Randomized Smoothing
- SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning
- GOODE: A Gaussian Off-The-Shelf Ordinary Differential Equation Solver
- Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels
- Collective Model Fusion for Multiple Black-Box Experts
- Greedy Layerwise Learning Can Scale To ImageNet
- Fast and Flexible Inference of Joint Distributions from their Marginals
- Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging
- Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition
- Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning
- Overcoming Mean-Field Approximations in Recurrent Gaussian Process Models
- Does Data Augmentation Lead to Positive Margin?
- Trading Redundancy for Communication: Speeding up Distributed SGD for Non-convex Optimization
- On the Impact of the Activation function on Deep Neural Networks Training
- Cognitive model priors for predicting human decisions
- Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits
- Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization
- Structured agents for physical construction
- AReS and MaRS - Adversarial and MMD-Minimizing Regression for SDEs
- Robust Learning from Untrusted Sources
- Trimming the $\ell_1$ Regularizer: Statistical Analysis, Optimization, and Applications to Deep Learning
- Estimating Information Flow in Deep Neural Networks
- Conditioning by adaptive sampling for robust design
- Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously
- Wasserstein Adversarial Examples via Projected Sinkhorn Iterations
- Learning Novel Policies For Tasks
- End-to-End Probabilistic Inference for Nonstationary Audio Analysis
- SELFIE: Refurbishing Unclean Samples for Robust Deep Learning
- Compressed Factorization: Fast and Accurate Low-Rank Factorization of Compressively-Sensed Data
- The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects
- Direct Uncertainty Prediction for Medical Second Opinions
- Bilinear Bandits with Low-rank Structure
- Transferable Clean-Label Poisoning Attacks on Deep Neural Nets
- Taming MAML: Efficient unbiased meta-reinforcement learning
- Deep Gaussian Processes with Importance-Weighted Variational Inference
- Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance
- Noisy Dual Principal Component Pursuit
- Characterizing Well-Behaved vs. Pathological Deep Neural Networks
- Dynamic Measurement Scheduling for Event Forecasting using Deep RL
- NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks
- Self-Supervised Exploration via Disagreement
- Automated Model Selection with Bayesian Quadrature
- Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling
- Understanding Geometry of Encoder-Decoder CNNs
- Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization
- On the Design of Estimators for Bandit Off-Policy Evaluation
- Simple Black-box Adversarial Attacks
- Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
- Data Poisoning Attacks in Multi-Party Learning
- Screening rules for Lasso with non-convex Sparse Regularizers
- Traditional and Heavy Tailed Self Regularization in Neural Network Models
- DeepNose: Using artificial neural networks to represent the space of odorants
- Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem
- Causal Identification under Markov Equivalence: Completeness Results
- Invertible Residual Networks
- The Natural Language of Actions
- Beyond the Chinese Restaurant and Pitman-Yor processes: Statistical Models with double power-law behavior
- Monge blunts Bayes: Hardness Results for Adversarial Training
- Almost surely constrained convex optimization
- Domain Agnostic Learning with Disentangled Representations
- Context-Aware Zero-Shot Learning for Object Recognition
- Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models
- NAS-Bench-101: Towards Reproducible Neural Architecture Search
- Control Regularization for Reduced Variance Reinforcement Learning
- DP-GP-LVM: A Bayesian Non-Parametric Model for Learning Multivariate Dependency Structures
- Better generalization with less data using robust gradient descent
- Generalized Majorization-Minimization
- Composing Value Functions in Reinforcement Learning
- Causal Discovery and Forecasting in Nonstationary Environments with State-Space Models
- Approximated Oracle Filter Pruning for Destructive CNN Width Optimization
- On the Generalization Gap in Reparameterizable Reinforcement Learning
- Random Function Priors for Correlation Modeling
- Near optimal finite time identification of arbitrary linear dynamical systems
- On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization
- Fast Context Adaptation via Meta-Learning
- Classifying Treatment Responders Under Causal Effect Monotonicity
- LegoNet: Efficient Convolutional Neural Networks with Lego Filters
- Trajectory-Based Off-Policy Deep Reinforcement Learning
- Variational Russian Roulette for Deep Bayesian Nonparametrics
- Lossless or Quantized Boosting with Integer Arithmetic
- Simple Stochastic Gradient Methods for Non-Smooth Non-Convex Regularized Optimization
- Provable Guarantees for Gradient-Based Meta-Learning
- Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules
- Learning Models from Data with Measurement Error: Tackling Underreporting
- Sorting Out Lipschitz Function Approximation
- A Deep Reinforcement Learning Perspective on Internet Congestion Control
- Incorporating Grouping Information into Bayesian Decision Tree Ensembles
- Orthogonal Random Forest for Causal Inference
- Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization
- Towards Understanding Knowledge Distillation
- Anomaly Detection With Multiple-Hypotheses Predictions
- Adjustment Criteria for Generalizing Experimental Findings
- Graph Element Networks: adaptive, structured computation and memory
- Model-Based Active Exploration
- Variational Implicit Processes
- MONK -- Outlier-Robust Mean Embedding Estimation by Median-of-Means
- Efficient Dictionary Learning with Gradient Descent
- Transferable Adversarial Training: A General Approach to Adapting Deep Classifiers
- Kernel Mean Matching for Content Addressability of GANs
- Conditional Independence in Testing Bayesian Networks
- Training CNNs with Selective Allocation of Channels
- Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
- Discovering Latent Covariance Structures for Multiple Time Series
- The advantages of multiple classes for reducing overfitting from test set reuse
- Plug-and-Play Methods Provably Converge with Properly Trained Denoisers
- Transferability vs. Discriminability: Batch Spectral Penalization for Adversarial Domain Adaptation
- Neural Inverse Knitting: From Images to Manufacturing Instructions
- Sensitivity Analysis of Linear Structural Causal Models
- Equivariant Transformer Networks
- Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN
- Scalable Training of Inference Networks for Gaussian-Process Models
- On the statistical rate of nonlinear recovery in generative models with heavy-tailed data
- Riemannian adaptive stochastic gradient algorithms on matrix manifolds
- Learning-to-Learn Stochastic Gradient Descent with Biased Regularization
- Making Convolutional Networks Shift-Invariant Again
- More Efficient Off-Policy Evaluation through Regularized Targeted Learning
- Overcoming Multi-model Forgetting
- A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs
- Bayesian Optimization Meets Bayesian Optimal Stopping
- Phase transition in PCA with missing data: Reduced signal-to-noise ratio, not sample size!
- Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence
- BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning
- Inferring Heterogeneous Causal Effects in Presence of Spatial Confounding
- Bayesian Nonparametric Federated Learning of Neural Networks
- Remember and Forget for Experience Replay
- Learning interpretable continuous-time models of latent stochastic dynamical systems
- On Medians of (Randomized) Pairwise Means
- Alternating Minimizations Converge to Second-Order Optimal Solutions
- Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation
- IMEXnet - A Forward Stable Deep Neural Network
- Adversarially Learned Representations for Information Obfuscation and Inference
- How does Disagreement Help Generalization against Label Corruption?
- Tensor Variable Elimination for Plated Factor Graphs
- A Tree-Based Method for Fast Repeated Sampling of Determinantal Point Processes
- Position-aware Graph Neural Networks
- Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances
- Provably Efficient Imitation Learning from Observation Alone
- Active Embedding Search via Noisy Paired Comparisons
- Do ImageNet Classifiers Generalize to ImageNet?
- Adaptive Neural Trees
- EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis
- Predicate Exchange: Inference with Declarative Knowledge
- Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models
- Detecting Overlapping and Correlated Communities without Pure Nodes: Identifiability and Algorithm
- SGD without Replacement: Sharper Rates for General Smooth Convex Functions
- Dead-ends and Secure Exploration in Reinforcement Learning
- Fast Direct Search in an Optimally Compressed Continuous Target Space for Efficient Multi-Label Active Learning
- Exploring the Landscape of Spatial Robustness
- Connectivity-Optimized Representation Learning via Persistent Homology
- Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment
- Discriminative Regularization for Latent Variable Models with Applications to Electrocardiography
- Understanding and Accelerating Particle-Based Variational Inference
- Learning Generative Models across Incomparable Spaces
- On the Complexity of Approximating Wasserstein Barycenters
- Statistics and Samples in Distributional Reinforcement Learning
- Myopic Posterior Sampling for Adaptive Goal Oriented Design of Experiments
- Sever: A Robust Meta-Algorithm for Stochastic Optimization
- Minimal Achievable Sufficient Statistic Learning
- Deep Compressed Sensing
- Hierarchical Decompositional Mixtures of Variational Autoencoders
- Efficient learning of smooth probability functions from Bernoulli tests with guarantees
- Relational Pooling for Graph Representations
- Estimate Sequences for Variance-Reduced Stochastic Composite Optimization
- Hessian Aided Policy Gradient
- Bayesian Generative Active Deep Learning
- Analyzing Federated Learning through an Adversarial Lens
- Learning to Route in Similarity Graphs
- Differentiable Dynamic Normalization for Learning Deep Representation
- Finding Mixed Nash Equilibria of Generative Adversarial Networks
- The Variational Predictive Natural Gradient
- Disentangled Graph Convolutional Networks
- A Dynamical Systems Perspective on Nesterov Acceleration
- Provably Efficient Maximum Entropy Exploration
- Active Learning for Probabilistic Structured Prediction of Cuts and Matchings
- Fairwashing: the risk of rationalization
- Invariant-Equivariant Representation Learning for Multi-Class Data
- Toward Understanding the Importance of Noise in Training Neural Networks
- CompILE: Compositional Imitation Learning and Execution
- Scalable Nonparametric Sampling from Multimodal Posteriors with the Posterior Bootstrap
- Open Vocabulary Learning on Source Code with a Graph-Structured Cache
- Random Shuffling Beats SGD after Finite Epochs
- Combining parametric and nonparametric models for off-policy evaluation
- Active Learning with Disagreement Graphs
- Understanding the Origins of Bias in Word Embeddings
- Infinite Mixture Prototypes for Few-shot Learning
- Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group
- Sparse Multi-Channel Variational Autoencoder for the Joint Analysis of Heterogeneous Data
- An Instability in Variational Inference for Topic Models
- Learning Discrete Structures for Graph Neural Networks
- First-Order Algorithms Converge Faster than $O(1/k)$ on Convex Problems
- Sample-Optimal Parametric Q-Learning Using Linearly Additive Features
- Multi-Frequency Vector Diffusion Maps
- Bias Also Matters: Bias Attribution for Deep Neural Network Explanation
- MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing
- Breaking Inter-Layer Co-Adaptation by Classifier Anonymization
- Deep Generative Learning via Variational Gradient Flow
- Bayesian Optimization of Composite Functions
- Compositional Fairness Constraints for Graph Embeddings
- Improved Convergence for $\ell_1$ and $\ell_\infty$ Regression via Iteratively Reweighted Least Squares
- Transfer of Samples in Policy Search via Multiple Importance Sampling
- Co-manifold learning with missing data
- Interpreting Adversarially Trained Convolutional Neural Networks
- Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting
- Understanding the Impact of Entropy on Policy Optimization
- Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design
- The Kernel Interaction Trick: Fast Bayesian Discovery of Pairwise Interactions in High Dimensions
- A Recurrent Neural Cascade-based Model for Continuous-Time Diffusion
- Optimal Mini-Batch and Step Sizes for SAGA
- Exploration Conscious Reinforcement Learning Revisited
- Counterfactual Visual Explanations
- Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning
- Learning Neurosymbolic Generative Models via Program Synthesis
- Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization
- Stochastic Blockmodels meet Graph Neural Networks
- Differential Inclusions for Modeling Nonsmooth ADMM Variants: A Continuous Limit Theory
- Kernel-Based Reinforcement Learning in Robust Markov Decision Processes
- Data Poisoning Attacks on Stochastic Bandits
- Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions
- Matrix-Free Preconditioning in Online Learning
- Geometric Losses for Distributional Learning
- Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning
- Doubly Robust Joint Learning for Recommendation on Data Missing Not at Random
- On Sparse Linear Regression in the Local Differential Privacy Model
- Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization
- Online Convex Optimization in Adversarial Markov Decision Processes
- Classification from Positive, Unlabeled and Biased Negative Data
- Stochastic Iterative Hard Thresholding for Graph-structured Sparsity Optimization
- Linear-Complexity Data-Parallel Earth Mover's Distance Approximations
- Differentially Private Empirical Risk Minimization with Non-convex Loss Functions
- Unifying Orthogonal Monte Carlo Methods
- Complementary-Label Learning for Arbitrary Losses and Models
- Neuron birth-death dynamics accelerates gradient descent and converges asymptotically
- Model Comparison for Semantic Grouping
- Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy
- Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits
- Online Learning with Sleeping Experts and Feedback Graphs
- Learning to Infer Program Sketches
- Width Provably Matters in Optimization for Deep Linear Neural Networks
- RaFM: Rank-Aware Factorization Machines
- Differentially Private Learning of Geometric Concepts
- Metropolis-Hastings Generative Adversarial Networks
- Hierarchically Structured Meta-learning
- Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path?
- CAB: Continuous Adaptive Blending for Policy Evaluation and Learning
- Toward Controlling Discrimination in Online Ad Auctions
- Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets
- Adaptive Scale-Invariant Online Algorithms for Learning Linear Models
- Bridging Theory and Algorithm for Domain Adaptation
- Power k-Means Clustering
- MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement
- Learning Optimal Fair Policies
- Replica Conditional Sequential Monte Carlo
- Online Control with Adversarial Disturbances
- Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation
- Distributed Learning over Unreliable Networks
- Neural Separation of Observed and Unobserved Distributions
- Fairness-Aware Learning for Continuous Attributes and Treatments
- A Polynomial Time MCMC Method for Sampling from Continuous Determinantal Point Processes
- Adversarial Online Learning with noise
- Learning What and Where to Transfer
- Escaping Saddle Points with Adaptive Gradient Methods
- Almost Unsupervised Text to Speech and Automatic Speech Recognition
- Fairness risk measures
- Adaptive Antithetic Sampling for Variance Reduction
- Online Variance Reduction with Mixtures
- $\texttt{DoubleSqueeze}$: Parallel Stochastic Gradient Descent with Double-pass Error-Compensated Compression
- AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
- Accelerated Flow for Probability Distributions
- Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case
- Model Function Based Conditional Gradient Method with Armijo-like Line Search
- A fully differentiable beam search decoder
- Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem
- Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel $k$-means Clustering
- Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret
- DBSCAN++: Towards fast and scalable density clustering
- Analogies Explained: Towards Understanding Word Embeddings
- Scaling Up Ordinal Embedding: A Landmark Approach
- Proportionally Fair Clustering
- On the Spectral Bias of Neural Networks
- Dimensionality Reduction for Tukey Regression
- Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems
- Concrete Autoencoders: Differentiable Feature Selection and Reconstruction
- Parameter-Efficient Transfer Learning for NLP
- Learning to select for a predefined ranking
- Stable and Fair Classification
- Recursive Sketches for Modular Deep Learning
- Efficient Full-Matrix Adaptive Regularization
- Adaptive Regret of Convex and Smooth Functions
- Gromov-Wasserstein Learning for Graph Matching and Node Embedding
- Efficient On-Device Models using Neural Projections
- Mallows ranking models: maximum likelihood estimate and regeneration
- Flexibly Fair Representation Learning by Disentanglement
- Zero-Shot Knowledge Distillation in Deep Networks
- Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms
- Online Adaptive Principal Component Analysis and Its extensions
- Spectral Clustering of Signed Graphs via Matrix Power Means
- Deep Residual Output Layers for Neural Language Generation
- Fast and Stable Maximum Likelihood Estimation for Incomplete Multinomial Models
- Fair Regression: Quantitative Definitions and Reduction-Based Algorithms
- A Convergence Theory for Deep Learning via Over-Parameterization
- Efficient Nonconvex Regularized Tensor Completion with Structure-aware Proximal Iterations
- POLITEX: Regret Bounds for Policy Iteration using Expert Prediction
- Improving Neural Language Modeling via Adversarial Training
- Fast Algorithm for Generalized Multinomial Models with Ranking Data
- Fairness without Harm: Decoupled Classifiers with Preference Guarantees
- A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks
- Robust Estimation of Tree Structured Gaussian Graphical Models
- Anytime Online-to-Batch, Optimism and Acceleration
- Fair k-Center Clustering for Data Summarization
- Mixture Models for Diverse Machine Translation: Tricks of the Trade
- Graph Resistance and Learning from Pairwise Comparisons
- Differentially Private Fair Learning
- Approximation and non-parametric estimation of ResNet-type convolutional neural networks
- Spectral Approximate Inference
- Cautious Regret Minimization: Online Optimization with Long-Term Budget Constraints
- A Better k-means++ Algorithm via Local Search
- MASS: Masked Sequence to Sequence Pre-training for Language Generation
- Learning Context-dependent Label Permutations for Multi-label Classification
- Obtaining Fairness using Optimal Transport Theory
- Global Convergence of Block Coordinate Descent in Deep Learning
- Optimal Kronecker-Sum Approximation of Real Time Recurrent Learning
- Kernel Normalized Cut: a Theoretical Revisit
- Humor in Word Embeddings: Cockamamie Gobbledegook for Nincompoops
- Discovering Context Effects from Raw Choice Data
- Repairing without Retraining: Avoiding Disparate Impact with Counterfactual Distributions
- Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians
- DAG-GNN: DAG Structure Learning with Graph Neural Networks
- Adaptive Sensor Placement for Continuous Spaces
- Guarantees for Spectral Clustering with Fairness Constraints
- MeanSum: A Neural Model for Unsupervised Multi-Document Abstractive Summarization
- On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference
- On the Long-term Impact of Algorithmic Decision Policies: Effort Unfairness and Feature Segregation through Social Learning
- On the Limitations of Representing Functions on Sets
- Random Walks on Hypergraphs with Edge-Dependent Vertex Weights
- Scale-free adaptive planning for deterministic dynamics & discounted rewards
- Supervised Hierarchical Clustering with Exponential Linkage
- CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network
- Learning Distance for Sequences by Learning a Ground Metric
- COMIC: Multi-view Clustering Without Parameter Selection
- Submodular Maximization beyond Non-negativity: Guarantees, Fast Algorithms, and Applications
- The Wasserstein Transform
- Online Algorithms for Rent-Or-Buy with Expert Advice
- Sequential Facility Location: Approximate Submodularity and Greedy Algorithm
- Non-monotone Submodular Maximization with Nearly Optimal Adaptivity and Query Complexity
- Neural Collaborative Subspace Clustering
- Categorical Feature Compression via Submodular Optimization
- Unsupervised Deep Learning by Neighbourhood Discovery
- Multi-Frequency Phase Synchronization
- Autoregressive Energy Machines
- Faster Algorithms for Binary Matrix Factorization
- Greedy Orthogonal Pivoting Algorithm for Non-Negative Matrix Factorization
- Noise2Self: Blind Denoising by Self-Supervision
- Guided evolutionary strategies: augmenting random search with surrogate gradients
- Learning Dependency Structures for Weak Supervision Models
- Adaptive and Safe Bayesian Optimization in High Dimensions via One-Dimensional Subspaces
- Geometry and Symmetry in Short-and-Sparse Deconvolution
- Semi-Cyclic Stochastic Gradient Descent
- Analyzing the dynamics of online learning in over-parameterized two-layer neural networks
- Convergence Properties of Neural Networks on Separable Data
- Towards Understanding Regularization in Batch Normalization
- How Noise during Training Affects the Hessian Spectrum
- Asymptotics of Wide Networks from Feynman Diagrams
- A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off
- Deep Learning on the 2-Dimensional Ising Model to Extract the Crossover Region
- Learning the Arrow of Time

## Oral Presenations

## Oral Presentations

## Oral presentations

## Organizer's introductions

## Panel Discussions

## Presentations

## Problem proposals

## Spotlights

- Spotlight
- A Meta-Analysis of Overfitting in Machine Learning
- Uniform convergence may be unable to explain generalization in deep learning
- Towards Task and Architecture-Independent Generalization Gap Predictors
- Data-Dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation
- Towards Large Scale Structure of the Loss Landscape of Neural Networks
- Zero-Shot Learning from scratch: leveraging local compositional representations
- Overparameterization without Overfitting: Jacobian-based Generalization Guarantees for Neural Networks
- How Learning Rate and Delay Affect Minima Selection in AsynchronousTraining of Neural Networks: Toward Closing the Generalization Gap
- Bad Global Minima Exist and SGD Can Reach Them
- Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask
- Do deep neural networks learn shallow learnable examples first?
- On the Convex Behavior of Deep Neural Networks in Relation to the Layers' Width

## Spotlight Presentations

- "Locality Driven Coded Computation" Michael Rudow, Rashmi Vinayak and Venkat Guruswami
- "CodeNet: Training Large-Scale Neural Networks in Presence of Soft-Errors," Sanghamitra Dutta, Ziqian Bai, Tze Meng Low and Pulkit Grover
- "Reliable Clustering with Redundant Data Assignment" Venkat Gandikota, Arya Mazumdar and Ankit Singh Rawat
- "OverSketched Newton: Fast Convex Optimization for Serverless Systems," Vipul Gupta, Swanand Kadhe, Thomas Courtade, Michael Mahoney and Kannan Ramchandran
- "Cooperative SGD: A Unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms", Jianyu Wang and Gauri Joshi
- "Secure Coded Multi-Party Computation for Massive Matrices with Adversarial Nodes," Seyed Reza, Mohammad Ali Maddah-Ali and Mohammad Reza Aref

## Spotlight Talks

## Spotlight talks

- Towards a Sustainable Food Supply Chain Powered by Artificial Intelligence
- Deep Learning for Wildlife Conservation and Restoration Efforts
- Detecting anthropogenic cloud perturbations with deep learning
- Evaluating aleatoric and epistemic uncertainties of time series deep learning models for soil moisture predictions
- Targeted Meta-Learning for Critical Incident Detection in Weather Data
- Truck Traffic Monitoring with Satellite Images
- Machine Learning for AC Optimal Power Flow
- Planetary Scale Monitoring of Urban Growth in High Flood Risk Areas

## Spotlights

## Talks

- Opening Remarks
- Welcome and Introduction
- invited talk by David Silver (Deepmind): AlphaStar: Mastering the Game of StarCraft II
- Patrick McDaniel
- Opening Remarks
- Yann LeCun
- Testing Arithmetic Circuits
- invited talk by John Langford (Microsoft Research): How do we make Real World Reinforcement Learning revolution?
- Jessica Hamrick
- Una-May O'Reilly
- DeepCABAC: Context-adaptive binary arithmetic coding for deep neural network compression
- invited talk by Craig Boutilier (Google Research): Reinforcement Learning in Recommender Systems: Some Challenges
- Oral Paper Presentations 1
- Spotlight Session 1
- Le Song
- Tractable Islands Revisited
- Stefan Schaal
- Allen Qi
- Dream Distillation: A Data-Independent Model Compression Framework
- The State of Sparsity in Deep Neural Networks
- Sum-Product Networks and Deep Learning: A Love Marriage
- Ziko Kolter
- David Silver
- Oral Paper Presentations 2
- Tensor Variable Elimination in Pyro
- Alexander Madry
- Been Kim
- Invertible Residual Networks and a Novel Perspective on Adversarial Examples
- Learning Compact Neural Networks Using Ordinary Differential Equations as Activation Functions
- Byron Boots
- Single-Path NAS: Device-Aware Efficient ConvNet Design
- Chelsea Finn
- Abhinav Gupta
- Jacob Devlin
- Sven Kreiss: "Compositionality, Confidence and Crowd Modeling for Self-Driving Cars"
- Alison Gopnik
- Mayank Bansal: "ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst"
- Chiyuan Zhang: Are all layers created equal? -- Studies on how neural networks represent functions
- Chelsea Finn: "A Practical View on Generalization and Autonomy in the Real World"
- Sergey Levine: "Imitation, Prediction, and Model-Based Reinforcement Learning for Autonomous Driving"
- Wolfram Burgard
- Building a tractable generator network
- Dorsa Sadigh: "Influencing Interactive Mixed-Autonomy Systems"
- Glow: Generative Flow with Invertible 1x1 Convolutions
- Contributed talk
- Aude Oliva: Reverse engineering neuroscience and cognitive science principles
- Householder meets Sylvester: Normalizing flows for variational inference
- Yann Lecun
- Neural Ordinary Differential Equations for Continuous Normalizing Flows
- Alexander Amini: "Learning to Drive with Purpose"
- Contributed talk
- Fisher Yu: "Motion and Prediction for Autonomous Driving"
- Alfredo Canziani: "Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic "
- Andrew Zisserman
- The Bijector API: An Invertible Function Library for TensorFlow
- Jianxiong Xiao: "Self-driving Car: What we can achieve today?"
- Invertible Neural Networks for Understanding and Controlling Learned Representations
- Abhinav Gupta
- German Ros: "Fostering Autonomous Driving Research with CARLA"
- Contributed talk
- Venkatraman Narayanan: "The Promise and Challenge of ML in Self-Driving"
- Alexei Efros

## Tutorials

- A Primer on PAC-Bayesian Learning
- Recent Advances in Population-Based Search for Deep Neural Networks: Quality Diversity, Indirect Encodings, and Open-Ended Algorithms
- Never-Ending Learning
- Safe Machine Learning
- Neural Approaches to Conversational AI
- Meta-Learning: from Few-Shot Learning to Rapid Reinforcement Learning
- Active Learning: From Theory to Practice
- Algorithm configuration: learning in the space of algorithm designs
- A Tutorial on Attention in Deep Learning
- Active Hypothesis Testing: An Information Theoretic (re)View
- Causal Inference and Stable Learning
- Tutorial on normalizing flows

## Welcomes

## Welcome Remarks

## Welcoming Remarks

## contributed talks

## panels

Report issues here.