### ICML 2017 Events with Videos

## Invited Talks

## Talks

- Multi-objective Bandits: Optimizing the Generalized Gini Index
- Communication-efficient Algorithms for Distributed Stochastic Principal Component Analysis
- Robust Adversarial Reinforcement Learning
- Enumerating Distinct Decision Trees
- The loss surface of deep and wide neural networks
- Robust Probabilistic Modeling with Bayesian Data Reweighting
- PixelCNN Models with Auxiliary Variables for Natural Image Modeling
- Tight Bounds for Approximate Carathéodory and Beyond
- Online Learning with Local Permutations and Delayed Feedback
- SPLICE: Fully Tractable Hierarchical Extension of ICA with Pooling
- Minimax Regret Bounds for Reinforcement Learning
- Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation
- Neural Taylor Approximations: Convergence and Exploration in Rectifier Networks
- Post-Inference Prior Swapping
- Understanding Synthetic Gradients and Decoupled Neural Interfaces
- Parallel Multiscale Autoregressive Density Estimation
- Oracle Complexity of Second-Order Methods for Finite-Sum Problems
- Model-Independent Online Learning for Influence Maximization
- Latent Feature Lasso
- Fairness in Reinforcement Learning
- Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things
- Sharp Minima Can Generalize For Deep Nets
- Evaluating Bayesian Models with Posterior Dispersion Indices
- meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting
- Global optimization of Lipschitz functions
- Online Learning to Rank in Stochastic Click Models
- Online Partial Least Square Optimization: Dropping Convexity for Better Efficiency and Scalability
- Boosted Fitted Q-Iteration
- Multi-Class Optimal Margin Distribution Machine
- Geometry of Neural Network Loss Surfaces via Random Matrix Theory
- Automatic Discovery of the Statistical Types of Variables in a Dataset
- Learning Important Features Through Propagating Activation Differences
- Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks
- Strong NP-Hardness for Sparse Optimization with Concave Penalty Functions
- The Sample Complexity of Online One-Class Collaborative Filtering
- Why is Posterior Sampling Better than Optimism for Reinforcement Learning?
- Kernelized Support Tensor Machines
- The Shattered Gradients Problem: If resnets are the answer, then what is the question?
- Bayesian Models of Data Streams with Hierarchical Power Priors
- Evaluating the Variance of Likelihood-Ratio Gradient Estimators
- Learning Texture Manifolds with the Periodic Spatial GAN
- Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence
- Efficient Regret Minimization in Non-Convex Games
- Coresets for Vector Summarization with Applications to Network Graphs
- Constrained Policy Optimization
- Dual Supervised Learning
- Recovery Guarantees for One-hidden-layer Neural Networks
- Ordinal Graphical Models: A Tale of Two Approaches
- Equivariance Through Parameter-Sharing
- Generalization and Equilibrium in Generative Adversarial Nets (GANs)
- GSOS: Gauss-Seidel Operator Splitting Algorithm for Multi-Term Nonsmooth Convex Composite Optimization
- Identify the Nash Equilibrium in Static Games with Random Payoffs
- Partitioned Tensor Factorizations for Learning Mixed Membership Models
- Reinforcement Learning with Deep Energy-Based Policies
- Learning Infinite Layer Networks without the Kernel Trick
- Failures of Gradient-Based Deep Learning
- Scalable Bayesian Rule Lists
- Warped Convolutions: Efficient Invariance to Spatial Transformations
- McGan: Mean and Covariance Feature Matching GAN
- Breaking Locality Accelerates Block Gauss-Seidel
- Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU
- On Mixed Memberships and Symmetric Nonnegative Matrix Factorizations
- Prediction and Control with Temporal Segment Models
- Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees
- Analytical Guarantees on Numerical Precision of Deep Neural Networks
- Learning Determinantal Point Processes with Moments and Cycles
- Graph-based Isometry Invariant Representation Learning
- Conditional Image Synthesis with Auxiliary Classifier GANs
- Stochastic DCA for the Large-sum of Non-convex Functions Problem and its Application to Group Variable Selection in Classification
- On Kernelized Multi-armed Bandits
- Nonnegative Matrix Factorization for Time Series Recovery From a Few Temporal Aggregates
- An Alternative Softmax Operator for Reinforcement Learning
- Logarithmic Time One-Against-Some
- Follow the Moving Leader in Deep Learning
- Deep Bayesian Active Learning with Image Data
- Deriving Neural Architectures from Sequence and Graph Kernels
- Learning to Discover Cross-Domain Relations with Generative Adversarial Networks
- Gradient Projection Iterative Sketch for Large-Scale Constrained Least-Squares
- Second-Order Kernel Online Convex Optimization with Adaptive Sketching
- Frame-based Data Factorizations
- Fake News Mitigation via Point Process Based Intervention
- Understanding Black-box Predictions via Influence Functions
- Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank
- Bayesian Boolean Matrix Factorisation
- Wasserstein Generative Adversarial Networks
- Connected Subgraph Detection with Mirror Descent on SDPs
- Dueling Bandits with Weak Regret
- Nearly Optimal Robust Matrix Completion
- Curiosity-driven Exploration by Self-supervised Prediction
- Re-revisiting Learning on Hypergraphs: Confidence Interval and Subgradient Method
- Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs
- Learning the Structure of Generative Models without Labeled Data
- Deep Transfer Learning with Joint Adaptation Networks
- Learning Hierarchical Features from Deep Generative Models
- Prox-PDA: The Proximal Primal-Dual Algorithm for Fast Distributed Nonconvex Optimization and Learning Over Networks
- On Context-Dependent Clustering of Bandits
- Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations
- Interactive Learning from Policy-Dependent Human Feedback
- Self-Paced Co-training
- Convexified Convolutional Neural Networks
- Learning to Discover Sparse Graphical Models
- Meta Networks
- Bottleneck Conditional Density Estimation
- Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms
- Provably Optimal Algorithms for Generalized Linear Contextual Bandits
- No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis
- End-to-End Differentiable Adversarial Imitation Learning
- Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data
- On the Expressive Power of Deep Neural Networks
- Local-to-Global Bayesian Network Structure Learning
- SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization
- Learning Deep Latent Gaussian Models with Markov Chain Monte Carlo
- Doubly Greedy Primal-Dual Coordinate Descent for Sparse Empirical Risk Minimization
- Safety-Aware Algorithms for Adversarial Contextual Bandit
- Coherence Pursuit: Fast, Simple, and Robust Subspace Recovery
- Learning in POMDPs with Monte Carlo Tree Search
- Iterative Machine Teaching
- Depth-Width Tradeoffs in Approximating Natural Functions With Neural Networks
- Composing Tree Graphical Models with Persistent Homology Features for Clustering Mixed-Type Data
- Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
- Zero-Inflated Exponential Family Embeddings
- A Richer Theory of Convex Constrained Optimization with Reduced Projections and Improved Rates
- Adaptive Multiple-Arm Identification
- Tensor Decomposition with Smoothness
- DARLA: Improving Zero-Shot Transfer in Reinforcement Learning
- Automated Curriculum Learning for Neural Networks
- On Relaxing Determinism in Arithmetic Circuits
- AdaNet: Adaptive Structural Learning of Artificial Neural Networks
- Convex Phase Retrieval without Lifting via PhaseMax
- Efficient Online Bandit Multiclass Learning with O(sqrt{T}) Regret
- Orthogonalized ALS: A Theoretically Principled Tensor Decomposition Algorithm for Practical Use
- Unifying task specification in reinforcement learning
- Asymmetric Tri-training for Unsupervised Domain Adaptation
- Efficient Nonmyopic Active Search
- An Infinite Hidden Markov Model With Similarity-Biased Transitions
- Learning to Learn without Gradient Descent by Gradient Descent
- Attentive Recurrent Comparators
- A Semismooth Newton Method for Fast, Generic Convex Programming
- Active Learning for Accurate Estimation of Linear Models
- Tensor Decomposition via Simultaneous Power Iteration
- A Distributional Perspective on Reinforcement Learning
- Source-Target Similarity Modelings for Multi-Source Transfer Gaussian Process Regression
- Leveraging Union of Subspace Structure to Improve Constrained Clustering
- Batched High-dimensional Bayesian Optimization via Structural Kernel Learning
- Learned Optimizers that Scale and Generalize
- State-Frequency Memory Recurrent Neural Networks
- Approximate Newton Methods and Their Local Convergence
- Adaptive Feature Selection: Computationally Efficient Online Sparse Linear Regression under RIP
- A Unified Variance Reduction-Based Framework for Nonconvex Low-Rank Matrix Recovery
- Hierarchy Through Composition with Multitask LMDPs
- Multi-task Learning with Labeled and Unlabeled Tasks
- Active Heteroscedastic Regression
- From Patches to Images: A Nonparametric Generative Model
- Learning Gradient Descent: Better Generalization and Longer Horizons
- Delta Networks for Optimized Recurrent Network Computation
- Stochastic Adaptive Quasi-Newton Methods for Minimizing Expected Values
- Emulating the Expert: Inverse Optimization through Online Learning
- An Efficient, Sparsity-Preserving, Online Algorithm for Low-Rank Approximation
- A Laplacian Framework for Option Discovery in Reinforcement Learning
- Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics
- Active Learning for Cost-Sensitive Classification
- Fast Bayesian Intensity Estimation for the Permanental Process
- Learning Algorithms for Active Learning
- Recurrent Highway Networks
- Practical Gauss-Newton Optimisation for Deep Learning
- Variants of RMSProp and Adagrad with Logarithmic Regret Bounds
- Algorithms for $\ell_p$ Low-Rank Approximation
- Modular Multitask Reinforcement Learning with Policy Sketches
- Risk Bounds for Transferring Representations With and Without Fine-Tuning
- Diameter-Based Active Learning
- A Birth-Death Process for Feature Allocation
- Tensor Balancing on Statistical Manifold
- Test of Time Award
- Leveraging Node Attributes for Incomplete Relational Data
- How Close Are the Eigenvectors of the Sample and Actual Covariance Matrices?
- Data-Efficient Policy Evaluation Through Behavior Policy Search
- Distributed and Provably Good Seedings for k-Means in Constant Rounds
- Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging
- Exact MAP Inference by Avoiding Fractional Vertices
- Relative Fisher Information and Natural Gradient for Learning Large Modular Models
- Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections
- Lazifying Conditional Gradient Algorithms
- Bayesian inference on random simple graphs with power law degree distributions
- Faster Principal Component Regression and Stable Matrix Chebyshev Approximation
- Stochastic Variance Reduction Methods for Policy Evaluation
- Consistent k-Clustering
- Estimating the unseen from multiple populations
- Exact Inference for Integer Latent-Variable Models
- Learning Deep Architectures via Generalized Whitened Neural Networks
- On orthogonality and learning RNNs with long term dependencies
- Conditional Accelerated Lazy Stochastic Gradient Descent
- Analogical Inference for Multi-relational Embeddings
- Spectral Learning from a Single Trajectory under Finite-State Policies
- Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
- Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering
- Meritocratic Fairness for Cross-Population Selection
- Improving Viterbi is Hard: Better Runtimes Imply Faster Clique Algorithms
- Continual Learning Through Synaptic Intelligence
- Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs
- SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient
- Know-Evolve: Deep Temporal Reasoning for Dynamic Knowledge Graphs
- Capacity Releasing Diffusion for Speed and Locality.
- Consistent On-Line Off-Policy Evaluation
- Hyperplane Clustering Via Dual Principal Component Pursuit
- Neural networks and rational functions
- Variational Inference for Sparse and Undirected Models
- Adaptive Neural Networks for Efficient Inference
- The Statistical Recurrent Unit
- Approximate Steepest Coordinate Descent
- Deep Generative Models for Relational Data with Side Information
- Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition
- Contextual Decision Processes with low Bellman rank are PAC-Learnable
- Multilevel Clustering via Wasserstein Means
- Tensor Belief Propagation
- Combined Group and Exclusive Sparsity for Deep Neural Networks
- Input Switched Affine Networks: An RNN Architecture Designed for Interpretability
- StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent
- On the Iteration Complexity of Support Recovery via Hard Thresholding Pursuit
- A Simple Multi-Class Boosting Framework with Theoretical Guarantees and Empirical Proficiency
- Co-clustering through Optimal Transport
- Uniform Deviation Bounds for k-Means Clustering
- Faster Greedy MAP Inference for Determinantal Point Processes
- Input Convex Neural Networks
- Online and Linear-Time Attention by Enforcing Monotonic Alignments
- Stochastic modified equations and adaptive stochastic gradient algorithms
- Statistical Inference for Incomplete Ranking Data: The Case of Rank-Dependent Coarsening
- Dual Iterative Hard Thresholding: From Non-convex Sparse Minimization to Non-smooth Concave Maximization
- Gradient Boosted Decision Trees for High Dimensional Sparse Output
- Multiple Clustering Views from Multiple Uncertain Experts
- Uniform Convergence Rates for Kernel Density Estimation
- Zonotope hit-and-run for efficient sampling from projection DPPs
- OptNet: Differentiable Optimization as a Layer in Neural Networks
- Sequence Tutor: Conservative fine-tuning of sequence generation models with KL-control
- Dissipativity Theory for Nesterov's Accelerated Method
- Just Sort It! A Simple and Effective Approach to Active Preference Learning
- On The Projection Operator to A Three-view Cardinality Constrained Set
- Globally Induced Forest: A Prepruning Compression Scheme
- Clustering by Sum of Norms: Stochastic Incremental Algorithm, Convergence and Cluster Recovery
- Density Level Set Estimation on Manifolds with DBSCAN
- Parseval Networks: Improving Robustness to Adversarial Examples
- Deep Voice: Real-time Neural Text-to-Speech
- An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis
- Maximum Selection and Ranking under Noisy Comparisons
- Sparse + Group-Sparse Dirty Models: Statistical Guarantees without Unreasonable Conditions and a Case for Non-Convexity
- Forest-type Regression with General Losses and Robust Forest
- Clustering High Dimensional Dynamic Data Streams
- Algorithmic Stability and Hypothesis Complexity
- On the Sampling Problem for Kernel Quadrature
- Regularising Non-linear Models Using Feature Side-information
- DeepBach: a Steerable Model for Bach Chorales Generation
- Forward and Reverse Gradient-Based Hyperparameter Optimization
- Active Learning for Top-$K$ Rank Aggregation from Noisy Comparisons
- Compressed Sensing using Generative Models
- Confident Multiple Choice Learning
- Consistency Analysis for Binary Classification Revisited
- Measuring Sample Quality with Kernels
- Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
- Adaptive Sampling Probabilities for Non-Smooth Optimization
- Learning to Align the Source Code to the Compiled Object Code
- Scaling Up Sparse Support Vector Machines by Simultaneous Feature and Sample Reduction
- Regret Minimization in Behaviorally-Constrained Zero-Sum Games
- Fast k-Nearest Neighbour Search via Prioritized DCI
- Distributed Mean Estimation with Limited Communication
- Variational Boosting: Iteratively Refining Posterior Approximations
- A Closer Look at Memorization in Deep Networks
- Learning to Generate Long-term Future via Hierarchical Prediction
- Sub-sampled Cubic Regularization for Non-convex Optimization
- RobustFill: Neural Program Learning under Noisy I/O
- Efficient Distributed Learning with Sparsity
- Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning
- Deep Spectral Clustering Learning
- Nonparanormal Information Estimation
- Lost Relatives of the Gumbel Trick
- Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study
- Sequence to Better Sequence: Continuous Revision of Combinatorial Structures
- Natasha: Faster Non-Convex Stochastic Optimization Via Strongly Non-Convex Parameter
- Programming with a Differentiable Forth Interpreter
- Innovation Pursuit: A New Approach to the Subspace Clustering Problem
- Strongly-Typed Agents are Guaranteed to Interact Safely
- Joint Dimensionality Reduction and Metric Learning: A Geometric Take
- A Unified Maximum Likelihood Approach for Estimating Symmetric Properties of Discrete Distributions
- Learning to Aggregate Ordinal Labels by Maximizing Separating Width
- Visualizing and Understanding Multilayer Perceptron Models: A Case Study in Speech Processing
- Tensor-Train Recurrent Neural Networks for Video Classification
- “Convex Until Proven Guilty”: Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions
- Differentiable Programs with Neural Libraries
- Selective Inference for Sparse High-Order Interaction Models
- Coordinated Multi-Agent Imitation Learning
- ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices
- Gradient Coding: Avoiding Stragglers in Distributed Learning
- Uncorrelation and Evenness: a New Diversity-Promoting Regularizer
- Axiomatic Attribution for Deep Networks
- Sequence Modeling via Segmentations
- Convergence Analysis of Proximal Gradient with Momentum for Nonconvex Optimization
- Developing Bug-Free Machine Learning Systems With Formal Mathematics
- Dictionary Learning Based on Sparse Distribution Tomography
- Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability
- Learning Discrete Representations via Information Maximizing Self-Augmented Training
- Learning Latent Space Models with Angular Constraints
- On Calibration of Modern Neural Networks
- Latent LSTM Allocation: Joint clustering and non-linear dynamic modeling of sequence data
- How to Escape Saddle Points Efficiently
- Preferential Bayesian Optmization
- Being Robust (in High Dimensions) Can Be Practical
- Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution
- When can Multi-Site Datasets be Pooled for Regression? Hypothesis Tests, $\ell_2$-consistency and Neuroscience Applications
- Differentially Private Ordinary Least Squares
- Fractional Langevin Monte Carlo: Exploring Levy Driven Stochastic Differential Equations for MCMC
- Device Placement Optimization with Reinforcement Learning
- Dynamic Word Embeddings
- Asynchronous Stochastic Gradient Descent with Delay Compensation
- Max-value Entropy Search for Efficient Bayesian Optimization
- Multilabel Classification with Group Testing and Codes
- Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning
- Learning from Clinical Judgments: Semi-Markov-Modulated Marked Hawkes Processes for Risk Prognosis
- Priv’IT: Private and Sample Efficient Identity Testing
- Stochastic Bouncy Particle Sampler
- Deep Tensor Convolution on Multicores
- Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling
- Adaptive Consensus ADMM for Distributed Optimization
- Bayesian Optimization with Tree-structured Dependencies
- High-Dimensional Structured Quantile Regression
- Prediction under Uncertainty in Sparse Spectrum Gaussian Processes with Applications to Filtering and Control
- Learning to Detect Sepsis with a Multitask Gaussian Process RNN Classifier
- Differentially Private Submodular Maximization: Data Summarization in Disguise
- Canopy --- Fast Sampling with Cover Trees
- MEC: Memory-efficient Convolution for Deep Neural Network
- Coupling Distributed and Symbolic Execution for Natural Language Queries
- Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks
- Multi-fidelity Bayesian Optimisation with Continuous Approximations
- High-dimensional Non-Gaussian Single Index Models via Thresholded Score Function Estimation
- Learning Stable Stochastic Nonlinear Dynamical Systems
- iSurvive: An Interpretable, Event-time Prediction Model for mHealth
- Differentially Private Learning of Graphical Models using CGMs
- A Simulated Annealing Based Inexact Oracle for Wasserstein Loss Minimization
- Beyond Filters: Compact Feature Map for Portable Deep Model
- Image-to-Markup Generation with Coarse-to-Fine Attention
- Projection-free Distributed Online Learning in Networks
- Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space
- Robust Structured Estimation with Single-Index Models
- Local Bayesian Optimization of Motor Skills
- Learning Sleep Stages from Radio Signals: A Conditional Adversarial Architecture
- Minimizing Trust Leaks for Robust Sybil Detection
- Improving Gibbs Sampler Scan Quality with DoGS
- Efficient softmax approximation for GPUs
- Multichannel End-to-end Speech Recognition
- Uncertainty Assessment and False Discovery Rate Control in High-Dimensional Granger Causal Inference
- Toward Efficient and Accurate Covariance Matrix Estimation on Compressed Data
- Count-Based Exploration with Neural Density Models
- Bidirectional learning for time-series models with hidden units
- The Price of Differential Privacy For Online Learning
- Magnetic Hamiltonian Monte Carlo
- Dropout Inference in Bayesian Neural Networks with Alpha-divergences
- Latent Intention Dialogue Models
- Robust Guarantees of Stochastic Greedy Algorithms
- Uncovering Causality from Multivariate Hawkes Integrated Cumulants
- Robust Gaussian Graphical Model Estimation with Arbitrary Corruption
- Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning
- Learning Hawkes Processes from Short Doubly-Censored Event Sequences
- Pain-Free Random Differential Privacy with Sensitivity Sampling
- Probabilistic Path Hamiltonian Monte Carlo
- Multiplicative Normalizing Flows for Variational Bayesian Neural Networks
- Discovering Discrete Latent Topics with Neural Variational Inference
- Guarantees for Greedy Maximization of Non-submodular Functions with Applications
- Cost-Optimal Learning of Causal Graphs
- Algebraic Variety Models for High-Rank Matrix Completion
- Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning
- Coherent probabilistic forecasts for hierarchical time series
- Differentially Private Clustering in High-Dimensional Euclidean Spaces
- Stochastic Gradient Monomial Gamma Sampler
- Variational Dropout Sparsifies Deep Neural Networks
- Toward Controlled Generation of Text
- Robust Submodular Maximization: A Non-Uniform Partitioning Approach
- Identification and Model Testing in Linear Structural Equation Models using Auxiliary Variables
- High-Dimensional Variance-Reduced Stochastic Gradient Expectation-Maximization Algorithm
- The Predictron: End-To-End Learning and Planning
- Soft-DTW: a Differentiable Loss Function for Time-Series
- Differentially Private Chi-squared Test by Unit Circle Mechanism
- Stochastic Gradient MCMC Methods for Hidden Markov Models
- Unimodal Probability Distributions for Deep Ordinal Classification
- Learning Continuous Semantic Representations of Symbolic Expressions
- Probabilistic Submodular Maximization in Sub-Linear Time
- Estimating individual treatment effect: generalization bounds and algorithms
- Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning
- Variational Policy for Guiding Point Processes
- Collect at Once, Use Effectively: Making Non-interactive Locally Private Learning Possible
- Deep Latent Dirichlet Allocation with Topic-Layer-Adaptive Stochastic Gradient Riemannian MCMC
- Adversarial Feature Matching for Text Generation
- On Approximation Guarantees for Greedy Low Rank Optimization
- Recursive Partitioning for Personalization using Observational Data
- Optimal Densification for Fast and Accurate Minwise Hashing
- FeUdal Networks for Hierarchical Reinforcement Learning
- Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs
- An Adaptive Test of Independence with Analytic Kernel Embeddings
- Distributed Batch Gaussian Process Optimization
- Dance Dance Convolution
- Language Modeling with Gated Convolutional Networks
- Deletion-Robust Submodular Maximization: Data Summarization with "the Right to be Forgotten"
- Identifying Best Interventions through Online Importance Sampling
- Stochastic Generative Hashing
- Deciding How to Decide: Dynamic Routing in Artificial Neural Networks
- Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction
- Sliced Wasserstein Kernel for Persistence Diagrams
- Scalable Multi-Class Gaussian Process Classification using Expectation Propagation
- World of Bits: An Open-Domain Platform for Web-Based Agents
- Convolutional Sequence to Sequence Learning
- Analysis and Optimization of Graph Decompositions by Lifted Multicuts
- Deep IV: A Flexible Approach for Counterfactual Prediction
- ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning
- Neural Episodic Control
- End-to-End Learning for Structured Prediction Energy Networks
- Adapting Kernel Representations Online Using Submodular Maximization
- Random Feature Expansions for Deep Gaussian Processes
- Real-Time Adaptive Image Compression
- Improved Variational Autoencoders for Text Modeling using Dilated Convolutions
- Near-Optimal Design of Experiments via Regret Minimization
- Counterfactual Data-Fusion for Online Reinforcement Learners
- Large-Scale Evolution of Image Classifiers
- Neural Optimizer Search using Reinforcement Learning
- A Unified View of Multi-Label Performance Measures
- Spherical Structured Feature Maps for Kernel Approximation
- Asynchronous Distributed Variational Gaussian Processes for Regression
- Neural Message Passing for Quantum Chemistry
- Grammar Variational Autoencoder
- Robust Budget Allocation via Continuous Submodular Functions
- Scalable Generative Models for Multi-label Learning with Missing Labels
- Nyström Method with Kernel K-means++ Samples as Landmarks
- High Dimensional Bayesian Optimization with Elastic Gaussian Process
- Accelerating Eulerian Fluid Simulation With Convolutional Networks
- Rule-Enhanced Penalized Regression by Column Generation using Rectangular Maximum Agreement

## Tutorials

- Distributed Deep Learning with MxNet Gluon
- Interpretable Machine Learning
- Machine Learning for Autonomous Vehicles
- Recent Advances in Stochastic Convex and Non-Convex Optimization
- Deep Reinforcement Learning, Decision Making, and Control
- Deep Learning for Health Care Applications: Challenges and Solutions
- Real World Interactive Learning
- Sequence-To-Sequence Modeling with Neural Networks
- Robustness Meets Algorithms (and Vice-Versa)

Report issues here.