ICML 2017 Papers

Layout:

mini compact topic detail

Asynchronous Distributed Variational Gaussian Processes for Regresssion

Dissipativity Theory for Nesterov's Accelerated Method

Estimating the unseen from multiple populations

Variants of RMSProp and Adagrad with Logarithmic Regret Bounds

Know-Evolve: Deep Temporal Reasoning for Dynamic Knowledge Graphs

Learning Sleep Stages from Radio Signals: A Conditional Adversarial Architecture

Learning Algorithms for Active Learning

Maximum Selection and Ranking under Noisy Comparisons

Algebraic Variety Models for High-Rank Matrix Completion

The Sample Complexity of Online One-Class Collaborative Filtering

On Approximation Guarantees for Greedy Low Rank Optimization

Counterfactual Data-Fusion for Online Reinforcement Learners

Forest-type Regression with General Losses and Robust Forest

Collect at Once, Use Effectively: Making Non-interactive Locally Private Learning Possible

Emulating the Expert: Inverse Optimization through Online Learning

Variational Inference for Sparse and Undirected Models

Latent Feature Lasso

Risk Bounds for Transferring Representations With and Without Fine-Tuning

Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning

Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

Connected Subgraph Detection with Mirror Descent on SDPs

Doubly Greedy Primal-Dual Coordinate Descent for Sparse Empirical Risk Minimization

Gradient Coding: Avoiding Stragglers in Distributed Learning

Differentially Private Chi-squared Test by Unit Circle Mechanism

Axiomatic Attribution for Deep Networks

Grammar Variational Autoencoder

OptNet: Differentiable Optimization as a Layer in Neural Networks

Stochastic Adaptive Quasi-Newton Methods for Minimizing Expected Values

Constrained Policy Optimization

Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms

Iterative Machine Teaching

Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning

Fake News Mitigation via Point Process Based Intervention

Learning Hierarchical Features from Deep Generative Models

Neural Optimizer Search using Reinforcement Learning

Generalization and Equilibrium in Generative Adversarial Nets (GANs)

On the Expressive Power of Deep Neural Networks

Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability

World of Bits: An Open-Domain Platform for Web-Based Agents

Input Convex Neural Networks

Reinforcement Learning with Deep Energy-Based Policies

“Convex Until Proven Guilty”: Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions

Stochastic Gradient MCMC Methods for Hidden Markov Models

Learning to Align the Source Code to the Compiled Object Code

Robust Structured Estimation with Single-Index Models

Multi-Class Optimal Margin Distribution Machine

A Divergence Bound for Hybrids of MCMC and Variational Inference and an Application to Langevin Dynamics and SGVI

Bottleneck Conditional Density Estimation

Recursive Partitioning for Personalization using Observational Data

Visualizing and Understanding Multilayer Perceptron Models: A Case Study in Speech Processing

Parseval Networks: Improving Robustness to Adversarial Examples

Capacity Releasing Diffusion for Speed and Locality.

Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning

Spectral Learning from a Single Trajectory under Finite-State Policies

Follow the Moving Leader in Deep Learning

On Relaxing Determinism in Arithmetic Circuits

On Calibration of Modern Neural Networks

Toward Controlled Generation of Text

Programming with a Differentiable Forth Interpreter

Active Learning for Top-$K$ Rank Aggregation from Noisy Comparisons

Globally Induced Forest: A Prepruning Compression Scheme

Diameter-Based Active Learning

A Unified Maximum Likelihood Approach for Estimating Symmetric Properties of Discrete Distributions

Second-Order Kernel Online Convex Optimization with Adaptive Sketching

Selective Inference for Sparse High-Order Interaction Models

Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees

Adaptive Neural Networks for Efficient Inference

Convolutional Sequence to Sequence Learning

Deriving Neural Architectures from Sequence and Graph Kernels

On The Projection Operator to A Three-view Cardinality Constrained Set

Deep Bayesian Active Learning with Image Data

Variational Policy for Guiding Point Processes

Identification and Model Testing in Linear Structural Equation Models using Auxiliary Variables

Wasserstein Generative Adversarial Networks

Active Heteroscedastic Regression

Differentiable Programs with Neural Libraries

Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation

An Alternative Softmax Operator for Reinforcement Learning

Practical Gauss-Newton Optimisation for Deep Learning

Variational Dropout Sparsifies Deep Neural Networks

Multilevel Clustering via Wasserstein Means

Discovering Discrete Latent Topics with Neural Variational Inference

Consistency Analysis for Binary Classification Revisited

Learned Optimizers that Scale and Generalize

On Kernelized Multi-armed Bandits

Soft-DTW: a Differentiable Loss Function for Time-Series

Minimax Regret Bounds for Reinforcement Learning

No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis

Tensor-Train Recurrent Neural Networks for Video Classification

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

Bayesian Models of Data Streams with Hierarchical Power Priors

Nearly Optimal Robust Matrix Completion

Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space

StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent

Stochastic Gradient Monomial Gamma Sampler

Adversarial Feature Matching for Text Generation

Neural networks and rational functions

Improving Gibbs Sampler Scan Quality with DoGS

Exact Inference for Integer Latent-Variable Models

Adaptive Feature Selection: Computationally Efficient Online Sparse Linear Regression under RIP

Dual Supervised Learning

Learning to Detect Sepsis with a Multitask Gaussian Process RNN Classifier

Differentially Private Clustering in High-Dimensional Euclidean Spaces

Regularising Non-linear Models Using Feature Side-information

Prox-PDA: The Proximal Primal-Dual Algorithm for Fast Distributed Nonconvex Optimization and Learning Over Networks

Input Switched Affine Networks: An RNN Architecture Designed for Interpretability

Robust Adversarial Reinforcement Learning

A Unified View of Multi-Label Performance Measures

Latent Intention Dialogue Models

From Patches to Images: A Nonparametric Generative Model

High-Dimensional Structured Quantile Regression

Cost-Optimal Learning of Causal Graphs

Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling

Learning in POMDPs with Monte Carlo Tree Search

Local Bayesian Optimization of Motor Skills

Analytical Guarantees on Numerical Precision of Deep Neural Networks

Hyperplane Clustering Via Dual Principal Component Pursuit

Scalable Bayesian Rule Lists

On orthogonality and learning RNNs with long term dependencies

DeepBach: a Steerable Model for Bach Chorales Generation

Multichannel End-to-end Speech Recognition

iSurvive: An Interpretable, Event-time Prediction Model for mHealth

Batched High-dimensional Bayesian Optimization via Structural Kernel Learning

Exact MAP Inference by Avoiding Fractional Vertices

High-dimensional Non-Gaussian Single Index Models via Thresholded Score Function Estimation

Neural Episodic Control

Hierarchy Through Composition with Multitask LMDPs

Improving Viterbi is Hard: Better Runtimes Imply Faster Clique Algorithms

The loss surface of deep and wide neural networks

Robust Gaussian Graphical Model Estimation with Arbitrary Corruption

Frame-based Data Factorizations

Learning Determinantal Point Processes with Moments and Cycles

Pain-Free Random Differential Privacy with Sensitivity Sampling

Strong NP-Hardness for Sparse Optimization with Concave Penalty Functions

Distributed Mean Estimation with Limited Communication

Approximate Newton Methods and Their Local Convergence

Video Pixel Networks

Bayesian Boolean Matrix Factorisation

Understanding Synthetic Gradients and Decoupled Neural Interfaces

Global optimization of Lipschitz functions

Learning to Discover Sparse Graphical Models

Deep Generative Models for Relational Data with Side Information

McGan: Mean and Covariance Feature Matching GAN

Scalable Generative Models for Multi-label Learning with Missing Labels

The Predictron: End-To-End Learning and Planning

On the Sampling Problem for Kernel Quadrature

Clustering by Sum of Norms: Stochastic Incremental Algorithm, Convergence and Cluster Recovery

Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things

Deep Latent Dirichlet Allocation with Topic-Layer-Adaptive Stochastic Gradient Riemannian MCMC

Failures of Gradient-Based Deep Learning

Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction

Meta Networks

Forward and Reverse Gradient-Based Hyperparameter Optimization

A Birth-Death Process for Feature Allocation

Deletion-Robust Submodular Maximization: Data Summarization with "the Right to be Forgotten"

SPLICE: Fully Tractable Hierarchical Extension of ICA with Pooling

Confident Multiple Choice Learning

DARLA: Improving Zero-Shot Transfer in Reinforcement Learning

Automated Curriculum Learning for Neural Networks

Multi-task Learning with Labeled and Unlabeled Tasks

Equivariance Through Parameter-Sharing

Fairness in Reinforcement Learning

Local-to-Global Bayesian Network Structure Learning

The Statistical Recurrent Unit

Learning to Learn without Gradient Descent by Gradient Descent

Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks

Multi-objective Bandits: Optimizing the Generalized Gini Index

Unimodal Probability Distributions for Deep Ordinal Classification

AdaNet: Adaptive Structural Learning of Artificial Neural Networks

Understanding Black-box Predictions via Influence Functions

Zonotope hit-and-run for efficient sampling from projection DPPs

Source-Target Similarity Modelings for Multi-Source Transfer Gaussian Process Regression

Robust Submodular Maximization: A Non-Uniform Partitioning Approach

ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices

Boosted Fitted Q-Iteration

A Simple Multi-Class Boosting Framework with Theoretical Guarantees and Empirical Proficiency

Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks

Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections

Adapting Kernel Representations Online Using Submodular Maximization

Uncovering Causality from Multivariate Hawkes Integrated Cumulants

Minimizing Trust Leaks for Robust Sybil Detection

On the Iteration Complexity of Support Recovery via Hard Thresholding Pursuit

Geometry of Neural Network Loss Surfaces via Random Matrix Theory

Decoupled Neural Interfaces using Synthetic Gradients

Warped Convolutions: Efficient Invariance to Spatial Transformations

Learning Texture Manifolds with the Periodic Spatial GAN

Dictionary Learning Based on Sparse Distribution Tomography

Dance Dance Convolution

Recurrent Highway Networks

Tensor Belief Propagation

Provably Optimal Algorithms for Generalized Linear Contextual Bandits

RobustFill: Neural Program Learning under Noisy I/O

ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning

An Infinite Hidden Markov Model With Similarity-Biased Transitions

Learning Continuous Semantic Representations of Symbolic Expressions

Prediction and Control with Temporal Segment Models

Deciding How to Decide: Dynamic Routing in Artificial Neural Networks

Sparse + Group-Sparse Dirty Models: Statistical Guarantees without Unreasonable Conditions and a Case for Non-Convexity

Ordinal Graphical Models: A Tale of Two Approaches

Stochastic Variance Reduction Methods for Policy Evaluation

How to Escape Saddle Points Efficiently

Online Learning to Rank in Stochastic Click Models

Learning to Generate Long-term Future via Hierarchical Prediction

Faster Greedy MAP Inference for Determinantal Point Processes

Parallel Multiscale Autoregressive Density Estimation

Differentially Private Submodular Maximization: Data Summarization in Disguise

Coherent probabilistic forecasts for hierarchical time series

Model-Independent Online Learning for Influence Maximization

Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations

Tensor Balancing on Statistical Manifold

Large-Scale Evolution of Image Classifiers

Asynchronous Distributed Variational Gaussian Processes for Regression

Max-value Entropy Search for Efficient Bayesian Optimization

Optimal Densification for Fast and Accurate Minwise Hashing

Safety-Aware Algorithms for Adversarial Contextual Bandit

Zero-Inflated Exponential Family Embeddings

Sequence to Better Sequence: Continuous Revision of Combinatorial Structures

Clustering High Dimensional Dynamic Data Streams

Fast Bayesian Intensity Estimation for the Permanental Process

Coordinated Multi-Agent Imitation Learning

Optimal and Adaptive Off-policy Evaluation in Contextual Bandits

Analogical Inference for Multi-relational Embeddings

Asymmetric Tri-training for Unsupervised Domain Adaptation

Identifying Best Interventions through Online Importance Sampling

Logarithmic Time One-Against-Some

Leveraging Union of Subspace Structure to Improve Constrained Clustering

Learning Important Features Through Propagating Activation Differences

Sharp Minima Can Generalize For Deep Nets

Contextual Decision Processes with low Bellman rank are PAC-Learnable

Near-Optimal Design of Experiments via Regret Minimization

PixelCNN Models with Auxiliary Variables for Natural Image Modeling

Strongly-Typed Agents are Guaranteed to Interact Safely

Evaluating the Variance of Likelihood-Ratio Gradient Estimators

SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient

Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data

Graph-based Isometry Invariant Representation Learning

Rule-Enhanced Penalized Regression by Column Generation using Rectangular Maximum Agreement

On Mixed Memberships and Symmetric Nonnegative Matrix Factorizations

The Shattered Gradients Problem: If resnets are the answer, then what is the question?

Neural Taylor Approximations: Convergence and Exploration in Rectifier Networks

Differentially Private Learning of Graphical Models using CGMs

Natasha: Faster Non-Convex Stochastic Optimization Via Strongly Non-Convex Parameter

Nyström Method with Kernel K-means++ Samples as Landmarks

Multi-fidelity Bayesian Optimisation with Continuous Approximations

Depth-Width Tradeoffs in Approximating Natural Functions With Neural Networks

Just Sort It! A Simple and Effective Approach to Active Preference Learning

Dueling Bandits with Weak Regret

Consistent k-Clustering

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

Efficient Distributed Learning with Sparsity

Co-clustering through Optimal Transport

Statistical Inference for Incomplete Ranking Data: The Case of Rank-Dependent Coarsening

End-to-End Differentiable Adversarial Imitation Learning

A Simulated Annealing Based Inexact Oracle for Wasserstein Loss Minimization

A Distributional Perspective on Reinforcement Learning

Gradient Projection Iterative Sketch for Large-Scale Constrained Least-Squares

A Laplacian Framework for Option Discovery in Reinforcement Learning

The Price of Differential Privacy For Online Learning

Learning Discrete Representations via Information Maximizing Self-Augmented Training

Innovation Pursuit: A New Approach to the Subspace Clustering Problem

Learning from Clinical Judgments: Semi-Markov-Modulated Marked Hawkes Processes for Risk Prognosis

Multiplicative Normalizing Flows for Variational Bayesian Neural Networks

Preferential Bayesian Optmization

Random Feature Expansions for Deep Gaussian Processes

Joint Dimensionality Reduction and Metric Learning: A Geometric Take

MEC: Memory-efficient Convolution for Deep Neural Network

Efficient Regret Minimization in Non-Convex Games

Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs

Sub-sampled Cubic Regularization for Non-convex Optimization

Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning

Leveraging Node Attributes for Incomplete Relational Data

meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting

Scaling Up Sparse Support Vector Machines by Simultaneous Feature and Sample Reduction

Tensor Decomposition via Simultaneous Power Iteration

Adaptive Sampling Probabilities for Non-Smooth Optimization

Density Level Set Estimation on Manifolds with DBSCAN

Bayesian inference on random simple graphs with power law degree distributions

Coupling Distributed and Symbolic Execution for Natural Language Queries

Variational Boosting: Iteratively Refining Posterior Approximations

Asynchronous Stochastic Gradient Descent with Delay Compensation

Tensor Decomposition with Smoothness

High Dimensional Bayesian Optimization with Elastic Gaussian Process

Efficient Online Bandit Multiclass Learning with O(sqrt{T}) Regret

High-Dimensional Variance-Reduced Stochastic Gradient Expectation-Maximization Algorithm

Accelerating Eulerian Fluid Simulation With Convolutional Networks

Dropout Inference in Bayesian Neural Networks with Alpha-divergences

Uniform Convergence Rates for Kernel Density Estimation

Real-Time Adaptive Image Compression

Learning Hawkes Processes from Short Doubly-Censored Event Sequences

Partitioned Tensor Factorizations for Learning Mixed Membership Models

Spherical Structured Feature Maps for Kernel Approximation

Communication-efficient Algorithms for Distributed Stochastic Principal Component Analysis

Modular Multitask Reinforcement Learning with Policy Sketches

Learning Stable Stochastic Nonlinear Dynamical Systems

Adaptive Multiple-Arm Identification

Measuring Sample Quality with Kernels

Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study

Enumerating Distinct Decision Trees

Automatic Discovery of the Statistical Types of Variables in a Dataset

Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs

FeUdal Networks for Hierarchical Reinforcement Learning

Bidirectional learning for time-series models with hidden units

Convexified Convolutional Neural Networks

Online Learning with Local Permutations and Delayed Feedback

Neural Message Passing for Quantum Chemistry

Delta Networks for Optimized Recurrent Network Computation

Sliced Wasserstein Kernel for Persistence Diagrams

Stochastic modified equations and adaptive stochastic gradient algorithms

Coherence Pursuit: Fast, Simple, and Robust Subspace Recovery

Nonnegative Matrix Factorization for Time Series Recovery From a Few Temporal Aggregates

Guarantees for Greedy Maximization of Non-submodular Functions with Applications

Uniform Deviation Bounds for k-Means Clustering

Re-revisiting Learning on Hypergraphs: Confidence Interval and Subgradient Method

Unsupervised Learning by Predicting Noise

Self-Paced Co-training

State-Frequency Memory Recurrent Neural Networks

Canopy --- Fast Sampling with Cover Trees

Evaluating Bayesian Models with Posterior Dispersion Indices

Kernelized Support Tensor Machines

Magnetic Hamiltonian Monte Carlo

Lazifying Conditional Gradient Algorithms

A Semismooth Newton Method for Fast, Generic Convex Programming

Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Image-to-Markup Generation with Coarse-to-Fine Attention

Conditional Accelerated Lazy Stochastic Gradient Descent

Sequence Modeling via Segmentations

ChoiceRank: Identifying Preferences from Node Traffic in Networks

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Orthogonalized ALS: A Theoretically Principled Tensor Decomposition Algorithm for Practical Use

Regret Minimization in Behaviorally-Constrained Zero-Sum Games

Composing Tree Graphical Models with Persistent Homology Features for Clustering Mixed-Type Data

Faster Principal Component Regression and Stable Matrix Chebyshev Approximation

Breaking Locality Accelerates Block Gauss-Seidel

Deep Spectral Clustering Learning

How Close Are the Eigenvectors of the Sample and Actual Covariance Matrices?

Learning to Discover Cross-Domain Relations with Generative Adversarial Networks

Dynamic Word Embeddings

Bayesian Optimization with Tree-structured Dependencies

Learning to Aggregate Ordinal Labels by Maximizing Separating Width

Learning Deep Architectures via Generalized Whitened Neural Networks

Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU

When can Multi-Site Datasets be Pooled for Regression? Hypothesis Tests, $\ell_2$-consistency and Neuroscience Applications

Uncorrelation and Evenness: a New Diversity-Promoting Regularizer

Learning Latent Space Models with Angular Constraints

Oracle Complexity of Second-Order Methods for Finite-Sum Problems

Curiosity-driven Exploration by Self-supervised Prediction

Consistent On-Line Off-Policy Evaluation

Analysis and Optimization of Graph Decompositions by Lifted Multicuts

Coresets for Vector Summarization with Applications to Network Graphs

Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition

Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging

Robust Guarantees of Stochastic Greedy Algorithms

Toward Efficient and Accurate Covariance Matrix Estimation on Compressed Data

Multiple Clustering Views from Multiple Uncertain Experts

Combined Group and Exclusive Sparsity for Deep Neural Networks

GSOS: Gauss-Seidel Operator Splitting Algorithm for Multi-Term Nonsmooth Convex Composite Optimization

Deep Transfer Learning with Joint Adaptation Networks

Distributed and Provably Good Seedings for k-Means in Constant Rounds

Uncertainty Assessment and False Discovery Rate Control in High-Dimensional Granger Causal Inference

Fast k-Nearest Neighbour Search via Prioritized DCI

Robust Probabilistic Modeling with Bayesian Data Reweighting

An Adaptive Test of Independence with Analytic Kernel Embeddings

Lost Relatives of the Gumbel Trick

Tight Bounds for Approximate Carathéodory and Beyond

Being Robust (in High Dimensions) Can Be Practical

Deep IV: A Flexible Approach for Counterfactual Prediction

Stochastic Bouncy Particle Sampler

Learning Gradient Descent: Better Generalization and Longer Horizons

Learning Deep Latent Gaussian Models with Markov Chain Monte Carlo

A Closer Look at Memorization in Deep Networks

Online and Linear-Time Attention by Enforcing Monotonic Alignments

Data-Efficient Policy Evaluation Through Behavior Policy Search

Developing Bug-Free Machine Learning Systems With Formal Mathematics

Interactive Learning from Policy-Dependent Human Feedback

Count-Based Exploration with Neural Density Models

Convergence Analysis of Proximal Gradient with Momentum for Nonconvex Optimization

Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank

An Efficient, Sparsity-Preserving, Online Algorithm for Low-Rank Approximation

Nonparanormal Information Estimation

Compressed Sensing using Generative Models

Conditional Image Synthesis with Auxiliary Classifier GANs

Active Learning for Cost-Sensitive Classification

Prediction under Uncertainty in Sparse Spectrum Gaussian Processes with Applications to Filtering and Control

Efficient softmax approximation for GPUs

Projection-free Distributed Online Learning in Networks

Distributed Batch Gaussian Process Optimization

Identify the Nash Equilibrium in Static Games with Random Payoffs

Robust Budget Allocation via Continuous Submodular Functions

Algorithmic Stability and Hypothesis Complexity

Convex Phase Retrieval without Lifting via PhaseMax

Deep Voice: Real-time Neural Text-to-Speech

Adaptive Consensus ADMM for Distributed Optimization

Probabilistic Path Hamiltonian Monte Carlo

Continual Learning Through Synaptic Intelligence

Online Partial Least Square Optimization: Dropping Convexity for Better Efficiency and Scalability

Multilabel Classification with Group Testing and Codes

Fractional Langevin Monte Carlo: Exploring Levy Driven Stochastic Differential Equations for MCMC

Differentially Private Ordinary Least Squares

Stochastic DCA for the Large-sum of Non-convex Functions Problem and its Application to Group Variable Selection in Classification

Device Placement Optimization with Reinforcement Learning

Language Modeling with Gated Convolutional Networks

Gradient Boosted Decision Trees for High Dimensional Sparse Output

Probabilistic Submodular Maximization in Sub-Linear Time

Improved Variational Autoencoders for Text Modeling using Dilated Convolutions

End-to-End Learning for Structured Prediction Energy Networks

Latent LSTM Allocation: Joint clustering and non-linear dynamic modeling of sequence data

Sequence Tutor: Conservative fine-tuning of sequence generation models with KL-control

Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs

Estimating individual treatment effect: generalization bounds and algorithms

Stochastic Generative Hashing

Recovery Guarantees for One-hidden-layer Neural Networks

Learning Infinite Layer Networks without the Kernel Trick

Meritocratic Fairness for Cross-Population Selection

Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering

Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution

Attentive Recurrent Comparators

Approximate Steepest Coordinate Descent

Algorithms for $\ell_p$ Low-Rank Approximation

Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics

A Unified Variance Reduction-Based Framework for Nonconvex Low-Rank Matrix Recovery

On Context-Dependent Clustering of Bandits

Efficient Nonmyopic Active Search

An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis

Post-Inference Prior Swapping

Dual Iterative Hard Thresholding: From Non-convex Sparse Minimization to Non-smooth Concave Maximization

A Richer Theory of Convex Constrained Optimization with Reduced Projections and Improved Rates

Scalable Multi-Class Gaussian Process Classification using Expectation Propagation

SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization

Active Learning for Accurate Estimation of Linear Models

Deep Tensor Convolution on Multicores

Learning the Structure of Generative Models without Labeled Data

Unifying task specification in reinforcement learning

Beyond Filters: Compact Feature Map for Portable Deep Model

Priv’IT: Private and Sample Efficient Identity Testing

Relative Fisher Information and Natural Gradient for Learning Large Modular Models