Word storms
Here is a visualisation of the accepted papers at ICML in the form of a word storm, which is a group of word clouds. The clouds are arranged so that if the same word appears in two clouds, it is in the same position. Hopefully this makes it easier to see the difference between clouds.
Word storms by Quim Castella Charles Sutton.
Session 1A — Optimization algorithms 1
chair Elad Hazan, room AT LT 4
- On the Equivalence between Herding and Conditional Gradient Algorithms
- Similarity Learning for Provably Accurate Sparse Linear Classification
- Stochastic Smoothing for Nonsmooth Minimizations: Accelerating SGD by Exploiting Structure
- Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization
- Scaling Up Coordinate Descent Algorithms for Large ℓ_1 Regularization Problems
- Quasi-Newton Methods: A New Direction
- A Hybrid Algorithm for Convex Semidefinite Optimization
- Efficient and Practical Stochastic Subgradient Descent for Nuclear Norm Regularization
Session 1B — Reinforcement learning 1
chair David Silver, room AT LT 5
- Policy Gradients with Variance Related Risk Criteria
- Approximate Dynamic Programming By Minimizing Distributionally Robust Bounds
- Statistical linear estimation with penalized estimators: an application to reinforcement learning
- Approximate Modified Policy Iteration
- A Dantzig Selector Approach to Temporal Difference Learning
- Linear Off-Policy Actor-Critic
- Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty
- Bounded Planning in Passive POMDPs
Session 1C — Neural networks and deep learning 1
chair Marc'Aurelio Ranzato, room AT LT 1
- Scene parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers
- A Generative Process for Contractive Auto-Encoders
- Deep Lambertian Networks
- Deep Mixtures of Factor Analysers
- Utilizing Static Analysis and Code Generation to Accelerate Neural Networks
- Estimating the Hessian by Back-propagating Curvature
- Training Restricted Boltzmann Machines on Word Observations
- A fast and simple algorithm for training neural probabilistic language models
Session 1D — Structured output prediction
chair David McAllester, room AT LT 2
- Learning to Identify Regular Expressions that Describe Email Campaigns
- Efficient Structured Prediction with Latent Variables for General Graphical Models
- Output Space Search for Structured Prediction
- Efficient Decomposed Learning for Structured Prediction
- Modeling Latent Variable Uncertainty for Loss-based Learning
Session 2A — Kernel methods 1
chair Arthur Gretton, room AT LT 4
- On the Size of the Online Kernel Sparsification Dictionary
- Improved Nystrom Low-rank Decomposition with Priors
- Bayesian Efficient Multiple Kernel Learning
- A Binary Classification Framework for Two-Stage Multiple Kernel Learning
- Multiple Kernel Learning from Noisy Labels by Stochastic Programming
- Subgraph Matching Kernels for Attributed Graphs
- Fast Computation of Subpath Kernel for Trees
- Hypothesis testing using pairwise distances and associated kernels
Session 2B — Reinforcement learning 2
chair Geoff Gordon, room AT LT 5
- No-Regret Learning in Extensive-Form Games with Imperfect Recall
- Near-Optimal BRL using Optimistic Local Transitions
- Continuous Inverse Optimal Control with Locally Optimal Examples
- Monte Carlo Bayesian Reinforcement Learning
- Apprenticeship Learning for Model Parameters of Partially Observable Environments
Session 2C — Gaussian processes
chair Ryan Adams, room AT LT 1
- Gaussian Process Regression Networks
- Infinite Tucker Decomposition: Nonparametric Bayesian Models for Multiway Data Analysis
- State-Space Inference for Non-Linear Latent Force Models with Application to Satellite Orbit Prediction
- Gaussian Process Quantile Regression using Expectation Propagation
- Residual Components Analysis
- Manifold Relevance Determination
Session 2D — Statistical methods
chair Lawrence Carin, room AT LT 2
- Lognormal and Gamma Mixed Negative Binomial Regression
- Group Sparse Additive Models
- Variance Function Estimation in High-dimensions
- Sparse Additive Functional and Kernel CCA
- Consistent Covariance Selection From Data With Missing Values
- Conditional Sparse Coding and Grouped Multivariate Regression
- Is margin preserved after random projection?
Session 3A — Optimization algorithms 2
chair Tong Zhang, room AT LT 4
- A Discrete Optimization Approach for Supervised Ranking with an Application to Reverse-Engineering Quality Ratings
- A Proximal-Gradient Homotopy Method for the L1-Regularized Least-Squares Problem
- Complexity Analysis of the Lasso Regularization Path
- Randomized Smoothing for (Parallel) Stochastic Optimization
Session 3B — Clustering 1
chair Shai Ben-David, room AT LT 5
- Demand-Driven Clustering in Relational Domains for Predicting Adverse Drug Events
- Clustering to Maximize the Ratio of Split to Diameter
- An Iterative Locally Linear Embedding Algorithm
- Robust Multiple Manifold Structure Learning
- A Split-Merge Framework for Comparing Clusterings
- On the Difficulty of Nearest Neighbor Search
Session 3C — Privacy, Anonymity, and Security
chair Tobias Scheffer, room AT LT 1
- Bayesian Watermark Attacks
- Poisoning Attacks against Support Vector Machines
- Convergence Rates for Differentially Private Statistical Estimation
- Finding Botnets Using Minimal Graph Clusterings
Session 3D — Ranking and Preference Learning
chair Balazs Kegl, room AT LT 2
- Incorporating Domain Knowledge in Matching Problems via Harmonic Analysis
- Consistent Multilabel Ranking through Univariate Losses
- Predicting Consumer Behavior in Commerce Search
- Adaptive Regularization for Similarity Measures
- Online Structured Prediction via Coactive Learning
- TrueLabel + Confusions: A Spectrum of Probabilistic Models in Analyzing Multiple Ratings
Session 3E — Nonparametric Bayesian inference
chair Sharon Goldwater, room AT LT 3
- Factorized Asymptotic Bayesian Hidden Markov Models
- An Infinite Latent Attribute Model for Network Data
- The Nonparametric Metadata Dependent Relational Model
- Dependent Hierarchical Normalized Random Measures for Dynamic Topic Modeling
- A Hierarchical Dirichlet Process Model with Multiple Levels of Clustering for Human EEG Seizure Modeling
- Modeling Images using Transformed Indian Buffet Processes
- A Topic Model for Melodic Sequences
Session 4A — Feature selection and dimensionality reduction 1
chair Kilian Weinberger, room AT LT 4
- Discovering Support and Affiliated Features from Very High Dimensions
- Inferring Latent Structure From Mixed Real and Categorical Relational Data
- Conditional Likelihood Maximization: A Unifying Framework for Information Theoretic Feature Selection
- Dimensionality Reduction by Local Discriminative Gaussians
- Fast Prediction of New Feature Utility
Session 4B — Online learning 1
chair Satyen Kale, room AT LT 5
- An Online Boosting Algorithm with Theoretical Justifications
- An adaptive algorithm for finite stochastic partial monitoring
- Online Alternating Direction Method
- Projection-free Online Learning
- PAC Subset Selection in Stochastic Multi-armed Bandits
- On Local Regret
- Exact Soft Confidence-Weighted Learning
- Compact Hyperplane Hashing with Bilinear Functions
Session 4C — Supervised learning 1
chair Cynthia Rudin, room AT LT 1
- Improved Information Gain Estimates for Decision Tree Induction
- Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation
- The Big Data Bootstrap
- Robust Classification with Adiabatic Quantum Optimization
- Nonparametric Link Prediction in Dynamic Networks
- A Unified Robust Classification Model
- Maximum Margin Output Coding
- Structured Learning from Partial Annotations
Session 4D — Transfer and Multi-Task Learning
chair Jenn Wortman Vaughan, room AT LT 2
- Marginalized Denoising Autoencoders for Domain Adaptation
- Information-Theoretical Learning of Discriminative Clusters for Unsupervised Domain Adaptation
- Learning Task Grouping and Overlap in Multi-task Learning
- A Convex Feature Learning Formulation for Latent Task Structure Discovery
- Convex Multitask Learning with Flexible Task Clusters
- A Complete Analysis of the l_1,p Group-Lasso
- Learning with Augmented Features for Heterogeneous Domain Adaptation
- Cross-Domain Multitask Learning with Latent Probit Models
Session 4E — Graphical models
chair Matthias Seeger, room AT LT 3
- High Dimensional Semiparametric Gaussian Copula Graphical Models
- Convergence Rates of Biased Stochastic Optimization for Learning Sparse Ising Models
- On the Partition Function and Random Maximum A-Posteriori Perturbations
- Anytime Marginal MAP Inference
- Exact Maximum Margin Structure Learning of Bayesian Networks
- LPQP for MAP: Putting LP Solvers to Better Use
- How To Grade a Test Without Knowing the Answers — A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing
- Smoothness and Structure Learning by Proxy
Session 5A — Learning theory
chair Daniel Hsu, room AT LT 4
- Linear Regression with Limited Observation
- Optimizing F-measure: A Tale of Two Approaches
- Conditional mean embeddings as regressors
- PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification
- Tighter Variational Representations of f-Divergences via Restriction to Probability Measures
- Agglomerative Bregman Clustering
- The Convexity and Design of Composite Multiclass Losses
- Minimizing The Misclassification Error Rate Using a Surrogate Convex Loss
Session 5B — Online learning 2
chair Csaba Szepesvari, room AT LT 5
- Hierarchical Exploration for Accelerating Contextual Bandits
- Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret
- Decoupling Exploration and Exploitation in Multi-Armed Bandits
- Learning the Experts for Online Sequence Prediction
- Plug-in martingales for testing exchangeability on-line
- Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization
- On-Line Portfolio Selection with Moving Average Reversion
- Exponential Regret Bounds for Gaussian Process Bandits with Deterministic Observations
Session 5C — Neural networks and deep learning 2
chair Yoshua Bengio, room AT LT 1
- Large-Scale Feature Learning With Spike-and-Slab Sparse Coding
- Learning Invariant Representations with Local Transformations
- Building high-level features using large scale unsupervised learning
- On multi-view feature learning
- Learning to Label Aerial Images from Noisy Data
Session 5D — Sparsity and compressed sensing
chair Mahdi Milani Fard, room AT LT 2
- Small-sample brain mapping: sparse recovery on spatially correlated designs with randomization and clustering
- Estimation of Simultaneously Sparse and Low Rank Matrices
- Multi-level Lasso for Sparse Multi-task Regression
- Efficient Euclidean Projections onto the Intersection of Norm Balls
- Learning Efficient Structured Sparse Models
Session 5E — Latent-Variable Models and Topic Models
chair Jordan Boyd-Graber, room AT LT 3
- Max-Margin Nonparametric Latent Feature Models for Link Prediction
- Canonical Trends: Detecting Trend Setters in Web Data
- Variational Inference in Non-negative Factorial Hidden Markov Models for Efficient Audio Source Separatio
- Sparse stochastic inference for latent Dirichlet allocation
- Dirichlet Process with Mixed Random Measures: A Nonparametric Topic Model for Labeled Data
- Rethinking Collapsed Variational Bayes Inference for LDA
- Capturing topical content with frequency and exclusivity
Session 6A — Semi-supervised learning
chair Maria Florina Balcan, room AT LT 4
- A convex relaxation for weakly supervised classifiers
- Semi-Supervised Learning of Class Balance under Class-Prior Change by Distribution Matching
- A Simple Algorithm for Semi-supervised Learning with Improved Generalization Error Bound
- Information-theoretic Semi-supervised Metric Learning via Entropy Regularization
- Cross Language Text Classification via Subspace Co-regularized Multi-view Learning
- Using CCA to improve CCA: A new spectral method for estimating vector models of words
- Semi-Supervised Collective Classification via Hybrid Label Regularization
Session 6B — Reinforcement learning 3
chair Ron Parr, room AT LT 5
- Compositional Planning Using Optimal Option Models
- Learning Parameterized Skills
- Safe Exploration in Markov Decision Processes
- Modelling transition dynamics in MDPs with RKHS embeddings
Session 6C — Applications
chair Tom Dietterich, room AT LT 1
- A Joint Model of Language and Perception for Grounded Attribute Learning
- Predicting Manhole Events in New York City
- Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription
- Learning Object Arrangements in 3D Scenes using Human Context
Session 6D — Time-Series Analysis
chair Naoki Abe, room AT LT 2
- Learning the Dependence Graph of Time Series with Latent Factors
- Improved Estimation in Time Varying Models
- Bayesian Conditional Cointegration
- Sparse-GEV: Sparse Latent Space Model for Multivariate Extreme Value Time Serie Modeling
Session 6E — Graph-based learning
chair Charles Elkan, room AT LT 3
- Shortest path distance in random k-nearest neighbor graphs
- Submodular Inference of Diffusion Networks from Multiple Trees
- Influence Maximization in Continuous Time Diffusion Networks
- Latent Multi-group Membership Graph Model
- The Most Persistent Soft-Clique in a Set of Sampled Graphs
- Two Manifold Problems with Applications to Nonlinear System Identification
- Incorporating Causal Prior Knowledge as Path-Constraints in Bayesian Networks and Maximal Ancestral Graphs
Session 7A — Invited Applications
chair Samy Bengio, room AT LT 4
- Conversational Speech Transcription Using Context-Dependent Deep Neural Networks
- Data-driven Web Design
- Learning the Central Events and Participants in Unlabeled Text
- Exemplar-SVMs for Visual Object Detection, Label Transfer and Image Retrieval
- Learning Force Control Policies for Compliant Robotic Manipulation
Session 7B — Reinforcement learning 4
chair Michael Bowling, room AT LT 5
- Agnostic System Identification for Model-Based Reinforcement Learning
- Greedy Algorithms for Sparse Reinforcement Learning
- On the Sample Complexity of Reinforcement Learning with a Generative Model
- Artist Agent: A Reinforcement Learning Approach to Automatic Stroke Generation in Oriental Ink Painting
- Path Integral Policy Improvement with Covariance Matrix Adaptation
Session 7C — Clustering 2
chair Raquel Urtasun, room AT LT 1
- On causal and anticausal learning
- Revisiting k-means: New Algorithms via Bayesian Nonparametrics
- Approximate Principal Direction Trees
- Clustering using Max-norm Constrained Optimization
- Efficient Active Algorithms for Hierarchical Clustering
- Convergence of the EM Algorithm for Gaussian Mixtures with Unbalanced Mixing Coefficients
- Groupwise Constrained Reconstruction for Subspace Clustering
- Clustering by Low-Rank Doubly Stochastic Matrix Decomposition
Session 7D — Supervised learning 2
chair Leon Bottou, room AT LT 2
- Total Variation and Euler's Elastica for Supervised Learning
- Flexible Modeling of Latent Task Structures in Multitask Learning
- Fast classification using sparse decision DAGs
- An Efficient Approach to Sparse Linear Discriminant Analysis
- Sequential Nonparametric Regression
- The Landmark Selection Method for Multiple Output Prediction
- Ensemble Methods for Convex Regression with Applications to Geometric Programming Based Circuit Design
- AOSO-LogitBoost: Adaptive One-Vs-One LogitBoost for Multi-Class Problem
Session 7E — Probabilistic Models
chair Erik Sudderth, room AT LT 3
- Local Loss Optimization in Operator Models: A New Insight into Spectral Learning
- Discriminative Probabilistic Prototype Learning
- Isoelastic Agents and Wealth Updates in Machine Learning Markets
- Evaluating Bayesian and L1 Approaches for Sparse Unsupervised Learning
- Nonparametric variational inference
- Levy Measure Decompositions for the Beta and Gamma Processes
- Copula Mixture Model for Dependency-seeking Clustering
- Predicting accurate probabilities with a ranking loss
Session 8A — Kernel methods 2
chair Mario Marchand, room AT LT 4
- Copula-based Kernel Dependency Measures
- The Kernelized Stochastic Batch Perceptron
- Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning
- Distributed Tree Kernels
- Analysis of Kernel Mean Matching under Covariate Shift
Session 8B — Active and cost-sensitive learning
chair Andreas Krause, room AT LT 5
- The Greedy Miser: Learning under Test-time Budgets
- Joint Optimization and Variable Selection of High-dimensional Gaussian Processes
- Comparison-Based Learning with Rank Nets
- Bayesian Optimal Active Search and Surveying
- Hybrid Batch Bayesian Optimization
- Batch Active Learning via Coordinated Matching
- Bayesian Nonexhaustive Learning for Online Discovery and Modeling of Emerging Classes
Session 8C — Feature selection and dimensionality reduction
chair Andrea Danyluk, room AT LT 1
- Robust PCA in High-dimension: A Deterministic Approach
- Communications Inspired Linear Discriminant Analysis
- Regularizers versus Losses for Nonlinear Dimensionality Reduction: A Factored View with New Convex Relaxations
- Fast Training of Nonlinear Embedding Algorithms
- Sparse Support Vector Infinite Push
- Adaptive Canonical Correlation Analysis Based On Matrix Manifolds
- Fast approximation of matrix coherence and statistical leverage
- Feature Selection via Probabilistic Outputs
Session 8D — Recommendation and Matrix Factorization
chair Thorsten Joachims, room AT LT 2
- A Combinatorial Algebraic Approach for the Identifiability of Low-Rank Matrix Completion
- Gap Filling in the Plant Kingdom—Trait Prediction Using Hierarchical Probabilistic Matrix Factorization
- Stability of matrix factorization for collaborative filtering
- Latent Collaborative Retrieval
- A Bayesian Approach to Approximate Joint Diagonalization of Square Matrices
- Collaborative Topic Regression with Social Matrix Factorization for Recommendation Systems
- Active Learning for Matching Problems
- A Graphical Model Formulation of Collaborative Filtering Neighbourhood Methods with Fast Maximum Entropy Training
Session 8E — Graphical models
chair Ricardo Silva, room AT LT 3
- Variational Bayesian Inference with Stochastic Search
- Large Scale Variational Bayesian Inference for Structured Scale Mixture Models
- A Generalized Loop Correction Method for Approximate Inference in Graphical Models
- Distributed Parameter Estimation via Pseudo-likelihood
- High-Dimensional Covariance Decomposition into Sparse Markov and Independence Domains