Track: Supervised Learning 1

Thu 12 July 4:30 - 4:50 PDT

Inductive Two-Layer Modeling with Parametric Bregman Transfer

Vignesh Ganapathiraman · Zhan Shi · Xinhua Zhang · Yaoliang Yu

Latent prediction models, exemplified by multi-layer networks, employ hidden variables that automate abstract feature discovery. They typically pose nonconvex optimization problems and effective semi-definite programming (SDP) relaxations have been developed to enable global solutions (Aslan et al., 2014).However, these models rely on nonparametric training of layer-wise kernel representations, and are therefore restricted to transductive learning which slows down test prediction. In this paper, we develop a new inductive learning framework for parametric transfer functions using matching losses. The result for ReLU utilizes completely positive matrices, and the inductive learner not only delivers superior accuracy but also offers an order of magnitude speedup over SDP with constant approximation guarantees.

Thu 12 July 4:50 - 5:00 PDT

Does Distributionally Robust Supervised Learning Give Robust Classifiers?

Weihua Hu · Gang Niu · Issei Sato · Masashi Sugiyama

Distributionally Robust Supervised Learning (DRSL) is necessary for building reliable machine learning systems. When machine learning is deployed in the real world, its performance can be significantly degraded because test data may follow a different distribution from training data. DRSL with f-divergences explicitly considers the worst-case distribution shift by minimizing the adversarially reweighted training loss. In this paper, we analyze this DRSL, focusing on the classification scenario. Since the DRSL is explicitly formulated for a distribution shift scenario, we naturally expect it to give a robust classifier that can aggressively handle shifted distributions. However, surprisingly, we prove that the DRSL just ends up giving a classifier that exactly fits the given training distribution, which is too pessimistic. This pessimism comes from two sources: the particular losses used in classification and the fact that the variety of distributions to which the DRSL tries to be robust is too wide. Motivated by our analysis, we propose simple DRSL that overcomes this pessimism and empirically demonstrate its effectiveness.

Thu 12 July 5:00 - 5:10 PDT

Prediction Rule Reshaping

Matt Bonakdarpour · Sabyasachi Chatterjee · Rina Barber · John Lafferty

Two methods are proposed for high-dimensional shape-constrained regression and classification. These methods reshape pre-trained prediction rules to satisfy shape constraints like monotonicity and convexity. The first method can be applied to any pre-trained prediction rule, while the second method deals specifically with random forests. In both cases, efficient algorithms are developed for computing the estimators, and experiments are performed to demonstrate their performance on four datasets. We find that reshaping methods enforce shape constraints without compromising predictive accuracy.

Thu 12 July 5:10 - 5:20 PDT

Finding Influential Training Samples for Gradient Boosted Decision Trees

Boris Sharchilev · Yury Ustinovskiy · Pavel Serdyukov · Maarten de Rijke

We address the problem of finding influential training samples for a particular case of tree ensemble-based models, e.g., Random Forest (RF) or Gradient Boosted Decision Trees (GBDT). A natural way of formalizing this problem is studying how the model's predictions change upon leave-one-out retraining, leaving out each individual training sample. Recent work has shown that, for parametric models, this analysis can be conducted in a computationally efficient way. We propose several ways of extending this framework to non-parametric GBDT ensembles under the assumption that tree structures remain fixed. Furthermore, we introduce a general scheme of obtaining further approximations to our method that balance the trade-off between performance and computational complexity. We evaluate our approaches on various experimental setups and use-case scenarios and demonstrate both the quality of our approach to finding influential training samples in comparison to the baselines and its computational efficiency.

Thu 12 July 5:20 - 5:30 PDT

Noise2Noise: Learning Image Restoration without Clean Data

Jaakko Lehtinen · Jacob Munkberg · Jon Hasselgren · Samuli Laine · Tero Karras · Miika Aittala · Timo Aila

We apply basic statistical reasoning to signal reconstruction by machine learning - learning to map corrupted observations to clean signals - with a simple and powerful conclusion: it is possible to learn to restore images by only looking at corrupted examples, at performance at and sometimes exceeding training using clean data, without explicit image priors or likelihood models of the corruption. In practice, we show that a single model learns photographic noise removal, denoising synthetic Monte Carlo images, and reconstruction of undersampled MRI scans - all corrupted by different processes - based on noisy data only.

Main Navigation

Session

Supervised Learning 1

Inductive Two-Layer Modeling with Parametric Bregman Transfer

Does Distributionally Robust Supervised Learning Give Robust Classifiers?

Prediction Rule Reshaping

Finding Influential Training Samples for Gradient Boosted Decision Trees

Noise2Noise: Learning Image Restoration without Clean Data