Session
Statistical Learning Theory 1
Nonparametric Regression with Comparisons: Escaping the Curse of Dimensionality with Ordinal Information
Yichong Xu · Hariank Muthakana · Sivaraman Balakrishnan · Aarti Singh · Artur Dubrawski
In supervised learning, we leverage a labeled dataset to design methodsfor function estimation.In many practical situations, we are able to obtainalternative feedback, possibly at a low cost. A broad goal is to understand the usefulnessof, and to design algorithms to exploit, this alternative feedback.We focus on a semi-supervised setting where we obtain additional ordinal (or comparison) information for potentially unlabeled samples. We considerordinal feedback of varying qualities where we have either a perfect ordering of the samples, a noisy ordering of the samples or noisy pairwise comparisons between the samples.We provide a precise quantification of the usefulness of these types of ordinal feedback in non-parametric regression, showing that in many cases it is possible to accurately estimate an underlying function with a very small labeled set, effectively escaping the curse of dimensionality.We develop an algorithm called Ranking-Regression (RR) and analyze its accuracy as a function of size of the labeled and unlabeled datasetsand various noise parameters.We also present lower bounds, that establish fundamental limits for the task and show that RR is optimal in a variety of settings. Finally, we present experiments that show the efficacy of RR and investigate its robustness to various sources of noise and model-misspecification.
We consider the problem of learning a binary classifier from $n$ different data sources, among which at most an $\eta$ fraction are adversarial. The overhead is defined as the ratio between the sample complexity of learning in this setting and that of learning the same hypothesis class on a single data distribution. We present an algorithm that achieves an $O(\eta n + \ln n)$ overhead, which is proved to be worst-case optimal. We also discuss the potential challenges to the design of a computationally efficient learning algorithm with a small overhead.
LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration
Gellért Weisz · András György · Csaba Szepesvari
We consider the problem of configuring general-purpose solvers to run efficiently on problem instances drawn from an unknown distribution. The goal of the configurator is to find a configuration that runs fast on average on most instances, and do so with the least amount of total work. It can run a chosen solver on a random instance until the solver finishes or a timeout is reached. We propose LeapsAndBounds, an algorithm that tests configurations on randomly selected problem instances for longer and longer time. We prove that the capped expected runtime of the configuration returned by LeapsAndBounds is close to the optimal expected runtime, while our algorithm’s running time is near-optimal. Our results show that LeapsAndBounds is more efficient than the recent algorithm of Kleinberg et al. (2017), which, to our knowledge, is the only other algorithm configuration method with non-trivial theoretical guarantees. Experimental results on configuring a public SAT solver on a new benchmark dataset also stand witness to the superiority of our method.
Variational Network Inference: Strong and Stable with Concrete Support
Amir Dezfouli · Edwin Bonilla · Richard Nock
Traditional methods for the discovery of latent network structures are limited in two ways: they either assume that all the signal comes from the network (i.e. there is no source of signal outside the network) or they place constraints on the network parameters to ensure model or algorithmic stability. We address these limitations by proposing a model that incorporates a Gaussian process prior on a network-independent component and formally proving that we get algorithmic stability for free while providing a novel perspective on model stability as well as robustness results and precise intervals for key inference parameters. We show that, on three applications, our approach outperforms previous methods consistently.
Network Global Testing by Counting Graphlets
Jiashun Jin · Zheng Ke · Shengming Luo
Consider a large social network with possibly severe degree heterogeneity and mixed-memberships. We are interested in testing whether the network has only one community or there are more than one communities. The problem is known to be non-trivial, partially due to the presence of severe degree heterogeneity. We construct a class of test statistics using the numbers of short paths and short cycles, and the key to our approach is a general framework for canceling the effects of degree heterogeneity. The tests compare favorably with existing methods. We support our methods with careful analysis and numerical study with simulated data and a real data example.