Session
Dimensionality Reduction 3
Out-of-sample extension of graph adjacency spectral embedding
Keith Levin · Fred Roosta · Michael Mahoney · Carey Priebe
Many popular dimensionality reduction procedures have out-of-sample extensions, which allow a practitioner to apply a learned embedding to observations not seen in the initial training sample. In this work, we consider the problem of obtaining an out-of-sample extension for the adjacency spectral embedding, a procedure for embedding the vertices of a graph into Euclidean space. We present two different approaches to this problem, one based on a least-squares objective and the other based on a maximum-likelihood formulation. We show that if the graph of interest is drawn according to a certain latent position model called a random dot product graph, then both of these out-of-sample extensions estimate the true latent position of the out-of-sample vertex with the same error rate. Further, we prove a central limit theorem for the least-squares-based extension, showing that the estimate is asymptotically normal about the truth in the large-graph limit.
Bayesian Model Selection for Change Point Detection and Clustering
othmane mazhar · Cristian R. Rojas · Inst. of Technology Carlo Fischione · Mohammad Reza Hesamzadeh
We address a generalization of change point detectionwith the purpose of detecting the changelocations and the levels of clusters of a piecewiseconstant signal. Our approach is to model itas a nonparametric penalized least square modelselection on a family of models indexed over thecollection of partitions of the design points andpropose a computationally efficient algorithm toapproximately solve it. Statistically, minimizingsuch a penalized criterion yields an approximationto the maximum a-posteriori probability(MAP) estimator. The criterion is then analyzedand an oracle inequality is derived usinga Gaussian concentration inequality. The oracleinequality is used to derive on one hand conditionsfor consistency and on the other hand anadaptive upper bound on the expected square riskof the estimator, which statistically motivates ourapproximation. Finally, we apply our algorithmto simulated data to experimentally validate thestatistical guarantees and illustrate its behavior.
An Iterative, Sketching-based Framework for Ridge Regression
Agniva Chowdhury · Jiasen Yang · Petros Drineas
Ridge regression is a variant of regularized least squares regression that is particularly suitable in settings where the number of predictor variables greatly exceeds the number of observations. We present a simple, iterative, sketching-based algorithm for ridge regression that guarantees high-quality approximations to the optimal solution vector. Our analysis builds upon two simple structural results that boil down to randomized matrix multiplication, a fundamental and well-understood primitive of randomized linear algebra. An important contribution of our work is the analysis of the behavior of subsampled ridge regression problems when the ridge leverage scores are used: we prove that accurate approximations can be achieved by a sample whose size depends on the degrees of freedom of the ridge-regression problem rather than the dimensions of the design matrix. Our experimental evaluations verify our theoretical results on both real and synthetic data.
Provable Variable Selection for Streaming Features
Jing Wang · Jie Shen · Ping Li
In large-scale machine learning applications and high-dimensional statistics, it is ubiquitous to address a considerable number of features among which many are redundant. As a remedy, online feature selection has attracted increasing attention in recent years. It sequentially reveals features and evaluates the importance of them. Though online feature selection has proven an elegant methodology, it is usually challenging to carry out a rigorous theoretical characterization. In this work, we propose a provable online feature selection algorithm that utilizes the online leverage score. The selected features are then fed to $k$-means clustering, making the clustering step memory and computationally efficient. We prove that with high probability, performing $k$-means clustering based on the selected feature space does not deviate far from the optimal clustering using the original data. The empirical results on real-world data sets demonstrate the effectiveness of our algorithm.
Low-dimensional discriminative representations enhance machine learning methods in both performance and complexity, motivating supervised dimensionality reduction (DR) that transforms high-dimensional data to a discriminative subspace. Most DR methods require data to be i.i.d., however, in some domains, data naturally come in sequences, where the observations are temporally correlated. We propose a DR method called LT-LDA to learn low-dimensional temporal representations. We construct the separability among sequence classes by lifting the holistic temporal structures, which are established based on temporal alignments and may change in different subspaces. We jointly learn the subspace and the associated alignments by optimizing an objective which favors easily-separable temporal structures, and show that this objective is connected to the inference of alignments, thus allows an iterative solution. We provide both theoretical insight and empirical evaluation on real-world sequence datasets to show the interest of our method.