Keywords: [ Statistical Learning Theory ] [ Algorithms ] [ Meta-Learning ] [ Multitask and Transfer Learning ] [ Algorithms -> Few-Shot Learning; Algorithms ]

[
Abstract
]
Spotlight presentation:
Learning Theory 2

Wed 21 Jul 5 a.m. PDT — 6 a.m. PDT

[
Slides]

[ Paper ]

Wed 21 Jul 9 a.m. PDT — 11 a.m. PDT

Wed 21 Jul 5 a.m. PDT — 6 a.m. PDT

Abstract:
In this paper, we study meta learning for support (i.e., the set of non-zero entries) recovery in high-dimensional precision matrix estimation where we reduce the sufficient sample complexity in a novel task with the information learned from other auxiliary tasks. In our setup, each task has a different random true precision matrix, each with a possibly different support. We assume that the union of the supports of all the true precision matrices (i.e., the true support union) is small in size. We propose to pool all the samples from different tasks, and \emph{improperly} estimate a single precision matrix by minimizing the $\ell_1$-regularized log-determinant Bregman divergence. We show that with high probability, the support of the \emph{improperly} estimated single precision matrix is equal to the true support union, provided a sufficient number of samples per task $n \in O((\log N)/K)$, for $N$-dimensional vectors and $K$ tasks. That is, one requires less samples per task when more tasks are available. We prove a matching information-theoretic lower bound for the necessary number of samples, which is $n \in \Omega((\log N)/K)$, and thus, our algorithm is minimax optimal. Then for the novel task, we prove that the minimization of the $\ell_1$-regularized log-determinant Bregman divergence with the additional constraint that the support is a subset of the estimated support union could reduce the sufficient sample complexity of successful support recovery to $O(\log(|S_{\text{off}}|))$ where $|S_{\text{off}}|$ is the number of off-diagonal elements in the support union and is much less than $N$ for sparse matrices. We also prove a matching information-theoretic lower bound of $\Omega(\log(|S_{\text{off}}|))$ for the necessary number of samples.

Chat is not available.