Efficient Variance Reduction for Meta-learning

Hansi Yang · James Kwok

Room 318 - 320
[ Abstract ] [ Livestream: Visit Transfer/Multitask/Meta Learning ]
Tue 19 Jul 8:45 a.m. — 8:50 a.m. PDT
[ Paper PDF

Meta-learning tries to learn meta-knowledge from a large number of tasks. However, the stochastic meta-gradient can have large variance due to data sampling (from each task) and task sampling (from the whole task distribution), leading to slow convergence. In this paper, we propose a novel approach that integrates variance reduction with first-order meta-learning algorithms such as Reptile. It retains the bilevel formulation which better captures the structure of meta-learning, but does not require storing the vast number of task-specific parameters in general bilevel variance reduction methods. Theoretical results show that it has fast convergence rate due to variance reduction. Experiments on benchmark few-shot classification data sets demonstrate its effectiveness over state-of-the-art meta-learning algorithms with and without variance reduction.

Chat is not available.