Skip to yearly menu bar Skip to main content


Self-similar Epochs: Value in arrangement

Eliav Buchnik · Edith Cohen · Avinatan Hasidim · Yossi Matias

Pacific Ballroom #60

Keywords: [ Optimization ] [ Matrix Factorization ]


Optimization of machine learning models is commonly performed through stochastic gradient updates on randomly ordered training examples. This practice means that each fraction of an epoch comprises an independent random sample of the training data that may not preserve informative structure present in the full data. We hypothesize that the training can be more effective with {\it self-similar} arrangements that potentially allow each epoch to provide benefits of multiple ones. We study this for ``matrix factorization'' -- the common task of learning metric embeddings of entities such as queries, videos, or words from example pairwise associations. We construct arrangements that preserve the weighted Jaccard similarities of rows and columns and experimentally observe training acceleration of 3\%-37\% on synthetic and recommendation datasets. Principled arrangements of training examples emerge as a novel and potentially powerful enhancement to SGD that merits further exploration.

Live content is unavailable. Log in and register to view live content