Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Geometry-grounded Representation Learning and Generative Modeling

On the Matter of Embeddings Dispersion on Hyperspheres

Evgeniia Tokarchuk · Hua Chang Bakker · Vlad Niculae

Keywords: [ Representation Learning ] [ maximum separation ] [ embeddings ] [ dispersion ] [ learning embeddings on hypersphere ]


Abstract: Dispersion of the embeddings on the $d$-dimensional hypersphere is a process of finding a configuration that preserves semantic information while pushing unrelated vectors away from each other without the need for negative examples. Such a formulation can be connected to the finding configuration of the points such that the minimum distance between two distinct points is maximal, which is a well-known open mathematical problem called the Tammes problem. When dealing with high-dimensional spaces and extremely large numbers of points, as in the text embeddings learning, there is typically no optimal solution, contrary to the Tammes problem, where the optimal solution exists for particular values of $N$ and $d$. Moreover, embeddings learning is mostly done in Euclidean space, which is at odds with the goal of directional dispersion. In this work, we revisit existing algorithms and propose new ones to find a sub-optimal solution for embeddings dispersion by defining the Riemannian optimization problem on the hypersphere.

Chat is not available.