Skip to yearly menu bar Skip to main content


Poster

Data-Efficient Molecular Generation with Hierarchical Textual Inversion

Seojin Kim · Jaehyun Nam · Sihyun Yu · Younghoon Shin · Jinwoo Shin


Abstract: Developing an effective molecular generation framework even with a limited number of molecules is often important for its practical deployment, e.g., drug discovery, since acquiring task-related molecular data requires expensive and time-consuming experimental costs. To tackle this issue, we introduce Hierarchical Textual Inversion for Molecular Generation (HI-Mol), a novel data-efficient molecular generation method. HI-Mol is inspired by the recent textual inversion technique in the visual domain that achieves data-efficient generation by learning the common concept of images via a new single text token of a pre-trained text-to-image model.However, we find that its naive adoption fails for molecules due to their complicatedly structured nature, i.e., molecules with the same concept such as drug-likeness often exhibit entirely different structures.Therefore, in addition to the globally shared token, we introduce low-level tokens to incorporate cluster- or molecule-specific features of molecules.We then generate molecules using a pre-trained text-to-molecule model by interpolating the low-level tokens. Extensive experiments demonstrate the superiority of HI-Mol with notable data-efficiency. For instance, on QM9, HI-Mol outperforms the prior state-of-the-art method with 50$\times$ less training data. We also show the effectiveness of molecules generated by HI-Mol in low-shot molecular property prediction.

Live content is unavailable. Log in and register to view live content