Poster
in
Workshop: AI for Science: Scaling in AI for Scientific Discovery
Tail Extrapolation in target-aware conditional molecule generation
Weichi Yao · Cameron Gruich · Bryan Goldsmith · Yixin Wang
Keywords: [ diffusion-based models ] [ large language models ] [ Molecule Generation ] [ Desired property ] [ extrapolation ]
Generative models, such as diffusion-based models and large language models, have become increasingly popular in cheminformatics research. These models have shown promise in accelerating the discovery of molecules. However, they are hindered by data scarcity and struggle to accurately generate molecules when the desired properties lie outside the range of the training data, a task known as tail extrapolation in statistics. To this end, we propose tail-extrapolative generative model in this work. The key idea is to adapt pre-additive noise models, which can provably perform tail extrapolation in classical regression tasks, to a variety of conditional generative models. Across empirical studies, we find that tail-extrapolative generative models exhibit improved extrapolation capabilities. They enable the generation of molecules with properties that more closely align with desired targets. Furthermore, these models enhance the diversity of the generated molecules compared to existing approaches, representing an advancement in molecular design.