Modeling Molecular Sequences with Learning-Order Autoregressive Models
Zhe Wang · Jiaxin Shi · Nicolas Heess · Michalis Titsias · Arthur Gretton · Yee-Whye Teh
Abstract
Text-based autoregressive models (ARMs) are popular for SMILES (Simplified Molecular Input Line Entry System) string generation due to their simplicity and state-of-the-art performance, but typically use a fixed left-to-right order. Since optimal SMILES ordering is less obvious than for natural text, we developed LO-ARM (Learning-Order ARM) to learn a data-dependent generation order. Evaluated on ChEMBL, LO-ARM learns consistent and meaningful orderings that reveal molecular substructures, and matches or surpasses state-of-the-art models, offering a well-balanced yet competitive model option.
Chat is not available.
Successful Page Load