Timezone: »
Efficient molecular design is one of the fundamental goals of computer-aided drug or material design. In recent years, significant progress has been made in solving challenging problems across various aspects of computational molecular optimizations, with an emphasis on achieving high validity, diversity, novelty, and most recently synthesizability. However, a crucial aspect that is rarely discussed is the budget spent on the optimization. If candidates are evaluated by experiment or high-fidelity simulation, as they are in realistic discovery settings, sample efficiency is paramount. In this paper, we thoroughly investigate 13 molecular design algorithms across 21 tasks within a limited oracle setting, allowing at most 10000 queries. We illustrate that most ``state-of-the-art'' methods fail to outperform some classic algorithms. Our results also highlight the influence of the generative action space (e.g., token-by-token, atom-by-atom, fragment-by-fragment) on performance and the necessity of multiple independent runs and hyperparameter tuning. We suggest a standard experimental benchmark to minimize the wasted effort caused by non-reproducibility, artificially poor baselines, and easily misinterpreted results.
Author Information
Wenhao Gao (Massachusetts Institute of Technology)
Tianfan Fu (Georgia Institute of Technology)
Jimeng Sun (Georgia Institute of Technology)
Connor Coley (MIT)
More from the Same Authors
-
2022 : Reinforced Genetic Algorithm for Structure-based Drug Design »
Tianfan Fu · Wenhao Gao · Connor Coley · Jimeng Sun -
2023 : A Survey on Knowledge Graphs for Healthcare: Resources, Application Progress, and Promise »
Hejie Cui · Jiaying Lu · Shiyu Wang · Ran Xu · Wenjing Ma · Shaojun Yu · Yue Yu · Xuan Kan · Tianfan Fu · Chen Ling · Joyce Ho · Fei Wang · Carl Yang -
2023 : An interpretable data augmentation framework for improving generative modeling of synthetic clinical trial data »
Afrah Shafquat · Mandis Beigi · Chufan Gao · Jason Mezey · Jimeng Sun · Jacob Aptekar -
2023 : An interpretable data augmentation framework for improving generative modeling of synthetic clinical trial data »
Afrah Shafquat · Mandis Beigi · Chufan Gao · Jason Mezey · Jimeng Sun · Jacob Aptekar -
2023 Poster: Fast Online Value-Maximizing Prediction Sets with Conformal Cost Control »
Zhen Lin · Shubhendu Trivedi · Cao Xiao · Jimeng Sun -
2022 Workshop: AI for Science »
Yuanqi Du · Tianfan Fu · Wenhao Gao · Kexin Huang · Shengchao Liu · Ziming Liu · Hanchen Wang · Connor Coley · Le Song · Linfeng Zhang · Marinka Zitnik -
2022 : Neural Scaling of Deep Chemical Models »
Connor Coley · Nathan C. Frey -
2021 Poster: Non-Autoregressive Electron Redistribution Modeling for Reaction Prediction »
Hangrui Bi · Hengyi Wang · Chence Shi · Connor Coley · Jian Tang · Hongyu Guo -
2021 Spotlight: Non-Autoregressive Electron Redistribution Modeling for Reaction Prediction »
Hangrui Bi · Hengyi Wang · Chence Shi · Connor Coley · Jian Tang · Hongyu Guo -
2020 Poster: Learning to Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning »
Sai Krishna Gottipati · Boris Sattarov · Sufeng Niu · Yashaswi Pathak · Haoran Wei · Shengchao Liu · Shengchao Liu · Simon Blackburn · Karam Thomas · Connor Coley · Jian Tang · Sarath Chandar · Yoshua Bengio -
2017 Tutorial: Deep Learning for Health Care Applications: Challenges and Solutions »
Yan Liu · Jimeng Sun