$\texttt{IDEAS}$: Interpretability Driven Evolutionary Approach for the Design of Biological Sequences
Akash Pandey ⋅ Wei Chen ⋅ Sinan Keten
Abstract
Designing biological sequences such as proteins and DNA for desired properties is challenging due to vast search spaces and limited wet lab evaluation budgets. Current evolutionary approaches ignore sequential dependencies and rely on random mutations, which scale poorly for long sequences. In contrast, reinforcement learning (RL) and generative models explicitly model sequence structure but require large datasets to guide generation toward the target properties. These limitations suggest the need for a method that combines the sample efficiency of evolutionary approaches with the ability to exploit sequential structure. In this work, we propose a novel evolutionary approach, $\texttt{IDEAS}$, in which mutations are guided by an explainable model. The model identifies critical motifs in high-fitness sequences and uses them to mutate non-critical positions. Across six continuous-property datasets, seven baselines, and three evaluation budgets, $\texttt{IDEAS}$ achieves a 19% acceleration in design while maintaining a favorable position on the Pareto curve balancing acceleration, diversity, and novelty.
Successful Page Load