ProtDBench: A Unified Benchmark of Protein Binder Design and Evaluation
Cong Liu ⋅ Milong Ren ⋅ Jiaqi Guan ⋅ Chengyue Gong ⋅ Jinyuan Sun ⋅ Xinshi Chen ⋅ Wenzhi Xiao
Abstract
Recent advances in $\textit{de novo}$ protein binder design have enabled increasing experimental validation, yet reported $\textit{in silico}$ metrics remain difficult to interpret or compare across studies due to non-standardized evaluation protocols. We introduce $\textbf{ProtDBench}$, a standardized and throughput-aware evaluation framework for protein binder design. ProtDBench defines unified benchmark tasks, evaluation protocols, and success criteria, enabling systematic analysis of how evaluation design influences observed performance. Using a large wet-lab annotated dataset, we analyze commonly used structure prediction models as evaluation verifiers, revealing substantial verifier-dependent bias and limited agreement under identical filtering protocols. We then benchmark representative open-source generative binder design methods across ten diverse protein targets under a fixed evaluation protocol. Beyond per-sequence success rates, ProtDBench incorporates throughput-aware metrics based on a fixed 24-hour budget, as well as cluster-level success criteria to account for structural diversity. Together, these results expose systematic differences induced by filtering rules, success definitions, and throughput-aware evaluation between computational efficiency, success rate, and structural diversity. Overall, ProtDBench provides a fair and reproducible evaluation pipeline that supports systematic and controlled comparison of protein binder design methods under realistic evaluation settings.
Successful Page Load