Skip to yearly menu bar Skip to main content


Poster

Kernel-Based Evaluation of Conditional Biological Sequence Models

Pierre Glaser · Steffan Paul · Alissa M. Hummer · Charlotte Deane · Debora Marks · Alan Amin

Hall C 4-9 #1800
[ ]
Thu 25 Jul 2:30 a.m. PDT — 4 a.m. PDT

Abstract:

We propose a set of kernel-based tools to evaluate the designs and tune the hyperparameters of conditional sequence models, with a focus on problems in computational biology. The backbone of our tools is a new measure of discrepancy between the true conditional distribution and the model's estimate, called the Augmented Conditional Maximum Mean Discrepancy (ACMMD). Provided that the model can be sampled from, the ACMMD can be estimated unbiasedly from data to quantify absolute model fit, integrated within hypothesis tests, and used to evaluate model reliability. We demonstrate the utility of our approach by analyzing a popular protein design model, ProteinMPNN. We are able to reject the hypothesis that ProteinMPNN fits its data for various protein families, and tune the model's temperature hyperparameter to achieve a better fit.

Live content is unavailable. Log in and register to view live content