Identifying Efficient Queries for Black-Box Model Classification
Abstract
We consider the problem of classifying a black-box generative model based on its responses to a collection of queries. While some query sets produce strong class separation, others do not. We formalize this distinction through the discriminative factorization, a decomposition of query-based model interaction into independent statistical “directions”. Under this framework, it is possible to separate informative and uninformative queries using parameters that can be estimated from the spectral structure of a query-model matrix. On a real model auditing task, we demonstrate that query sets selected using the estimated discriminative factorization reproduce oracle query selection and improve classification efficiency without task-specific knowledge.