Workshop: Subset Selection in Machine Learning: From Theory to Applications

On Coresets For Fair Regression

Rachit Chhaya · Anirban Dasgupta · Supratim Shit · Jayesh Choudhari

Abstract: In this work we present an algorithm to construct coresets for fair regression for a dataset of $n$ points in $d$ dimensions and having some discrete-valued protected attribute. For (fair) regression with statistical parity we first define the notion of a coreset and give a coreset that incurs only a small additive error in satisfying the constraints and a relative error in the objective function. The coreset size is linear in $\ell$, the number of unique values the protected attribute can take, and the dependence on dimension is near linear ($d \log{d}$). To the best of our knowledge this is the first coreset for fair regression problem with statistical parity. We also empirically demonstrate the performance of our coresets on real data sets.