Workshop: Subset Selection in Machine Learning: From Theory to Applications

On Coresets For Fair Regression

Supratim Shit · Anirban Dasgupta · Rachit Chhaya · Jayesh Choudhari

[ Abstract ]
Sat 24 Jul 3:25 p.m. PDT — 3:30 p.m. PDT

Abstract: In this work we present an algorithm to construct coresets for fair regression for a dataset of $n$ points in $d$ dimensions and having some discrete-valued protected attribute. For (fair) regression with statistical parity we first define the notion of a coreset and give a coreset that incurs only a small additive error in satisfying the constraints and a relative error in the objective function. The coreset size is linear in $\ell$, the number of unique values the protected attribute can take, and the dependence on dimension is near linear ($d \log{d}$). To the best of our knowledge this is the first coreset for fair regression problem with statistical parity. We also empirically demonstrate the performance of our coresets on real data sets.