Poster
in
Workshop: Geometry-grounded Representation Learning and Generative Modeling
Bias-inducing geometries: exactly solvable data model with fairness implications
Stefano Mannelli · Federica Gerace · Negar Rostamzadeh · Luca Saglietti
Keywords: [ Statistical Physics ] [ Fairness ] [ Data structure ]
Machine learning (ML) may be oblivious to human bias but it is not immune to its perpetuation. Marginalisation and iniquitous group representation are often traceable in the very data used for training, and may be reflected or even enhanced by the learning models. In this abstract, we aim to clarify the role played by data geometry in the emergence of ML bias. We introduce an exactly solvable high-dimensional model of data imbalance, where parametric control over the many bias-inducing factors allows for an extensive exploration of the bias inheritance mechanism. Through the tools of statistical physics, we analytically characterise the typical properties of learning models trained in this synthetic framework and obtain exact predictions for the observables that are commonly employed for fairness assessment. Simplifying the nature of the problem to its minimal components, we can retrace and unpack typical unfairness behaviour observed on real-world datasets