Towards Fair Sequential Decision-Making: A Causal Decomposition Approach
Abstract
Counterfactual reasoning is one of the fundamental facets of human cognition, involved in various tasks such as explanation, credit assignment, blame, and responsibility. It describes the queries what would have happened had some intervention been performed given that something else, corresponding to Layer 3 of the Pearl Causal Hierarchy. In this project, we examine a specific type of counterfactual quantities, called counterfactual direct (Str-DE), indirect (Str-IE), and spurious (Str-SE) effects for quantifying fairness in a sequential decision-making framework. Building on these measures, we formulate an online causally-fair learning problem with multiple long-term constraints and study it in both non-parametric contextual bandits and parametric logistic bandits settings. We achieve sublinear regret and violations bounds for both bandits settings with round-wise counterfactual fairness constraints (that are a priori unknown) without Slater’s condition. In particular, for logistic bandits, we obtain nearly optimal regret bound with leading term similar to that for unconstrained case (Zhang et al., 2025).