A Minimax Approach for Optimal Intervention Policy Learning with Two-Stage Outcomes
Abstract
When designing interventions to promote desired actions, two-stage agent heterogeneity -- encompassing both engagement with the intervention and completion of the desired action -- creates significant challenges in identifying optimal intervention policies. While this two-dimensional heterogeneity creates distinct agent response types with varying marginal policy returns, existing literature typically falls short in full identification of all agent types, leading to inefficient intervention allocations. To address the challenge of learning optimal policies that account for two-stage outcomes, we propose a minimax approach within a counterfactual principal strata framework. A value function, accommodating varying policy returns across six potentially non-identifiable principal strata, is designed and partially identified to minimize the worst-case value loss relative to three benchmark policies: never-treat, always-treat, and oracle. We introduce three estimators for optimal policy learning: Principal Outcome Regression (P-OR), Principal Inverse Propensity Scoring (P-IPS), and Principal Doubly Robust (P-DR), providing theoretical guarantees for their unbiasedness, robustness, and regret upper bounds. Extensive numerical experiments demonstrate the effectiveness and superiority of the proposed approach.