Personalized Policy Learning through Discrete Experimentation
Zhiqi Zhang ⋅ Zhiyu Zeng ⋅ Ruohan Zhan ⋅ Dennis Zhang
Abstract
While Randomized controlled trials (RCTs), or A/B tests, are the gold standard for optimizing online-platform policies, they are limited by discrete testing levels. This approach is suboptimal for continuous variables (e.g., prices and incentives), as it fails to extrapolate to untested values or account for user heterogeneity. We address this by developing Deep Learning for Policy Targeting (\textsf{DLPT}) to learn personalized continuous policies from discrete RCTs using high-dimensional features. We prove our estimators are asymptotically unbiased and consistent, achieving a $\sqrt{n}$-regret bound. In a collaboration with a leading social media platform to optimize creator incentives, we show that \textsf{DLPT} substantially outperforms existing benchmarks. In a collaboration with a leading social media platform to optimize creator incentives, we show that \textsf{DLPT} substantially outperforms existing benchmarks.
Successful Page Load