Event-Driven Reinforcement Learning for Pluralistic Alignment
Soyoung Yun ⋅ HAYOUNG OH
Abstract
Recent AI alignment research increasingly emphasizes accounting for conflicting and diverse human values rather than collapsing them into averaged preferences. However, existing reinforcement learning-based alignment approaches are typically optimized for single scalar rewards, making it difficult to address learning instability arising from heterogeneous preferences. This work proposes an event-driven reinforcement learning framework for pluralistic alignment by leveraging discrepancy-induced dynamics rather than suppressing preference inconsistencies. Under a multi-reward set $\mathbf{R}=\{R_1,\ldots,R_k\}$, the spikes and collapses of episode returns are interpreted as observable signals reflecting conflicts among underlying preference components. Under this view, spike events selectively reinforce trajectories with positive advantages, while collapse events mitigate policy fluctuations via KL regularization of the EMA anchor. In 5-seed pilot experiments on CartPole-v1 with partial observations, the proposed FULL variant demonstrated superior sample efficiency compared to the control group, with an AUC of 256.85 and a solve rate of 80\%, achieving the best performance with 1--4 additional event-driven updates. We further provide a reproducible protocol, theoretical hypotheses, and experimental design for MountainCar-v0 and multi-reward settings.
Successful Page Load