Skip to yearly menu bar Skip to main content

Workshop: ICML Workshop on Human in the Loop Learning (HILL)

Explicable Policy Search via Preference-Based Learning under Human Biases

Ze Gong · Yu Zhang


As intelligent agents become pervasive in our lives, they are expected to not only achieve tasks alone but also engage in tasks that require close collaboration with humans. In such a context, the optimal agent behavior without considering the humans in the loop may be viewed as inexplicable, resulting in degraded team performance and loss of trust. Consequently, to be seen as good team players, such agents are required to learn about human idiosyncrasies and preferences for their behaviors based on human feedback and respect them during decision-making. On the other hand, human biases can skew the feedback and cause such learning agents to deviate from their original design purposes, leading to severe consequences. Therefore, it is critical for these agents to be aware of human biases and trade off optimality with human preferences for their behaviors appropriately. In this paper, we formulate the problem of Explicable Policy Search (EPS). We assume that human biases arise from the human’s belief about the agent’s domain dynamics and the human’s reward function. Directly learning the human’s belief and reward function is possible but largely inefficient and unnecessary. We demonstrate that they can be encoded by a single surrogate reward function that is learned in a preference-based framework. With this reward function, the agent then learns a stochastic policy via maximum entropy reinforcement learning to recover an explicable policy. We evaluate our method for EPS in a set of continuous navigation domains with synthetic human models and in an autonomous driving domain with a human subject study. The results suggest that our method can effectively generate explicable behaviors that are more desirable under various human biases.

Chat is not available.