Poster
in
Workshop: New Frontiers in Learning, Control, and Dynamical Systems
Distributional Distance Classifiers for Goal-Conditioned Reinforcement Learning
Ravi Tej Akella · Benjamin Eysenbach · Jeff Schneider · Ruslan Salakhutdinov
What does it mean to find the shortest path in stochastic environments, where every strategy has a non-zero probability of failing? At the core of this question is a conflict between two seemingly-natural notions of planning: maximizing the probability of reaching a goal state, and minimizing the expected number of steps to reach that goal state. Reinforcement learning (RL) methods based on minimizing the steps to a goal make an implicit assumption: that the goal is always reached, at least within some finite horizon. This assumption is violated in practical settings and can lead to very suboptimal strategies. In this paper, we bridge the gap between these two notions of planning by estimating the probability of reaching the goal at different horizons. This is not the same as estimating the distance to the goal -- rather, probabilities convey uncertainty in ever reaching the goal at all. We then propose an algorithm for estimating these probabilities. The update rule resembles distributional RL but is used to solve (reward-free) goal-reaching tasks rather than (single) reward-maximization tasks. Taken together, we believe that our results provide a cogent framework for thinking about probabilities and distances in stochastic settings, along with a practical and effective algorithm for solving goal-reaching problems in many settings.