Recursive Monte-Carlo Tree Search
Benjamin Howard ⋅ Keith Frankston
Abstract
We introduce a recursive AlphaZero style Monte--Carlo tree search algorithm, "RMCTS". It first generates the search tree using prior policies, and then recursively re-estimates action values by using the regularized optimal posterior policies from ``Monte--Carlo tree search as regularized policy optimization'' (Grill et al., 2020) at each node of the search tree, starting from the leaves and working back up to the root. We find that RMCTS matches or exceeds the quality of AlphaZero's MCTS-UCB in a tiny fraction of the time.
Successful Page Load