Beyond Policy Training: Recursive Solution Search from Unannotated Videos
Abstract
Many real-world tasks are recorded as large collections of unannotated task executions, such as videos, which contain rich information about task progress but lack the supervision assumed by standard reinforcement learning (RL) pipelines. In many practical settings, the goal is not to train a reusable policy but simply to recover one feasible solution, making policy-centered learning unnecessarily costly. We propose Policy-Free Recursive Search (PFR-Search), a framework that directly recovers solutions from unannotated task executions without policy-grounded supervision or policy training. PFR-Search organizes videos into high-level task flows and performs recursive backward-forward search to recover solutions without policy modeling. To evaluate the efficiency of policy-free search in exploiting unannotated data, we use RL as an evaluation interface, incorporating task-flow-aligned intrinsic rewards and compare against video-driven RL methods. Experiments on long-horizon Minecraft tasks show that PFR-Search recovers feasible solutions from unannotated videos with minimal exploration.