Trajectory-Aware Heuristic Learning for Combinatorial Search
Abstract
Learning effective value heuristics for combinatorial search is difficult, as prior methods rely on surrogate supervision or costly downstream search to assess progress. We introduce a trajectory-aware probabilistic framework that models uncertainty in cost-to-go labels instead of treating them as fixed targets. Heuristic learning is cast as inference over state trajectories using an HMM-style model (Rabiner, 1989), where estimated depth-change dynamics define transitions and forward–backward inference yields soft supervision. To evaluate heuristic quality without search, we propose a large-scale local ranking metric that measures a model’s ability to order neighboring states. On the Rubik’s Cube, our approach consistently improves local ranking accuracy and downstream search performance under matched computational budgets.