Second-Order Smooth Planning with Optimal-Transport Bellman Smoothing
Tuan Dam
Abstract
Planning with a generative model aims to estimate state values using minimal oracle calls. For entropy-regularized MDPs, SmoothCruiser exploits the smoothness of the $\operatorname{LogSumExp}$ Bellman operator to achieve $\widetilde{\mathcal{O}}(\varepsilon^{-4})$ sample complexity, but its first-order Taylor approximation limits the rate. We develop a curvature--complexity theory showing that if a Bellman aggregator has Taylor remainder of order $\beta \ge 2$, the optimal oracle complexity exponent is $2 + 2/(\beta-1)$---recovering $\widetilde{\mathcal{O}}(\varepsilon^{-4})$ for $\beta=2$ and predicting $\widetilde{\mathcal{O}}(\varepsilon^{-3})$ for $\beta=3$. To achieve $\beta=3$, we introduce an entropic optimal-transport regularizer over action distributions. The resulting OT-smoothed Bellman operator admits a closed-form expression, explicit gradient policy, and Lipschitz Hessian. We derive an unbiased estimator of the quadratic Taylor term via cross-product debiasing, enabling a second-order SmoothCruiser with $\widetilde{\mathcal{O}}(\varepsilon^{-3})$ complexity. We further propose gap-dependent variants and provide a complexity analysis and show advantage of our method.
Successful Page Load