Designing closed-loop optimal control for high-dimensional nonlinear systems remains a persistent challenge. Traditional methods, such as solving the Hamilton-Jacobi-Bellman equation, suffer from the curse of dimensionality. Recent studies introduced a promising supervised learning approach, akin to imitation learning, that uses deep neural networks to learn from open-loop optimal control solutions.
In this talk, we'll explore this method, highlighting a limitation in its basic form: the distribution mismatch phenomenon, induced by controlled dynamics. To overcome this, we present an improved approach—the initial value problem enhanced sampling method. This method not only provides a theoretical edge over the basic version in the linear-quadratic regulator but also showcases substantial numerical improvement on various high-dimensional nonlinear problems, including the optimal reaching problem of a 7 DoF manipulator. Notably, our method also surpasses the Dataset Aggregation (DAGGER) algorithm, widely adopted in imitation learning, with significant theoretical and practical advantages.