Convergence Analysis of Decentralized Hessian-/Jacobian-Free Algorithm for Nonconvex Stochastic Bilevel Optimization
Abstract
Decentralized stochastic bi-level optimization has been actively studied in recent years. However, existing studies assume that the lower-level loss function is strongly convex, which limits their applicability to many machine learning models. To address this limitation, in this paper, we propose a novel decentralized stochastic first-order optimization algorithm, which does not require second-order Hessian or Jacobian matrices, for the setting where the lower-level loss function is nonconvex but satisfies the Polyak–Łojasiewicz (PL) condition. Additionally, unlike existing single-agent methods that introduce a regularization term to the lower-level loss function to artificially enforce strong convexity, our algorithm does not require such modification. Moreover, our algorithm employs a constant single-timescale learning rate for updating variables, which is different from the time-dependent and two-timescale learning rate schedules used in prior work. To establish the convergence rate, we develop a new convergence analysis framework for the pure PL condition, rather than relying on the artificial strong convexity introduced through regularization in existing single-agent methods. To the best of our knowledge, this is the first algorithm for nonconvex decentralized bi-level optimization that offers theoretical convergence guarantees under mild conditions. Finally, our extensive experimental results on hyperparameter optimization and model pruning applications validate the efficacy of the proposed algorithm.