Dimension-Free Adaptive Subgradient Methods with Frequent Directions
Abstract
Lay Summary
Adaptive subgradient methods, despite their better theoretical guarantees, are often impractical for large-scale machine learning tasks due to their high computational cost. Existing works have attempted to accelerate these methods using sketching techniques. However, their performance are significantly worse than those of standard adaptive methods, limiting their applicability in practice.In this work, we first utilize a classic sketching technique to propose an algorithm, which achieves improved theoretical guarantee. We then enhance its computational efficiency without compromising performance. Furthermore, we extend our approach to optimization problems involving matrix variables (e.g., training neural networks). By integrating our sketching technique into the existing method, we reduce the computational overhead while attaining better theoretical guarantee.