Poster
Incremental Gradient Descent with Small Epoch Counts is Surprisingly Slow on Ill-Conditioned Problems
Yujun Kim · Jaeyoung Cha · Chulhee Yun
West Exhibition Hall B2-B3 #W-1020
How quickly do practical optimization algorithms learn the solution when the training time is limited? Many machine learning models are trained using a method called stochastic gradient descent (SGD), which improves performance by gradually adjusting model parameters. A practical variation called permutation-based SGD, which processes training data in a shuffled order, is known to work faster than the uniform-sampling SGD, which selects a random sample at each step. However, these benefits typically appear only when the training time is sufficiently long.We ask what happens when training is short, which corresponds to the common case where the computational budget is limited. To explore this, we study a simple instance of permutation-based SGD called Incremental Gradient Descent, which repeatedly processes the data in the natural order given by the dataset. We found that in short training scenarios, it can actually be much slower than expected, even for simple problems. As the complexity of the problem increases, performance can degrade further.These results show that training methods that work well in long training may behave very differently depending on the problem difficulty when time is limited. This has important implications connecting the theory and practice when the computational budget is limited.
Live content is unavailable. Log in and register to view live content