Timezone: »
In large-scale time series forecasting, one often encounters the situation where the temporal patterns of time series, while drifting over time, differ from one another in the same dataset. In this paper, we provably show under such heterogeneity, training a forecasting model with commonly used stochastic optimizers (e.g. SGD) potentially suffers large variance on gradient estimation, and thus incurs long-time training. We show that this issue can be efficiently alleviated via stratification, which allows the optimizer to sample from pre-grouped time series strata. For better trading-off gradient variance and computation complexity, we further propose SCott (Stochastic Stratified Control Variate Gradient Descent), a variance reduced SGD-style optimizer that utilizes stratified sampling via control variate. In theory, we provide the convergence guarantee of SCott on smooth non-convex objectives. Empirically, we evaluate SCott and other baseline optimizers on both synthetic and real-world time series forecasting problems, and demonstrate SCott converges faster with respect to both iterations and wall clock time.
Author Information
Yucheng Lu (Cornell University)
Youngsuk Park (Amazon Research)
Lifan Chen (Amazon)
Yuyang Wang (AWS AI Labs)
Christopher De Sa (Cornell)
Dean Foster (Amazon)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: Variance Reduced Training with Stratified Sampling for Forecasting Models »
Wed. Jul 21st 02:20 -- 02:25 PM Room
More from the Same Authors
-
2023 Poster: CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks »
Jue Wang · Yucheng Lu · Binhang Yuan · Beidi Chen · Percy Liang · Chris De Sa · Christopher Re · Ce Zhang -
2023 Poster: STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition »
Yucheng Lu · Sally Jesmonth · Suvinay Subramanian · Oleg Rybakov · Chris De Sa · Amir Yazdanbakhsh -
2023 Poster: Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting »
Hilaf Hasson · Danielle Robinson · Yuyang Wang · Gaurav Gupta · Youngsuk Park -
2022 : MCTensor: A High-Precision Deep Learning Library with Multi-Component Floating-Point »
Tao Yu · Wentao Guo · Canal Li · Tiancheng Yuan · Christopher De Sa -
2022 : Riemannian Residual Neural Networks »
Isay Katsman · Eric Chen · Sidhanth Holalkere · Aaron Lou · Ser Nam Lim · Christopher De Sa -
2022 Poster: Domain Adaptation for Time Series Forecasting via Attention Sharing »
Xiaoyong Jin · Youngsuk Park · Danielle Robinson · Hao Wang · Yuyang Wang -
2022 Spotlight: Domain Adaptation for Time Series Forecasting via Attention Sharing »
Xiaoyong Jin · Youngsuk Park · Danielle Robinson · Hao Wang · Yuyang Wang -
2022 Poster: Low-Precision Stochastic Gradient Langevin Dynamics »
Ruqi Zhang · Andrew Wilson · Christopher De Sa -
2022 Spotlight: Low-Precision Stochastic Gradient Langevin Dynamics »
Ruqi Zhang · Andrew Wilson · Christopher De Sa -
2021 Workshop: Time Series Workshop »
Yian Ma · Ehi Nosakhare · Yuyang Wang · Scott Yang · Rose Yu -
2021 Poster: Correcting Exposure Bias for Link Recommendation »
Shantanu Gupta · Hao Wang · Zachary Lipton · Yuyang Wang -
2021 Spotlight: Correcting Exposure Bias for Link Recommendation »
Shantanu Gupta · Hao Wang · Zachary Lipton · Yuyang Wang -
2021 Poster: Top-k eXtreme Contextual Bandits with Arm Hierarchy »
Rajat Sen · Alexander Rakhlin · Lexing Ying · Rahul Kidambi · Dean Foster · Daniel Hill · Inderjit Dhillon -
2021 Spotlight: Top-k eXtreme Contextual Bandits with Arm Hierarchy »
Rajat Sen · Alexander Rakhlin · Lexing Ying · Rahul Kidambi · Dean Foster · Daniel Hill · Inderjit Dhillon -
2021 Poster: Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision »
Johan Björck · Xiangyu Chen · Christopher De Sa · Carla Gomes · Kilian Weinberger -
2021 Spotlight: Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision »
Johan Björck · Xiangyu Chen · Christopher De Sa · Carla Gomes · Kilian Weinberger -
2021 Poster: Optimal Complexity in Decentralized Training »
Yucheng Lu · Christopher De Sa -
2021 Oral: Optimal Complexity in Decentralized Training »
Yucheng Lu · Christopher De Sa -
2020 Poster: Moniqua: Modulo Quantized Communication in Decentralized SGD »
Yucheng Lu · Christopher De Sa -
2020 Poster: Differentiating through the Fréchet Mean »
Aaron Lou · Isay Katsman · Qingxuan Jiang · Serge Belongie · Ser Nam Lim · Christopher De Sa -
2019 Workshop: ICML 2019 Time Series Workshop »
Vitaly Kuznetsov · Scott Yang · Rose Yu · Cheng Tang · Yuyang Wang -
2019 Poster: Distributed Learning with Sublinear Communication »
Jayadev Acharya · Christopher De Sa · Dylan Foster · Karthik Sridharan -
2019 Oral: Distributed Learning with Sublinear Communication »
Jayadev Acharya · Christopher De Sa · Dylan Foster · Karthik Sridharan -
2019 Poster: SWALP : Stochastic Weight Averaging in Low Precision Training »
Guandao Yang · Tianyi Zhang · Polina Kirichenko · Junwen Bai · Andrew Wilson · Christopher De Sa -
2019 Poster: A Kernel Theory of Modern Data Augmentation »
Tri Dao · Albert Gu · Alexander J Ratner · Virginia Smith · Christopher De Sa · Christopher Re -
2019 Poster: Deep Factors for Forecasting »
Yuyang Wang · Alex Smola · Danielle Robinson · Jan Gasthaus · Dean Foster · Tim Januschowski -
2019 Poster: Improving Neural Network Quantization without Retraining using Outlier Channel Splitting »
Ritchie Zhao · Yuwei Hu · Jordan Dotzel · Christopher De Sa · Zhiru Zhang -
2019 Oral: SWALP : Stochastic Weight Averaging in Low Precision Training »
Guandao Yang · Tianyi Zhang · Polina Kirichenko · Junwen Bai · Andrew Wilson · Christopher De Sa -
2019 Oral: Improving Neural Network Quantization without Retraining using Outlier Channel Splitting »
Ritchie Zhao · Yuwei Hu · Jordan Dotzel · Christopher De Sa · Zhiru Zhang -
2019 Oral: A Kernel Theory of Modern Data Augmentation »
Tri Dao · Albert Gu · Alexander J Ratner · Virginia Smith · Christopher De Sa · Christopher Re -
2019 Oral: Deep Factors for Forecasting »
Yuyang Wang · Alex Smola · Danielle Robinson · Jan Gasthaus · Dean Foster · Tim Januschowski -
2018 Poster: Minibatch Gibbs Sampling on Large Graphical Models »
Christopher De Sa · Vincent Chen · Wong -
2018 Oral: Minibatch Gibbs Sampling on Large Graphical Models »
Christopher De Sa · Vincent Chen · Wong -
2018 Poster: Representation Tradeoffs for Hyperbolic Embeddings »
Frederic Sala · Christopher De Sa · Albert Gu · Christopher Re -
2018 Oral: Representation Tradeoffs for Hyperbolic Embeddings »
Frederic Sala · Christopher De Sa · Albert Gu · Christopher Re