Timezone: »
The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade
We study linear regression under covariate shift, where the marginal distribution over the input covariates differs in the source and the target domains, while the conditional distribution of the output given the input covariates is similar across the two domains. We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data (both conducted by SGD) for this problem. We establish sharp instance-dependent excess risk upper and lower bounds for this approach. Our bounds suggest that for a large class of linear regression instances, transfer learning with $O(N^2)$ source data (and scarce or no target data) is as effective as supervised learning with $N$ target data. In addition, we show that finetuning, even with only a small amount of target data, can drastically reduce the amount of source data required by pretraining. Our theory sheds light on the effectiveness and limitation of pretraining as well as the benefits of finetuning for tackling covariate shift problems.
Author Information
Jingfeng Wu (Johns Hopkins University)
Difan Zou (UCLA)
Vladimir Braverman (Johns Hopkins University)
Quanquan Gu (University of California, Los Angeles)
Sham Kakade (Harvard University)
More from the Same Authors
-
2021 : Adversarial Robustness of Streaming Algorithms through Importance Sampling »
Vladimir Braverman · Avinatan Hasidim · Yossi Matias · Mariano Schain · Sandeep Silwal · Samson Zhou -
2021 : Benign Overfitting in Adversarially Robust Linear Classification »
Jinghui Chen · Yuan Cao · Yuan Cao · Quanquan Gu -
2021 : Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures »
Yuan Cao · Yuan Cao · Quanquan Gu · Mikhail Belkin -
2021 : Bi-directional Adaptive Communication for Heterogenous Distributed Learning »
Dmitrii Avdiukhin · Vladimir Braverman -
2021 : Gap-Dependent Unsupervised Exploration for Reinforcement Learning »
Jingfeng Wu · Vladimir Braverman · Lin Yang -
2021 : Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation »
Yue Wu · Dongruo Zhou · Quanquan Gu -
2021 : Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation »
Jiafan He · Dongruo Zhou · Quanquan Gu -
2021 : Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs »
Jiafan He · Dongruo Zhou · Quanquan Gu -
2021 : Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation »
Zixiang Chen · Dongruo Zhou · Quanquan Gu -
2023 : Robust Learning with Progressive Data Expansion Against Spurious Correlation »
Yihe Deng · Yu Yang · Baharan Mirzasoleiman · Quanquan Gu -
2023 : DiffMol: 3D Structured Molecule Generation with Discrete Denoising Diffusion Probabilistic Models »
Weitong Zhang · Xiaoyun Wang · Justin Smith · Joe Eaton · Brad Rees · Quanquan Gu -
2023 : Predicting Task Forgetting in Large Language Models »
Anat Kleiman · Jonathan Frankle · Sham Kakade · Mansheej Paul -
2023 : Borda Regret Minimization for Generalized Linear Dueling Bandits »
Yue Wu · Tao Jin · Qiwei Di · Hao Lou · Farzad Farnoud · Quanquan Gu -
2023 Poster: DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design »
Jiaqi Guan · Xiangxin Zhou · Yuwei Yang · Yu Bao · Jian Peng · Jianzhu Ma · Qiang Liu · Liang Wang · Quanquan Gu -
2023 Poster: Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes »
Jiafan He · Heyang Zhao · Dongruo Zhou · Quanquan Gu -
2023 Poster: Cooperative Multi-Agent Reinforcement Learning: Asynchronous Communication and Linear Function Approximation »
Yifei Min · Jiafan He · Tianhao Wang · Quanquan Gu -
2023 Poster: Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron »
Jingfeng Wu · Difan Zou · Zixiang Chen · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2023 Poster: Provable Data Subset Selection For Efficient Neural Networks Training »
Morad Tukan · Samson Zhou · Alaa Maalouf · Daniela Rus · Vladimir Braverman · Dan Feldman -
2023 Poster: Benign Overfitting in Two-layer ReLU Convolutional Neural Networks »
Yiwen Kou · Zixiang Chen · Yuanzhou Chen · Quanquan Gu -
2023 Poster: Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization »
Chris Junchi Li · Huizhuo Yuan · Gauthier Gidel · Quanquan Gu · Michael Jordan -
2023 Oral: Structure-informed Language Models Are Protein Designers »
Zaixiang Zheng · Yifan Deng · Dongyu Xue · Yi Zhou · Fei YE · Quanquan Gu -
2023 Poster: Personalized Federated Learning under Mixture of Distributions »
Yue Wu · Shuaicheng Zhang · Wenchao Yu · Yanchi Liu · Quanquan Gu · Dawei Zhou · Haifeng Chen · Wei Cheng -
2023 Poster: AutoCoreset: An Automatic Practical Coreset Construction Framework »
Alaa Maalouf · Morad Tukan · Vladimir Braverman · Daniela Rus -
2023 Poster: On Provable Copyright Protection for Generative Models »
Nikhil Vyas · Sham Kakade · Boaz Barak -
2023 Poster: Optimal Online Generalized Linear Regression with Stochastic Noise and Its Application to Heteroscedastic Bandits »
Heyang Zhao · Dongruo Zhou · Jiafan He · Quanquan Gu -
2023 Poster: Structure-informed Language Models Are Protein Designers »
Zaixiang Zheng · Yifan Deng · Dongyu Xue · Yi Zhou · Fei YE · Quanquan Gu -
2023 Poster: The Benefits of Mixup for Feature Learning »
Difan Zou · Yuan Cao · Yuanzhi Li · Quanquan Gu -
2023 Poster: Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs »
Junkai Zhang · Weitong Zhang · Quanquan Gu -
2023 Poster: Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path »
Qiwei Di · Jiafan He · Dongruo Zhou · Quanquan Gu -
2023 Poster: On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits »
Weitong Zhang · Jiafan He · Zhiyuan Fan · Quanquan Gu -
2023 Poster: Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games »
Dylan Foster · Noah Golowich · Sham Kakade -
2023 Poster: Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes »
Chenlu Ye · Wei Xiong · Quanquan Gu · Tong Zhang -
2022 Poster: Learning Stochastic Shortest Path with Linear Function Approximation »
Yifei Min · Jiafan He · Tianhao Wang · Quanquan Gu -
2022 Spotlight: Learning Stochastic Shortest Path with Linear Function Approximation »
Yifei Min · Jiafan He · Tianhao Wang · Quanquan Gu -
2022 Poster: Sparsity in Partially Controllable Linear Systems »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Poster: Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Poster: On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs »
Yuanzhou Chen · Jiafan He · Quanquan Gu -
2022 Poster: Understanding Contrastive Learning Requires Incorporating Inductive Biases »
Nikunj Umesh Saunshi · Jordan Ash · Surbhi Goel · Dipendra Kumar Misra · Cyril Zhang · Sanjeev Arora · Sham Kakade · Akshay Krishnamurthy -
2022 Spotlight: Sparsity in Partially Controllable Linear Systems »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Oral: Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Spotlight: On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs »
Yuanzhou Chen · Jiafan He · Quanquan Gu -
2022 Spotlight: Understanding Contrastive Learning Requires Incorporating Inductive Biases »
Nikunj Umesh Saunshi · Jordan Ash · Surbhi Goel · Dipendra Kumar Misra · Cyril Zhang · Sanjeev Arora · Sham Kakade · Akshay Krishnamurthy -
2022 Poster: Dimension-free Complexity Bounds for High-order Nonconvex Finite-sum Optimization »
Dongruo Zhou · Quanquan Gu -
2022 Poster: Inductive Biases and Variable Creation in Self-Attention Mechanisms »
Benjamin Edelman · Surbhi Goel · Sham Kakade · Cyril Zhang -
2022 Spotlight: Inductive Biases and Variable Creation in Self-Attention Mechanisms »
Benjamin Edelman · Surbhi Goel · Sham Kakade · Cyril Zhang -
2022 Spotlight: Dimension-free Complexity Bounds for High-order Nonconvex Finite-sum Optimization »
Dongruo Zhou · Quanquan Gu -
2021 : Stochastic Variance-Reduced High-order Optimization for Nonconvex Optimization »
Quanquan Gu -
2021 Workshop: Over-parameterization: Pitfalls and Opportunities »
Yasaman Bahri · Quanquan Gu · Amin Karbasi · Hanie Sedghi -
2021 Poster: On the Convergence of Hamiltonian Monte Carlo with Stochastic Gradients »
Difan Zou · Quanquan Gu -
2021 Spotlight: On the Convergence of Hamiltonian Monte Carlo with Stochastic Gradients »
Difan Zou · Quanquan Gu -
2021 Poster: Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits »
Tianyuan Jin · Jing Tang · Pan Xu · Keke Huang · Xiaokui Xiao · Quanquan Gu -
2021 Poster: MOTS: Minimax Optimal Thompson Sampling »
Tianyuan Jin · Pan Xu · Jieming Shi · Xiaokui Xiao · Quanquan Gu -
2021 Poster: Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping »
Dongruo Zhou · Jiafan He · Quanquan Gu -
2021 Poster: Logarithmic Regret for Reinforcement Learning with Linear Function Approximation »
Jiafan He · Dongruo Zhou · Quanquan Gu -
2021 Spotlight: Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits »
Tianyuan Jin · Jing Tang · Pan Xu · Keke Huang · Xiaokui Xiao · Quanquan Gu -
2021 Spotlight: Logarithmic Regret for Reinforcement Learning with Linear Function Approximation »
Jiafan He · Dongruo Zhou · Quanquan Gu -
2021 Spotlight: MOTS: Minimax Optimal Thompson Sampling »
Tianyuan Jin · Pan Xu · Jieming Shi · Xiaokui Xiao · Quanquan Gu -
2021 Spotlight: Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping »
Dongruo Zhou · Jiafan He · Quanquan Gu -
2021 Poster: Provable Robustness of Adversarial Training for Learning Halfspaces with Noise »
Difan Zou · Spencer Frei · Quanquan Gu -
2021 Poster: Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins »
Spencer Frei · Yuan Cao · Quanquan Gu -
2021 Poster: Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise »
Spencer Frei · Yuan Cao · Quanquan Gu -
2021 Spotlight: Provable Robustness of Adversarial Training for Learning Halfspaces with Noise »
Difan Zou · Spencer Frei · Quanquan Gu -
2021 Oral: Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins »
Spencer Frei · Yuan Cao · Quanquan Gu -
2021 Spotlight: Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise »
Spencer Frei · Yuan Cao · Quanquan Gu -
2020 Poster: A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation »
Pan Xu · Quanquan Gu -
2020 Poster: Optimization Theory for ReLU Neural Networks Trained with Normalization Layers »
Yonatan Dukler · Quanquan Gu · Guido Montufar -
2020 Poster: Coresets for Clustering in Graphs of Bounded Treewidth »
Daniel Baker · Vladimir Braverman · Lingxiao Huang · Shaofeng H.-C. Jiang · Robert Krauthgamer · Xuan Wu -
2020 Poster: Schatten Norms in Matrix Streams: Hello Sparsity, Goodbye Dimension »
Vladimir Braverman · Robert Krauthgamer · Aditya Krishnan · Roi Sinoff -
2020 Poster: Neural Contextual Bandits with UCB-based Exploration »
Dongruo Zhou · Lihong Li · Quanquan Gu -
2020 Poster: Obtaining Adjustable Regularization for Free via Iterate Averaging »
Jingfeng Wu · Vladimir Braverman · Lin Yang -
2020 Poster: On the Noisy Gradient Descent that Generalizes as SGD »
Jingfeng Wu · Wenqing Hu · Haoyi Xiong · Jun Huan · Vladimir Braverman · Zhanxing Zhu -
2020 Poster: FetchSGD: Communication-Efficient Federated Learning with Sketching »
Daniel Rothchild · Ashwinee Panda · Enayat Ullah · Nikita Ivkin · Ion Stoica · Vladimir Braverman · Joseph E Gonzalez · Raman Arora -
2019 Poster: Coresets for Ordered Weighted Clustering »
Vladimir Braverman · Shaofeng Jiang · Robert Krauthgamer · Xuan Wu -
2019 Oral: Coresets for Ordered Weighted Clustering »
Vladimir Braverman · Shaofeng Jiang · Robert Krauthgamer · Xuan Wu -
2019 Poster: On the Convergence and Robustness of Adversarial Training »
Yisen Wang · Xingjun Ma · James Bailey · Jinfeng Yi · Bowen Zhou · Quanquan Gu -
2019 Poster: The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects »
Zhanxing Zhu · Jingfeng Wu · Bing Yu · Lei Wu · Jinwen Ma -
2019 Oral: The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects »
Zhanxing Zhu · Jingfeng Wu · Bing Yu · Lei Wu · Jinwen Ma -
2019 Oral: On the Convergence and Robustness of Adversarial Training »
Yisen Wang · Xingjun Ma · James Bailey · Jinfeng Yi · Bowen Zhou · Quanquan Gu -
2019 Poster: Lower Bounds for Smooth Nonconvex Finite-Sum Optimization »
Dongruo Zhou · Quanquan Gu -
2019 Oral: Lower Bounds for Smooth Nonconvex Finite-Sum Optimization »
Dongruo Zhou · Quanquan Gu -
2018 Poster: Fast and Sample Efficient Inductive Matrix Completion via Multi-Phase Procrustes Flow »
Xiao Zhang · Simon Du · Quanquan Gu -
2018 Poster: Continuous and Discrete-time Accelerated Stochastic Mirror Descent for Strongly Convex Functions »
Pan Xu · Tianhao Wang · Quanquan Gu -
2018 Oral: Fast and Sample Efficient Inductive Matrix Completion via Multi-Phase Procrustes Flow »
Xiao Zhang · Simon Du · Quanquan Gu -
2018 Oral: Continuous and Discrete-time Accelerated Stochastic Mirror Descent for Strongly Convex Functions »
Pan Xu · Tianhao Wang · Quanquan Gu -
2018 Poster: Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order »
Vladimir Braverman · Stephen Chestnut · Robert Krauthgamer · Yi Li · David Woodruff · Lin Yang -
2018 Poster: A Primal-Dual Analysis of Global Optimality in Nonconvex Low-Rank Matrix Recovery »
Xiao Zhang · Lingxiao Wang · Yaodong Yu · Quanquan Gu -
2018 Poster: Stochastic Variance-Reduced Hamilton Monte Carlo Methods »
Difan Zou · Pan Xu · Quanquan Gu -
2018 Oral: Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order »
Vladimir Braverman · Stephen Chestnut · Robert Krauthgamer · Yi Li · David Woodruff · Lin Yang -
2018 Oral: Stochastic Variance-Reduced Hamilton Monte Carlo Methods »
Difan Zou · Pan Xu · Quanquan Gu -
2018 Oral: A Primal-Dual Analysis of Global Optimality in Nonconvex Low-Rank Matrix Recovery »
Xiao Zhang · Lingxiao Wang · Yaodong Yu · Quanquan Gu -
2018 Poster: Stochastic Variance-Reduced Cubic Regularized Newton Method »
Dongruo Zhou · Pan Xu · Quanquan Gu -
2018 Poster: Covariate Adjusted Precision Matrix Estimation via Nonconvex Optimization »
Jinghui Chen · Pan Xu · Lingxiao Wang · Jian Ma · Quanquan Gu -
2018 Oral: Stochastic Variance-Reduced Cubic Regularized Newton Method »
Dongruo Zhou · Pan Xu · Quanquan Gu -
2018 Oral: Covariate Adjusted Precision Matrix Estimation via Nonconvex Optimization »
Jinghui Chen · Pan Xu · Lingxiao Wang · Jian Ma · Quanquan Gu -
2017 Poster: Uncertainty Assessment and False Discovery Rate Control in High-Dimensional Granger Causal Inference »
Aditya Chaudhry · Pan Xu · Quanquan Gu -
2017 Poster: High-Dimensional Variance-Reduced Stochastic Gradient Expectation-Maximization Algorithm »
Rongda Zhu · Lingxiao Wang · Chengxiang Zhai · Quanquan Gu -
2017 Poster: Clustering High Dimensional Dynamic Data Streams »
Lin Yang · Harry Lang · Christian Sohler · Vladimir Braverman · Gereon Frahling -
2017 Poster: Robust Gaussian Graphical Model Estimation with Arbitrary Corruption »
Lingxiao Wang · Quanquan Gu -
2017 Talk: High-Dimensional Variance-Reduced Stochastic Gradient Expectation-Maximization Algorithm »
Rongda Zhu · Lingxiao Wang · Chengxiang Zhai · Quanquan Gu -
2017 Talk: Robust Gaussian Graphical Model Estimation with Arbitrary Corruption »
Lingxiao Wang · Quanquan Gu -
2017 Talk: Uncertainty Assessment and False Discovery Rate Control in High-Dimensional Granger Causal Inference »
Aditya Chaudhry · Pan Xu · Quanquan Gu -
2017 Poster: A Unified Variance Reduction-Based Framework for Nonconvex Low-Rank Matrix Recovery »
Lingxiao Wang · Xiao Zhang · Quanquan Gu -
2017 Talk: Clustering High Dimensional Dynamic Data Streams »
Lin Yang · Harry Lang · Christian Sohler · Vladimir Braverman · Gereon Frahling -
2017 Talk: A Unified Variance Reduction-Based Framework for Nonconvex Low-Rank Matrix Recovery »
Lingxiao Wang · Xiao Zhang · Quanquan Gu