Timezone: »

Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization
Sang Michael Xie · Tengyu Ma · Percy Liang

Thu Jul 22 05:00 PM -- 05:20 PM (PDT) @ None

We focus on prediction problems with structured outputs that are subject to output validity constraints, e.g. pseudocode-to-code translation where the code must compile. While labeled input-output pairs are expensive to obtain, "unlabeled" outputs, i.e. outputs without corresponding inputs, are freely available (e.g. code on GitHub) and provide information about output validity. Pre-training captures this structure by training a denoiser to denoise corrupted versions of unlabeled outputs. We first show that standard fine-tuning after pre-training destroys some of this structure. We then propose composed fine-tuning, which trains a predictor composed with the pre-trained denoiser. Importantly, the denoiser is fixed to preserve output structure. Like standard fine-tuning, the predictor is also initialized with the pre-trained denoiser. We prove for two-layer ReLU networks that composed fine-tuning significantly reduces the complexity of the predictor, thus improving generalization. Empirically, we show that composed fine-tuning improves over standard fine-tuning on two pseudocode-to-code translation datasets (3% and 6% relative). The improvement is magnified on out-of-distribution (OOD) examples (4% and 25% relative), suggesting that reducing predictor complexity improves OOD extrapolation.

Author Information

Sang Michael Xie (Stanford University)
Tengyu Ma (Stanford University)
Percy Liang (Stanford University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors