Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Models of Human Feedback for AI Alignment

Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping

Haoyu Wang · Guozheng Ma · Ziqiao Meng · Zeyu Qin · Li Shen · Zhong Zhang · Bingzhe Wu · Liu Liu · Yatao Bian · Tingyang Xu · Xueqian Wang · Peilin Zhao

[ ] [ Project Page ]
Fri 26 Jul 8 a.m. PDT — 8 a.m. PDT

Abstract:

Self-alignment is an effective way to reduce the cost of human annotation while ensuring promising model capability. However, existing self-alignment methods utilize the pretrained LLM to generate alignment datasets in a few-shot manner, which gives rise to a question: Is the pretrained LLM the better few-shot generator rather than its aligned version? If not, to what extent could the aligned LLM continue providing benefits? In this paper, our pioneering exploration delves into the impact of bootstrapping self-alignment on large language models. We find the key role of in-context learning (ICL) examples, which serves as the only fresh data in this self-training loop and should be as much diverse and informative as possible. Our findings reveal that bootstrapping self-alignment markedly surpasses the single-round approach. To further exploit the capabilities of bootstrapping, we investigate and adjust the training order of data, which yields improved performance of the model. We discuss the collapse phenomenon in the later stage and offer two viewpoints: Data Processing Inequality and Sharper Output Distribution along with corresponding empirical study for explanation. Based on this, we give a validation dataset for early stop in case of further model collapse. We propose Step-On-Feet Tuning (SOFT) which leverages model's continuously enhanced few-shot ability to boost zero or one-shot performance, shedding light on the ignored potential of continually enhancing model self-alignment performance.

Chat is not available.