Timezone: »
Extremely large pre-trained language models (PTMs) such as GPT-3 are usually released as a service. It allows users to design task-specific prompts to query the PTMs through some black-box APIs. In such a scenario, which we call Language-Model-as-a-Service (LMaaS), the gradients of PTMs are usually unavailable. Can we optimize the task prompts by only accessing the model inference APIs? This paper proposes the black-box tuning framework to optimize the continuous prompt prepended to the input text via derivative-free optimization. Instead of optimizing in the original high-dimensional prompt space, which is intractable for traditional derivative-free optimization, we perform optimization in a randomly generated subspace due to the low intrinsic dimensionality of large PTMs. The experimental results show that the black-box tuning with RoBERTa on a few labeled samples not only significantly outperforms manual prompt and GPT-3's in-context learning, but also surpasses the gradient-based counterparts, i.e., prompt tuning and full model tuning.
Author Information
Tianxiang Sun (Fudan University)
Tianxiang Sun is currently a Ph.D. student at Fudan University, working on natural language processing and machine learning.
Yunfan Shao (Fudan University)
Hong Qian (East China Normal University)
Xuanjing Huang (Fudan University)
Xuanjing Huang is a Professor of the School of Computer Science, Fudan University, Shanghai, China. Her research interest includes artificial intelligence, natural language processing, information retrieval and social media processing. She has published more than 100 papers in major computer science conferences and journals. She has also served as Program Co-Chair in EMNLP 2021, CCL 2019, CCL 2016, NLPCC 2017, SMP 2015, the organizer of WSDM 2015, and competition chair of CIKM 2014.
Xipeng Qiu (Fudan University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Black-Box Tuning for Language-Model-as-a-Service »
Tue. Jul 19th through Wed the 20th Room Hall E #120
More from the Same Authors
-
2023 Poster: From Hypergraph Energy Functions to Hypergraph Neural Networks »
Yuxin Wang · Quan Gan · Xipeng Qiu · Xuanjing Huang · David Wipf -
2022 : Incorporating Dynamic Structures into Pre-trained Language Models »
Xuanjing Huang -
2022 Poster: What Dense Graph Do You Need for Self-Attention? »
Yuxin Wang · Chu-Tak Lee · Qipeng Guo · Zhangyue Yin · yunhua zhou · Xuanjing Huang · Xipeng Qiu -
2022 Spotlight: What Dense Graph Do You Need for Self-Attention? »
Yuxin Wang · Chu-Tak Lee · Qipeng Guo · Zhangyue Yin · yunhua zhou · Xuanjing Huang · Xipeng Qiu -
2022 Poster: The Teaching Dimension of Regularized Kernel Learners »
Hong Qian · Xu-Hui Liu · Chen-Xi Su · Aimin Zhou · Yang Yu -
2022 Spotlight: The Teaching Dimension of Regularized Kernel Learners »
Hong Qian · Xu-Hui Liu · Chen-Xi Su · Aimin Zhou · Yang Yu