Learn-to-learn on Arbitrary Textual Conditioning: A Hypernetwork-Driven Meta-gated LLM
Luo Ji ⋅ Qi Qin ⋅ Ningyuan Xi ⋅ Teng Chen ⋅ Qingqing Gu ⋅ Hongyan Li
Abstract
Conventional LLMs may suffer from heterogeneous corpus and subtle condition changes. While finetuning can create the catastrophe forgetting issue, application of meta-learning on LLMs is also limited due to its complexity and scalability. In this paper, we activate the meta-signal of $\beta$ within the SwiGLU blocks, resulting a meta-gating mechanism which adaptively adjusts the nonlinearity of FFN. A hypernetwork is employed which dynamically produces $\beta$ on textual conditions, providing meta-controllability on LLMs. By testing on different condition types such as task, domain, persona, and style, our method outperforms finetuning and meta-learning baselines, and can generalize reasonable on unseen task, condition type or instructions. Our code can be found https://anonymous.4open.science/r/MeGan-CAC0.
Successful Page Load