How Do Language Models Speak Languages? A Case Study on Unintended Code-Switching
Yuxin Xiao ⋅ Zhen Huang ⋅ Wenxiao Wang ⋅ Yan Zhao ⋅ Zhihong Gu ⋅ Binbin Lin ⋅ Xiaofei He ⋅ Xu Shen ⋅ Jieping Ye
Abstract
Unintended code-switching, which refers to the phenomenon where LLM unexpectedly switch languages, poses a fundamental challenge in the multilingual capabilities in LLMs. However, we still lack a mechanistic account of how this failure mode is implemented inside the model. For example, what internal components (i.e., circuits) give rise to unintended code-switching, where they emerge across layers, and how we can intervene to mitigate it. In this work, we introduce a scalable circuit discovery framework that causally localizes multilingual neurons and describes their functional patterns, then further groups them into interpretable circuits---without any additional training or manual annotation. Our findings lie in two folds: a) The model's ``speaking-a-language'' circuit decomposes into a language regime (detecting and maintaining language identity) and a semantic regime (retrieving language-agnostic semantics). b) The mechanism of unintended code-switching is a regime shift. Semantic regime suppresses the language regime, and overwhelms the multilingual circuit, leading the model to speak in unintended language. To validate these findings, we further fine-tune the identified language sub-circuit, reducing the code-switching rate by $20.8\%$ with minimal parameter updates ($\sim0.019\%$ of all neurons). This work serves as a preliminary exploration of multilingual generation mechanism, offering actionable insight for targeted training for multilingual LLMs.
Successful Page Load