SLIM: Secure and Efficient Inference for Large Language Models on Untrusted Devices via TEEs
Wei Wang ⋅ Zihao Guan ⋅ Xing Zhou ⋅ Yan Ding ⋅ Yusong Tan ⋅ Jie Yu ⋅ Bao Li
Abstract
Deploying large language models (LLMs) on untrusted hardware entails a risk of weight extraction, which can lead to unauthorized replication and misuse of the model. A practical approach is to leverage Trusted Execution Environments (TEEs) and protect model security by obfuscating model weights. However, existing obfuscation schemes struggle to simultaneously provide strong security guarantees and high performance: schemes with security guarantees incur substantial overhead due to frequent TEE interactions, whereas schemes that achieve efficient inference are insecure. We propose SLIM, a secure inference framework that exploits the iterative structure of LLMs to let transformed representations cascade through consecutive obfuscated layers, thereby minimizing interactions with the TEE. SLIM introduces a T-Way Mixing algorithm that performs consecutive inter-vector covering using carefully constructed block-diagonal Householder matrices and combines it with successive random permutations, providing thorough weight obfuscation while keeping TEE-side computation lightweight. Evaluations demonstrate that SLIM provides robust security guarantees and significantly outperforms prior state-of-the-art obfuscation schemes in terms of performance, delivering up to a $13.80\times$ speedup while preserving fidelity.
Successful Page Load