PrivGate: Steering Contextual Integrity in LLMs via Latent Space Geometry
Abstract
Securing Contextual Integrity (CI) is critical for privacy-preserving Large Language Model (LLM) agent execution. However, existing agents struggle to balance the agility of direct generation against the prohibitive latency of CI-constrained thinking. To address this, we propose PrivGate, a framework that selectively invokes explicit reasoning based on internal privacy signals. Our approach is grounded in the discovery of a Privacy Manifold, where models linearly encode privacy sensitivity within their residual streams, even during non-compliant generation. Leveraging this structure, PrivGate employs Latent Gating, a training-free mechanism that triggers explicit reasoning only when high latent risk is detected, thereby optimizing the efficiency-privacy trade-off by minimizing unnecessary compute. On the real-world PrivacyLens benchmark, PrivGate achieves an out-of-distribution AUROC of 0.97 in risk identification, confirming the universality of the discovered manifold. End-to-end evaluations show that PrivGate achieves a 70% reduction in privacy leakage with less than 5% overhead, offering a practical pathway to reconcile rigorous CI requirements with the performance demands of LLM agents.