TAMPO: Task- and Model-Aware Automatic Prompt Optimization for Robust and Controllable Auto-Routing in LLM-based Systems
Abstract
Automatic Prompt Optimization (APO) enables Large Language Models (LLMs) to adapt to specific tasks while minimizing manual engineering costs. However, since existing APO approaches either rely solely on multi-round iterative procedures or use model-specific generators tailored to optimizing prompts for a single model and objective, they are not readily applicable to auto-routing scenarios, which require operating over diverse LLMs and juggling multiple, often competing, trade-offs. To address this issue, we propose TAMPO, a novel task- and model-aware APO framework for auto-routing in LLM-based systems. Specifically, to reflect performance variation across a broad range of tasks and models, we construct a comprehensive heterogeneity-aware dataset for training an uncertainty-aware reward model. Serving as an offline proxy, this reward model can greatly mitigate reward hacking, allowing TAMPO to learn an optimal multi-objective conditional policy for robust prompt generation. Based on the user requirements encoded in our defined preference vector, this policy enables flexible control over prompt generation, supporting a cost-effective deployment strategy. Extensive experiments across 86 tasks demonstrate that TAMPO effectively maintains performance stability across diverse tasks and models, providing a robust, controllable solution for auto-routing in various LLM-based systems.