SMILE: Extended Deep Submodular Function-Based Instruction and In-context Learning Demonstration Selection
Abstract
Prompt optimization is a key way to steer large language models when fine-tuning is impractical. However, instruction optimization (IO) and in-context learning (ICL) demonstration selection are often optimized separately and combined post hoc, implicitly assuming that a "best'' instruction and a "best" demonstration set compose well. In practice, their interactions are strong, making such decoupled pipelines brittle. We propose SMILE, an efficient method that jointly selects instructions and demonstrations. Our key observation is that the ICL performance exhibits consistent diminishing returns across diverse instructions. Leveraging this structure, SMILE learns an instruction-conditioned surrogate aligned with LLM feedback and instantiates it as an Extended Deep Submodular Function that captures sample--sample coverage, sample--query relevance, and sample--instruction compatibility. SMILE then performs greedy, query-adaptive selection of the instruction--demonstration pair. Experiments on six datasets and multiple LLM backbones show that SMILE consistently outperforms IO-only, ICL-only, and existing joint baselines, supporting a context engineering view of prompting: jointly optimizing interacting components rather than tuning them in isolation.