Workshop: ES-FoMo: Efficient Systems for Foundation Models

Less is More: Using Multiple LLMs for Applications with Lower Costs

Lingjiao Chen · Matei Zaharia · James Zou


Large language models (LLMs) are increasingly used for querying purposes, but their associated costs vary significantly. This study investigates the pricing structures of popular LLM APIs, such as GPT-4, ChatGPT, and J1-Jumbo, revealing sub- stantial fee differences. To mitigate the expense of using LLMs on extensive queries and text, we propose three strategies: prompt adaptation, LLM approximation, and LLM cascade. We present FrugalGPT, an adaptable LLM cascade that in- telligently selects LLM combinations to reduce costs by up to 98% while matching or improving the accuracy of individual LLMs. This work es- tablishes a foundation for sustainable and efficient LLM utilization, offering valuable insights and practical techniques for users.

