Skip to yearly menu bar Skip to main content


Poster
in
Workshop: ES-FoMo: Efficient Systems for Foundation Models

Less is More: Using Multiple LLMs for Applications with Lower Costs

Lingjiao Chen · Matei Zaharia · James Zou


Abstract:

Large language models (LLMs) are increasingly used for querying purposes, but their associated costs vary significantly. This study investigates the pricing structures of popular LLM APIs, such as GPT-4, ChatGPT, and J1-Jumbo, revealing sub- stantial fee differences. To mitigate the expense of using LLMs on extensive queries and text, we propose three strategies: prompt adaptation, LLM approximation, and LLM cascade. We present FrugalGPT, an adaptable LLM cascade that in- telligently selects LLM combinations to reduce costs by up to 98% while matching or improving the accuracy of individual LLMs. This work es- tablishes a foundation for sustainable and efficient LLM utilization, offering valuable insights and practical techniques for users.

Chat is not available.