Poster
in
Workshop: ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models

Training-Free Semantic Deferrals for Open-Ended LLM Cascades

Duncan Soiffer · Steven Kolawole · Virginia Smith

Project Page [ OpenReview]

Abstract

Existing cascade systems struggle with open-ended text generation due to evaluation challenges where multiple valid outputs exist without ground truth references. We propose using semantic agreement between multiple model outputs as a training-free deferral signal and evaluate semantic similarity metrics against token-level confidence across translation, summarization, question answering, and reading comprehension tasks. We show that semantic signals provide a stronger indication of when deferral is appropriate than token-level methods and are resilient to heterogeneous model quality.

Chat is not available.