Poster
 in 
Workshop: ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models
                        
                    
                    Training-Free Semantic Deferrals for Open-Ended LLM Cascades
Duncan Soiffer · Steven Kolawole · Virginia Smith
                        Abstract:
                        
                            
                    
                Existing cascade systems struggle with open-ended text generation due to evaluation challenges where multiple valid outputs exist without ground truth references. We propose using semantic agreement between multiple model outputs as a training-free deferral signal and evaluate semantic similarity metrics against token-level confidence across translation, summarization, question answering, and reading comprehension tasks. We show that semantic signals provide a stronger indication of when deferral is appropriate than token-level methods and are resilient to heterogeneous model quality.
Chat is not available.
            
        Successful Page Load