Skip to yearly menu bar Skip to main content

Workshop: ES-FoMo: Efficient Systems for Foundation Models

On Robustness-Accuracy Characterization of Large Language Models using Synthetic Datasets

Ching-Yun (Irene) Ko · Pin-Yu Chen · Payel Das · Yung-Sung Chuang · Luca Daniel


Despite the impressive capability of large language models (LLMs) in solving different downstream tasks, new concerns about proper performance evaluation have been raised, especially for test-data leakage caused by accidentally including them during pretraining, or by indirectly exposing them through API calls for evaluation. Motivated by these, in this paper, we propose a new evaluation workflow that generates steerable synthetic language datasets and proxy tasks for benchmarking the performance of pertained LLMs on sentence classification tasks. This approach allows for better characterization of the joint analysis on the robustness and accuracy of LLMs without risking sensitive information leakage. Verified on various pretrained LLMs, the proposed approach demonstrates promising high correlation with real downstream performance.

Chat is not available.