Timezone: »

On Robustness-Accuracy Characterization of Large Language Models using Synthetic Datasets
Ching-Yun (Irene) Ko · Pin-Yu Chen · Payel Das · Yung-Sung Chuang · Luca Daniel

Despite the impressive capability of large language models (LLMs) in solving different downstream tasks, new concerns about proper performance evaluation have been raised, especially for test-data leakage caused by accidentally including them during pretraining, or by indirectly exposing them through API calls for evaluation. Motivated by these, in this paper, we propose a new evaluation workflow that generates steerable synthetic language datasets and proxy tasks for benchmarking the performance of pertained LLMs on sentence classification tasks. This approach allows for better characterization of the joint analysis on the robustness and accuracy of LLMs without risking sensitive information leakage. Verified on various pretrained LLMs, the proposed approach demonstrates promising high correlation with real downstream performance.

Author Information

Ching-Yun (Irene) Ko (MIT)
Pin-Yu Chen (IBM Research AI)
Payel Das (IBM Research AI)
Yung-Sung Chuang (MIT CSAIL)

Hi! I'm a second-year PhD student in Electrical Engineering and Computer Science at Massachusetts Institute of Technology, where I work with Jim Glass. My research interest broadly covers the deep learning technique for natural language processing and speech processing. In particular, I aim to utilize the ability of machines to help people grasp large information in text/audio form in efficient ways. Previously, I was an undergraduate student in Electrical Engineering at National Taiwan University. I joined Speech Processing Lab supervised by Hung-Yi Lee and Lin-shan Lee, and Machine Intelligence Understanding Lab supervised by Yun-Nung (Vivian) Chen. I received the NTU Presidential Award for top 5% students four times in 2018-2020, Irving T. Ho Memorial Scholarship in 2018 and 2019. Here is my Curriculum Vitae.

Luca Daniel (Massachusetts Institute of Technology)

More from the Same Authors