Surely You’re Lying, Mr. Model: Improving and Analyzing CCS
Naomi Bashkansky · Chloe Loughridge · Chuyue Tang
Abstract
Contrast Consistent Search (Burns et al., 2022) is a method for eliciting latent knowledge without supervision. In this paper, we explore a few directions for improving CCS. We use conjunctive logic to make CCS fully unsupervised. We investigate which factors contribute to CCS’s poor performance on autoregressive models. Replicating (Belrose & Mallen, 2023), we improve CCS’s performance on autoregressive models and study the effect of multi-shot context. And we better characterize where CCS techniques add value by adding early exit baselines to the original CCS experiments, replicating (Halawi et al., 2023).
Video
Chat is not available.
Successful Page Load