Processing math: 100%
Skip to yearly menu bar Skip to main content


Poster

Stealing part of a production language model

Nicholas Carlini · Daniel Paleka · Krishnamurthy Dvijotham · Thomas Steinke · Jonathan Hayase · A. Feder Cooper · Katherine Lee · Matthew Jagielski · Milad Nasr · Arthur Conmy · Eric Wallace · David Rolnick · Florian Tramer

Hall C 4-9 #2308
Best Paper Best Paper
[ ] [ Paper PDF ]

Abstract: We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under 20USD,ourattackextractstheentireprojectionmatrixofOpenAIsAdaandBabbagelanguagemodels.Wetherebyconfirm,forthefirsttime,thattheseblackboxmodelshaveahiddendimensionof1024and2048,respectively.WealsorecovertheexacthiddendimensionsizeoftheGPT3.5turbomodel,andestimateitwouldcostunder2,000 in queries to recover the entire projection matrix. We conclude with potential defenses and mitigations, and discuss the implications of possible future work that could extend our attack.

Chat is not available.