Poster
in
Workshop: 2nd Workshop on Formal Verification of Machine Learning
Meaning in Language Models: A Formal Semantics Approach
Charles Jin · Martin Rinard
We present a framework for studying the emergence of meaning in language models based on the formal semantics of programs. Working with programs enables us to precisely define concepts relevant to meaning in language (e.g., correctness and semantics), making this domain well-suited as an intermediate testbed for characterizing the presence (or absence) of meaning in language models. Specifically, we first train a Transformer model on the corpus of programs, then probe the trained model's hidden states as it completes a program given a specification. Our findings include evidence that (1) the model states linear encode an abstraction of the program semantics, (2) such encodings emerge nearly in lockstep with the ability of the model to generate correct code during training, and (3) the model learns to generate correct programs that are, on average, shorter than those in the training set. In summary, this paper does not propose any new techniques for improving language models, but develops an experimental framework for and provides insights into the acquisition and representation of (formal) meaning in language models.