Skip to yearly menu bar Skip to main content


Poster

Isolated Causal Effects of Natural Language

Victoria Lin · Louis-Philippe Morency · Eli Ben-Michael

East Exhibition Hall A-B #E-1906
[ ] [ ]
Tue 15 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

As language technologies become widespread, it is important to understand how changes in language affect reader perceptions and behaviors. These relationships may be formalized as the isolated causal effect of some focal language-encoded intervention (e.g., factual inaccuracies) on an external outcome (e.g., readers' beliefs). In this paper, we introduce a formal estimation framework for isolated causal effects of language. We show that a core challenge of estimating isolated effects is the need to approximate all non-focal language outside of the intervention. Drawing on the principle of omitted variable bias, we provide measures for evaluating the quality of both non-focal language approximations and isolated effect estimates themselves. We find that poor approximation of non-focal language can lead to bias in the corresponding isolated effect estimates due to omission of relevant variables, and we show how to assess the sensitivity of effect estimates to such bias along the two key axes of fidelity and overlap. In experiments on semi-synthetic and real-world data, we validate the ability of our framework to correctly recover isolated effects and demonstrate the utility of our proposed measures.

Lay Summary:

Science often seeks to understand whether one variable (an intervention) causes another (an outcome), like whether a new medication actually improves health or whether a certain type of conversation affects mood. This process is called causal inference, which helps separate real cause-and-effect relationships from coincidences.Scientists have developed many methods to study cause and effect in structured data settings like numbers in a table. However, the growing availability of language data—such as experiences that people share online or in text conversations—opens new opportunities for discovery. For example, if thousands of people mention feeling better after taking a specific medicine, we might wonder whether the medicine is truly responsible or whether other factors are at play.A major challenge with language is that words naturally convey multiple ideas. Someone discussing blood pressure medication may also mention changes to diet and exercise, making it hard to pinpoint which of the three is actually causing the improvement. We address this issue with a new causal inference method that is able to fully isolate the effect of an intervention expressed in language, ensuring conclusions are scientifically sound.

Live content is unavailable. Log in and register to view live content