Gaming Consensus: Coordinated Manipulation in Crowdsourced Fact-Checking
Abstract
Crowdsourced fact-checking systems have been widely adopted by major social media companies such as X, Meta, Tiktok, and Google with the aim of combating misinformation at scale without relying on centralized editorial control. These systems have been developed around a common underlying algorithm: a bridging mechanism—based on matrix factorization—that surfaces notes indicating misinformation only when they receive support from diverse ideological groups rather than simple majority support. Although this algorithm is designed to be robust against traditional brigading, we demonstrate an attack showing that coordinated users can strategically fabricate diverse agreement in the system’s latent space to manipulate the scoring algorithm. We validate this attack on real-world production data and find that a surprisingly large number of notes’ scores can potentially be manipulated with a small number (< 10) of coordinated votes, raising the risk that adversaries could surface arbitrary notes on these social media platforms. We complement these findings with a theoretical analysis of voting strategies that surface arbitrary notes, revealing counterintuitive properties of the system: for instance, rating a note as “Not Helpful” can increase its helpfulness score. Finally, we develop a cost model quantifying manipulation effort and discuss potential mitigations. Following a responsible disclosure process, X's Community Notes team acknowledged this attack and has deployed mitigations based on our findings. We hope this work spurs further research into the robustness of crowdsourced fact-checking systems and, more broadly, bridging-based consensus mechanisms.