Poster
in
Affinity Event: LatinX in AI (LXAI) Research at ICML 2021

Community pooling: LDA topic modeling in Twitter

Federico Albanese

2021 Poster
in
Affinity Event: LatinX in AI (LXAI) Research at ICML 2021

Abstract

Social networks play a fundamental role in propagation of information and news. Characterizing the content of the messages becomes vital for tasks like fake news detection or personalized message recommendation. However, Twitter posts are short and often less coherent than other text documents, which makes it challenging to apply text mining algorithms efficiently. We propose a new pooling scheme for topic modeling in Twitter, which groups tweets whose authors belong to the same community on the retweet network into a single document. Our findings contribute to an improved methodology for identifying the latent topics in a Twitter dataset, without modifying the basic machinery of a topic decomposition model. In particular, we used Latent Dirichlet Allocation (LDA) and empirically showed that this novel method achieves better results than previous pooling methods in terms of cluster quality, document retrieval tasks, supervised machine learning classification and overall run time.

Video

Chat is not available.