Poster
in
Workshop: DMLR Workshop: Data-centric Machine Learning Research
Predicting Article Time Periods with Text2Time: A Transformer-based Approach
KARTHICK GUNASEKARAN
The prediction of the publication period of textualdocuments, such as news articles, represents a significant andrelatively understudied problem within the realm of naturallanguage processing. Determining the year in which a news articlewas published holds relevance in various domains, includinghistorical research, sentiment analysis, and media monitoring.In this research, our focus is on investigating the prediction ofpublication periods specifically for news articles, leveraging theirtextual content. To tackle this challenge, we curated an extensivelabeled dataset consisting of over 350,000 news articles publishedby The New York Times over a span of six decades. This datasetforms the foundation of our investigation. Our approach involvesutilizing a pretrained BERT model that has been fine-tunedfor the task of text classification, specifically tailored for timeperiod prediction. The performance of our model surpasses ourinitial expectations, demonstrating impressive results in accurately classifying news articles into their respective publicationdecades. Through rigorous evaluation, our model outperforms thebaseline model for this relatively unexplored task of predictingtime periods based on textual content. This research sheds lighton the potential for effectively predicting the publication periodsof news articles and presents promising outcomes achieved byleveraging a pretrained BERT model fine-tuned for time periodclassification. The results obtained contribute to the advancementof this underexplored task, demonstrating the viability andaccuracy of time prediction from textual data.