References & Citations
Computer Science > Computation and Language
Title: News Category Dataset
(Submitted on 23 Sep 2022 (v1), last revised 6 Oct 2022 (this version, v3))
Abstract: People rely on news to know what is happening around the world and inform their daily lives. In today's world, when the proliferation of fake news is rampant, having a large-scale and high-quality source of authentic news articles with the published category information is valuable to learning authentic news' Natural Language syntax and semantics. As part of this work, we present a News Category Dataset that contains around 210k news headlines from the year 2012 to 2022 obtained from HuffPost, along with useful metadata to enable various NLP tasks. In this paper, we also produce some novel insights from the dataset and describe various existing and potential applications of our dataset.
Submission history
From: Rishabh Misra [view email][v1] Fri, 23 Sep 2022 06:13:16 GMT (9038kb,D)
[v2] Mon, 3 Oct 2022 21:28:21 GMT (9038kb,D)
[v3] Thu, 6 Oct 2022 20:43:53 GMT (9038kb,D)
Link back to: arXiv, form interface, contact.