We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Forecasting COVID-19 Caseloads Using Unsupervised Embedding Clusters of Social Media Posts

Abstract: We present a novel approach incorporating transformer-based language models into infectious disease modelling. Text-derived features are quantified by tracking high-density clusters of sentence-level representations of Reddit posts within specific US states' COVID-19 subreddits. We benchmark these clustered embedding features against features extracted from other high-quality datasets. In a threshold-classification task, we show that they outperform all other feature types at predicting upward trend signals, a significant result for infectious disease modelling in areas where epidemiological data is unreliable. Subsequently, in a time-series forecasting task we fully utilise the predictive power of the caseload and compare the relative strengths of using different supplementary datasets as covariate feature sets in a transformer-based time-series model.
Comments: NAACL 2022
Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)
Cite as: arXiv:2205.10408 [cs.CL]
  (or arXiv:2205.10408v1 [cs.CL] for this version)

Submission history

From: Felix Drinkall [view email]
[v1] Fri, 20 May 2022 18:59:04 GMT (733kb,D)

Link back to: arXiv, form interface, contact.