We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Using novel data and ensemble models to improve automated labeling of Sustainable Development Goals

Abstract: A number of labeling systems based on text have been proposed to help monitor work on the United Nations (UN) Sustainable Development Goals (SDGs). Here, we present a systematic comparison of systems using a variety of text sources and show that systems differ considerably in their specificity (i.e., true-positive rate) and sensitivity (i.e., true-negative rate), have systematic biases (e.g., are more sensitive to specific SDGs relative to others), and are susceptible to the type and amount of text analyzed. We then show that an ensemble model that pools labeling systems alleviates some of these limitations, exceeding the labeling performance of all currently available systems. We conclude that researchers and policymakers should care about the choice of labeling system and that ensemble methods should be favored when drawing conclusions about the absolute and relative prevalence of work on the SDGs based on automated methods.
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2301.11353 [cs.CL]
  (or arXiv:2301.11353v2 [cs.CL] for this version)

Submission history

From: Dirk U. Wulff [view email]
[v1] Wed, 25 Jan 2023 07:44:46 GMT (7382kb,D)
[v2] Wed, 1 Feb 2023 12:44:44 GMT (7382kb,D)

Link back to: arXiv, form interface, contact.