We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Data-to-Value: An Evaluation-First Methodology for Natural Language Projects

Abstract: Big data, i.e. collecting, storing and processing of data at scale, has recently been possible due to the arrival of clusters of commodity computers powered by application-level distributed parallel operating systems like HDFS/Hadoop/Spark, and such infrastructures have revolutionized data mining at scale. For data mining project to succeed more consistently, some methodologies were developed (e.g. CRISP-DM, SEMMA, KDD), but these do not account for (1) very large scales of processing, (2) dealing with textual (unstructured) data (i.e. Natural Language Processing (NLP, "text analytics"), and (3) non-technical considerations (e.g. legal, ethical, project managerial aspects).
To address these shortcomings, a new methodology, called "Data to Value" (D2V), is introduced, which is guided by a detailed catalog of questions in order to avoid a disconnect of big data text analytics project team with the topic when facing rather abstract box-and-arrow diagrams commonly associated with methodologies.
Comments: 9 pages, 6 figures, 4 tables
Subjects: Computation and Language (cs.CL); Methodology (stat.ME)
MSC classes: 91B02, 68U15, 68T50, 62H99
ACM classes: I.2.7; D.2.9; I.7.m; H.0
Cite as: arXiv:2201.07725 [cs.CL]
  (or arXiv:2201.07725v1 [cs.CL] for this version)

Submission history

From: Jochen L. Leidner [view email]
[v1] Wed, 19 Jan 2022 17:04:52 GMT (1506kb,D)

Link back to: arXiv, form interface, contact.