We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Statistics > Applications

Title: Elements and Principles of Data Analysis

Abstract: The data revolution has led to an increased interest in the practice of data analysis. As a result, there has been a proliferation of "data science" training programs. Because data science has been previously defined as an intersection of already-established fields or union of emerging technologies, the following problems arise: (1) There is little agreement about what is data science; (2) Data science becomes secondary to established fields in a university setting; and (3) It is difficult to have discussions on what it means to learn about data science, to teach data science courses and to be a data scientist. To address these problems, we propose to define the field from first principles based on the activities of people who analyze data with a language and taxonomy for describing a data analysis in a manner spanning disciplines. Here, we describe the elements and principles of data analysis. This leads to two insights: it suggests a formal mechanism to evaluate data analyses based on objective characteristics, and it provides a framework to teach students how to build data analyses. We argue that the elements and principles of data analysis lay the foundational framework for a more general theory of data science.
Comments: 27 pages, 9 figures, 1 table
Subjects: Applications (stat.AP)
Cite as: arXiv:1903.07639 [stat.AP]
  (or arXiv:1903.07639v1 [stat.AP] for this version)

Submission history

From: Stephanie Hicks [view email]
[v1] Mon, 18 Mar 2019 18:04:25 GMT (2464kb,D)
[v2] Fri, 26 Jul 2019 00:55:25 GMT (1916kb,D)

Link back to: arXiv, form interface, contact.