We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Statistics > Applications

Title: Elements and Principles for Characterizing Variation between Data Analyses

Abstract: The data revolution has led to an increased interest in the practice of data analysis. For a given problem, there can be significant or subtle differences in how a data analyst constructs or creates a data analysis, including differences in the choice of methods, tooling, and workflow. In addition, data analysts can prioritize (or not) certain objective characteristics in a data analysis, leading to differences in the quality or experience of the data analysis, such as an analysis that is more or less reproducible or an analysis that is more or less exhaustive. However, data analysts currently lack a formal mechanism to compare and contrast what makes analyses different from each other. To address this problem, we introduce a vocabulary to describe and characterize variation between data analyses. We denote this vocabulary as the elements and principles of data analysis, and we use them to describe the fundamental concepts for the practice and teaching of creating a data analysis. This leads to two insights: it suggests a formal mechanism to evaluate data analyses based on objective characteristics, and it provides a framework to teach students how to build data analyses.
Comments: 14 pages, 7 figures, 1 table
Subjects: Applications (stat.AP)
Cite as: arXiv:1903.07639 [stat.AP]
  (or arXiv:1903.07639v2 [stat.AP] for this version)

Submission history

From: Stephanie Hicks [view email]
[v1] Mon, 18 Mar 2019 18:04:25 GMT (2464kb,D)
[v2] Fri, 26 Jul 2019 00:55:25 GMT (1916kb,D)

Link back to: arXiv, form interface, contact.