We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DB

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Databases

Title: Statistical Validity and Consistency of Big Data Analytics: A General Framework

Abstract: Informatics and technological advancements have triggered generation of huge volume of data with varied complexity in its management and analysis. Big Data analytics is the practice of revealing hidden aspects of such data and making inferences from it. Although storage, retrieval and management of Big Data seem possible through efficient algorithm and system development, concern about statistical consistency remains to be addressed in view of its specific characteristics. Since Big Data does not conform to standard analytics, we need proper modification of the existing statistical theory and tools. Here we propose, with illustrations, a general statistical framework and an algorithmic principle for Big Data analytics that ensure statistical accuracy of the conclusions. The proposed framework has the potential to push forward advancement of Big Data analytics in the right direction. The partition-repetition approach proposed here is broad enough to encompass all practical data analytic problems.
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Methodology (stat.ME)
Cite as: arXiv:1803.10901 [cs.DB]
  (or arXiv:1803.10901v1 [cs.DB] for this version)

Submission history

From: Bikram Karmakar [view email]
[v1] Thu, 29 Mar 2018 02:15:03 GMT (16kb)

Link back to: arXiv, form interface, contact.