We gratefully acknowledge support from
the Simons Foundation and member institutions.

Data Analysis, Statistics and Probability

New submissions

[ total of 8 entries: 1-8 ]
[ showing up to 500 entries per page: fewer | more ]

New submissions for Tue, 25 Feb 20

[1]  arXiv:2002.09530 [pdf, other]
Title: Using machine learning to separate hadronic and electromagnetic interactions in the GlueX forward calorimeter
Comments: 12 pages, 10 figures, submitted to JINST
Subjects: Data Analysis, Statistics and Probability (physics.data-an); Instrumentation and Detectors (physics.ins-det)

The GlueX forward calorimeter is an array of 2800 lead glass modules that was constructed to detect photons produced in the decays of hadrons. A background to this process originates from hadronic interactions in the calorimeter, which, in some instances, can be difficult to distinguish from low energy photon interactions. Machine learning techniques were applied to the classification of particle interactions in the GlueX forward calorimeter. The algorithms were trained on data using decays of the $\omega$ meson, which contain both true photons and charged particles that interact with the calorimeter. Algorithms were evaluated on efficiency, rate of false positives, run time, and implementation complexity. An algorithm that utilizes a multi-layer perceptron neural net was deployed in the GlueX software stack and provides a signal efficiency of 85% with a background rejection of 60% for an inclusive $\pi^0$ data sample for an intermediate quality constraint.

[2]  arXiv:2002.09865 [pdf, other]
Title: Bryan's Maximum Entropy Method -- diagnosis of a flawed argument and its remedy
Comments: 7 pages, 1 figure
Subjects: Data Analysis, Statistics and Probability (physics.data-an); High Energy Physics - Lattice (hep-lat); High Energy Physics - Phenomenology (hep-ph); Nuclear Theory (nucl-th); Computational Physics (physics.comp-ph)

The Maximum Entropy Method (MEM) is a popular data analysis technique, based on Bayesian inference, which has found various applications in the research literature. While the MEM itself is well grounded in statistics, I argue that its state-of-the-art implementation, suggested originally by Bryan, artificially restricts its solution space. This restriction leads to a systematic error often unaccounted for in contemporary MEM studies. Since previously published arguments on the shortcoming of Bryan's MEM have recently been questioned in arXiv:2001.10205, this paper will carefully revisit Bryan's train of thought, point out its flaw in applying linear algebra arguments to an inherently non-linear problem and suggest possible ways to overcome it.

Cross-lists for Tue, 25 Feb 20

[3]  arXiv:2002.09713 (cross-list from stat.OT) [pdf, other]
Title: Connections between statistical practice in elementary particle physics and the severity concept as discussed in Mayo's Statistical Inference as Severe Testing
Comments: 25 pages including 4 figures
Subjects: Other Statistics (stat.OT); High Energy Physics - Experiment (hep-ex); Data Analysis, Statistics and Probability (physics.data-an)

For many years, philosopher-of-statistics Deborah Mayo has been advocating the concept of severe testing as a key part of hypothesis testing. Her recent book, Statistical Inference as Severe Testing, is a comprehensive exposition of her arguments in the context of a historical study of many threads of statistical inference, both frequentist and Bayesian. Her foundational point of view is called error statistics, emphasizing frequentist evaluation of the errors called Type I and Type II in the Neyman-Pearson theory of frequentist hypothesis testing. Since the field of elementary particle physics (also known as high energy physics) has strong traditions in frequentist inference, one might expect that something like the severity concept was independently developed in the field. Indeed, I find that, at least operationally (numerically), we high-energy physicists have long interpreted data in ways that map directly onto severity. Whether or not we subscribe to Mayo's philosophical interpretations of severity is a more complicated story that I do not address here.

[4]  arXiv:2002.09770 (cross-list from physics.soc-ph) [pdf, other]
Title: Allotaxonometry and rank-turbulence divergence: A universal instrument for comparing complex systems
Comments: 22 pages, 7 figures, 1 table; online appendices: this http URL
Subjects: Physics and Society (physics.soc-ph); Data Analysis, Statistics and Probability (physics.data-an)

Complex systems often comprise many kinds of components which vary over many orders of magnitude in size: Populations of cities in countries, individual and corporate wealth in economies, species abundance in ecologies, word frequency in natural language, and node degree in complex networks. Comparisons of component size distributions for two complex systems---or a system with itself at two different time points---generally employ information-theoretic instruments, such as Jensen-Shannon divergence. We argue that these methods lack transparency and adjustability, and should not be applied when component probabilities are non-sensible or are problematic to estimate. Here, we introduce `allotaxonometry' along with `rank-turbulence divergence', a tunable instrument for comparing any two (Zipfian) ranked lists of components. We analytically develop our rank-based divergence in a series of steps, and then establish a rank-based allotaxonograph which pairs a map-like histogram for rank-rank pairs with an ordered list of components according to divergence contribution. We explore the performance of rank-turbulence divergence for a series of distinct settings including: Language use on Twitter and in books, species abundance, baby name popularity, market capitalization, performance in sports, mortality causes, and job titles. We provide a series of supplementary flipbooks which demonstrate the tunability and storytelling power of rank-based allotaxonometry.

[5]  arXiv:2002.10440 (cross-list from astro-ph.IM) [pdf, other]
Title: Modeling Aerial Gamma-Ray Backgrounds using Non-negative Matrix Factorization
Comments: 14 pages, 12 figures, accepted for publication in IEEE Transactions on Nuclear Science
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Applied Physics (physics.app-ph); Data Analysis, Statistics and Probability (physics.data-an)

Airborne gamma-ray surveys are useful for many applications, ranging from geology and mining to public health and nuclear security. In all these contexts, the ability to decompose a measured spectrum into a linear combination of background source terms can provide useful insights into the data and lead to improvements over techniques that use spectral energy windows. Multiple methods for the linear decomposition of spectra exist but are subject to various drawbacks, such as allowing negative photon fluxes or requiring detailed Monte Carlo modeling. We propose using Non-negative Matrix Factorization (NMF) as a data-driven approach to spectral decomposition. Using aerial surveys that include flights over water, we demonstrate that the mathematical approach of NMF finds physically relevant structure in aerial gamma-ray background, namely that measured spectra can be expressed as the sum of nearby terrestrial emission, distant terrestrial emission, and radon and cosmic emission. These NMF background components are compared to the background components obtained using Noise-Adjusted Singular Value Decomposition (NASVD), which contain negative photon fluxes and thus do not represent emission spectra in as straightforward a way. Finally, we comment on potential areas of research that are enabled by NMF decompositions, such as new approaches to spectral anomaly detection and data fusion.

Replacements for Tue, 25 Feb 20

[6]  arXiv:1708.08794 (replaced) [pdf, other]
Title: Impact of non-stationarity on hybrid ensemble filters: A study with a doubly stochastic advection-diffusion-decay model
Comments: The accepted version of the published article
Journal-ref: Quarterly Journal of the Royal Meteorological Society, 2019, v. 145, N 722, 2255-2271
Subjects: Data Analysis, Statistics and Probability (physics.data-an); Atmospheric and Oceanic Physics (physics.ao-ph); Geophysics (physics.geo-ph)
[7]  arXiv:1907.11674 (replaced) [pdf, other]
Title: Reducing the dependence of the neural network function to systematic uncertainties in the input space
Journal-ref: Comput Softw Big Sci 4, 5 (2020)
Subjects: Data Analysis, Statistics and Probability (physics.data-an)
[8]  arXiv:1910.11571 (replaced) [pdf, other]
Title: Report from RAMP challenge on fast vertexing
Authors: Florian Reiss
Comments: 7 pages, 7 figures, Connecting The Dots and Workshop on Intelligent Trackers 2019
Subjects: Instrumentation and Detectors (physics.ins-det); High Energy Physics - Experiment (hep-ex); Data Analysis, Statistics and Probability (physics.data-an)
[ total of 8 entries: 1-8 ]
[ showing up to 500 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, physics, recent, 2002, contact, help  (Access key information)