We gratefully acknowledge support from
the Simons Foundation and member institutions.

Data Analysis, Statistics and Probability

New submissions

[ total of 4 entries: 1-4 ]
[ showing up to 500 entries per page: fewer | more ]

New submissions for Thu, 18 Apr 24

[1]  arXiv:2404.10903 [pdf, ps, other]
Title: Superior Polymeric Gas Separation Membrane Designed by Explainable Graph Machine Learning
Subjects: Materials Science (cond-mat.mtrl-sci); Chemical Physics (physics.chem-ph); Data Analysis, Statistics and Probability (physics.data-an)

Gas separation using polymer membranes promises to dramatically drive down the energy, carbon, and water intensity of traditional thermally driven separation, but developing the membrane materials is challenging. Here, we demonstrate a novel graph machine learning (ML) strategy to guide the experimental discovery of synthesizable polymer membranes with performances simultaneously exceeding the empirical upper bounds in multiple industrially important gas separation tasks. Two predicted candidates are synthesized and experimentally validated to perform beyond the upper bounds for multiple gas pairs (O2/N2, H2/CH4, and H2/N2). Notably, the O2/N2 separation selectivity is 1.6-6.7 times higher than existing polymer membranes. The molecular origin of the high performance is revealed by combining the inherent interpretability of our ML model, experimental characterization, and molecule-level simulation. Our study presents a unique explainable ML-experiment combination to tackle challenging energy material design problems in general, and the discovered polymers are beneficial for industrial gas separation.

[2]  arXiv:2404.11348 [pdf, ps, other]
Title: Farthest Point Sampling in Property Designated Chemical Feature Space as a General Strategy for Enhancing the Machine Learning Model Performance for Small Scale Chemical Dataset
Authors: Yuze Liu, Xi Yu
Comments: 9 pages, 5 figures
Subjects: Chemical Physics (physics.chem-ph); Data Analysis, Statistics and Probability (physics.data-an)

Machine learning model development in chemistry and materials science often grapples with the challenge of small scale, unbalanced labelled datasets, a common limitation in scientific experiments. These dataset imbalances can precipitate overfit ting and diminish model generalization. Our study explores the efficacy of the farthest point sampling (FPS) strategy within target ed chemical feature spaces, demonstrating its capacity to generate well-distributed training datasets and consequently enhance model performance. We rigorously evaluated this strategy across various machine learning models, including artificial neural net works (ANN), support vector machines (SVM), and random forests (RF), using datasets encapsulating physicochemical properties like standard boiling points and enthalpy of vaporization. Our findings reveal that FPS-based models consistently surpass those trained via random sampling, exhibiting superior predictive accuracy and robustness, alongside a marked reduction in overfitting. This improvement is particularly pronounced in smaller training datasets, attributable to increased diversity within the training data's chemical feature space. Consequently, FPS emerges as a universally effective and adaptable approach in approaching high performance machine learning models by small and biased experimental datasets prevalent in chemistry and materials science.

Replacements for Thu, 18 Apr 24

[3]  arXiv:2310.16116 (replaced) [pdf, other]
Title: Precise Cosmological Constraints from BOSS Galaxy Clustering with a Simulation-Based Emulator of the Wavelet Scattering Transform
Comments: 27 pages, 17 figures, 4 tables. Updated to match version accepted in PRD
Subjects: Cosmology and Nongalactic Astrophysics (astro-ph.CO); Astrophysics of Galaxies (astro-ph.GA); Instrumentation and Methods for Astrophysics (astro-ph.IM); High Energy Physics - Phenomenology (hep-ph); Data Analysis, Statistics and Probability (physics.data-an)
[4]  arXiv:2401.13495 (replaced) [pdf, other]
Title: Detecting local perturbations of networks in a latent hyperbolic space
Subjects: Quantitative Methods (q-bio.QM); Data Analysis, Statistics and Probability (physics.data-an)
[ total of 4 entries: 1-4 ]
[ showing up to 500 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, physics, recent, 2404, contact, help  (Access key information)