We gratefully acknowledge support from
the Simons Foundation and member institutions.

Applications

New submissions

[ total of 16 entries: 1-16 ]
[ showing up to 1000 entries per page: fewer | more ]

New submissions for Wed, 8 May 24

[1]  arXiv:2405.03814 [pdf, other]
Title: Stochastic behavior of an n-node blockchain under cyber attacks from multiple hackers with random re-setting times
Subjects: Applications (stat.AP)

This paper investigates the stochastic behavior of an n-node blockchain which is continuously monitored and faces non-stop cyber attacks from multiple hackers. The blockchain will start being re-set once hacking is detected, forfeiting previous efforts of all hackers. It is assumed the re-setting process takes a random amount of time. Multiple independent hackers will keep attempting to hack into the blockchain until one of them succeeds. For arbitrary distributions of the hacking times, detecting times, and re-setting times, we derive the instantaneous functional probability, the limiting functional probability, and the mean functional time of the blockchain. Moreover, we establish that these quantities are increasing functions of the number of nodes, formalizing the intuition that the more nodes a blockchain has the more secure it is.

[2]  arXiv:2405.03874 [pdf, ps, other]
Title: Non-locality and Spillover Effects of Residential Flood Damage on Community Recovery: Insights from High-resolution Flood Claim and Mobility Data
Subjects: Applications (stat.AP)

Examining the relationship between vulnerability of the built environment and community recovery is crucial for understanding disaster resilience. Yet, this relationship is rather neglected in the existing literature due to previous limitations in the availability of empirical datasets needed for such analysis. In this study, we combine fine-resolution flood damage claims data (composed of both insured and uninsured losses) and human mobility data (composed of millions of movement trajectories) during the 2017 Hurricane Harvey in Harris County, Texas, to specify the extent to which vulnerability of the built environment (i.e., flood property damage) affects community recovery (based on the speed of human mobility recovery) locally and regionally. We examine this relationship using a spatial lag, spatial reach, and spatial decay models to measure the extent of spillover effects of residential damage on community recovery. The findings show that: first, the severity of residential damage significantly affects the speed of community recovery. A greater extent of residential damage suppresses community recovery not only locally but also in the surrounding areas. Second, the spatial spillover effect of residential damage on community recovery speed decays with distance from the highly damaged areas. Third, spatial areas display heterogeneous spatial decay coefficients, which are associated with urban structure features such as the density of points-of-interest facilities and roads. These findings provide a novel data-driven characterization of the spatial diffusion of residential flood damage effects on community recovery and move us closer to a better understanding of complex spatial processes that shape community resilience to hazards. This study also provides valuable insights for emergency managers and public officials seeking to mitigate the non-local effects of residential damage.

[3]  arXiv:2405.04269 [pdf, other]
Title: An Analysis of Sea Level Spatial Variability by Topological Indicators and $k$-means Clustering Algorithm
Subjects: Applications (stat.AP)

The time-series data of sea level rise and fall contains crucial information on the variability of sea level patterns. Traditional $k$-means clustering is commonly used for categorizing regional variability of sea level, however, its results are not robust against a number of factors. This study analyzed fourteen datasets of monthly sea level in fourteen shoreline regions of Peninsular Malaysia. We applied a hybridization of clustering technique to analyze data categorization and topological data analysis method to enhance the performance of our clustering analysis. Specifically, our approach utilized the persistent homology and $k$-means/$k$-means++ clustering. The fourteen data sets from fourteen tide gauge stations were categorized in classes based on a prior categorization that was determined by topological information, and the probability of data points that belong to certain groups that is yielded by $k$-means/$k$-means++ clustering. Our results demonstrated that our method significantly improves the performance of traditional clustering techniques.

[4]  arXiv:2405.04507 [pdf, other]
Title: New allometric models for the USA create a step-change in forest carbon estimation, modeling, and mapping
Authors: Lucas K. Johnson (1), Michael J. Mahoney (1), Grant Domke (2), Colin M. Beier (1) ((1) State University of New York College of Environmental Science and Forestry, (2) USDA Forest Service)
Comments: Manuscript: 16 pages, 7 figures; Supplements: 3 pages, 2 figures; Submitted to: Remote Sensing of Environment
Subjects: Applications (stat.AP)

The United States national forest inventory (NFI) serves as the foundation for forest aboveground biomass (AGB) and carbon accounting across the nation. These data enable design-based estimates of forest carbon stocks and stock-changes at state and regional levels, but also serve as inputs to model-based approaches for characterizing forest carbon stocks and stock-changes at finer resolutions. Although NFI tree and plot-level data are often treated as truth in these models, they are in fact estimates based on regional species-group models known collectively as the Component Ratio Method (CRM). In late 2023 the Forest Inventory and Analysis (FIA) program introduced a new National Scale Volume and Biomass Estimators (NSVB) system to replace CRM nationwide and offer more precise and accurate representations of forest AGB and carbon. Given the prevalence of model-based AGB studies relying on FIA, there is concern about the transferability of methods from CRM to NSVB models, as well as the comparability of existing CRM AGB products (e.g. maps) to new and forthcoming NSVB AGB products. To begin addressing these concerns we compared previously published CRM AGB maps to new maps produced using identical methods with NSVB AGB reference data. Our results suggest that models relying on passive satellite imagery (e.g. Landsat) provide acceptable estimates of point-in-time NSVB AGB and carbon stocks, but fail to accurately quantify growth in mature closed-canopy forests. We highlight that existing estimates, models, and maps based on FIA reference data are no longer compatible with NSVB, and recommend new methods as well as updated models and maps for accommodating this step-change. Our collective ability to adopt NSVB in our modeling and mapping workflows will help us provide the most accurate spatial forest carbon data possible in order to better inform local management and decision making.

Cross-lists for Wed, 8 May 24

[5]  arXiv:2405.03734 (cross-list from cs.HC) [pdf, other]
Title: FOKE: A Personalized and Explainable Education Framework Integrating Foundation Models, Knowledge Graphs, and Prompt Engineering
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Applications (stat.AP)

Integrating large language models (LLMs) and knowledge graphs (KGs) holds great promise for revolutionizing intelligent education, but challenges remain in achieving personalization, interactivity, and explainability. We propose FOKE, a Forest Of Knowledge and Education framework that synergizes foundation models, knowledge graphs, and prompt engineering to address these challenges. FOKE introduces three key innovations: (1) a hierarchical knowledge forest for structured domain knowledge representation; (2) a multi-dimensional user profiling mechanism for comprehensive learner modeling; and (3) an interactive prompt engineering scheme for generating precise and tailored learning guidance.
We showcase FOKE's application in programming education, homework assessment, and learning path planning, demonstrating its effectiveness and practicality. Additionally, we implement Scholar Hero, a real-world instantiation of FOKE. Our research highlights the potential of integrating foundation models, knowledge graphs, and prompt engineering to revolutionize intelligent education practices, ultimately benefiting learners worldwide. FOKE provides a principled and unified approach to harnessing cutting-edge AI technologies for personalized, interactive, and explainable educational services, paving the way for further research and development in this critical direction.

[6]  arXiv:2405.03879 (cross-list from stat.ML) [pdf, other]
Title: Scalable Amortized GPLVMs for Single Cell Transcriptomics Data
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Genomics (q-bio.GN); Applications (stat.AP)

Dimensionality reduction is crucial for analyzing large-scale single-cell RNA-seq data. Gaussian Process Latent Variable Models (GPLVMs) offer an interpretable dimensionality reduction method, but current scalable models lack effectiveness in clustering cell types. We introduce an improved model, the amortized stochastic variational Bayesian GPLVM (BGPLVM), tailored for single-cell RNA-seq with specialized encoder, kernel, and likelihood designs. This model matches the performance of the leading single-cell variational inference (scVI) approach on synthetic and real-world COVID datasets and effectively incorporates cell-cycle and batch information to reveal more interpretable latent structures as we demonstrate on an innate immunity dataset.

[7]  arXiv:2405.04352 (cross-list from econ.GN) [pdf, other]
Title: Return to Office and the Tenure Distribution
Comments: 6 figures, 3 tables, 18 pages
Subjects: General Economics (econ.GN); Applications (stat.AP)

With the official end of the COVID-19 pandemic, debates about the return to office have taken center stage among companies and employees. Despite their ubiquity, the economic implications of return to office policies are not fully understood. Using 260 million resumes matched to company data, we analyze the causal effects of such policies on employees' tenure and seniority levels at three of the largest US tech companies: Microsoft, SpaceX, and Apple. Our estimation procedure is nonparametric and captures the full heterogeneity of tenure and seniority of employees in a distributional synthetic controls framework. We estimate a reduction in counterfactual tenure that increases for employees with longer tenure. Similarly, we document a leftward shift in the seniority distribution towards positions below the senior level. These shifts appear to be driven by employees leaving to larger firms that are direct competitors. Our results suggest that return to office policies can lead to an outflow of senior employees, posing a potential threat to the productivity, innovation, and competitiveness of the wider firm.

[8]  arXiv:2405.04487 (cross-list from stat.CO) [pdf, other]
Title: UQ state-dependent framework for seismic fragility assessment of industrial components
Subjects: Computation (stat.CO); Applications (stat.AP)

In this study, we propose a novel surrogate modelling approach to efficiently and accurately approximate the response of complex dynamical systems driven by time-varying Recently, there has been increased interest in assessing the seismic fragility of industrial plants and process equipment. This is reflected in the growing number of studies, community-funded research projects and experimental campaigns on the matter.Nonetheless, the complexity of the problem and its inherent modelling, coupled with a general scarcity of available data on process equipment, has limited the development of risk assessment methods. In fact, these limitations have led to the creation of simplified and quick-to-run models. In this context, we propose an innovative framework for developing state-dependent fragility functions. This new methodology combines limited data with the power of metamodelling and statistical techniques, namely polynomial chaos expansions (PCE) and bootstrapping. Therefore, we validated the framework on a simplified and inexpensive-to-run MDoF system endowed with Bouc-Wen hysteresis.Then, we tested it on a real nonstructural industrial process component. Specifically, we applied the state-dependent fragility framework to a critical vertical tank of a multicomponent full-scale 3D steel braced frame (BF). The seismic performance of the BF endowed with process components was captured by means of shake table campaign within the European SPIF project. Finally, we derived state-dependent fragility functions based on the combination of PCE and bootstrap at a greatly reduced computational cost.

Replacements for Wed, 8 May 24

[9]  arXiv:2404.01469 (replaced) [pdf, other]
Title: A group testing based exploration of age-varying factors in chlamydia infections among Iowa residents
Subjects: Applications (stat.AP); Methodology (stat.ME)
[10]  arXiv:2012.00180 (replaced) [pdf, other]
Title: Anisotropic local constant smoothing for change-point regression function estimation
Comments: 30 pages, 12 figures. Parts of this Original Manuscript are in an article published by Taylor & Francis in the Journal of Applied Statistics on April 20th 2024, available at this https URL
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP)
[11]  arXiv:2311.02043 (replaced) [pdf, other]
Title: Bayesian Quantile Regression with Subset Selection: A Posterior Summarization Perspective
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP); Computation (stat.CO); Machine Learning (stat.ML)
[12]  arXiv:2312.05429 (replaced) [pdf, ps, other]
Title: Mitigating Nonlinear Algorithmic Bias in Binary Classification
Comments: 5 pages, 3 figures, 12 tables. arXiv admin note: text overlap with arXiv:2310.12421
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Applications (stat.AP)
[13]  arXiv:2312.08927 (replaced) [pdf, other]
Title: Limit Order Book Dynamics and Order Size Modelling Using Compound Hawkes Process
Comments: Presented at Market Microstructure 2023, Accepted at Quantitative Finance Workshop 2024. To be submitted for publication to a journal
Subjects: Trading and Market Microstructure (q-fin.TR); Computational Engineering, Finance, and Science (cs.CE); Computational Finance (q-fin.CP); Applications (stat.AP)
[14]  arXiv:2401.02048 (replaced) [pdf, other]
Title: Random Effect Restricted Mean Survival Time Model
Subjects: Methodology (stat.ME); Applications (stat.AP)
[15]  arXiv:2402.01932 (replaced) [pdf, other]
Title: A Virtual Solar Wind Monitor at Mars with Uncertainty Quantification using Gaussian Processes
Comments: submitted to JGR: Machine Learning and Computation
Subjects: Space Physics (physics.space-ph); Applications (stat.AP)
[16]  arXiv:2404.17626 (replaced) [pdf, other]
Title: Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank
Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Applications (stat.AP); Computation (stat.CO)
[ total of 16 entries: 1-16 ]
[ showing up to 1000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2405, contact, help  (Access key information)