Computational Engineering, Finance, and Science

Title: Statistical detection of format dialects using the weighted Dowker complex
15 pages, 11 figures, 5 tables
This paper provides an experimentally validated, probabilistic model of file behavior when consumed by a set of pre-existing parsers. File behavior is measured by way of a standardized set of Boolean "messages" produced as the files are read. By thresholding the posterior probability that a file exhibiting a particular set of messages is from a particular dialect, our model yields a practical classification algorithm for two dialects. We demonstrate that this thresholding algorithm for two dialects can be bootstrapped from a training set consisting primarily of one dialect. Both the (parametric) theoretical and the (non-parametric) empirical distributions of file behaviors for one dialect yield good classification performance, and outperform classification based on simply counting messages.
Our theoretical framework relies on statistical independence of messages within each dialect. Violations of this assumption are detectable and allow a format analyst to identify "boundaries" between dialects. A format analyst can therefore greatly reduce the number of files they need to consider when crafting new criteria for dialect detection, since they need only consider the files that exhibit ambiguous message patterns.

Title: Data-Driven Innovation: What Is It
Authors: Jianxi Luo
The future of innovation processes is anticipated to be more data-driven and empowered by the ubiquitous digitalization, increasing data accessibility and rapid advances in machine learning, artificial intelligence, and computing technologies. While the data-driven innovation (DDI) paradigm is emerging, it has yet been formally defined and theorized and often confused with several other data-related phenomena. This paper defines and crystalizes "data-driven innovation" as a formal innovation process paradigm, dissects its value creation, and distinguishes it from data-driven optimization (DDO), data-based innovation (DBI), and the traditional innovation processes that purely rely on human intelligence. With real-world examples and theoretical framing, I elucidate what DDI entails and how it addresses uncertainty and enhance creativity in the innovation process and present a process-based taxonomy of different data-driven innovation approaches. On this basis, I recommend the strategies and actions for innovators, companies, R&D organizations, and governments to enact data-driven innovation.

Title: Scalable $k$-d trees for distributed data
34 pages, 3 figures; submitted for publication
Data structures known as $k$-d trees have numerous applications in scientific computing, particularly in areas of modern statistics and data science such as range search in decision trees, clustering, nearest neighbors search, local regression, and so forth. In this article we present a scalable mechanism to construct $k$-d trees for distributed data, based on approximating medians for each recursive subdivision of the data. We provide theoretical guarantees of the quality of approximation using this approach, along with a simulation study quantifying the accuracy and scalability of our proposed approach in practice.

Title: Nonlinear material identification of heterogeneous isogeometric Kirchhoff-Love shells
37 pages, 18 figures, 5 tables
Journal-ref: Computer Methods in Applied Mechanics and Engineering, Volume 390, 114442, 2022
Title: Allocation of locally generated electricity in renewable energy communities
16 pages, 8 figures, 5 tables, 3 algorithms, submitted to IEEE Access
