Introduction to Rare-Event Predictive Modeling for Inferential Statisticians -- A Hands-On Application in the Prediction of Breakthrough Patents

Hain, Daniel; Jurowetzki, Roman

Full-text links:

Download:

PDF only

Current browse context:

cs.LG

< prev | next >

new | recent | 2003

Computer Science > Machine Learning

Title: Introduction to Rare-Event Predictive Modeling for Inferential Statisticians -- A Hands-On Application in the Prediction of Breakthrough Patents

Authors: Daniel Hain, Roman Jurowetzki

(Submitted on 30 Mar 2020 (v1), last revised 1 Mar 2021 (this version, v2))

Abstract: Recent years have seen a substantial development of quantitative methods, mostly led by the computer science community with the goal of developing better machine learning applications, mainly focused on predictive modeling. However, economic, management, and technology forecasting research has so far been hesitant to apply predictive modeling techniques and workflows. In this paper, we introduce a machine learning (ML) approach to quantitative analysis geared towards optimizing the predictive performance, contrasting it with standard practices inferential statistics, which focus on producing good parameter estimates. We discuss the potential synergies between the two fields against the backdrop of this, at first glance, target-incompatibility. We discuss fundamental concepts in predictive modeling, such as out-of-sample model validation, variable and model selection, generalization, and hyperparameter tuning procedures. We are providing a hands-on predictive modeling introduction for a quantitative social science audience while aiming at demystifying computer science jargon. We use the illustrative example of patent quality estimation - which should be a familiar topic of interest in the Scientometrics community - guiding the reader through various model classes and procedures for data pre-processing, modeling, and validation. We start off with more familiar easy to interpret model classes (Logit and Elastic Nets), continues with less familiar non-parametric approaches (Classification Trees, Random Forest, Gradient Boosted Trees), and finally presents artificial neural network architectures, first a simple feed-forward and then a deep autoencoder geared towards rare-event prediction.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2003.13441 [cs.LG]
	(or arXiv:2003.13441v2 [cs.LG] for this version)

Submission history

From: Daniel Hain PhD. [view email]
[v1] Mon, 30 Mar 2020 13:06:25 GMT (2572kb)
[v2] Mon, 1 Mar 2021 14:36:32 GMT (3444kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2003.13441

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Introduction to Rare-Event Predictive Modeling for Inferential Statisticians -- A Hands-On Application in the Prediction of Breakthrough Patents

Submission history