Multilevel orthogonal Bochner function subspaces with applications to robust machine learning

Castrillon-Candas, Julio Enrique; Liu, Dingning; Yang, Sicheng; Kon, Mark

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 2110

Statistics > Machine Learning

Title: Multilevel orthogonal Bochner function subspaces with applications to robust machine learning

Authors: Julio Enrique Castrillon-Candas, Dingning Liu, Sicheng Yang, Mark Kon

(Submitted on 4 Oct 2021 (v1), last revised 25 Aug 2023 (this version, v4))

Abstract: In our approach, we consider the data as instances of a random field within a relevant Bochner space. Our key observation is that the classes can predominantly reside in two distinct subspaces. To uncover the separation between these classes, we employ the Karhunen-Loeve expansion and construct the appropriate subspaces. This allows us to effectively reveal the distinction between the classes. The novel features forming the above bases are constructed by applying a coordinate transformation based on the recent Functional Data Analysis theory for anomaly detection. The associated signal decomposition is an exact hierarchical tensor product expansion with known optimality properties for approximating stochastic processes (random fields) with finite dimensional function spaces. Using a hierarchical finite dimensional expansion of the nominal class, a series of orthogonal nested subspaces is constructed for detecting anomalous signal components. Projection coefficients of input data in these subspaces are then used to train a Machine Learning (ML classifier. However, due to the split of the signal into nominal and anomalous projection components, clearer separation surfaces for the classes arise. In fact we show that with a sufficiently accurate estimation of the covariance structure of the nominal class, a sharp classification can be obtained. This is particularly advantageous for large unbalanced datasets. We demonstrate it on a number of high-dimensional datasets. This approach yields significant increases in accuracy of ML methods compared to using the same ML algorithm with the original feature data. Our tests on the Alzheimer's Disease ADNI dataset shows a dramatic increase in accuracy (from 48% to 89% accuracy). Furthermore, tests using unbalanced semi-synthetic datasets created from the benchmark GCM dataset confirm increased accuracy as the dataset becomes more unbalanced.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
MSC classes:	62R10, 60G35, 62-08, 60G60, 65F25, 46B09
Cite as:	arXiv:2110.01729 [stat.ML]
	(or arXiv:2110.01729v4 [stat.ML] for this version)

Submission history

From: Julio Castrillon PhD [view email]
[v1] Mon, 4 Oct 2021 22:01:01 GMT (555kb,D)
[v2] Wed, 5 Oct 2022 17:18:40 GMT (961kb,D)
[v3] Tue, 13 Jun 2023 15:22:04 GMT (1860kb,D)
[v4] Fri, 25 Aug 2023 18:23:26 GMT (823kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:2110.01729

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Multilevel orthogonal Bochner function subspaces with applications to robust machine learning

Submission history