References & Citations
Computer Science > Machine Learning
Title: Data-driven root-cause analysis for distributed system anomalies
(Submitted on 20 May 2016 (v1), last revised 31 May 2018 (this version, v2))
Abstract: Modern distributed cyber-physical systems encounter a large variety of anomalies and in many cases, they are vulnerable to catastrophic fault propagation scenarios due to strong connectivity among the sub-systems. In this regard, root-cause analysis becomes highly intractable due to complex fault propagation mechanisms in combination with diverse operating modes. This paper presents a new data-driven framework for root-cause analysis for addressing such issues. The framework is based on a spatiotemporal feature extraction scheme for distributed cyber-physical systems built on the concept of symbolic dynamics for discovering and representing causal interactions among subsystems of a complex system. We present two approaches for root-cause analysis, namely the sequential state switching ($S^3$, based on free energy concept of a Restricted Boltzmann Machine, RBM) and artificial anomaly association ($A^3$, a multi-class classification framework using deep neural networks, DNN). Synthetic data from cases with failed pattern(s) and anomalous node are simulated to validate the proposed approaches, then compared with the performance of vector autoregressive (VAR) model-based root-cause analysis. Real dataset based on Tennessee Eastman process (TEP) is also used for validation. The results show that: (1) $S^3$ and $A^3$ approaches can obtain high accuracy in root-cause analysis and successfully handle multiple nominal operation modes, and (2) the proposed tool-chain is shown to be scalable while maintaining high accuracy.
Submission history
From: Chao Liu [view email][v1] Fri, 20 May 2016 16:17:59 GMT (336kb)
[v2] Thu, 31 May 2018 02:10:51 GMT (1060kb)
Link back to: arXiv, form interface, contact.