References & Citations
Computer Science > Machine Learning
Title: Shell Language Processing: Unix command parsing for Machine Learning
(Submitted on 6 Jul 2021 (v1), last revised 7 Jul 2022 (this version, v3))
Abstract: In this article, we present a Shell Language Preprocessing (SLP) library, which implements tokenization and encoding directed at parsing Unix and Linux shell commands. We describe the rationale behind the need for a new approach with specific examples of when conventional Natural Language Processing (NLP) pipelines fail. Furthermore, we evaluate our methodology on a security classification task against widely accepted information and communications technology (ICT) tokenization techniques and achieve significant improvement of an F1 score from 0.392 to 0.874.
Submission history
From: Dmitrijs Trizna [view email][v1] Tue, 6 Jul 2021 07:34:16 GMT (52kb,D)
[v2] Mon, 11 Apr 2022 06:39:41 GMT (43kb)
[v3] Thu, 7 Jul 2022 13:31:35 GMT (7kb)
Link back to: arXiv, form interface, contact.