References & Citations
Computer Science > Computation and Language
Title: Automatic Rule Induction for Efficient Semi-Supervised Learning
(Submitted on 18 May 2022 (this version), latest version 14 Oct 2022 (v5))
Abstract: Semi-supervised learning has shown promise in allowing NLP models to generalize from small amounts of labeled data. Meanwhile, pretrained transformer models act as black-box correlation engines that are difficult to explain and sometimes behave unreliably. In this paper, we propose tackling both of these challenges via Automatic Rule Induction (ARI), a simple and general-purpose framework for the automatic discovery and integration of symbolic rules into pretrained transformer models. First, we extract weak symbolic rules from low-capacity machine learning models trained on small amounts of labeled data. Next, we use an attention mechanism to integrate these rules into high-capacity pretrained transformer models. Last, the rule-augmented system becomes part of a self-training framework to boost supervision signal on unlabeled data. These steps can be layered beneath a variety of existing weak supervision and semi-supervised NLP algorithms in order to improve performance and interpretability. Experiments across nine sequence classification and relation extraction tasks suggest that ARI can improve state-of-the-art methods with no manual effort and minimal computational overhead.
Submission history
From: Reid Pryzant [view email][v1] Wed, 18 May 2022 16:50:20 GMT (445kb,D)
[v2] Thu, 19 May 2022 16:18:40 GMT (445kb,D)
[v3] Fri, 20 May 2022 16:42:21 GMT (446kb,D)
[v4] Tue, 11 Oct 2022 20:32:49 GMT (645kb,D)
[v5] Fri, 14 Oct 2022 17:10:39 GMT (645kb,D)
Link back to: arXiv, form interface, contact.