Domain-Adaptive Text Classification with Structured Knowledge from Unlabeled Data

Li, Tian; Chen, Xiang; Dong, Zhen; Yu, Weijiang; Yan, Yijun; Keutzer, Kurt; Zhang, Shanghang

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2206

Change to browse by:

Computer Science > Computation and Language

Title: Domain-Adaptive Text Classification with Structured Knowledge from Unlabeled Data

Authors: Tian Li, Xiang Chen, Zhen Dong, Weijiang Yu, Yijun Yan, Kurt Keutzer, Shanghang Zhang

(Submitted on 20 Jun 2022)

Abstract: Domain adaptive text classification is a challenging problem for the large-scale pretrained language models because they often require expensive additional labeled data to adapt to new domains. Existing works usually fails to leverage the implicit relationships among words across domains. In this paper, we propose a novel method, called Domain Adaptation with Structured Knowledge (DASK), to enhance domain adaptation by exploiting word-level semantic relationships. DASK first builds a knowledge graph to capture the relationship between pivot terms (domain-independent words) and non-pivot terms in the target domain. Then during training, DASK injects pivot-related knowledge graph information into source domain texts. For the downstream task, these knowledge-injected texts are fed into a BERT variant capable of processing knowledge-injected textual data. Thanks to the knowledge injection, our model learns domain-invariant features for non-pivots according to their relationships with pivots. DASK ensures the pivots to have domain-invariant behaviors by dynamically inferring via the polarity scores of candidate pivots during training with pseudo-labels. We validate DASK on a wide range of cross-domain sentiment classification tasks and observe up to 2.9% absolute performance improvement over baselines for 20 different domain pairs. Code will be made available at this https URL

Subjects:	Computation and Language (cs.CL)
Journal reference:	IJCAI-ECAI 2022
Cite as:	arXiv:2206.09591 [cs.CL]
	(or arXiv:2206.09591v1 [cs.CL] for this version)

Submission history

From: Tian Li [view email]
[v1] Mon, 20 Jun 2022 06:38:51 GMT (746kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.09591

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Domain-Adaptive Text Classification with Structured Knowledge from Unlabeled Data

Submission history