We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Weakly Supervised Text Classification using Supervision Signals from a Language Model

Abstract: Solving text classification in a weakly supervised manner is important for real-world applications where human annotations are scarce. In this paper, we propose to query a masked language model with cloze style prompts to obtain supervision signals. We design a prompt which combines the document itself and "this article is talking about [MASK]." A masked language model can generate words for the [MASK] token. The generated words which summarize the content of a document can be utilized as supervision signals. We propose a latent variable model to learn a word distribution learner which associates generated words to pre-defined categories and a document classifier simultaneously without using any annotated data. Evaluation on three datasets, AGNews, 20Newsgroups, and UCINews, shows that our method can outperform baselines by 2%, 4%, and 3%.
Comments: 11 pages, 1 figures
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2205.06604 [cs.CL]
  (or arXiv:2205.06604v1 [cs.CL] for this version)

Submission history

From: Ziqian Zeng [view email]
[v1] Fri, 13 May 2022 12:57:15 GMT (959kb,D)

Link back to: arXiv, form interface, contact.