We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Keyword Extraction from Short Texts with a Text-To-Text Transfer Transformer

Abstract: The paper explores the relevance of the Text-To-Text Transfer Transformer language model (T5) for Polish (plT5) to the task of intrinsic and extrinsic keyword extraction from short text passages. The evaluation is carried out on the new Polish Open Science Metadata Corpus (POSMAC), which is released with this paper: a collection of 216,214 abstracts of scientific publications compiled in the CURLICAT project. We compare the results obtained by four different methods, i.e. plT5kw, extremeText, TermoPL, KeyBERT and conclude that the plT5kw model yields particularly promising results for both frequent and sparsely represented keywords. Furthermore, a plT5kw keyword generation model trained on the POSMAC also seems to produce highly useful results in cross-domain text labelling scenarios. We discuss the performance of the model on news stories and phone-based dialog transcripts which represent text genres and domains extrinsic to the dataset of scientific abstracts. Finally, we also attempt to characterize the challenges of evaluating a text-to-text model on both intrinsic and extrinsic keyword extraction.
Comments: Accepted to ACIIDS 2022. The proceedings of ACIIDS 2022 will be published by Springer in series Lecture Notes in Artificial Intelligence (LNAI) and Communications in Computer and Information Science (CCIS)
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2209.14008 [cs.CL]
  (or arXiv:2209.14008v2 [cs.CL] for this version)

Submission history

From: Agnieszka Mikołajczyk-Bareła [view email]
[v1] Wed, 28 Sep 2022 11:31:43 GMT (385kb,D)
[v2] Mon, 17 Oct 2022 11:04:18 GMT (126kb,D)

Link back to: arXiv, form interface, contact.