References & Citations
Computer Science > Computation and Language
Title: SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization
(Submitted on 27 Nov 2019 (v1), last revised 29 Nov 2019 (this version, v2))
Abstract: This paper introduces the SAMSum Corpus, a new dataset with abstractive dialogue summaries. We investigate the challenges it poses for automated summarization by testing several models and comparing their results with those obtained on a corpus of news articles. We show that model-generated summaries of dialogues achieve higher ROUGE scores than the model-generated summaries of news -- in contrast with human evaluators' judgement. This suggests that a challenging task of abstractive dialogue summarization requires dedicated models and non-standard quality measures. To our knowledge, our study is the first attempt to introduce a high-quality chat-dialogues corpus, manually annotated with abstractive summarizations, which can be used by the research community for further studies.
Submission history
From: Iwona Mochol [view email][v1] Wed, 27 Nov 2019 15:54:55 GMT (2802kb,A)
[v2] Fri, 29 Nov 2019 09:55:22 GMT (2923kb,A)
Link back to: arXiv, form interface, contact.