Benchmarking Large Language Models for News Summarization

Zhang, Tianyi; Ladhak, Faisal; Durmus, Esin; Liang, Percy; McKeown, Kathleen; Hashimoto, Tatsunori B.

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2301

Computer Science > Computation and Language

Title: Benchmarking Large Language Models for News Summarization

Authors: Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, Tatsunori B. Hashimoto

(Submitted on 31 Jan 2023)

Abstract: Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood. By conducting a human evaluation on ten LLMs across different pretraining methods, prompts, and model scales, we make two important observations. First, we find instruction tuning, and not model size, is the key to the LLM's zero-shot summarization capability. Second, existing studies have been limited by low-quality references, leading to underestimates of human performance and lower few-shot and finetuning performance. To better evaluate LLMs, we perform human evaluation over high-quality summaries we collect from freelance writers. Despite major stylistic differences such as the amount of paraphrasing, we find that LMM summaries are judged to be on par with human written summaries.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2301.13848 [cs.CL]
	(or arXiv:2301.13848v1 [cs.CL] for this version)

Submission history

From: Tianyi Zhang [view email]
[v1] Tue, 31 Jan 2023 18:46:19 GMT (715kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2301.13848

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Benchmarking Large Language Models for News Summarization

Submission history