Human and Automatic Detection of Generated Text

Ippolito, Daphne; Duckworth, Daniel; Callison-Burch, Chris; Eck, Douglas

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 1911

Change to browse by:

Computer Science > Computation and Language

Title: Human and Automatic Detection of Generated Text

Authors: Daphne Ippolito, Daniel Duckworth, Chris Callison-Burch, Douglas Eck

(Submitted on 2 Nov 2019 (this version), latest version 7 May 2020 (v2))

Abstract: With the advent of generative models with a billion parameters or more, it is now possible to automatically generate vast amounts of human-sounding text. This raises questions into just how human-like is the machine-generated text, and how long does a text excerpt need to be for both humans and automatic discriminators to be able reliably detect that it was machine-generated. In this paper, we conduct a thorough investigation of how choices such as sampling strategy and text excerpt length can impact the performance of automatic detection methods as well as human raters. We find that the sampling strategies which result in more human-like text according to human raters create distributional differences from human-written text that make detection easy for automatic discriminators.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1911.00650 [cs.CL]
	(or arXiv:1911.00650v1 [cs.CL] for this version)

Submission history

From: Daniel Duckworth [view email]
[v1] Sat, 2 Nov 2019 04:52:00 GMT (1347kb,D)
[v2] Thu, 7 May 2020 21:32:16 GMT (2130kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1911.00650v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Human and Automatic Detection of Generated Text

Submission history