Paraphrase Detection: Human vs. Machine Content

Becker, Jonas; Wahle, Jan Philip; Ruas, Terry; Gipp, Bela

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2303

Computer Science > Computation and Language

Title: Paraphrase Detection: Human vs. Machine Content

Authors: Jonas Becker, Jan Philip Wahle, Terry Ruas, Bela Gipp

(Submitted on 24 Mar 2023)

Abstract: The growing prominence of large language models, such as GPT-4 and ChatGPT, has led to increased concerns over academic integrity due to the potential for machine-generated content and paraphrasing. Although studies have explored the detection of human- and machine-paraphrased content, the comparison between these types of content remains underexplored. In this paper, we conduct a comprehensive analysis of various datasets commonly employed for paraphrase detection tasks and evaluate an array of detection methods. Our findings highlight the strengths and limitations of different detection methods in terms of performance on individual datasets, revealing a lack of suitable machine-generated datasets that can be aligned with human expectations. Our main finding is that human-authored paraphrases exceed machine-generated ones in terms of difficulty, diversity, and similarity implying that automatically generated texts are not yet on par with human-level performance. Transformers emerged as the most effective method across datasets with TF-IDF excelling on semantically diverse corpora. Additionally, we identify four datasets as the most diverse and challenging for paraphrase detection.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2303.13989 [cs.CL]
	(or arXiv:2303.13989v1 [cs.CL] for this version)

Submission history

From: Jonas Becker [view email]
[v1] Fri, 24 Mar 2023 13:25:46 GMT (6976kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2303.13989

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Paraphrase Detection: Human vs. Machine Content

Submission history