Human Perception of Audio Deepfakes

Müller, Nicolas M.; Pizzi, Karla; Williams, Jennifer

doi:10.1145/3552466.3556531

Full-text links:

Download:

Current browse context:

cs.HC

< prev | next >

new | recent | 2107

Computer Science > Human-Computer Interaction

Title: Human Perception of Audio Deepfakes

Authors: Nicolas M. Müller, Karla Pizzi, Jennifer Williams

(Submitted on 20 Jul 2021 (v1), last revised 6 Oct 2022 (this version, v6))

Abstract: The recent emergence of deepfakes has brought manipulated and generated content to the forefront of machine learning research. Automatic detection of deepfakes has seen many new machine learning techniques, however, human detection capabilities are far less explored. In this paper, we present results from comparing the abilities of humans and machines for detecting audio deepfakes used to imitate someone's voice. For this, we use a web-based application framework formulated as a game. Participants were asked to distinguish between real and fake audio samples. In our experiment, 472 unique users competed against a state-of-the-art AI deepfake detection algorithm for 14912 total of rounds of the game. We find that humans and deepfake detection algorithms share similar strengths and weaknesses, both struggling to detect certain types of attacks. This is in contrast to the superhuman performance of AI in many application areas such as object detection or face recognition. Concerning human success factors, we find that IT professionals have no advantage over non-professionals but native speakers have an advantage over non-native speakers. Additionally, we find that older participants tend to be more susceptible than younger ones. These insights may be helpful when designing future cybersecurity training for humans as well as developing better detection algorithms.

Comments:	Published at ACM Multimedia 2022 Workshop DDAM First International Workshop on Deepfake Detection for Audio Multimedia at ACM Multimedia 2022
Subjects:	Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
DOI:	10.1145/3552466.3556531
Cite as:	arXiv:2107.09667 [cs.HC]
	(or arXiv:2107.09667v6 [cs.HC] for this version)

Submission history

From: Nicolas Michael Müller [view email]
[v1] Tue, 20 Jul 2021 09:19:42 GMT (339kb,D)
[v2] Thu, 30 Sep 2021 13:15:52 GMT (321kb,D)
[v3] Mon, 28 Mar 2022 12:32:45 GMT (1471kb,D)
[v4] Mon, 1 Aug 2022 07:51:21 GMT (1552kb,D)
[v5] Wed, 5 Oct 2022 08:22:11 GMT (1605kb,D)
[v6] Thu, 6 Oct 2022 07:49:41 GMT (1605kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2107.09667

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Human-Computer Interaction

Title: Human Perception of Audio Deepfakes

Submission history