VenoMave: Targeted Poisoning Against Speech Recognition

Aghakhani, Hojjat; Schönherr, Lea; Eisenhofer, Thorsten; Kolossa, Dorothea; Holz, Thorsten; Kruegel, Christopher; Vigna, Giovanni

Full-text links:

Download:

Current browse context:

cs.SD

< prev | next >

new | recent | 2010

Computer Science > Sound

Title: VenoMave: Targeted Poisoning Against Speech Recognition

Authors: Hojjat Aghakhani, Lea Schönherr, Thorsten Eisenhofer, Dorothea Kolossa, Thorsten Holz, Christopher Kruegel, Giovanni Vigna

(Submitted on 21 Oct 2020 (v1), last revised 20 Apr 2023 (this version, v3))

Abstract: Despite remarkable improvements, automatic speech recognition is susceptible to adversarial perturbations. Compared to standard machine learning architectures, these attacks are significantly more challenging, especially since the inputs to a speech recognition system are time series that contain both acoustic and linguistic properties of speech. Extracting all recognition-relevant information requires more complex pipelines and an ensemble of specialized components. Consequently, an attacker needs to consider the entire pipeline. In this paper, we present VENOMAVE, the first training-time poisoning attack against speech recognition. Similar to the predominantly studied evasion attacks, we pursue the same goal: leading the system to an incorrect and attacker-chosen transcription of a target audio waveform. In contrast to evasion attacks, however, we assume that the attacker can only manipulate a small part of the training data without altering the target audio waveform at runtime. We evaluate our attack on two datasets: TIDIGITS and Speech Commands. When poisoning less than 0.17% of the dataset, VENOMAVE achieves attack success rates of more than 80.0%, without access to the victim's network architecture or hyperparameters. In a more realistic scenario, when the target audio waveform is played over the air in different rooms, VENOMAVE maintains a success rate of up to 73.3%. Finally, VENOMAVE achieves an attack transferability rate of 36.4% between two different model architectures.

Subjects:	Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2010.10682 [cs.SD]
	(or arXiv:2010.10682v3 [cs.SD] for this version)

Submission history

From: Hojjat Aghakhani [view email]
[v1] Wed, 21 Oct 2020 00:30:08 GMT (553kb,D)
[v2] Mon, 25 Oct 2021 17:28:34 GMT (6568kb,D)
[v3] Thu, 20 Apr 2023 21:21:04 GMT (7011kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2010.10682

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Sound

Title: VenoMave: Targeted Poisoning Against Speech Recognition

Submission history