We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for recent submissions

[ total of 50 entries: 1-25 | 26-50 ]
[ showing 25 entries per page: fewer | more | all ]

Wed, 5 Oct 2022

[1]  arXiv:2210.01677 [pdf, ps, other]
Title: The DKU-DukeECE Diarization System for the VoxCeleb Speaker Recognition Challenge 2022
Comments: arXiv admin note: substantial text overlap with arXiv:2109.02002
Subjects: Audio and Speech Processing (eess.AS)
[2]  arXiv:2210.01273 [pdf, other]
Title: An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification
Comments: Accepted by SLT2022
Subjects: Audio and Speech Processing (eess.AS)
[3]  arXiv:2210.01719 (cross-list from cs.SD) [pdf, other]
Title: Learning the Spectrogram Temporal Resolution for Audio Classification
Comments: Under review. Code open-sourced at this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[4]  arXiv:2210.01703 (cross-list from cs.SD) [pdf, other]
Title: Improving Label-Deficient Keyword Spotting Using Self-Supervised Pretraining
Authors: Holger Severin Bovbjerg (1), Zheng-Hua Tan (1) ((1) Aalborg University)
Comments: 8 pages, 3 figures, 4 tables, Submitted to Northern Lights Deep Learning Conference 2023
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[5]  arXiv:2210.01512 (cross-list from cs.CL) [pdf, other]
Title: Code-Switching without Switching: Language Agnostic End-to-End Speech Translation
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[6]  arXiv:2210.01448 (cross-list from cs.SD) [pdf, other]
Title: Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings
Comments: SIGGRAPH Asia 2022 (Journal Track); Project Page: this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Graphics (cs.GR); Audio and Speech Processing (eess.AS)
[7]  arXiv:2210.01353 (cross-list from cs.SD) [pdf, other]
Title: Pay Self-Attention to Audio-Visual Navigation
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[8]  arXiv:2210.01256 (cross-list from cs.SD) [pdf, ps, other]
Title: And what if two musical versions don't share melody, harmony, rhythm, or lyrics ?
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[9]  arXiv:2210.01205 (cross-list from cs.LG) [pdf]
Title: Diagnosis of Parkinson's Disease Based on Voice Signals Using SHAP and Hard Voting Ensemble Method
Subjects: Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)

Tue, 4 Oct 2022

[10]  arXiv:2210.01029 [pdf, other]
Title: WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration
Comments: Accepted to IEEE SLT 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[11]  arXiv:2210.00943 [pdf, other]
Title: Simple Pooling Front-ends For Efficient Audio Classification
Comments: Submitted to ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD); Signal Processing (eess.SP)
[12]  arXiv:2210.00889 [pdf, other]
Title: Learnable Acoustic Frontends in Bird Activity Detection
Comments: Submitted and presented at IWAENC, September 2022, 5 Pages, 1 Figure, 3 Tables
Subjects: Audio and Speech Processing (eess.AS)
[13]  arXiv:2210.00434 [pdf, other]
Title: Music-to-Text Synaesthesia: Generating Descriptive Text from Music Recordings
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD)
[14]  arXiv:2210.00417 [pdf, other]
Title: Voice Spoofing Countermeasures: Taxonomy, State-of-the-art, experimental analysis of generalizability, open challenges, and the way forward
Subjects: Audio and Speech Processing (eess.AS); Computers and Society (cs.CY); Sound (cs.SD)
[15]  arXiv:2210.00378 [pdf, other]
Title: Optimized Decoders for Mixed-Order Ambisonics
Authors: Aaron Heller (1), Eric Benjamin (2), Fernando Lopez-Lezcano (3) ((1) Artificial Intelligence Center, SRI International, (2) Surround Research, (3) Center for Computer Research in Music and Acoustics (CCRMA), Stanford University)
Comments: 9 pages, 10 figures,
Journal-ref: Paper 10507, 150th Audio Engineering Society Convention, May 2021
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)
[16]  arXiv:2210.00367 [pdf, other]
Title: A Comparison of Transformer, Convolutional, and Recurrent Neural Networks on Phoneme Recognition
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[17]  arXiv:2210.00263 [pdf, other]
Title: Fine-tuning Wav2vec for Vocal-burst Emotion Recognition
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[18]  arXiv:2210.00259 [pdf, other]
Title: Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications
Comments: 5 pages, submitted to INTERSPEECH 2022
Journal-ref: Proc. Interspeech 2022, 4083-4087
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[19]  arXiv:2210.00117 [pdf, other]
Title: Blind Signal Dereverberation for Machine Speech Recognition
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[20]  arXiv:2210.00077 [pdf, other]
Title: E-Branchformer: Branchformer with Enhanced merging for speech recognition
Comments: Accepted to SLT 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[21]  arXiv:2210.01116 (cross-list from cs.RO) [pdf, other]
Title: That Sounds Right: Auditory Self-Supervision for Dynamic Robot Manipulation
Comments: Videos and audio data are best seen on our project website: audio-robot-learning.github.io
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22]  arXiv:2210.00753 (cross-list from cs.SD) [pdf, other]
Title: Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection
Comments: Accepted by SLT 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[23]  arXiv:2210.00721 (cross-list from cs.SD) [pdf, other]
Title: Efficient acoustic feature transformation in mismatched environments using a Guided-GAN
Comments: 20 pages, 8 figures, 9 tables
Journal-ref: Speech Communication, 143, pp.10-20 (2022)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[24]  arXiv:2210.00705 (cross-list from cs.CL) [pdf, other]
Title: SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Comments: Accepted to IEEE SLT 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[25]  arXiv:2210.00169 (cross-list from cs.SD) [pdf, other]
Title: Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition
Comments: Published in INTERSPEECH 2022
Journal-ref: Proc. Interspeech 2022, 1691-1695
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[ total of 50 entries: 1-25 | 26-50 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2210, contact, help  (Access key information)