We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for recent submissions

[ total of 82 entries: 1-25 | 26-50 | 51-75 | 76-82 ]
[ showing 25 entries per page: fewer | more | all ]

Fri, 9 Jun 2023

[1]  arXiv:2306.05350 [pdf, other]
Title: PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models
Comments: This work was accepted to the 11th International Conference on Affective Computing and Intelligent Interaction (ACII), 2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2]  arXiv:2306.05284 [pdf, other]
Title: Simple and Controllable Music Generation
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[3]  arXiv:2306.05279 [pdf, other]
Title: Language-specific Acoustic Boundary Learning for Mandarin-English Code-switching Speech Recognition
Subjects: Sound (cs.SD)
[4]  arXiv:2306.04956 [pdf, other]
Title: Adaptive Fake Audio Detection with Low-Rank Model Squeezing
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[5]  arXiv:2306.05374 (cross-list from physics.med-ph) [pdf, other]
Title: Towards Ultrasound Tongue Image prediction from EEG during speech production
Comments: accepted at Interspeech 2023
Subjects: Medical Physics (physics.med-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[6]  arXiv:2306.05358 (cross-list from cs.CR) [pdf, other]
Title: Trustworthy Sensor Fusion against Inaudible Command Attacks in Advanced Driver-Assistance System
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[7]  arXiv:2306.05320 (cross-list from cs.CL) [pdf, other]
Title: KIT's Multilingual Speech Translation System for IWSLT 2023
Comments: IWSLT 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD)
[8]  arXiv:2306.05245 (cross-list from eess.AS) [pdf, other]
Title: Matching Latent Encoding for Audio-Text based Keyword Spotting
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[9]  arXiv:2306.05088 (cross-list from cs.CL) [pdf, other]
Title: The ART of Conversation: Measuring Phonetic Convergence and Deliberate Imitation in L2-Speech with a Siamese RNN
Authors: Zheng Yuan (1 and 2), Aldo Pastore (1 and 2), Dorina de Jong (1 and 2), Hao Xu (3), Luciano Fadiga (1 and 2), Alessandro D'Ausilio (1 and 2) ((1) Istituto Italiano di Tecnologia, Italy, (2) Università degli Studi di Ferrara, Italy, (3) University of California San Diego, USA)
Comments: Accepted at INTERSPEECH 2023
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[10]  arXiv:2306.05004 (cross-list from eess.AS) [pdf, other]
Title: VIFS: An End-to-End Variational Inference for Foley Sound Synthesis
Comments: DCASE 2023 Challenge Task 7
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[11]  arXiv:2306.04987 (cross-list from eess.AS) [pdf, other]
Title: Two-stage Autoencoder Neural Network for 3D Speech Enhancement
Comments: 5 pages,5 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[12]  arXiv:2306.04980 (cross-list from cs.CL) [pdf, other]
Title: Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models
Comments: Accepted by InterSpeech 2023. arXiv admin note: substantial text overlap with arXiv:2210.16029
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[13]  arXiv:2306.04655 (cross-list from eess.SP) [pdf]
Title: Modulation Classification Through Deep Learning Using Resolution Transformed Spectrograms
Comments: 15 pages, 12 figures
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Thu, 8 Jun 2023 (showing first 12 of 15 entries)

[14]  arXiv:2306.04628 [pdf, other]
Title: Systematic Analysis of Music Representations from BERT
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[15]  arXiv:2306.04368 [pdf, other]
Title: Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation
Comments: Accepted to Interspeech 2023
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[16]  arXiv:2306.04301 [pdf, other]
Title: Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Comments: Accepted at Interspeech2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[17]  arXiv:2306.04286 [pdf, other]
Title: A Mask Free Neural Network for Monaural Speech Enhancement
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[18]  arXiv:2306.04268 [pdf, other]
Title: Multi-microphone Automatic Speech Segmentation in Meetings Based on Circular Harmonics Features
Authors: Théo Mariotte (LAUM, LIUM), Anthony Larcher (LIUM), Silvio Montrésor (LAUM), Jean-Hugh Thomas (LAUM)
Comments: Interspeech 2023, international Speech Communication Association (ISCA), Aug 2023, Dublin, Ireland
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[19]  arXiv:2306.04148 [pdf, other]
Title: SANGEET: A XML based Open Dataset for Research in Hindustani Sangeet
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[20]  arXiv:2306.04143 [pdf, other]
Title: RISC: A Corpus for Shout Type Classification and Shout Intensity Prediction
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21]  arXiv:2306.04428 (cross-list from cs.CL) [pdf, other]
Title: Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages
Comments: Accepted at INTERSPEECH 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22]  arXiv:2306.04374 (cross-list from cs.CL) [pdf, other]
Title: Label Aware Speech Representation Learning For Language Identification
Comments: Accepted at Interspeech 2023
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23]  arXiv:2306.04306 (cross-list from cs.CL) [pdf, other]
Title: Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes
Authors: Kevin Glocker (1), Aaricia Herygers (1), Munir Georges (1 and 2) ((1) AImotion Bavaria Technische Hochschule Ingolstadt, (2) Intel Labs Germany)
Comments: 5 pages, 2 figures, 2 tables, accepted to INTERSPEECH 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[24]  arXiv:2306.04276 (cross-list from physics.ao-ph) [pdf]
Title: Test experiments with distributed acoustic sensing and hydrophone arrays for locating underwater sound sources
Comments: Data description
Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS); Biological Physics (physics.bio-ph)
[25]  arXiv:2306.04233 (cross-list from cs.CL) [pdf, other]
Title: Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization
Comments: Accepted by Interspeech 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[ total of 82 entries: 1-25 | 26-50 | 51-75 | 76-82 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2306, contact, help  (Access key information)