We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for recent submissions

[ total of 33 entries: 1-25 | 26-33 ]
[ showing 25 entries per page: fewer | more | all ]

Fri, 3 Feb 2023

[1]  arXiv:2302.01090 (cross-list from cs.SD) [pdf, other]
Title: Goniometers are a Powerful Acoustic Feature for Music Information Retrieval Tasks
Authors: Tim Ziemer
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[2]  arXiv:2302.00868 (cross-list from cs.SD) [pdf, other]
Title: Speech Enhancement for Virtual Meetings on Cellular Networks
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[3]  arXiv:2302.00836 (cross-list from cs.CL) [pdf, other]
Title: Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition
Comments: The 13th International Symposium on Chinese Spoken Language Processing (ISCSLP 2022)
Journal-ref: Published in ISCSLP 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[4]  arXiv:2302.00765 (cross-list from cs.CL) [pdf, other]
Title: Visually Grounded Keyword Detection and Localisation for Low-Resource Languages
Comments: PhD dissertation, University of Stellenbosch, 108 pages, submitted and accepted 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Thu, 2 Feb 2023

[5]  arXiv:2302.00646 (cross-list from cs.SD) [pdf, other]
Title: Epic-Sounds: A Large-scale Dataset of Actions That Sound
Comments: 6 pages, 4 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[6]  arXiv:2302.00286 (cross-list from cs.SD) [pdf, other]
Title: Jointist: Simultaneous Improvement of Multi-instrument Transcription and Music Source Separation via Joint Training
Comments: arXiv admin note: text overlap with arXiv:2206.10805
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Wed, 1 Feb 2023

[7]  arXiv:2301.13341 [pdf, other]
Title: Neural Target Speech Extraction: An Overview
Comments: Submitted to IEEE Signal Processing Magazine on Apr. 25, 2022, and accepted on Jan. 12, 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8]  arXiv:2301.13662 (cross-list from cs.SD) [pdf, other]
Title: InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9]  arXiv:2301.13507 (cross-list from cs.IR) [pdf, ps, other]
Title: An Analysis of Classification Approaches for Hit Song Prediction using Engineered Metadata Features with Lyrics and Audio Features
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[10]  arXiv:2301.13383 (cross-list from cs.SD) [pdf, other]
Title: An Comparative Analysis of Different Pitch and Metrical Grid Encoding Methods in the Task of Sequential Music Generation
Comments: This is a draft before submitted to TISMIR as a journal paper
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[11]  arXiv:2301.13380 (cross-list from cs.SD) [pdf, other]
Title: Automated Time-frequency Domain Audio Crossfades using Graph Cuts
Journal-ref: Late Breaking/Demo at the 20th International Society for Music Information Retrieval, Delft, The Netherlands, 2019
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[12]  arXiv:2301.13267 (cross-list from cs.SD) [pdf]
Title: ArchiSound: Audio Generation with Diffusion
Authors: Flavio Schneider
Comments: Master Thesis at ETH Zurich
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

Tue, 31 Jan 2023 (showing first 13 of 17 entries)

[13]  arXiv:2301.13057 [pdf, other]
Title: MYRiAD: A Multi-Array Room Acoustic Database
Comments: submitted for publication
Subjects: Audio and Speech Processing (eess.AS)
[14]  arXiv:2301.12808 [pdf, other]
Title: Real-Time Acoustic Perception for Automotive Applications
Subjects: Audio and Speech Processing (eess.AS)
[15]  arXiv:2301.12596 [pdf, other]
Title: Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[16]  arXiv:2301.12363 [pdf, other]
Title: KalmanNet: A Learnable Kalman Filter for Acoustic Echo Cancellation
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[17]  arXiv:2301.12258 [pdf, other]
Title: Cross-domain Neural Pitch and Periodicity Estimation
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[18]  arXiv:2301.13003 (cross-list from cs.CL) [pdf, other]
Title: Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation
Comments: 5 pages; Keywords: speech recognition, continuous integrate-and-fire, knowledge distillation, contrastive learning, pre-trained language models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19]  arXiv:2301.12686 (cross-list from cs.LG) [pdf, other]
Title: GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20]  arXiv:2301.12662 (cross-list from cs.SD) [pdf, other]
Title: SingSong: Generating musical accompaniments from singing
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[21]  arXiv:2301.12661 (cross-list from cs.SD) [pdf, other]
Title: Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Comments: Audio samples are available at this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[22]  arXiv:2301.12525 (cross-list from cs.SD) [pdf, other]
Title: Composer's Assistant: Interactive Transformers for Multi-Track MIDI Infilling
Comments: 16 pages, 7 figures, 3 tables
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[23]  arXiv:2301.12503 (cross-list from cs.SD) [pdf, other]
Title: AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
Comments: Demo and implementation at this https URL Evaluation toolbox at this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[24]  arXiv:2301.12354 (cross-list from cs.SD) [pdf, other]
Title: Artistic Curve Steganography Carried by Musical Audio
Comments: 18 pages, 14 figures, in Proceedings of EvoMUSART 2023
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[25]  arXiv:2301.12343 (cross-list from cs.SD) [pdf, other]
Title: Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[ total of 33 entries: 1-25 | 26-33 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2302, contact, help  (Access key information)