We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for recent submissions

[ total of 48 entries: 1-29 | 30-48 ]
[ showing 29 entries per page: fewer | more | all ]

Fri, 24 Mar 2023

[1]  arXiv:2303.13336 [pdf, other]
Title: Audio Diffusion Model for Speech Synthesis: A Survey on Text To Speech and Speech Enhancement in Generative AI
Comments: 18 pages
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[2]  arXiv:2303.13272 [pdf, other]
Title: Frame-Level Multi-Label Playing Technique Detection Using Multi-Scale Network and Self-Attention Mechanism
Comments: Accepted to ICASSP 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[3]  arXiv:2303.13072 [pdf, other]
Title: Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognit
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[4]  arXiv:2303.12984 [pdf, other]
Title: LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models
Comments: 5 pages, accepted to ICASSP 2023, project page: this https URL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[5]  arXiv:2303.13471 (cross-list from cs.CV) [pdf, other]
Title: Egocentric Audio-Visual Object Localization
Comments: Accepted by CVPR 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[6]  arXiv:2303.13453 (cross-list from eess.AS) [pdf, other]
Title: Better Together: Dialogue Separation and Voice Activity Detection for Audio Personalization in TV
Comments: Paper accepted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), Rhodes, Greece
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[7]  arXiv:2303.13243 (cross-list from eess.AS) [pdf, other]
Title: Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8]  arXiv:2303.13027 (cross-list from eess.AS) [pdf, other]
Title: Weighted Pressure and Mode Matching for Sound Field Reproduction: Theoretical and Experimental Comparisons
Comments: Accepted to Journal of Audio Engineering Society, Special Issue on Spatial Audio
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[9]  arXiv:2303.12930 (cross-list from cs.CV) [pdf, other]
Title: Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
Comments: Accepted by CVPR2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[10]  arXiv:2303.12908 (cross-list from eess.AS) [pdf, other]
Title: Self-supervised Learning with Speech Modulation Dropout
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Thu, 23 Mar 2023

[11]  arXiv:2303.12692 [pdf]
Title: Dual-Quaternions: Theory and Applications in Sound
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[12]  arXiv:2303.12300 [pdf, other]
Title: Exploring Turkish Speech Recognition via Hybrid CTC/Attention Architecture and Multi-feature Fusion Network
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[13]  arXiv:2303.12659 (cross-list from cs.AI) [pdf, other]
Title: Posthoc Interpretation via Quantization
Comments: * Equal contribution
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14]  arXiv:2303.12337 (cross-list from cs.MM) [pdf, other]
Title: Music-Driven Group Choreography
Comments: accepted in cvpr 2023
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[15]  arXiv:2303.12187 (cross-list from eess.AS) [pdf, other]
Title: Practice of the conformer enhanced AUDIO-VISUAL HUBERT on Mandarin and English
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)

Wed, 22 Mar 2023

[16]  arXiv:2303.11816 [pdf, other]
Title: Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning
Comments: ICASSP 2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[17]  arXiv:2303.11692 [pdf, other]
Title: ByteCover3: Accurate Cover Song Identification on Short Queries
Comments: Accepeted by ICASSP 2023
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[18]  arXiv:2303.11510 [pdf, other]
Title: ICASSP 2023 Deep Speech Enhancement Challenge
Comments: 6 pages, 1 figure. arXiv admin note: text overlap with arXiv:2202.13288
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19]  arXiv:2303.12002 (cross-list from eess.AS) [pdf, other]
Title: End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[20]  arXiv:2303.11607 (cross-list from cs.CL) [pdf, other]
Title: Transformers in Speech Processing: A Survey
Comments: under-review
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21]  arXiv:2303.11551 (cross-list from cs.CV) [pdf, other]
Title: ModEFormer: Modality-Preserving Embedding for Audio-Video Synchronization using Transformers
Comments: Paper accepted at ICASSP 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)

Tue, 21 Mar 2023 (showing first 8 of 23 entries)

[22]  arXiv:2303.11020 [pdf, other]
Title: Dual-stream Time-Delay Neural Network with Dynamic Global Filter for Speaker Verification
Comments: 13 pages 4 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[23]  arXiv:2303.10912 [pdf, other]
Title: Exploring Representation Learning for Small-Footprint Keyword Spotting
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[24]  arXiv:2303.10897 [pdf, other]
Title: Relate auditory speech to EEG by shallow-deep attention-based network
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[25]  arXiv:2303.10757 [pdf, other]
Title: Multiscale Audio Spectrogram Transformer for Efficient Audio Classification
Comments: ICASSP 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[26]  arXiv:2303.10667 [pdf, other]
Title: Audio-Text Models Do Not Yet Leverage Natural Language
Comments: Copyright 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[27]  arXiv:2303.10539 [pdf, other]
Title: Textless Speech-to-Music Retrieval Using Emotion Similarity
Comments: To Appear IEEE ICASSP 2023
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[28]  arXiv:2303.10446 [pdf, other]
Title: A Content Adaptive Learnable Time-Frequency Representation For Audio Signal Processing
Comments: 5 pages, 4 figures. 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing, Rhodes, Greece
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[29]  arXiv:2303.10445 [pdf, other]
Title: EarCough: Enabling Continuous Subject Cough Event Detection on Hearables
Comments: This paper has been accepted by ACM CHI 2023
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[ total of 48 entries: 1-29 | 30-48 ]
[ showing 29 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2303, contact, help  (Access key information)