We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for recent submissions

[ total of 48 entries: 1-25 | 26-48 ]
[ showing 25 entries per page: fewer | more | all ]

Fri, 24 Mar 2023

[1]  arXiv:2303.13336 [pdf, other]
Title: Audio Diffusion Model for Speech Synthesis: A Survey on Text To Speech and Speech Enhancement in Generative AI
Comments: 18 pages
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[2]  arXiv:2303.13272 [pdf, other]
Title: Frame-Level Multi-Label Playing Technique Detection Using Multi-Scale Network and Self-Attention Mechanism
Comments: Accepted to ICASSP 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[3]  arXiv:2303.13072 [pdf, other]
Title: Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognit
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[4]  arXiv:2303.12984 [pdf, other]
Title: LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models
Comments: 5 pages, accepted to ICASSP 2023, project page: this https URL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[5]  arXiv:2303.13471 (cross-list from cs.CV) [pdf, other]
Title: Egocentric Audio-Visual Object Localization
Comments: Accepted by CVPR 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[6]  arXiv:2303.13453 (cross-list from eess.AS) [pdf, other]
Title: Better Together: Dialogue Separation and Voice Activity Detection for Audio Personalization in TV
Comments: Paper accepted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), Rhodes, Greece
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[7]  arXiv:2303.13243 (cross-list from eess.AS) [pdf, other]
Title: Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8]  arXiv:2303.13027 (cross-list from eess.AS) [pdf, other]
Title: Weighted Pressure and Mode Matching for Sound Field Reproduction: Theoretical and Experimental Comparisons
Comments: Accepted to Journal of Audio Engineering Society, Special Issue on Spatial Audio
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[9]  arXiv:2303.12930 (cross-list from cs.CV) [pdf, other]
Title: Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
Comments: Accepted by CVPR2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[10]  arXiv:2303.12908 (cross-list from eess.AS) [pdf, other]
Title: Self-supervised Learning with Speech Modulation Dropout
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Thu, 23 Mar 2023

[11]  arXiv:2303.12692 [pdf]
Title: Dual-Quaternions: Theory and Applications in Sound
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[12]  arXiv:2303.12300 [pdf, other]
Title: Exploring Turkish Speech Recognition via Hybrid CTC/Attention Architecture and Multi-feature Fusion Network
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[13]  arXiv:2303.12659 (cross-list from cs.AI) [pdf, other]
Title: Posthoc Interpretation via Quantization
Comments: * Equal contribution
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14]  arXiv:2303.12337 (cross-list from cs.MM) [pdf, other]
Title: Music-Driven Group Choreography
Comments: accepted in cvpr 2023
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[15]  arXiv:2303.12187 (cross-list from eess.AS) [pdf, other]
Title: Practice of the conformer enhanced AUDIO-VISUAL HUBERT on Mandarin and English
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)

Wed, 22 Mar 2023

[16]  arXiv:2303.11816 [pdf, other]
Title: Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning
Comments: ICASSP 2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[17]  arXiv:2303.11692 [pdf, other]
Title: ByteCover3: Accurate Cover Song Identification on Short Queries
Comments: Accepeted by ICASSP 2023
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[18]  arXiv:2303.11510 [pdf, other]
Title: ICASSP 2023 Deep Speech Enhancement Challenge
Comments: 6 pages, 1 figure. arXiv admin note: text overlap with arXiv:2202.13288
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19]  arXiv:2303.12002 (cross-list from eess.AS) [pdf, other]
Title: End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[20]  arXiv:2303.11607 (cross-list from cs.CL) [pdf, other]
Title: Transformers in Speech Processing: A Survey
Comments: under-review
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21]  arXiv:2303.11551 (cross-list from cs.CV) [pdf, other]
Title: ModEFormer: Modality-Preserving Embedding for Audio-Video Synchronization using Transformers
Comments: Paper accepted at ICASSP 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)

Tue, 21 Mar 2023 (showing first 4 of 23 entries)

[22]  arXiv:2303.11020 [pdf, other]
Title: Dual-stream Time-Delay Neural Network with Dynamic Global Filter for Speaker Verification
Comments: 13 pages 4 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[23]  arXiv:2303.10912 [pdf, other]
Title: Exploring Representation Learning for Small-Footprint Keyword Spotting
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[24]  arXiv:2303.10897 [pdf, other]
Title: Relate auditory speech to EEG by shallow-deep attention-based network
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[25]  arXiv:2303.10757 [pdf, other]
Title: Multiscale Audio Spectrogram Transformer for Efficient Audio Classification
Comments: ICASSP 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[ total of 48 entries: 1-25 | 26-48 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2303, contact, help  (Access key information)