We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for recent submissions, skipping first 26

[ total of 66 entries: 1-25 | 2-26 | 27-51 | 52-66 ]
[ showing 25 entries per page: fewer | more | all ]

Thu, 16 Mar 2023 (continued, showing last 4 of 19 entries)

[27]  arXiv:2303.08372 (cross-list from eess.AS) [pdf, other]
Title: Target Sound Extraction with Variable Cross-modality Clues
Comments: Accepted by ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[28]  arXiv:2303.08343 (cross-list from eess.AS) [pdf, ps, other]
Title: Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models
Comments: Accepted to IEEE ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[29]  arXiv:2303.08295 (cross-list from eess.SP) [pdf, other]
Title: A large-scale multimodal dataset of human speech recognition
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[30]  arXiv:2303.08268 (cross-list from cs.RO) [pdf, other]
Title: Chat with the Environment: Interactive Multimodal Perception using Large Language Models
Comments: See website at this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Wed, 15 Mar 2023 (showing first 21 of 29 entries)

[31]  arXiv:2303.08026 [pdf, other]
Title: A Study on Bias and Fairness In Deep Speaker Recognition
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[32]  arXiv:2303.07902 [pdf, other]
Title: BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33]  arXiv:2303.07794 [pdf, ps, other]
Title: DiffuseRoll: Multi-track multi-category music generation based on diffusion model
Authors: Hongfei Wang
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[34]  arXiv:2303.07711 [pdf, other]
Title: Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis
Comments: Accepted by ICASSP2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[35]  arXiv:2303.07687 [pdf, other]
Title: Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy
Comments: Accepted by ICASSP 2023
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[36]  arXiv:2303.07682 [pdf, other]
Title: QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis
Comments: Accepted by ICASSP 2023
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[37]  arXiv:2303.07667 [pdf, other]
Title: Improving Music Genre Classification from multi-modal properties of music and genre correlations Perspective
Comments: Accepted by ICASSP 2023
Subjects: Sound (cs.SD)
[38]  arXiv:2303.07643 [pdf, other]
Title: Feature-Rich Audio Model Inversion for Data-Free Knowledge Distillation Towards General Sound Classification
Comments: Accepted by ICASSP 2023. International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[39]  arXiv:2303.07626 [pdf, other]
Title: CAT: Causal Audio Transformer for Audio Classification
Comments: Accepted to ICASSP 2023
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[40]  arXiv:2303.07578 [pdf, ps, other]
Title: VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation
Comments: Presentation accepted at ICASSP 2023
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[41]  arXiv:2303.08052 (cross-list from eess.AS) [pdf, other]
Title: Localizing Spatial Information in Neural Spatiospectral Filters
Comments: Submitted to the 31st European Signal Processing Conference (EUSIPCO 2023), Helsinki, Finland. 5 pages, 3 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[42]  arXiv:2303.08027 (cross-list from eess.AS) [pdf, other]
Title: A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition
Comments: 5 pages, 3 figures, 5 tables
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[43]  arXiv:2303.08019 (cross-list from eess.AS) [pdf, other]
Title: Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection
Comments: 5 pages, 3 figures, 3 tables
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[44]  arXiv:2303.08005 (cross-list from eess.AS) [pdf, other]
Title: Native Multi-Band Audio Coding within Hyper-Autoencoded Reconstruction Propagation Networks
Comments: Accepted to ICASSP 2023. For resources and examples, see this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[45]  arXiv:2303.07924 (cross-list from cs.LG) [pdf, other]
Title: Improving Accented Speech Recognition with Multi-Domain Training
Comments: 5 pages, 2 figures. Accepted to ICASSP 2023
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[46]  arXiv:2303.07816 (cross-list from eess.AS) [pdf, other]
Title: Multi-Channel Masking with Learnable Filterbank for Sound Source Separation
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[47]  arXiv:2303.07739 (cross-list from eess.SP) [pdf, other]
Title: Detecting post-stroke aphasia using EEG-based neural envelope tracking of natural speech
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[48]  arXiv:2303.07704 (cross-list from eess.AS) [pdf, other]
Title: TEA-PSE 3.0: Tencent-Ethereal-Audio-Lab Personalized Speech Enhancement System For ICASSP 2023 DNS Challenge
Comments: Accepted by ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[49]  arXiv:2303.07650 (cross-list from cs.CL) [pdf, other]
Title: Cross-lingual Alzheimer's Disease detection based on paralinguistic and pre-trained features
Comments: accepted by ICASSP 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50]  arXiv:2303.07624 (cross-list from cs.CL) [pdf, other]
Title: I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Comments: Accepted at ICASSP 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[51]  arXiv:2303.07621 (cross-list from eess.AS) [pdf, other]
Title: Two-stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge
Comments: Accepted by ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[ total of 66 entries: 1-25 | 2-26 | 27-51 | 52-66 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2303, contact, help  (Access key information)