We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for cs.SD in Mar 2023

[ total of 232 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 226-232 ]
[ showing 25 entries per page: fewer | more | all ]
[1]  arXiv:2303.00204 [pdf, other]
Title: PCF: ECAPA-TDNN with Progressive Channel Fusion for Speaker Verification
Comments: Accepted by ICASSP 2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2]  arXiv:2303.00264 [pdf, other]
Title: Distance-based Weight Transfer from Near-field to Far-field Speaker Verification
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[3]  arXiv:2303.00332 [pdf, other]
Title: CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[4]  arXiv:2303.00502 [pdf, other]
Title: On the Audio-visual Synchronization for Lip-to-Speech Synthesis
Authors: Zhe Niu, Brian Mak
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[5]  arXiv:2303.00510 [pdf, other]
Title: A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[6]  arXiv:2303.00747 [pdf, other]
Title: WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Comments: Accepted to INTERSPEECH 2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[7]  arXiv:2303.01125 [pdf, other]
Title: Distilling Multi-Level X-vector Knowledge for Small-footprint Speaker Verification
Comments: Submitted to Data & Knowledge Engineering at Dec. 2023. Copyright may be transferred without notice
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[8]  arXiv:2303.01126 [pdf, other]
Title: Speaker-Aware Anti-Spoofing
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[9]  arXiv:2303.01211 [pdf, other]
Title: Learning From Yourself: A Self-Distillation Method for Fake Speech Detection
Comments: Accepted by ICASSP 2023
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[10]  arXiv:2303.01507 [pdf, other]
Title: Defending against Adversarial Audio via Diffusion Model
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[11]  arXiv:2303.01508 [pdf, other]
Title: Fine-grained Emotional Control of Text-To-Speech: Learning To Rank Inter- And Intra-Class Emotion Intensities
Comments: Accepted by ICASSP2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[12]  arXiv:2303.01639 [pdf, other]
Title: WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for Whisper-based Speech Interactions
Authors: Jun Rekimoto
Comments: ACM CHI 2023 paper
Journal-ref: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), April 23--28, 2023
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[13]  arXiv:2303.01664 [pdf, other]
Title: Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
Comments: Accepted to WASPAA 2023
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[14]  arXiv:2303.01665 [pdf, other]
Title: LooperGP: A Loopable Sequence Model for Live Coding Performance using GuitarPro Tablature
Comments: The Version of Record of this contribution is published in Proceedings of EvoMUSART: International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar) 2023
Journal-ref: EvoMUSART: International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar) 2023
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[15]  arXiv:2303.01694 [pdf, other]
Title: DWFormer: Dynamic Window transFormer for Speech Emotion Recognition
Comments: 4 pages, 5 figures, 3 tables, accepted by 2023 International Conference on Acoustics, Speech, and Signal Processing (ICASSP2023)
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[16]  arXiv:2303.01812 [pdf, other]
Title: Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers
Comments: ICASSP 2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[17]  arXiv:2303.01864 [pdf, ps, other]
Title: Spectrogram Inversion for Audio Source Separation via Consistency, Mixing, and Magnitude Constraints
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[18]  arXiv:2303.01875 [pdf, other]
Title: Decoding and Visualising Intended Emotion in an Expressive Piano Performance
Comments: Extended version of Late-Breaking Demo Session paper accepted at ISMIR 2022 (23rd Int. Society for Music Information Retrieval Conf., Bengaluru, India, 2022)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19]  arXiv:2303.01879 [pdf, other]
Title: Low-Complexity Audio Embedding Extractors
Comments: In Proceedings of the 31st European Signal Processing Conference, EUSIPCO 2023. Source Code available at: this https URL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20]  arXiv:2303.01884 [pdf, other]
Title: AutoMatch: A Large-scale Audio Beat Matching Benchmark for Boosting Deep Learning Assistant Video Editing
Comments: 11 pages, 5 figures
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[21]  arXiv:2303.02348 [pdf, other]
Title: The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis
Comments: Accepted by ICASSP 2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22]  arXiv:2303.02396 [pdf, other]
Title: A General Framework for Learning Procedural Audio Models of Environmental Sounds
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[23]  arXiv:2303.02599 [pdf, ps, other]
Title: Hybrid Y-Net Architecture for Singing Voice Separation
Comments: Submitted for EUSIPCO23: 5 Pages, 7 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[24]  arXiv:2303.02665 [pdf, other]
Title: Heterogeneous Graph Learning for Acoustic Event Classification
Comments: arXiv admin note: text overlap with arXiv:2207.07935
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[25]  arXiv:2303.02673 [pdf, other]
Title: Time-frequency Network for Robust Speaker Recognition
Comments: 5pages, 3 figures
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[ total of 232 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 226-232 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2404, contact, help  (Access key information)