We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for cs.SD in Jun 2022, skipping first 75

[ total of 221 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | 151-175 | ... | 201-221 ]
[ showing 25 entries per page: fewer | more | all ]
[76]  arXiv:2206.12469 [pdf, other]
Title: Burst2Vec: An Adversarial Multi-Task Approach for Predicting Emotion, Age, and Origin from Vocal Bursts
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[77]  arXiv:2206.12494 [pdf, other]
Title: Multitask vocal burst modeling with ResNets and pre-trained paralinguistic Conformers
Comments: To be published in the ICML Expressive Vocalizations Workshop & Competition 2022 (this https URL)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[78]  arXiv:2206.12513 [pdf, other]
Title: Domain Generalization with Relaxed Instance Frequency-wise Normalization for Multi-device Acoustic Scene Classification
Comments: Proceedings of INTERSPEECH 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[79]  arXiv:2206.12559 [pdf, other]
Title: Self-supervised Context-aware Style Representation for Expressive Speech Synthesis
Comments: Accepted by Interspeech 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[80]  arXiv:2206.12563 [pdf, other]
Title: Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms
Comments: To be published at the ICML Expressive Vocalizations Workshop and Competition (ExVo Generate) held in conjunction with the 39th International Conference on Machine Learning
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[81]  arXiv:2206.12568 [pdf, other]
Title: Self-supervision and Learnable STRFs for Age, Emotion, and Country Prediction
Journal-ref: Proceedings of the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[82]  arXiv:2206.12662 [pdf, other]
Title: Synthesizing Personalized Non-speech Vocalization from Discrete Speech Representations
Authors: Chin-Cheng Hsu
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[83]  arXiv:2206.12829 [pdf, other]
Title: On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring Mode
Comments: Accepted at SPCOM 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[84]  arXiv:2206.13021 [pdf, other]
Title: Speak Like a Professional: Increasing Speech Intelligibility by Mimicking Professional Announcer Voice with Voice Conversion
Comments: Accepted at INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[85]  arXiv:2206.13071 [pdf, other]
Title: Uncertainty Calibration for Deep Audio Classifiers
Comments: Accepted by InterSpeech 2022, the first two authors contributed equally
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[86]  arXiv:2206.13085 [pdf, other]
Title: Sound Model Factory: An Integrated System Architecture for Generative Audio Modelling
Journal-ref: International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar) (pp. 308-322). Springer, Cham. 2022
Subjects: Sound (cs.SD); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[87]  arXiv:2206.13101 [pdf, other]
Title: SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning
Comments: This paper is accepted by Interspeech 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[88]  arXiv:2206.13110 [pdf, other]
Title: Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire
Comments: Signal Processing Letters 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[89]  arXiv:2206.13136 [pdf]
Title: A two-stage full-band speech enhancement model with effective spectral compression mapping
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[90]  arXiv:2206.13476 [pdf, other]
Title: Impact of Acoustic Event Tagging on Scene Classification in a Multi-Task Learning Framework
Comments: Accepted at ISCA Interspeech 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[91]  arXiv:2206.13611 [pdf, other]
Title: ClearBuds: Wireless Binaural Earbuds for Learning-Based Speech Enhancement
Comments: 12 pages, Published in Mobisys 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[92]  arXiv:2206.13689 [pdf, other]
Title: Tiny-Sepformer: A Tiny Time-Domain Transformer Network for Speech Separation
Comments: Accepted by Interspeech 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[93]  arXiv:2206.13691 [pdf, other]
Title: Dummy Prototypical Networks for Few-Shot Open-Set Keyword Spotting
Comments: Proceedings of INTERSPEECH 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[94]  arXiv:2206.13700 [pdf, other]
Title: Domain Agnostic Few-shot Learning for Speaker Verification
Comments: Proceedings of INTERSPEECH 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[95]  arXiv:2206.13708 [pdf, other]
Title: Personalized Keyword Spotting through Multi-task Learning
Comments: Proceedings of INTERSPEECH 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[96]  arXiv:2206.13817 [pdf, other]
Title: Comparison of Speech Representations for the MOS Prediction System
Comments: 5 pages, 4 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[97]  arXiv:2206.13909 [pdf, other]
Title: QTI Submission to DCASE 2021: residual normalization for device-imbalanced acoustic scene classification with efficient design
Comments: tech report; won 1st place in DCASE2021 challenge. arXiv admin note: substantial text overlap with arXiv:2111.06531
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[98]  arXiv:2206.13979 [pdf, other]
Title: Attack Agnostic Dataset: Towards Generalization and Stabilization of Audio DeepFake Detection
Comments: Proceedings of INTERSPEECH 2022 (Updated version: corrected ASVspoof dataset description)
Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[99]  arXiv:2206.14659 [pdf, other]
Title: Language-Based Audio Retrieval with Converging Tied Layers and Contrastive Loss
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[100]  arXiv:2206.14723 [pdf, other]
Title: DrumGAN VST: A Plugin for Drum Sound Analysis/Synthesis With Autoencoding Generative Adversarial Networks
Comments: 7 pages, 2 figures, 3 tables, ICML2022 Machine Learning for Audio Synthesis (MLAS) Workshop, for sound examples visit this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[ total of 221 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | 151-175 | ... | 201-221 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2211, contact, help  (Access key information)