Sound
Authors and titles for cs.SD in Jun 2022, skipping first 50
[ total of 221 entries: 1-50 | 51-100 | 101-150 | 151-200 | 201-221 ][ showing 50 entries per page: fewer | more | all ]
- [51] arXiv:2206.08297 [pdf, other]
-
Title: GoodBye WaveNet -- A Language Model for Raw Audio with Context of 1/2 Million SamplesAuthors: Prateek VermaComments: 12 pages, 1 figure. Technical Report at Stanford University. Ongoing WorkSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [52] arXiv:2206.08312 [pdf, other]
-
Title: SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic LearningAuthors: Changan Chen, Carl Schissler, Sanchit Garg, Philip Kobernik, Alexander Clegg, Paul Calamia, Dhruv Batra, Philip W Robinson, Kristen GraumanSubjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
- [53] arXiv:2206.08317 [pdf, other]
-
Title: Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech RecognitionComments: 5 pages, 3 figures, accepted by INTERSPEECH 2022Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [54] arXiv:2206.09131 [pdf, other]
-
Title: Tackling Spoofing-Aware Speaker Verification with Multi-Model FusionAuthors: Haibin Wu, Jiawen Kang, Lingwei Meng, Yang Zhang, Xixin Wu, Zhiyong Wu, Hung-yi Lee, Helen MengComments: Accepted by Odyssey 2022Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [55] arXiv:2206.09142 [pdf, other]
-
Title: Redundancy Reduction Twins Network: A Training framework for Multi-output Emotion RegressionComments: 5 pages, accepted by ICML Exvo workshopSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [56] arXiv:2206.09298 [pdf, ps, other]
-
Title: GMM based multi-stage Wiener filtering for low SNR speech enhancementComments: 5 pages, 3 figures, submitted to a conferenceSubjects: Sound (cs.SD); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
- [57] arXiv:2206.09920 [pdf, other]
- [58] arXiv:2206.10175 [pdf, other]
-
Title: A Multi-grained based Attention Network for Semi-supervised Sound Event DetectionJournal-ref: INTERSPEECH 2022Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [59] arXiv:2206.10256 [pdf, other]
-
Title: Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTSComments: 5 pages, 3 figures, Accepted for INTERSPEECH2022Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
- [60] arXiv:2206.10349 [pdf, ps, other]
-
Title: Joint Analysis of Acoustic Scenes and Sound Events Based on Multitask Learning with Dynamic Weight AdaptationComments: Submitted to Acoustical Science and TechnologySubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [61] arXiv:2206.10421 [pdf, other]
-
Title: Rethinking Audio-visual Synchronization for Active Speaker DetectionComments: Accepted by IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2022)Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
- [62] arXiv:2206.10695 [pdf, other]
-
Title: Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal VocalizationsComments: Accepted by the ICML Expressive Vocalizations Workshop and Competition 2022Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [63] arXiv:2206.10805 [pdf, other]
-
Title: Jointist: Joint Learning for Multi-instrument Transcription and Its ApplicationsAuthors: Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won, Amy Hung, Ju-Chiang Wang, Dorien HerremansComments: Submitted to ISMIRSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [64] arXiv:2206.11049 [pdf, other]
-
Title: Dynamic Restrained Uncertainty Weighting Loss for Multitask Learning of Vocal ExpressionAuthors: Meishu Song, Zijiang Yang, Andreas Triantafyllopoulos, Xin Jing, Vincent Karas, Xie Jiangjian, Zixing Zhang, Yamamoto Yoshiharu, Bjoern W. SchullerComments: 5 pagesSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [65] arXiv:2206.11066 [pdf, other]
-
Title: Radio2Speech: High Quality Speech Recovery from Radio Frequency SignalsComments: Accepted to INTERSPEECH 2022Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [66] arXiv:2206.11260 [pdf, other]
-
Title: Few-shot Long-Tailed Bird Audio RecognitionComments: LifeCLEF2022 (best paper award)Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [67] arXiv:2206.11567 [pdf]
-
Title: Restoring speech intelligibility for hearing aid users with deep learningAuthors: Peter Udo Diehl, Yosef Singer, Hannes Zilly, Uwe Schönfeld, Paul Meyer-Rachner, Mark Berry, Henning Sprekeler, Elias Sprengel, Annett Pudszuhn, Veit M. HofmannSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
- [68] arXiv:2206.11632 [pdf, other]
-
Title: Formant Estimation and Tracking using Probabilistic Heat-MapsComments: interspeech 2022Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [69] arXiv:2206.11643 [pdf, ps, other]
-
Title: Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Switchboard CorpusComments: Interspeech 2022 Accepted. arXiv admin note: text overlap with arXiv:2111.14479Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
- [70] arXiv:2206.11699 [pdf, ps, other]
-
Title: The SJTU X-LANCE Lab System for CNSRC 2022Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [71] arXiv:2206.11968 [pdf, other]
-
Title: Comparing supervised and self-supervised embedding for ExVo Multi-Task learning trackJournal-ref: Proceedings of the ICML 2022 Expressive Vocalizations Workshop and Competition: Recognizing, Generating, and Personalizing Vocal BurstsSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [72] arXiv:2206.12038 [pdf, other]
-
Title: BYOL-S: Learning Self-supervised Speech Representations by BootstrappingAuthors: Gasser Elbanna, Neil Scheidwasser-Clow, Mikolaj Kegler, Pierre Beckmann, Karl El Hajal, Milos CernakComments: Submitted to HEAR-PMLR 2021Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [73] arXiv:2206.12229 [pdf, other]
-
Title: Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-SpeechComments: Accepted to IEEE SLT 2022Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [74] arXiv:2206.12230 [pdf, other]
-
Title: Deformable CNN and Imbalance-Aware Feature Learning for Singing Technique ClassificationComments: Accepted to INTERSPEECH2022Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
- [75] arXiv:2206.12320 [pdf, other]
-
Title: PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech Assistant using Interventional Radiology Workflow AnalysisAuthors: Kubilay Can Demir, Matthias May, Axel Schmid, Michael Uder, Katharina Breininger, Tobias Weise, Andreas Maier, Seung Hee YangComments: 8 pages, 4 figures, Text, Speech and Dialogue 2022 ConferenceSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
- [76] arXiv:2206.12469 [pdf, other]
-
Title: Burst2Vec: An Adversarial Multi-Task Approach for Predicting Emotion, Age, and Origin from Vocal BurstsSubjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [77] arXiv:2206.12494 [pdf, other]
-
Title: Multitask vocal burst modeling with ResNets and pre-trained paralinguistic ConformersComments: To be published in the ICML Expressive Vocalizations Workshop & Competition 2022 (this https URL)Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [78] arXiv:2206.12513 [pdf, other]
-
Title: Domain Generalization with Relaxed Instance Frequency-wise Normalization for Multi-device Acoustic Scene ClassificationComments: Proceedings of INTERSPEECH 2022Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [79] arXiv:2206.12559 [pdf, other]
-
Title: Self-supervised Context-aware Style Representation for Expressive Speech SynthesisComments: Accepted by Interspeech 2022Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [80] arXiv:2206.12563 [pdf, other]
-
Title: Generating Diverse Vocal Bursts with StyleGAN2 and MEL-SpectrogramsComments: To be published at the ICML Expressive Vocalizations Workshop and Competition (ExVo Generate) held in conjunction with the 39th International Conference on Machine LearningSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [81] arXiv:2206.12568 [pdf, other]
-
Title: Self-supervision and Learnable STRFs for Age, Emotion, and Country PredictionJournal-ref: Proceedings of the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
- [82] arXiv:2206.12662 [pdf, other]
-
Title: Synthesizing Personalized Non-speech Vocalization from Discrete Speech RepresentationsAuthors: Chin-Cheng HsuSubjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [83] arXiv:2206.12829 [pdf, other]
-
Title: On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring ModeComments: Accepted at SPCOM 2022Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [84] arXiv:2206.13021 [pdf, other]
-
Title: Speak Like a Professional: Increasing Speech Intelligibility by Mimicking Professional Announcer Voice with Voice ConversionComments: Accepted at INTERSPEECH 2022Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [85] arXiv:2206.13071 [pdf, other]
-
Title: Uncertainty Calibration for Deep Audio ClassifiersComments: Accepted by InterSpeech 2022, the first two authors contributed equallySubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [86] arXiv:2206.13085 [pdf, other]
-
Title: Sound Model Factory: An Integrated System Architecture for Generative Audio ModellingJournal-ref: International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar) (pp. 308-322). Springer, Cham. 2022Subjects: Sound (cs.SD); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
- [87] arXiv:2206.13101 [pdf, other]
-
Title: SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask LearningComments: This paper is accepted by Interspeech 2022Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [88] arXiv:2206.13110 [pdf, other]
-
Title: Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fireComments: Signal Processing Letters 2022Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [89] arXiv:2206.13136 [pdf]
-
Title: A two-stage full-band speech enhancement model with effective spectral compression mappingSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [90] arXiv:2206.13476 [pdf, other]
-
Title: Impact of Acoustic Event Tagging on Scene Classification in a Multi-Task Learning FrameworkComments: Accepted at ISCA Interspeech 2022Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
- [91] arXiv:2206.13611 [pdf, other]
-
Title: ClearBuds: Wireless Binaural Earbuds for Learning-Based Speech EnhancementAuthors: Ishan Chatterjee, Maruchi Kim, Vivek Jayaram, Shyamnath Gollakota, Ira Kemelmacher-Shlizerman, Shwetak Patel, Steven M. SeitzComments: 12 pages, Published in Mobisys 2022Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [92] arXiv:2206.13689 [pdf, other]
-
Title: Tiny-Sepformer: A Tiny Time-Domain Transformer Network for Speech SeparationComments: Accepted by Interspeech 2022Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [93] arXiv:2206.13691 [pdf, other]
-
Title: Dummy Prototypical Networks for Few-Shot Open-Set Keyword SpottingComments: Proceedings of INTERSPEECH 2022Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [94] arXiv:2206.13700 [pdf, other]
-
Title: Domain Agnostic Few-shot Learning for Speaker VerificationComments: Proceedings of INTERSPEECH 2022Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [95] arXiv:2206.13708 [pdf, other]
-
Title: Personalized Keyword Spotting through Multi-task LearningComments: Proceedings of INTERSPEECH 2022Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [96] arXiv:2206.13817 [pdf, other]
-
Title: Comparison of Speech Representations for the MOS Prediction SystemComments: 5 pages, 4 figuresSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
- [97] arXiv:2206.13909 [pdf, other]
-
Title: QTI Submission to DCASE 2021: residual normalization for device-imbalanced acoustic scene classification with efficient designComments: tech report; won 1st place in DCASE2021 challenge. arXiv admin note: substantial text overlap with arXiv:2111.06531Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [98] arXiv:2206.13979 [pdf, other]
-
Title: Attack Agnostic Dataset: Towards Generalization and Stabilization of Audio DeepFake DetectionComments: Proceedings of INTERSPEECH 2022 (Updated version: corrected ASVspoof dataset description)Subjects: Sound (cs.SD); Machine Learning (cs.LG)
- [99] arXiv:2206.14659 [pdf, other]
-
Title: Language-Based Audio Retrieval with Converging Tied Layers and Contrastive LossSubjects: Sound (cs.SD); Computation and Language (cs.CL); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
- [100] arXiv:2206.14723 [pdf, other]
-
Title: DrumGAN VST: A Plugin for Drum Sound Analysis/Synthesis With Autoencoding Generative Adversarial NetworksComments: 7 pages, 2 figures, 3 tables, ICML2022 Machine Learning for Audio Synthesis (MLAS) Workshop, for sound examples visit this https URLSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[ showing 50 entries per page: fewer | more | all ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, cs, 2301, contact, help (Access key information)