We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for cs.SD in Jan 2023

[ total of 104 entries: 1-104 ]
[ showing 104 entries per page: fewer | more ]
[1]  arXiv:2301.00508 [pdf, other]
Title: EmoGator: A New Open Source Vocal Burst Dataset with Baseline Machine Learning Classification Methodologies
Authors: Fred W. Buhl
Comments: 12 pages, 4 tables, 2 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[2]  arXiv:2301.01162 [pdf, other]
Title: Language Models are Drummers: Drum Composition with Natural Language Pre-Training
Comments: Accepted to the 1st workshop on Creative AI across Modalities in AAAI 2023
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[3]  arXiv:2301.01378 [pdf, other]
Title: An ensemble-based framework for mispronunciation detection of Arabic phonemes
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[4]  arXiv:2301.01578 [pdf, other]
Title: Validity in Music Information Research Experiments
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[5]  arXiv:2301.02385 [pdf, other]
Title: Multi-Genre Music Transformer -- Composing Full Length Musical Piece
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[6]  arXiv:2301.02732 [pdf, ps, other]
Title: Multimodal Lyrics-Rhythm Matching
Comments: Accepted by 2022 IEEE International Conference on Big Data (IEEE Big Data 2022)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[7]  arXiv:2301.02884 [pdf, other]
Title: TunesFormer: Forming Irish Tunes with Control Codes by Bar Patching
Comments: 6 pages, 1 figure, 1 table, accepted by HCMIR 2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[8]  arXiv:2301.02886 [pdf, other]
Title: Perceptual-Neural-Physical Sound Matching
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[9]  arXiv:2301.03206 [pdf, other]
Title: Introducing Model Inversion Attacks on Automatic Speaker Recognition
Comments: for associated pdf, see this https URL
Journal-ref: Proc. 2nd Symposium on Security and Privacy in Speech Communication, 2022
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[10]  arXiv:2301.03751 [pdf, other]
Title: Generative Emotional AI for Speech Emotion Recognition: The Case for Synthetic Emotional Speech Augmentation
Comments: Under review
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[11]  arXiv:2301.03801 [pdf, other]
Title: UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[12]  arXiv:2301.04320 [pdf, other]
Title: Rethinking complex-valued deep neural networks for monaural speech enhancement
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[13]  arXiv:2301.04388 [pdf, other]
Title: Perceive and predict: self-supervised speech representation based loss functions for speech enhancement
Comments: 4 pages, accepted at ICASSP 2023
Journal-ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[14]  arXiv:2301.04488 [pdf, other]
Title: WuYun: Exploring hierarchical skeleton-guided melody generation using knowledge-enhanced deep learning
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[15]  arXiv:2301.05898 [pdf, ps, other]
Title: Acoustic correlates of the syllabic rhythm of speech: Modulation spectrum or local features of the temporal envelope
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[16]  arXiv:2301.05908 [pdf, other]
Title: An Order-Complexity Model for Aesthetic Quality Assessment of Symbolic Homophony Music Scores
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[17]  arXiv:2301.06078 [pdf, ps, other]
Title: Training one model to detect heart and lung sound events from single point auscultations
Comments: 14 pages, 8 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[18]  arXiv:2301.06211 [pdf, ps, other]
Title: What artificial intelligence might teach us about the origin of human language
Comments: ICPHS2023 Conference Submission. 5 pages
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[19]  arXiv:2301.06277 [pdf, ps, other]
Title: Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings
Comments: ACCEPTED by NCMMSC 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[20]  arXiv:2301.06468 [pdf, other]
Title: Msanii: High Fidelity Music Synthesis on a Shoestring Budget
Authors: Kinyugo Maina
Comments: 15 pages, 8 figures, for demo see this https URL and for code, see this https URL, this paper is a work in progress
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[21]  arXiv:2301.06735 [pdf, other]
Title: Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming Transducer
Comments: accepted by interspeech 2023
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[22]  arXiv:2301.07491 [pdf, ps, other]
Title: The Newsbridge -Telecom SudParis VoxCeleb Speaker Recognition Challenge 2022 System Description
Authors: Yannis Tevissen (ARMEDIA-SAMOVAR), Jérôme Boudy (ARMEDIA-SAMOVAR), Frédéric Petitpont
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[23]  arXiv:2301.07665 [pdf, other]
Title: An investigation of the reconstruction capacity of stacked convolutional autoencoders for log-mel-spectrograms
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[24]  arXiv:2301.07851 [pdf, other]
Title: From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition
Comments: Submitted to ICASSP 2023. The project was initiated in May 2022 during a research internship at Google Research
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[25]  arXiv:2301.07939 [pdf, other]
Title: THLNet: two-stage heterogeneous lightweight network for monaural speech enhancement
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[26]  arXiv:2301.07978 [pdf, other]
Title: SpotHitPy: A Study For ML-Based Song Hit Prediction Using Spotify
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[27]  arXiv:2301.08620 [pdf, other]
Title: Adjoint-Based Identification of Sound Sources for Sound Reinforcement and Source Localization
Journal-ref: Notes on Numerical Fluid Mechanics and Multidisciplinary Design, vol 145. Springer (2021)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Fluid Dynamics (physics.flu-dyn)
[28]  arXiv:2301.09027 [pdf, other]
Title: Cellular Network Speech Enhancement: Removing Background and Transmission Noise
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[29]  arXiv:2301.09362 [pdf, other]
Title: A Comprehensive Survey on Heart Sound Analysis in the Deep Learning Era
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[30]  arXiv:2301.10015 [pdf, other]
Title: Deep Attention-Based Alignment Network for Melody Generation from Incomplete Lyrics
Comments: arXiv admin note: substantial text overlap with arXiv:2011.06380
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[31]  arXiv:2301.10183 [pdf, other]
Title: Mesostructures: Beyond Spectrogram Loss in Differentiable Time-Frequency Analysis
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[32]  arXiv:2301.10335 [pdf, other]
Title: Multilingual Multiaccented Multispeaker TTS with RADTTS
Comments: 5 pages, submitted to ICASSP 2023
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[33]  arXiv:2301.10477 [pdf, other]
[34]  arXiv:2301.10587 [pdf, other]
Title: On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems
Comments: Accepted to ICASSP 2023
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[35]  arXiv:2301.11325 [pdf, other]
Title: MusicLM: Generating Music From Text
Comments: Supplementary material at this https URL and this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[36]  arXiv:2301.12084 [pdf, other]
Title: Automated Arrangements of Multi-Part Music for Sets of Monophonic Instruments
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[37]  arXiv:2301.12209 [pdf, other]
Title: who is snoring? snore based user recognition
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[38]  arXiv:2301.12343 [pdf, other]
Title: Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[39]  arXiv:2301.12354 [pdf, other]
Title: Artistic Curve Steganography Carried by Musical Audio
Comments: 18 pages, 14 figures, in Proceedings of EvoMUSART 2023
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[40]  arXiv:2301.12503 [pdf, other]
Title: AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
Comments: Accepted by ICML 2023. Demo and implementation at this https URL Evaluation toolbox at this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[41]  arXiv:2301.12525 [pdf, other]
Title: Composer's Assistant: An Interactive Transformer for Multi-Track MIDI Infilling
Comments: 12 pages, 6 figures, 3 tables. To be published in ISMIR 2023
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[42]  arXiv:2301.12661 [pdf, other]
Title: Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Comments: Audio samples are available at this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[43]  arXiv:2301.12662 [pdf, other]
Title: SingSong: Generating musical accompaniments from singing
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[44]  arXiv:2301.13267 [pdf, ps, other]
Title: ArchiSound: Audio Generation with Diffusion
Authors: Flavio Schneider
Comments: Master Thesis at ETH Zurich
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[45]  arXiv:2301.13380 [pdf, other]
Title: Automated Time-frequency Domain Audio Crossfades using Graph Cuts
Journal-ref: Late Breaking/Demo at the 20th International Society for Music Information Retrieval, Delft, The Netherlands, 2019
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[46]  arXiv:2301.13383 [pdf, other]
Title: An Comparative Analysis of Different Pitch and Metrical Grid Encoding Methods in the Task of Sequential Music Generation
Comments: This is a draft before submitted to TISMIR as a journal paper
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[47]  arXiv:2301.13662 [pdf, other]
Title: InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Comments: Submit to TASLP
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[48]  arXiv:2301.00142 (cross-list from cs.HC) [pdf, other]
Title: Computational Charisma -- A Brick by Brick Blueprint for Building Charismatic Artificial Intelligence
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[49]  arXiv:2301.00304 (cross-list from cs.CL) [pdf, other]
Title: Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition Systems A case study for Modern Greek
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50]  arXiv:2301.00591 (cross-list from cs.CL) [pdf, other]
Title: Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling
Comments: Accepted at ICASSP 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[51]  arXiv:2301.01020 (cross-list from cs.CL) [pdf, other]
Title: Supervised Acoustic Embeddings And Their Transferability Across Languages
Comments: Presented at ICNLSP 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[52]  arXiv:2301.01456 (cross-list from cs.CV) [pdf, other]
Title: Audio-Visual Efficient Conformer for Robust Speech Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[53]  arXiv:2301.02111 (cross-list from cs.CL) [pdf, other]
Title: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Comments: Working in progress
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[54]  arXiv:2301.02184 (cross-list from cs.CV) [pdf, other]
Title: Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations
Comments: Accepted to CVPR 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[55]  arXiv:2301.03238 (cross-list from cs.CL) [pdf, ps, other]
Title: MAQA: A Multimodal QA Benchmark for Negation
Comments: NeurIPS 2022 SyntheticData4ML Workshop
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[56]  arXiv:2301.04474 (cross-list from cs.CV) [pdf, other]
Title: Speech Driven Video Editing via an Audio-Conditioned Diffusion Model
Comments: 8 Pages, code and project page available here: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[57]  arXiv:2301.06267 (cross-list from cs.CV) [pdf, other]
Title: Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models
Comments: CVPR 2023. Project website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[58]  arXiv:2301.06375 (cross-list from cs.MM) [pdf, other]
Title: OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[59]  arXiv:2301.06475 (cross-list from cs.CL) [pdf, ps, other]
Title: Using Kaldi for Automatic Speech Recognition of Conversational Austrian German
Comments: 10 pages, 2 figures, 4 tables
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[60]  arXiv:2301.06916 (cross-list from cs.CL) [pdf, other]
Title: Automated speech- and text-based classification of neuropsychiatric conditions in a multidiagnostic setting
Comments: 24 pages, 5 figures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Applications (stat.AP)
[61]  arXiv:2301.07087 (cross-list from cs.CL) [pdf, other]
Title: MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module
Comments: Accepted to SSW 12: this https URL
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[62]  arXiv:2301.07829 (cross-list from cs.HC) [pdf, other]
Title: Warning: Humans Cannot Reliably Detect Speech Deepfakes
Journal-ref: PLoS ONE 18(8) (2023): e0285333
Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[63]  arXiv:2301.08145 (cross-list from cs.IR) [pdf, other]
Title: Music Playlist Title Generation Using Artist Information
Comments: AAAI-23 Workshop on Creative AI Across Modalities
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[64]  arXiv:2301.08562 (cross-list from cs.LG) [pdf, other]
Title: Latent Autoregressive Source Separation
Comments: Accepted to AAAI 2023
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[65]  arXiv:2301.08730 (cross-list from cs.CV) [pdf, other]
Title: Novel-View Acoustic Synthesis
Comments: Accepted at CVPR 2023. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[66]  arXiv:2301.08810 (cross-list from cs.CL) [pdf, other]
Title: Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[67]  arXiv:2301.09080 (cross-list from cs.MM) [pdf, other]
Title: Dance2MIDI: Dance-driven multi-instruments music generation
Comments: has been accepted by Computational Visual Media Journal
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[68]  arXiv:2301.09099 (cross-list from cs.CL) [pdf, ps, other]
Title: Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[69]  arXiv:2301.10047 (cross-list from cs.GR) [pdf, other]
Title: DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model
Comments: 13 pages, 3 figures
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[70]  arXiv:2301.10056 (cross-list from cs.CR) [pdf, ps, other]
Title: Side Eye: Characterizing the Limits of POV Acoustic Eavesdropping from Smartphone Cameras with Rolling Shutters and Movable Lenses
Journal-ref: 2023 IEEE Symposium on Security and Privacy
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[71]  arXiv:2301.10180 (cross-list from cs.CL) [pdf, other]
Title: A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[72]  arXiv:2301.10295 (cross-list from cs.CV) [pdf, other]
Title: Object Segmentation with Audio Context
Comments: Research project for Introduction to Deep Learning (11785) at Carnegie Mellon University
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[73]  arXiv:2301.10314 (cross-list from cs.HC) [pdf, other]
Title: WhisperWand: Simultaneous Voice and Gesture Tracking Interface
Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74]  arXiv:2301.10606 (cross-list from cs.CL) [pdf, other]
Title: A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation
Comments: This is the full version of our submission to ICASSP 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[75]  arXiv:2301.11716 (cross-list from cs.CL) [pdf, other]
Title: Pre-training for Speech Translation: CTC Meets Optimal Transport
Comments: ICML 2023 (oral presentation). This version fixed URLs, updated affiliations & acknowledgements, and improved formatting
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[76]  arXiv:2301.11757 (cross-list from cs.CL) [pdf, other]
Title: Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[77]  arXiv:2301.11975 (cross-list from cs.LG) [pdf, other]
Title: Byte Pair Encoding for Symbolic Music
Comments: EMNLP 2023, source code: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[78]  arXiv:2301.12331 (cross-list from cs.CL) [pdf, other]
Title: Time out of Mind: Generating Rate of Speech conditioned on emotion and speaker
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[79]  arXiv:2301.12686 (cross-list from cs.LG) [pdf, other]
Title: GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[80]  arXiv:2301.13003 (cross-list from cs.CL) [pdf, other]
Title: Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation
Comments: Accepted by INTERSPEECH 2023
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[81]  arXiv:2301.13507 (cross-list from cs.IR) [pdf, ps, other]
Title: An Analysis of Classification Approaches for Hit Song Prediction using Engineered Metadata Features with Lyrics and Audio Features
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[82]  arXiv:2301.00448 (cross-list from eess.AS) [pdf, other]
Title: Unsupervised Acoustic Scene Mapping Based on Acoustic Features and Dimensionality Reduction
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[83]  arXiv:2301.00646 (cross-list from eess.AS) [pdf, other]
Title: Addressing the Selection Bias in Voice Assistance: Training Voice Assistance Model in Python with Equal Data Selection
Subjects: Audio and Speech Processing (eess.AS); Multiagent Systems (cs.MA); Robotics (cs.RO); Sound (cs.SD)
[84]  arXiv:2301.00833 (cross-list from eess.AS) [pdf, other]
Title: Hyperuniform disordered parametric loudspeaker array
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Applied Physics (physics.app-ph)
[85]  arXiv:2301.01361 (cross-list from eess.AS) [pdf, other]
Title: Modeling the Rhythm from Lyrics for Melody Generation of Pop Song
Comments: Published in ISMIR 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[86]  arXiv:2301.01595 (cross-list from quant-ph) [pdf, other]
Title: Quantum Representations of Sound: from mechanical waves to quantum circuits
Comments: 29 pages,26 figures. Accompanying Python package is available: this https URL
Subjects: Quantum Physics (quant-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[87]  arXiv:2301.02214 (cross-list from eess.AS) [pdf, other]
Title: Automatic Sound Event Detection and Classification of Great Ape Calls Using Neural Networks
Comments: This paper is published as: Jiang, Zifan, Adrian Soldati, Isaac Schamberg, Adriano R. Lameira and Steven Moran. Automatic Sound Event Detection and Classification of Great Ape Calls Using Neural Networks. In Proceedings of the 20th International Congress of Phonetic Sciences (ICPhS 2023), 3100-3104, Prague, Czech Republic
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[88]  arXiv:2301.02262 (cross-list from eess.AS) [pdf, other]
Title: Singing voice synthesis based on frame-level sequence-to-sequence models considering vocal timing deviation
Comments: 5 pages, 4 figures
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[89]  arXiv:2301.02736 (cross-list from eess.AS) [pdf, other]
Title: Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[90]  arXiv:2301.04606 (cross-list from eess.AS) [pdf, other]
Title: Modelling low-resource accents without accent-specific TTS frontend
Comments: The first two authors contributed equally to this work. In Review. Samples available on this https URL
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[91]  arXiv:2301.05025 (cross-list from math.HO) [pdf, other]
Title: Topological data analysis hearing the shapes of drums and bells
Authors: Guo-Wei Wei
Comments: 4 pages, 2 figures
Subjects: History and Overview (math.HO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[92]  arXiv:2301.05295 (cross-list from eess.AS) [pdf, other]
Title: Rock Guitar Tablature Generation via Natural Language Processing
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[93]  arXiv:2301.05868 (cross-list from eess.AS) [pdf, other]
Title: Modulation spectral features for speech emotion recognition using deep neural networks
Comments: Accepted for publication in Elsevier's Speech Communication Journal
Journal-ref: Volume 146, January 2023, Pages 53-69
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[94]  arXiv:2301.06458 (cross-list from eess.AS) [pdf, other]
Title: Multi-resolution location-based training for multi-channel continuous speech separation
Comments: Submitted to ICASSP 23
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[95]  arXiv:2301.07173 (cross-list from eess.AS) [pdf, other]
Title: Towards Voice Reconstruction from EEG during Imagined Speech
Comments: 9 pages, 4 figures, accepted paper of AAAI 2023 in main track
Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Sound (cs.SD); Signal Processing (eess.SP)
[96]  arXiv:2301.08925 (cross-list from eess.AS) [pdf, other]
Title: New Challenges for Content Privacy in Speech and Audio
Comments: Accepted for publication in ISCA SPSC Symposium 2022
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[97]  arXiv:2301.09198 (cross-list from eess.AS) [pdf, other]
Title: Estimation of Source and Receiver Positions, Room Geometry and Reflection Coefficients From a Single Room Impulse Response
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[98]  arXiv:2301.10210 (cross-list from eess.AS) [pdf, ps, other]
Title: Perceptual evaluation of listener envelopment using spatial granular synthesis
Comments: Submitted to the Journal of the Audio Engineering Society (JAES)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[99]  arXiv:2301.11176 (cross-list from eess.AS) [pdf, ps, other]
Title: A simple model for pink noise from amplitude modulations
Comments: 12 pages, 9 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Classical Physics (physics.class-ph)
[100]  arXiv:2301.11276 (cross-list from eess.AS) [pdf, other]
Title: BayesSpeech: A Bayesian Transformer Network for Automatic Speech Recognition
Authors: Will Rieger
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[101]  arXiv:2301.11446 (cross-list from eess.AS) [pdf, other]
Title: On granularity of prosodic representations in expressive text-to-speech
Comments: Accepted to IEEE SLT 2022
Journal-ref: 2022 IEEE Spoken Language Technology Workshop (SLT), pp. 892-899
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[102]  arXiv:2301.12258 (cross-list from eess.AS) [pdf, other]
Title: Cross-domain Neural Pitch and Periodicity Estimation
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[103]  arXiv:2301.12363 (cross-list from eess.AS) [pdf, other]
Title: NeuralKalman: A Learnable Kalman Filter for Acoustic Echo Cancellation
Comments: The term of the algorithm is renamed because it conflicts with an existing KalmanNet algorithm proposed by Revach et. al. (arXiv:2107.10043); Accepted by ASRU 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[104]  arXiv:2301.13341 (cross-list from eess.AS) [pdf, other]
Title: Neural Target Speech Extraction: An Overview
Comments: Submitted to IEEE Signal Processing Magazine on Apr. 25, 2022, and accepted on Jan. 12, 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[ total of 104 entries: 1-104 ]
[ showing 104 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2404, contact, help  (Access key information)