We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for recent submissions

[ total of 47 entries: 1-25 | 26-47 ]
[ showing 25 entries per page: fewer | more | all ]

Fri, 15 Nov 2019

[1]  arXiv:1911.06149 [pdf, other]
Title: Emotional Voice Conversion using multitask learning with Text-to-speech
Comments: 4 pages, 3 figures, submitted to ICASSP2020
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[2]  arXiv:1911.05733 [pdf, other]
Title: The phonetic bases of vocal expressed emotion: natural versus acted
Comments: 6 pages, 4 figures
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[3]  arXiv:1911.06266 (cross-list from cs.SD) [pdf, ps, other]
Title: Speaker independence of neural vocoders and their effect on parametric resynthesis speech enhancement
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[4]  arXiv:1911.06245 (cross-list from cs.SD) [pdf, other]
Title: Scene-Aware Audio Rendering via Deep Acoustic Analysis
Comments: conditionally accepted to IEEE VR 2020 Journal Track
Subjects: Sound (cs.SD); Graphics (cs.GR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[5]  arXiv:1911.05894 (cross-list from cs.SD) [pdf, other]
Title: Coincidence, Categorization, and Consolidation: Learning to Recognize Sounds with Minimal Supervision
Comments: This extended version of a ICASSP 2020 submission under same title has an added figure and additional discussion for easier consumption
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[6]  arXiv:1911.05833 (cross-list from cs.SD) [pdf, other]
Title: Emotion and Theme Recognition in Music with Frequency-Aware RF-Regularized CNNs
Comments: MediaEval`19, 27-29 October 2019, Sophia Antipolis, France
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[7]  arXiv:1804.00003 (cross-list from stat.AP) [pdf, other]
Title: Spectral Estimation of Plasma Fluctuations I: Comparison of Methods
Comments: Missing Figures
Journal-ref: Physics of Plasmas, Volume 1, Issue 3, March 1994, pp.485-500
Subjects: Applications (stat.AP); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP); Statistics Theory (math.ST)

Thu, 14 Nov 2019

[8]  arXiv:1911.05560 [pdf]
Title: Enhanced Voice Post Processing Using Voice Decoder Guidance Indicators
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[9]  arXiv:1911.05504 [pdf, other]
Title: 3-D Feature and Acoustic Modeling for Far-Field Speech Recognition
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[10]  arXiv:1911.05659 (cross-list from eess.SP) [pdf, other]
Title: M3ER: Multiplicative Multimodal Emotion Recognition Using Facial, Textual, and Speech Cues
Subjects: Signal Processing (eess.SP); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[11]  arXiv:1911.05187 (cross-list from cs.CV) [pdf, other]
Title: AI in Pursuit of Happiness, Finding Only Sadness: Multi-Modal Facial Emotion Recognition Challenge
Authors: Carl Norman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[12]  arXiv:1911.05186 (cross-list from cs.CV) [pdf, other]
Title: TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation
Comments: submitted to ICASSP 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[13]  arXiv:1803.04078 (cross-list from stat.ME) [pdf, other]
Title: Minimum bias multiple taper spectral estimation
Journal-ref: I.E.E.E. Trans. Signal Processing 43, pp. 188-195 (1995)
Subjects: Methodology (stat.ME); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP); Statistics Theory (math.ST)
[14]  arXiv:1803.04075 (cross-list from stat.ME) [pdf, ps, other]
Title: Kernel estimation of the instantaneous frequency
Authors: Kurt S. Riedel
Journal-ref: I.E.E.E. Trans. Signal Processing 42, pp. 2644-2649 (1994)
Subjects: Methodology (stat.ME); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP); Statistics Theory (math.ST)
[15]  arXiv:1803.03995 (cross-list from stat.ME) [pdf, ps, other]
Title: Adaptive Smoothing of the Log-Spectrum with Multiple Tapering
Journal-ref: IEEE Trans. Signal Process., vol. 44, no. 7, pp. 1794-1800, Jul. 1996
Subjects: Methodology (stat.ME); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP); Statistics Theory (math.ST)
[16]  arXiv:1803.03897 (cross-list from stat.ME) [pdf, ps, other]
Title: Optimal Data-based Kernel Estimation of Evolutionary Spectra
Authors: Kurt S. Riedel
Journal-ref: IEEE Transactions on Signal Processing ( Volume: 41, Issue: 7, Jul 1993 ) Page(s): 2439 - 2447
Subjects: Methodology (stat.ME); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)

Wed, 13 Nov 2019 (showing first 9 of 13 entries)

[17]  arXiv:1911.04952 [pdf, other]
Title: 'Warriors of the Word' -- Deciphering Lyrical Topics in Music and Their Connection to Audio Feature Dimensions Based on a Corpus of Over 100,000 Metal Songs
Comments: Corrected typo in abstract (subsample)
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[18]  arXiv:1911.04908 [pdf, other]
Title: Non-Autoregressive Transformer Automatic Speech Recognition
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[19]  arXiv:1911.04890 [pdf, other]
Title: Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
Authors: Takaki Makino (1), Hank Liao (1), Yannis Assael (2), Brendan Shillingford (2), Basilio Garcia (1), Otavio Braga (1), Olivier Siohan (1) ((1) Google Inc. (2) DeepMind)
Comments: Will be presented in 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019)
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[20]  arXiv:1911.04862 [pdf, other]
Title: An End-to-end Approach for Lexical Stress Detection based on Transformer
Comments: Submission to ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[21]  arXiv:1911.04808 [pdf, other]
Title: Detection of speech events and speaker characteristics through photo-plethysmographic signal neural processing
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[22]  arXiv:1911.04666 [pdf, other]
Title: Segment Relevance Estimation for Audio Analysis and Weakly-Labelled Classification
Comments: Submitted to IEEE Signal Processing Letters
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[23]  arXiv:1911.04973 (cross-list from cs.SD) [pdf, other]
Title: Using musical relationships between chord labels in automatic chord extraction tasks
Comments: Accepted for publication in ISMIR, 2018
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[24]  arXiv:1911.04972 (cross-list from cs.LG) [pdf, other]
Title: Multi-Step Chord Sequence Prediction Based on Aggregated Multi-Scale Encoder-Decoder Network
Comments: Accepted for publication in MLSP, 2019
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[25]  arXiv:1911.04824 (cross-list from cs.IR) [pdf, other]
Title: Music Auto-tagging Using CNNs and Mel-spectrograms With Reduced Frequency and Time Resolution
Comments: Submitted to IEEE 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)
Subjects: Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[ total of 47 entries: 1-25 | 26-47 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 1911, contact, help  (Access key information)