We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for recent submissions

[ total of 44 entries: 1-44 ]
[ showing up to 47 entries per page: fewer | more ]

Fri, 21 Feb 2020

[1]  arXiv:2002.08933 [pdf, other]
Title: Wavesplit: End-to-End Speech Separation by Speaker Clustering
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[2]  arXiv:2002.08926 [pdf, ps, other]
Title: Imputer: Sequence Modelling via Imputation and Dynamic Programming
Comments: preprint
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[3]  arXiv:2002.08796 [pdf, ps, other]
Title: iSEGAN: Improved Speech Enhancement Generative Adversarial Networks
Authors: Deepak Baby
Comments: A short report on improving SEGAN performance
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[4]  arXiv:2002.08742 [pdf, other]
Title: Disentangled Speech Embeddings using Cross-modal Self-supervision
Comments: To appear in ICASSP 2020. The first three authors contributed equally to this work
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[5]  arXiv:2002.08688 [pdf, other]
Title: An empirical study of Conv-TasNet
Comments: In proceedings of ICASSP2020
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[6]  arXiv:2002.08700 (cross-list from cs.CV) [pdf, other]
Title: Photorealistic Lip Sync with Adversarial Temporal Convolutional Networks
Comments: 9 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[7]  arXiv:2002.08582 (cross-list from cs.SD) [pdf, ps, other]
Title: Convergence-guaranteed Independent Positive Semidefinite Tensor Analysis Based on Student's t Distribution
Comments: 5 pages, 3 figures, to appear in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2020
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)

Thu, 20 Feb 2020

[8]  arXiv:2002.08249 [pdf, other]
Title: Workshop Report: Detection and Classification in Marine Bioacoustics with Deep Learning
Comments: 13 pages, 1 figure, 1 table
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[9]  arXiv:2002.08267 (cross-list from cs.CL) [pdf]
Title: Multilogue-Net: A Context Aware RNN for Multi-modal Emotion Detection and Sentiment Analysis in Conversation
Comments: 10 pages, 4 figures, 6 tables
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[10]  arXiv:2002.08126 (cross-list from cs.CL) [pdf, ps, other]
Title: Rnn-transducer with language bias for end-to-end Mandarin-English code-switching speech recognition
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[11]  arXiv:2002.08125 (cross-list from cs.LG) [pdf, other]
Title: Gradient-Adjusted Neuron Activation Profiles for Comprehensive Introspection of Convolutional Speech Recognition Models
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)

Wed, 19 Feb 2020

[12]  arXiv:2002.07629 [pdf, other]
Title: Multi-Task Siamese Neural Network for Improving Replay Attack Detection
Comments: Submit to INTERSPEECH2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[13]  arXiv:2002.07590 [pdf]
Title: Speech Emotion Recognition using Support Vector Machine
Subjects: Audio and Speech Processing (eess.AS); Information Retrieval (cs.IR); Sound (cs.SD)
[14]  arXiv:2002.07450 [pdf, other]
Title: Multitask Learning with Capsule Networks for Speech-to-Intent Applications
Comments: To be published in ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS)
[15]  arXiv:2002.07677 (cross-list from cs.SD) [pdf]
Title: Performance Analysis of Adaptive Noise Cancellation for Speech Signal
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

Tue, 18 Feb 2020

[16]  arXiv:2002.07065 [pdf]
Title: Acoustic Scene Classification Using Bilinear Pooling on Time-liked and Frequency-liked Convolution Neural Network
Comments: inclusion in conference proceedings 2019 IEEE Symposium Series on Computational Intelligence (IEEE SSCI 2019), Xiamen
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[17]  arXiv:2002.06637 [pdf, other]
Title: Real-time binaural speech separation with preserved spatial cues
Comments: To appear in ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[18]  arXiv:2002.06595 [pdf, other]
Title: Speech-to-Singing Conversion in an Encoder-Decoder Framework
Comments: Accepted at IEEE ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[19]  arXiv:2002.06312 [pdf, other]
Title: Small energy masking for improved neural network training for end-to-end speech recognition
Comments: Accepted at ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[20]  arXiv:2002.06279 [pdf, other]
Title: A Comparison of Pooling Methods on LSTM Models for Rare Acoustic Event Classification
Comments: Accepted to ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[21]  arXiv:2002.06239 [pdf, other]
Title: Boosted Locality Sensitive Hashing: Discriminative Binary Codes for Source Separation
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[22]  arXiv:2002.06220 [pdf, other]
Title: Speaker Diarization with Region Proposal Network
Comments: Accepted to ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[23]  arXiv:2002.07016 (cross-list from cs.SD) [pdf, other]
Title: Meta-learning Extractors for Music Source Separation
Comments: Camera-ready version for ICASSP 2020; the source files are published at this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[24]  arXiv:2002.06817 (cross-list from cs.SD) [pdf, other]
Title: Addressing the confounds of accompaniments in singer identification
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[25]  arXiv:2002.06778 (cross-list from cs.SD) [pdf, other]
Title: Lifter Training and Sub-band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials
Comments: 5 pages, to appear in IEEE International Conference on Acoustics, Speech, and Signal Processing 2020 (ICASSP 2020)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[26]  arXiv:2002.06758 (cross-list from cs.SD) [pdf, other]
Title: Interactive Text-to-Speech via Semi-supervised Style Transfer Learning
Comments: Version 0
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[27]  arXiv:2002.06353 (cross-list from cs.CV) [pdf, other]
Title: UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[28]  arXiv:2002.06328 (cross-list from cs.SD) [pdf]
Title: Many-to-Many Voice Conversion using Conditional Cycle-Consistent Adversarial Networks
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Mon, 17 Feb 2020

[29]  arXiv:2002.06165 [pdf, other]
Title: Unsupervised Speaker Adaptation using Attention-based Speaker Memory for End-to-End ASR
Comments: To appear in Proc. ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[30]  arXiv:2002.06049 [pdf]
Title: An Adaptive X-vector Model for Text-independent Speaker Verification
Comments: 6 pages, 3 figures
Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[31]  arXiv:2002.05994 [pdf, ps, other]
Title: Sound Event Localization based on Sound Intensity Vector Refined By DNN-Based Denoising and Source Separation
Comments: 5 pages, 3 figures, to appear in IEEE ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[32]  arXiv:2002.05879 [pdf, other]
Title: Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-Box Cost Function
Comments: accepted to the 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[33]  arXiv:2002.05873 [pdf, other]
Title: Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention
Comments: 5 pages, to appear in IEEE ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[34]  arXiv:2002.05865 [pdf, other]
Title: A Sequence Matching Network for Polyphonic Sound Event Localization and Detection
Comments: to be published in 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[35]  arXiv:2002.05843 [pdf, other]
Title: Real-time speech enhancement using equilibriated RNN
Comments: To appear in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[36]  arXiv:2002.05832 [pdf, other]
Title: Phase reconstruction based on recurrent phase unwrapping with deep neural networks
Comments: To appear at the 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[37]  arXiv:2002.05831 [pdf, other]
Title: Consistency-aware multi-channel speech enhancement using deep neural networks
Comments: To appear at the 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[38]  arXiv:2002.06033 (cross-list from cs.SD) [pdf, other]
Title: Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances
Comments: Submitted to Odyssey 2020
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[39]  arXiv:2002.06021 (cross-list from cs.SD) [pdf, other]
Title: Hodge and Podge: Hybrid Supervised Sound Event Detection with Multi-Hot MixMatch and Composition Consistence Training
Comments: arXiv admin note: substantial text overlap with arXiv:1907.07398
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[40]  arXiv:2002.06016 (cross-list from cs.SD) [pdf, other]
Title: DNN-Based Distributed Multichannel Mask Estimation for Speech Enhancement in Microphone Arrays
Authors: Nicolas Furnon (LORIA, MULTISPEECH), Romain Serizel (LORIA, MULTISPEECH), Irina Illina (LORIA, MULTISPEECH), Slim Essid (LTCI)
Comments: Submitted to ICASSP2020
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[41]  arXiv:2002.06012 (cross-list from cs.CL) [pdf, other]
Title: Dialogue history integration into end-to-end signal-to-concept spoken language understanding systems
Comments: Accepted for ICASSP 2020 (Submitted: October 21, 2019)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[42]  arXiv:2002.05967 (cross-list from cs.CL) [pdf, ps, other]
Title: Integrating Discrete and Neural Features via Mixed-feature Trans-dimensional Random Field Language Models
Authors: Silin Gao (1), Zhijian Ou (1), Wei Yang (2), Huifang Xu (3) ((1) Tsinghua University, (2) State Grid Customer Service Center, (3) China Electric Power Research Institute)
Comments: 5 pages, 2 figures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[43]  arXiv:2002.05955 (cross-list from cs.CL) [pdf, other]
Title: A Data Efficient End-To-End Spoken Language Understanding Architecture
Comments: Accepted to ICASSP 2020
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[44]  arXiv:2002.05848 (cross-list from cs.SD) [pdf, ps, other]
Title: Sound Event Detection by Multitask Learning of Sound Events and Scenes with Soft Scene Labels
Comments: Accepted to ICASSP 2020
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[ total of 44 entries: 1-44 ]
[ showing up to 47 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2002, contact, help  (Access key information)