References & Citations
Computer Science > Computation and Language
Title: The IBM 2016 English Conversational Telephone Speech Recognition System
(Submitted on 27 Apr 2016 (v1), last revised 22 Jun 2016 (this version, v2))
Abstract: We describe a collection of acoustic and language modeling techniques that lowered the word error rate of our English conversational telephone LVCSR system to a record 6.6% on the Switchboard subset of the Hub5 2000 evaluation testset. On the acoustic side, we use a score fusion of three strong models: recurrent nets with maxout activations, very deep convolutional nets with 3x3 kernels, and bidirectional long short-term memory nets which operate on FMLLR and i-vector features. On the language modeling side, we use an updated model "M" and hierarchical neural network LMs.
Submission history
From: George Saon [view email][v1] Wed, 27 Apr 2016 21:00:03 GMT (63kb,D)
[v2] Wed, 22 Jun 2016 16:30:37 GMT (62kb,D)
Link back to: arXiv, form interface, contact.