We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: MelHuBERT: A simplified HuBERT on Mel spectrogram

Abstract: Self-supervised models have had great success in learning speech representations that can generalize to various downstream tasks. HuBERT, in particular, achieves strong performance while being relatively simple in training compared to others. The original experimental setting is computationally extensive, hindering the reproducibility of the models. It is also unclear why certain design decisions are made, such as the ad-hoc loss function, and whether these decisions have an impact on the learned representations. We propose MelHuBERT, a simplified version of HuBERT that takes Mel spectrograms as input, significantly reducing computation and memory consumption. We study several aspects of training, including the loss function, multi-stage training, and streaming options. Our result is a efficient yet performant model that can be trained on a single GPU.
Comments: Submitted to ICASSP 2023
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as: arXiv:2211.09944 [cs.CL]
  (or arXiv:2211.09944v1 [cs.CL] for this version)

Submission history

From: Hao Tang [view email]
[v1] Thu, 17 Nov 2022 23:38:29 GMT (522kb,D)

Link back to: arXiv, form interface, contact.