We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

eess

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Sound

Title: Few-Shot Audio-Visual Learning of Environment Acoustics

Abstract: Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener, with implications for various applications in AR, VR, and robotics. Whereas traditional methods to estimate RIRs assume dense geometry and/or sound measurements throughout the environment, we explore how to infer RIRs based on a sparse set of images and echoes observed in the space. Towards that goal, we introduce a transformer-based method that uses self-attention to build a rich acoustic context, then predicts RIRs of arbitrary query source-receiver locations through cross-attention. Additionally, we design a novel training objective that improves the match in the acoustic signature between the RIR predictions and the targets. In experiments using a state-of-the-art audio-visual simulator for 3D environments, we demonstrate that our method successfully generates arbitrary RIRs, outperforming state-of-the-art methods and -- in a major departure from traditional methods -- generalizing to novel environments in a few-shot manner. Project: this http URL
Comments: Accepted to NeurIPS 2022
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as: arXiv:2206.04006 [cs.SD]
  (or arXiv:2206.04006v2 [cs.SD] for this version)

Submission history

From: Sagnik Majumder [view email]
[v1] Wed, 8 Jun 2022 16:38:24 GMT (3671kb,D)
[v2] Thu, 24 Nov 2022 16:52:50 GMT (6108kb,D)

Link back to: arXiv, form interface, contact.