We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

eess.AS

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Audio to Body Dynamics

Abstract: We present a method that gets as input an audio of violin or piano playing, and outputs a video of skeleton predictions which are further used to animate an avatar. The key idea is to create an animation of an avatar that moves their hands similarly to how a pianist or violinist would do, just from audio. Aiming for a fully detailed correct arms and fingers motion is a goal, however, it's not clear if body movement can be predicted from music at all. In this paper, we present the first result that shows that natural body dynamics can be predicted at all. We built an LSTM network that is trained on violin and piano recital videos uploaded to the Internet. The predicted points are applied onto a rigged avatar to create the animation.
Comments: Link with videos this https URL
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
Journal reference: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
DOI: 10.1109/CVPR.2018.00790
Cite as: arXiv:1712.09382 [eess.AS]
  (or arXiv:1712.09382v1 [eess.AS] for this version)

Submission history

From: Ira Kemelmacher-Shlizerman [view email]
[v1] Tue, 19 Dec 2017 23:45:00 GMT (9232kb,D)

Link back to: arXiv, form interface, contact.