We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CV

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computer Vision and Pattern Recognition

Title: Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Zero Shot Action Generation

Abstract: We introduce Action-GPT, a plug and play framework for incorporating Large Language Models (LLMs) into text-based action generation models. Action phrases in current motion capture datasets contain minimal and to-the-point information. By carefully crafting prompts for LLMs, we generate richer and fine-grained descriptions of the action. We show that utilizing these detailed descriptions instead of the original action phrases leads to better alignment of text and motion spaces. Our experiments show qualitative and quantitative improvement in the quality of synthesized motions produced by recent text-to-motion models. Code, pretrained models and sample videos will be made available at this https URL
Comments: WIP. Code, pretrained models and sample videos will be made available at \url{this https URL}
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Multimedia (cs.MM)
Cite as: arXiv:2211.15603 [cs.CV]
  (or arXiv:2211.15603v2 [cs.CV] for this version)

Submission history

From: Ravi Kiran Sarvadevabhatla [view email]
[v1] Mon, 28 Nov 2022 17:57:48 GMT (2416kb,D)
[v2] Wed, 30 Nov 2022 13:13:29 GMT (2417kb,D)

Link back to: arXiv, form interface, contact.