Actions ~ Transformations

Wang, Xiaolong; Farhadi, Ali; Gupta, Abhinav

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 1512

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: Actions ~ Transformations

Authors: Xiaolong Wang, Ali Farhadi, Abhinav Gupta

(Submitted on 2 Dec 2015 (v1), last revised 26 Jul 2016 (this version, v2))

Abstract: What defines an action like "kicking ball"? We argue that the true meaning of an action lies in the change or transformation an action brings to the environment. In this paper, we propose a novel representation for actions by modeling an action as a transformation which changes the state of the environment before the action happens (precondition) to the state after the action (effect). Motivated by recent advancements of video representation using deep learning, we design a Siamese network which models the action as a transformation on a high-level feature space. We show that our model gives improvements on standard action recognition datasets including UCF101 and HMDB51. More importantly, our approach is able to generalize beyond learned action categories and shows significant performance improvement on cross-category generalization on our new ACT dataset.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1512.00795 [cs.CV]
	(or arXiv:1512.00795v2 [cs.CV] for this version)

Submission history

From: Xiaolong Wang [view email]
[v1] Wed, 2 Dec 2015 18:17:32 GMT (4086kb,D)
[v2] Tue, 26 Jul 2016 04:51:49 GMT (4218kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1512.00795v2

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Actions ~ Transformations

Submission history