Temporal and Contextual Transformer for Multi-Camera Editing of TV Shows

Rao, Anyi; Jiang, Xuekun; Wang, Sichen; Guo, Yuwei; Liu, Zihao; Dai, Bo; Pang, Long; Wu, Xiaoyu; Lin, Dahua; Jin, Libiao

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2210

Computer Science > Computer Vision and Pattern Recognition

Title: Temporal and Contextual Transformer for Multi-Camera Editing of TV Shows

Authors: Anyi Rao, Xuekun Jiang, Sichen Wang, Yuwei Guo, Zihao Liu, Bo Dai, Long Pang, Xiaoyu Wu, Dahua Lin, Libiao Jin

(Submitted on 17 Oct 2022)

Abstract: The ability to choose an appropriate camera view among multiple cameras plays a vital role in TV shows delivery. But it is hard to figure out the statistical pattern and apply intelligent processing due to the lack of high-quality training data. To solve this issue, we first collect a novel benchmark on this setting with four diverse scenarios including concerts, sports games, gala shows, and contests, where each scenario contains 6 synchronized tracks recorded by different cameras. It contains 88-hour raw videos that contribute to the 14-hour edited videos. Based on this benchmark, we further propose a new approach temporal and contextual transformer that utilizes clues from historical shots and other views to make shot transition decisions and predict which view to be used. Extensive experiments show that our method outperforms existing methods on the proposed multi-camera editing benchmark.

Comments:	Extended Abstract of ECCV 2022 Workshop on AI for Creative Video Editing and Understanding
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Cite as:	arXiv:2210.08737 [cs.CV]
	(or arXiv:2210.08737v1 [cs.CV] for this version)

Submission history

From: Anyi Rao [view email]
[v1] Mon, 17 Oct 2022 04:11:23 GMT (10238kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2210.08737

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Temporal and Contextual Transformer for Multi-Camera Editing of TV Shows

Submission history