Contrastive Graph Multimodal Model for Text Classification in Videos

Liu, Ye; Lu, Changchong; Lin, Chen; Yin, Di; Ren, Bo

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2206

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: Contrastive Graph Multimodal Model for Text Classification in Videos

Authors: Ye Liu, Changchong Lu, Chen Lin, Di Yin, Bo Ren

(Submitted on 6 Jun 2022)

Abstract: The extraction of text information in videos serves as a critical step towards semantic understanding of videos. It usually involved in two steps: (1) text recognition and (2) text classification. To localize texts in videos, we can resort to large numbers of text recognition methods based on OCR technology. However, to our knowledge, there is no existing work focused on the second step of video text classification, which will limit the guidance to downstream tasks such as video indexing and browsing. In this paper, we are the first to address this new task of video text classification by fusing multimodal information to deal with the challenging scenario where different types of video texts may be confused with various colors, unknown fonts and complex layouts. In addition, we tailor a specific module called CorrelationNet to reinforce feature representation by explicitly extracting layout information. Furthermore, contrastive learning is utilized to explore inherent connections between samples using plentiful unlabeled videos. Finally, we construct a new well-defined industrial dataset from the news domain, called TI-News, which is dedicated to building and evaluating video text recognition and classification applications. Extensive experiments on TI-News demonstrate the effectiveness of our method.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2206.02343 [cs.CV]
	(or arXiv:2206.02343v1 [cs.CV] for this version)

Submission history

From: Ye Liu [view email]
[v1] Mon, 6 Jun 2022 04:06:21 GMT (5065kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.02343

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Contrastive Graph Multimodal Model for Text Classification in Videos

Submission history