We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Digital Libraries

Title: Finding Person Relations in Image Data of the Internet Archive

Abstract: The multimedia content in the World Wide Web is rapidly growing and contains valuable information for many applications in different domains. For this reason, the Internet Archive initiative has been gathering billions of time-versioned web pages since the mid-nineties. However, the huge amount of data is rarely labeled with appropriate metadata and automatic approaches are required to enable semantic search. Normally, the textual content of the Internet Archive is used to extract entities and their possible relations across domains such as politics and entertainment, whereas image and video content is usually neglected. In this paper, we introduce a system for person recognition in image content of web news stored in the Internet Archive. Thus, the system complements entity recognition in text and allows researchers and analysts to track media coverage and relations of persons more precisely. Based on a deep learning face recognition approach, we suggest a system that automatically detects persons of interest and gathers sample material, which is subsequently used to identify them in the image data of the Internet Archive. We evaluate the performance of the face recognition system on an appropriate standard benchmark dataset and demonstrate the feasibility of the approach with two use cases.
Subjects: Digital Libraries (cs.DL); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
Journal reference: In: M\'endez E., Crestani F., Ribeiro C., David G., Lopes J. (eds) Digital Libraries for Open Knowledge. TPDL 2018. Lecture Notes in Computer Science, vol 11057. Springer, Cham
DOI: 10.1007/978-3-030-00066-0_20
Cite as: arXiv:1806.08246 [cs.DL]
  (or arXiv:1806.08246v2 [cs.DL] for this version)

Submission history

From: Eric Müller-Budack [view email]
[v1] Thu, 21 Jun 2018 13:48:21 GMT (609kb,D)
[v2] Tue, 28 May 2019 13:04:14 GMT (609kb,D)

Link back to: arXiv, form interface, contact.