We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.IR

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Information Retrieval

Title: Serial Speakers: a Dataset of TV Series

Authors: Xavier Bost (LIA), Vincent Labatut (LIA), Georges Linares (LIA)
Abstract: For over a decade, TV series have been drawing increasing interest, both from the audience and from various academic fields. But while most viewers are hooked on the continuous plots of TV serials, the few annotated datasets available to researchers focus on standalone episodes of classical TV series. We aim at filling this gap by providing the multimedia/speech processing communities with Serial Speakers, an annotated dataset of 161 episodes from three popular American TV serials: Breaking Bad, Game of Thrones and House of Cards. Serial Speakers is suitable both for investigating multimedia retrieval in realistic use case scenarios, and for addressing lower level speech related tasks in especially challenging conditions. We publicly release annotations for every speech turn (boundaries, speaker) and scene boundary, along with annotations for shot boundaries, recurring shots, and interacting speakers in a subset of episodes. Because of copyright restrictions, the textual content of the speech turns is encrypted in the public version of the dataset, but we provide the users with a simple online tool to recover the plain text from their own subtitle files.
Subjects: Information Retrieval (cs.IR); Multimedia (cs.MM)
Journal reference: 12th International Conference on Language Resources and Evaluation (LREC 2020), p.4256-4264, May 2020, Marseille, France
Cite as: arXiv:2002.06923 [cs.IR]
  (or arXiv:2002.06923v1 [cs.IR] for this version)

Submission history

From: Xavier Bost [view email]
[v1] Mon, 17 Feb 2020 12:51:21 GMT (1197kb,D)

Link back to: arXiv, form interface, contact.