We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.IR

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Information Retrieval

Title: HC4: A New Suite of Test Collections for Ad Hoc CLIR

Abstract: HC4 is a new suite of test collections for ad hoc Cross-Language Information Retrieval (CLIR), with Common Crawl News documents in Chinese, Persian, and Russian, topics in English and in the document languages, and graded relevance judgments. New test collections are needed because existing CLIR test collections built using pooling of traditional CLIR runs have systematic gaps in their relevance judgments when used to evaluate neural CLIR methods. The HC4 collections contain 60 topics and about half a million documents for each of Chinese and Persian, and 54 topics and five million documents for Russian. Active learning was used to determine which documents to annotate after being seeded using interactive search and judgment. Documents were judged on a three-grade relevance scale. This paper describes the design and construction of the new test collections and provides baseline results for demonstrating their utility for evaluating systems.
Comments: 16 pages, 2 figures, accepted at ECIR 2022
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Cite as: arXiv:2201.09992 [cs.IR]
  (or arXiv:2201.09992v1 [cs.IR] for this version)

Submission history

From: Eugene Yang [view email]
[v1] Mon, 24 Jan 2022 22:52:11 GMT (539kb,D)

Link back to: arXiv, form interface, contact.