We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: SimRelUz: Similarity and Relatedness scores as a Semantic Evaluation dataset for Uzbek language

Abstract: Semantic relatedness between words is one of the core concepts in natural language processing, thus making semantic evaluation an important task. In this paper, we present a semantic model evaluation dataset: SimRelUz - a collection of similarity and relatedness scores of word pairs for the low-resource Uzbek language. The dataset consists of more than a thousand pairs of words carefully selected based on their morphological features, occurrence frequency, semantic relation, as well as annotated by eleven native Uzbek speakers from different age groups and gender. We also paid attention to the problem of dealing with rare words and out-of-vocabulary words to thoroughly evaluate the robustness of semantic models.
Comments: Final version, published in the proceedings of SIGUL workshop of LREC 2022
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2205.06072 [cs.CL]
  (or arXiv:2205.06072v1 [cs.CL] for this version)

Submission history

From: Elmurod Kuriyozov [view email]
[v1] Thu, 12 May 2022 13:11:28 GMT (199kb,D)

Link back to: arXiv, form interface, contact.