We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Multilingual Text Classification for Dravidian Languages

Abstract: As the fourth largest language family in the world, the Dravidian languages have become a research hotspot in natural language processing (NLP). Although the Dravidian languages contain a large number of languages, there are relatively few public available resources. Besides, text classification task, as a basic task of natural language processing, how to combine it to multiple languages in the Dravidian languages, is still a major difficulty in Dravidian Natural Language Processing. Hence, to address these problems, we proposed a multilingual text classification framework for the Dravidian languages. On the one hand, the framework used the LaBSE pre-trained model as the base model. Aiming at the problem of text information bias in multi-task learning, we propose to use the MLM strategy to select language-specific words, and used adversarial training to perturb them. On the other hand, in view of the problem that the model cannot well recognize and utilize the correlation among languages, we further proposed a language-specific representation module to enrich semantic information for the model. The experimental results demonstrated that the framework we proposed has a significant performance in multilingual text classification tasks with each strategy achieving certain improvements.
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2112.01705 [cs.CL]
  (or arXiv:2112.01705v1 [cs.CL] for this version)

Submission history

From: Nankai Lin [view email]
[v1] Fri, 3 Dec 2021 04:26:49 GMT (662kb)

Link back to: arXiv, form interface, contact.