We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Towards Computational Linguistics in Minangkabau Language: Studies on Sentiment Analysis and Machine Translation

Abstract: Although some linguists (Rusmali et al., 1985; Crouch, 2009) have fairly attempted to define the morphology and syntax of Minangkabau, information processing in this language is still absent due to the scarcity of the annotated resource. In this work, we release two Minangkabau corpora: sentiment analysis and machine translation that are harvested and constructed from Twitter and Wikipedia. We conduct the first computational linguistics in Minangkabau language employing classic machine learning and sequence-to-sequence models such as LSTM and Transformer. Our first experiments show that the classification performance over Minangkabau text significantly drops when tested with the model trained in Indonesian. Whereas, in the machine translation experiment, a simple word-to-word translation using a bilingual dictionary outperforms LSTM and Transformer model in terms of BLEU score.
Comments: Accepted at PACLIC 2020 - The 34th Pacific Asia Conference on Language, Information and Computation
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2009.09309 [cs.CL]
  (or arXiv:2009.09309v1 [cs.CL] for this version)

Submission history

From: Fajri Koto [view email]
[v1] Sat, 19 Sep 2020 22:13:27 GMT (541kb,D)

Link back to: arXiv, form interface, contact.