References & Citations
Computer Science > Computation and Language
Title: Amharic Text Clustering Using Encyclopedic Knowledge with Neural Word Embedding
(Submitted on 31 Mar 2021 (v1), last revised 22 Sep 2022 (this version, v2))
Abstract: In this digital era, almost in every discipline people are using automated systems that generate information represented in document format in different natural languages. As a result, there is a growing interest towards better solutions for finding, organizing and analyzing these documents. In this paper, we propose a system that clusters Amharic text documents using Encyclopedic Knowledge (EK) with neural word embedding. EK enables the representation of related concepts and neural word embedding allows us to handle the contexts of the relatedness. During the clustering process, all the text documents pass through preprocessing stages. Enriched text document features are extracted from each document by mapping with EK and word embedding model. TF-IDF weighted vector of enriched feature was generated. Finally, text documents are clustered using popular spherical K-means algorithm. The proposed system is tested with Amharic text corpus and Amharic Wikipedia data. Test results show that the use of EK with word embedding for document clustering improves the average accuracy over the use of only EK. Furthermore, changing the size of the class has a significant effect on accuracy.
Submission history
From: Dessalew Yohannes [view email][v1] Wed, 31 Mar 2021 05:37:33 GMT (397kb)
[v2] Thu, 22 Sep 2022 14:46:31 GMT (0kb,I)
Link back to: arXiv, form interface, contact.