We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: A Comparison of Synthetic Oversampling Methods for Multi-class Text Classification

Authors: Anna Glazkova
Abstract: The authors compared oversampling methods for the problem of multi-class topic classification. The SMOTE algorithm underlies one of the most popular oversampling methods. It consists in choosing two examples of a minority class and generating a new example based on them. In the paper, the authors compared the basic SMOTE method with its two modifications (Borderline SMOTE and ADASYN) and random oversampling technique on the example of one of text classification tasks. The paper discusses the k-nearest neighbor algorithm, the support vector machine algorithm and three types of neural networks (feedforward network, long short-term memory (LSTM) and bidirectional LSTM). The authors combine these machine learning algorithms with different text representations and compared synthetic oversampling methods. In most cases, the use of oversampling techniques can significantly improve the quality of classification. The authors conclude that for this task, the quality of the KNN and SVM algorithms is more influenced by class imbalance than neural networks.
Comments: 12 pages, 5 figures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
MSC classes: 68T50
ACM classes: I.7.2; I.2.7
Cite as: arXiv:2008.04636 [cs.CL]
  (or arXiv:2008.04636v1 [cs.CL] for this version)

Submission history

From: Anna Glazkova [view email]
[v1] Tue, 11 Aug 2020 11:41:53 GMT (555kb)

Link back to: arXiv, form interface, contact.