We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: Sequential Targeting: an incremental learning approach for data imbalance in text classification

Abstract: Classification tasks require a balanced distribution of data to ensure the learner to be trained to generalize over all classes. In real-world datasets, however, the number of instances vary substantially among classes. This typically leads to a learner that promotes bias towards the majority group due to its dominating property. Therefore, methods to handle imbalanced datasets are crucial for alleviating distributional skews and fully utilizing the under-represented data, especially in text classification. While addressing the imbalance in text data, most methods utilize sampling methods on the numerical representation of the data, which limits its efficiency on how effective the representation is. We propose a novel training method, Sequential Targeting(ST), independent of the effectiveness of the representation method, which enforces an incremental learning setting by splitting the data into mutually exclusive subsets and training the learner adaptively. To address problems that arise within incremental learning, we apply elastic weight consolidation. We demonstrate the effectiveness of our method through experiments on simulated benchmark datasets (IMDB) and data collected from NAVER.
Comments: 9 pages, 7 figures, submitted to the journal of Expert Systems with Applications
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Cite as: arXiv:2011.10216 [cs.LG]
  (or arXiv:2011.10216v2 [cs.LG] for this version)

Submission history

From: Joel Jang [view email]
[v1] Fri, 20 Nov 2020 04:54:00 GMT (359kb,D)
[v2] Mon, 23 Nov 2020 02:33:08 GMT (359kb,D)

Link back to: arXiv, form interface, contact.