Lifelong Learning Natural Language Processing Approach for Multilingual Data Classification

Kozal, Jędrzej; Leś, Michał; Zyblewski, Paweł; Ksieniewicz, Paweł; Woźniak, Michał

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2206

Computer Science > Computation and Language

Title: Lifelong Learning Natural Language Processing Approach for Multilingual Data Classification

Authors: Jędrzej Kozal, Michał Leś, Paweł Zyblewski, Paweł Ksieniewicz, Michał Woźniak

(Submitted on 25 May 2022)

Abstract: The abundance of information in digital media, which in today's world is the main source of knowledge about current events for the masses, makes it possible to spread disinformation on a larger scale than ever before. Consequently, there is a need to develop novel fake news detection approaches capable of adapting to changing factual contexts and generalizing previously or concurrently acquired knowledge. To deal with this problem, we propose a lifelong learning-inspired approach, which allows for fake news detection in multiple languages and the mutual transfer of knowledge acquired in each of them. Both classical feature extractors, such as Term frequency-inverse document frequency or Latent Dirichlet Allocation, and integrated deep NLP (Natural Language Processing) BERT (Bidirectional Encoder Representations from Transformers) models paired with MLP (Multilayer Perceptron) classifier, were employed. The results of experiments conducted on two datasets dedicated to the fake news classification task (in English and Spanish, respectively), supported by statistical analysis, confirmed that utilization of additional languages could improve performance for traditional methods. Also, in some cases supplementing the deep learning method with classical ones can positively impact obtained results. The ability of models to generalize the knowledge acquired between the analyzed languages was also observed.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2206.11867 [cs.CL]
	(or arXiv:2206.11867v1 [cs.CL] for this version)

Submission history

From: Jędrzej Kozal [view email]
[v1] Wed, 25 May 2022 10:34:04 GMT (112kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.11867

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Lifelong Learning Natural Language Processing Approach for Multilingual Data Classification

Submission history