We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Comparing Performance of Different Linguistically-Backed Word Embeddings for Cyberbullying Detection

Abstract: In most cases, word embeddings are learned only from raw tokens or in some cases, lemmas. This includes pre-trained language models like BERT. To investigate on the potential of capturing deeper relations between lexical items and structures and to filter out redundant information, we propose to preserve the morphological, syntactic and other types of linguistic information by combining them with the raw tokens or lemmas. This means, for example, including parts-of-speech or dependency information within the used lexical features. The word embeddings can then be trained on the combinations instead of just raw tokens. It is also possible to later apply this method to the pre-training of huge language models and possibly enhance their performance. This would aid in tackling problems which are more sophisticated from the point of view of linguistic representation, such as detection of cyberbullying.
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Journal reference: Proceedings of the 2021 International Workshop on Modern Science and Technology, September 29, 2021
DOI: 10.19000/0002000095
Cite as: arXiv:2206.01950 [cs.CL]
  (or arXiv:2206.01950v1 [cs.CL] for this version)

Submission history

From: Juuso Eronen [view email]
[v1] Sat, 4 Jun 2022 09:11:41 GMT (76kb)

Link back to: arXiv, form interface, contact.