We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Speech Detection Task Against Asian Hate: BERT the Central, While Data-Centric Studies the Crucial

Authors: Xin Lian
Abstract: With the COVID-19 pandemic continuing, hatred against Asians is intensifying in countries outside Asia, especially among the Chinese. There is an urgent need to detect and prevent hate speech towards Asians effectively. In this work, we first create COVID-HATE-2022, an annotated dataset including 2,025 annotated tweets fetched in early February 2022, which are labeled based on specific criteria, and we present the comprehensive collection of scenarios of hate and non-hate tweets in the dataset. Second, we fine-tune the BERT model based on the relevant datasets and demonstrate several strategies related to the "cleaning" of the tweets. Third, we investigate the performance of advanced fine-tuning strategies with various model-centric and data-centric approaches, and we show that both strategies generally improve the performance, while data-centric ones outperform the others, and it demonstrates the feasibility and effectiveness of the data-centric approaches in the associated tasks.
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2206.02114 [cs.CL]
  (or arXiv:2206.02114v2 [cs.CL] for this version)

Submission history

From: Xin Lian [view email]
[v1] Sun, 5 Jun 2022 07:41:24 GMT (62kb,D)
[v2] Sun, 21 Aug 2022 15:22:03 GMT (51kb,D)

Link back to: arXiv, form interface, contact.