We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.IR

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Information Retrieval

Title: Représentations lexicales pour la détection non supervisée d'événements dans un flux de tweets : étude sur des corpus français et anglais

Authors: Béatrice Mazoyer (MICS), Nicolas Hervé (INA), Céline Hudelot (MICS), Julia Cage (ECON)
Abstract: In this work, we evaluate the performance of recent text embeddings for the automatic detection of events in a stream of tweets. We model this task as a dynamic clustering problem.Our experiments are conducted on a publicly available corpus of tweets in English and on a similar dataset in French annotated by our team. We show that recent techniques based on deep neural networks (ELMo, Universal Sentence Encoder, BERT, SBERT), although promising on many applications, are not very suitable for this task. We also experiment with different types of fine-tuning to improve these results on French data. Finally, we propose a detailed analysis of the results obtained, showing the superiority of tf-idf approaches for this task.
Comments: in French. Extraction et Gestion des connaissances, EGC 2020, Jan 2020, Bruxelles, France
Subjects: Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
Cite as: arXiv:2001.04139 [cs.IR]
  (or arXiv:2001.04139v1 [cs.IR] for this version)

Submission history

From: Beatrice Mazoyer [view email]
[v1] Mon, 13 Jan 2020 10:25:49 GMT (26kb)

Link back to: arXiv, form interface, contact.