We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Event-Driven News Stream Clustering using Entity-Aware Contextual Embeddings

Abstract: We propose a method for online news stream clustering that is a variant of the non-parametric streaming K-means algorithm. Our model uses a combination of sparse and dense document representations, aggregates document-cluster similarity along these multiple representations and makes the clustering decision using a neural classifier. The weighted document-cluster similarity model is learned using a novel adaptation of the triplet loss into a linear classification objective. We show that the use of a suitable fine-tuning objective and external knowledge in pre-trained transformer models yields significant improvements in the effectiveness of contextual embeddings for clustering. Our model achieves a new state-of-the-art on a standard stream clustering dataset of English documents.
Comments: To appear in Proceedings of The 16th Conference of the European Chapter of the Association for Computational Linguistics
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
ACM classes: I.2.7
Cite as: arXiv:2101.11059 [cs.CL]
  (or arXiv:2101.11059v1 [cs.CL] for this version)

Submission history

From: Muthu Kumar Chandrasekaran [view email]
[v1] Tue, 26 Jan 2021 19:58:30 GMT (17493kb,D)

Link back to: arXiv, form interface, contact.