We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: Probabilistic Random Indexing for Continuous Event Detection

Abstract: The present paper explores a novel variant of Random Indexing (RI) based representations for encoding language data with a view to using them in a dynamic scenario where events are happening in a continuous fashion. As the size of the representations in the general method of onehot encoding grows linearly with the size of the vocabulary, they become non-scalable for online purposes with high volumes of dynamic data. On the other hand, existing pre-trained embedding models are not suitable for detecting happenings of new events due to the dynamic nature of the text data. The present work addresses this issue by using a novel RI representation by imposing a probability distribution on the number of randomized entries which leads to a class of RI representations. It also provides a rigorous analysis of the goodness of the representation methods to encode semantic information in terms of the probability of orthogonality. Building on these ideas we propose an algorithm that is log-linear with the size of vocabulary to track the semantic relationship of a query word to other words for suggesting the events that are relevant to the word in question. We ran simulations using the proposed algorithm for tweet data specific to three different events and present our findings. The proposed probabilistic RI representations are found to be much faster and scalable than Bag of Words (BoW) embeddings while maintaining accuracy in depicting semantic relationships.
Comments: 8 pages, 12 figures
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as: arXiv:2008.12552 [cs.LG]
  (or arXiv:2008.12552v3 [cs.LG] for this version)

Submission history

From: Yashank Singh [view email]
[v1] Fri, 28 Aug 2020 09:37:39 GMT (5149kb,D)
[v2] Wed, 30 Sep 2020 13:51:27 GMT (5150kb,D)
[v3] Thu, 9 Dec 2021 06:48:20 GMT (932kb,D)

Link back to: arXiv, form interface, contact.