We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.IR

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Information Retrieval

Title: Parameterizing Kterm Hashing

Abstract: Kterm Hashing provides an innovative approach to novelty detection on massive data streams. Previous research focused on maximizing the efficiency of Kterm Hashing and succeeded in scaling First Story Detection to Twitter-size data stream without sacrificing detection accuracy. In this paper, we focus on improving the effectiveness of Kterm Hashing. Traditionally, all kterms are considered as equally important when calculating a document's degree of novelty with respect to the past. We believe that certain kterms are more important than others and hypothesize that uniform kterm weights are sub-optimal for determining novelty in data streams. To validate our hypothesis, we parameterize Kterm Hashing by assigning weights to kterms based on their characteristics. Our experiments apply Kterm Hashing in a First Story Detection setting and reveal that parameterized Kterm Hashing can surpass state-of-the-art detection accuracy and significantly outperform the uniformly weighted approach.
Comments: Kterm Hashing, Novelty Detection, First Story Detection
Subjects: Information Retrieval (cs.IR)
Journal reference: SIGIR 18, July 2018, Ann Arbor, MI, USA
DOI: 10.1145/3209978.3210101
Cite as: arXiv:2208.01340 [cs.IR]
  (or arXiv:2208.01340v1 [cs.IR] for this version)

Submission history

From: Dominik Wurzer [view email]
[v1] Tue, 2 Aug 2022 10:01:02 GMT (98kb)

Link back to: arXiv, form interface, contact.