We gratefully acknowledge support from
the Simons Foundation and member institutions.

Information Retrieval

New submissions

[ total of 6 entries: 1-6 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Wed, 1 Dec 21

[1]  arXiv:2111.15023 [pdf, ps, other]
Title: Georacle: Enabling Geospatially Aware Smart Contracts
Authors: Taha Azzaoui
Subjects: Information Retrieval (cs.IR); Cryptography and Security (cs.CR)

Smart contracts have enabled a paradigm shift in computing by leveraging decentralized networks of trust to achieve consensus at scale. Oracle networks further extend the power of smart contracts by solving the so-called "oracle problem". Such networks enable smart contracts to make use of the vast amount pre-existing data available on the web today without jeopardizing the integrity of the underlying network of trust. By leveraging oracle networks, smart contracts can make decisions based on data corresponding to the physical world. To this end, we introduce Georacle - an oracle service that enables geospatially aware smart contracts in a way that respects the space constrained nature of blockchain environments. Contracts can query the location of objects in a given area, map between street addresses and coordinates, and retrieve the geometry of a desired region of space while conserving gas consumption and avoiding unnecessary data processing.

[2]  arXiv:2111.15068 [pdf, other]
Title: MISS: Multi-Interest Self-Supervised Learning Framework for Click-Through Rate Prediction
Comments: Accepted by ICDE2022
Subjects: Information Retrieval (cs.IR)

CTR prediction is essential for modern recommender systems. Ranging from early factorization machines to deep learning based models in recent years, existing CTR methods focus on capturing useful feature interactions or mining important behavior patterns. Despite the effectiveness, we argue that these methods suffer from the risk of label sparsity (i.e., the user-item interactions are highly sparse with respect to the feature space), label noise (i.e., the collected user-item interactions are usually noisy), and the underuse of domain knowledge (i.e., the pairwise correlations between samples). To address these challenging problems, we propose a novel Multi-Interest Self-Supervised learning (MISS) framework which enhances the feature embeddings with interest-level self-supervision signals. With the help of two novel CNN-based multi-interest extractors,self-supervision signals are discovered with full considerations of different interest representations (point-wise and union-wise), interest dependencies (short-range and long-range), and interest correlations (inter-item and intra-item). Based on that, contrastive learning losses are further applied to the augmented views of interest representations, which effectively improves the feature representation learning. Furthermore, our proposed MISS framework can be used as an plug-in component with existing CTR prediction models and further boost their performances. Extensive experiments on three large-scale datasets show that MISS significantly outperforms the state-of-the-art models, by up to 13.55% in AUC, and also enjoys good compatibility with representative deep CTR models.

Cross-lists for Wed, 1 Dec 21

[3]  arXiv:2111.15629 (cross-list from cs.SI) [pdf, other]
Title: DiPD: Disruptive event Prediction Dataset from Twitter
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)

Riots and protests, if gone out of control, can cause havoc in a country. We have seen examples of this, such as the BLM movement, climate strikes, CAA Movement, and many more, which caused disruption to a large extent. Our motive behind creating this dataset was to use it to develop machine learning systems that can give its users insight into the trending events going on and alert them about the events that could lead to disruption in the nation. If any event starts going out of control, it can be handled and mitigated by monitoring it before the matter escalates. This dataset collects tweets of past or ongoing events known to have caused disruption and labels these tweets as 1. We also collect tweets that are considered non-eventful and label them as 0 so that they can also be used to train a classification system. The dataset contains 94855 records of unique events and 168706 records of unique non-events, thus giving the total dataset 263561 records. We extract multiple features from the tweets, such as the user's follower count and the user's location, to understand the impact and reach of the tweets. This dataset might be useful in various event related machine learning problems such as event classification, event recognition, and so on.

Replacements for Wed, 1 Dec 21

[4]  arXiv:2111.12929 (replaced) [pdf, other]
Title: Unbiased Pairwise Learning to Rank in Recommender Systems
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
[5]  arXiv:2111.14106 (replaced) [pdf]
Title: Enhancing Keyphrase Extraction from Academic Articles with their Reference Information
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Digital Libraries (cs.DL)
[6]  arXiv:2111.11249 (replaced) [pdf, ps, other]
Title: LeQua@CLEF2022: Learning to Quantify
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
[ total of 6 entries: 1-6 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2111, contact, help  (Access key information)