EGGS: A Flexible Approach to Relational Modeling of Social Network Spam

Brophy, Jonathan; Lowd, Daniel

Full-text links:

Download:

Current browse context:

cs.SI

< prev | next >

new | recent | 2001

Computer Science > Social and Information Networks

Title: EGGS: A Flexible Approach to Relational Modeling of Social Network Spam

Authors: Jonathan Brophy, Daniel Lowd

(Submitted on 14 Jan 2020 (v1), last revised 28 Jan 2020 (this version, v2))

Abstract: Social networking websites face a constant barrage of spam, unwanted messages that distract, annoy, and even defraud honest users. These messages tend to be very short, making them difficult to identify in isolation. Furthermore, spammers disguise their messages to look legitimate, tricking users into clicking on links and tricking spam filters into tolerating their malicious behavior. Thus, some spam filters examine relational structure in the domain, such as connections among users and messages, to better identify deceptive content. However, even when it is used, relational structure is often exploited in an incomplete or ad hoc manner. In this paper, we present Extended Group-based Graphical models for Spam (EGGS), a general-purpose method for classifying spam in online social networks. Rather than labeling each message independently, we group related messages together when they have the same author, the same content, or other domain-specific connections. To reason about related messages, we combine two popular methods: stacked graphical learning (SGL) and probabilistic graphical models (PGM). Both methods capture the idea that messages are more likely to be spammy when related messages are also spammy, but they do so in different ways; SGL uses sequential classifier predictions and PGMs use probabilistic inference. We apply our method to four different social network domains. EGGS is more accurate than an independent model in most experimental settings, especially when the correct label is uncertain. For the PGM implementation, we compare Markov logic networks to probabilistic soft logic and find that both work well with neither one dominating, and the combination of SGL and PGMs usually performs better than either on its own.

Comments:	10 pages, 6 figures, 5 tables. STARAI 2020
Subjects:	Social and Information Networks (cs.SI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2001.04909 [cs.SI]
	(or arXiv:2001.04909v2 [cs.SI] for this version)

Submission history

From: Jonathan Brophy [view email]
[v1] Tue, 14 Jan 2020 17:06:13 GMT (307kb,D)
[v2] Tue, 28 Jan 2020 22:10:00 GMT (161kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2001.04909

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Social and Information Networks

Title: EGGS: A Flexible Approach to Relational Modeling of Social Network Spam

Submission history