We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:


References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Social and Information Networks

Title: Are We All in a Truman Show? Spotting Instagram Crowdturfing through Self-Training

Abstract: In 2021, Influencer Marketing generated more than $13 billion. Companies and major brands advertise their products on Social Media, especially Instagram, through Influencers, i.e., people with high popularity and the ability to influence the mass. Usually, more popular and visible influencers are paid more for their collaborations. As a result, many services were born to boost profiles' popularity, engagement, or visibility, mainly through bots or fake accounts. Researchers have focused on recognizing such unnatural activities in different social networks with high success. However, real people recently started participating in such boosting activities using their real accounts for monetary rewards, generating ungenuine content that is very difficult to detect. Currently, on Instagram, no works have tried to detect this new phenomenon, known as crowdturfing (CT).
In this work, we are the first to propose a CT engagement detector on Instagram. Our algorithm leverages profiles' characteristics through semi-supervised learning to spot accounts involved in CT activities. In contrast to the supervised methods employed so far to detect fake accounts, a semi-supervised approach takes advantage of the vast quantities of unlabeled data on social media to yield better results. We purchased and studied 1293 CT profiles from 11 providers to build our self-training classifier, which reached 95% accuracy. Finally, we ran our model in the wild to detect and analyze the CT engagement of 20 mega-influencers (i.e., with more than one million followers), discovering that more than 20% of their engagement was artificial. We analyzed the profiles and comments of people involved in CT engagement, showing how difficult it is to spot these activities using only the generated content.
Subjects: Social and Information Networks (cs.SI)
Cite as: arXiv:2206.12904 [cs.SI]
  (or arXiv:2206.12904v1 [cs.SI] for this version)

Submission history

From: Pier Paolo Tricomi [view email]
[v1] Sun, 26 Jun 2022 15:32:31 GMT (1056kb,D)

Link back to: arXiv, form interface, contact.