DocumentCode :
2135218
Title :
Snooping Wikipedia vandals with MapReduce
Author :
Spina, Michele ; Rossi, Dario ; Sozio, Mauro ; Maniu, Silviu ; Cautis, Bogdan
Author_Institution :
LINCS, Paris, France
fYear :
2015
fDate :
8-12 June 2015
Firstpage :
1146
Lastpage :
1151
Abstract :
In this paper, we present and validate an algorithm able to accurately identify anomalous behaviors on online and collaborative social networks, based on their interaction with other fellows. We focus on Wikipedia, where accurate ground truth for the classification of vandals can be reliably gathered by manual inspection of the page edit history. We develop a distributed crawler and classifier tasks, both implemented in MapReduce, with whom we are able to explore a very large dataset, consisting of over 5 millions articles collaboratively edited by 14 millions authors, resulting in over 8 billion pairwise interactions. We represent Wikipedia as a signed network, where positive arcs imply constructive interaction between editors. We then isolate a set of high reputation editors (i.e., nodes having many positive incoming links) and classify the remaining ones based on their interactions with high reputation editors. We demonstrate our approach not only to be practically relevant (due to the size of our dataset), but also feasible (as it requires few MapReduce iteration) and accurate (over 95% true positive rate). At the same time, we are able to classify only about half of the dataset editors (recall of 50%) for which we outline some solution under study.
Keywords :
Crawlers; Electronic publishing; Encyclopedias; Internet; Reliability; Social network services;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communications (ICC), 2015 IEEE International Conference on
Conference_Location :
London, United Kingdom
Type :
conf
DOI :
10.1109/ICC.2015.7248477
Filename :
7248477
Link To Document :
بازگشت