DocumentCode :
3673685
Title :
Link Analysis of Wikipedia Documents Using MapReduce
Author :
Vasa Hardik;Vasudevan Anirudh;Palanisamy Balaji
Author_Institution :
Sch. of Inf. Sci., Univ. of Pittsburgh, Pittsburgh, PA, USA
fYear :
2015
Firstpage :
582
Lastpage :
588
Abstract :
Wikipedia, a collaborative and user driven encyclopedia is considered to be the largest content thesaurus on the web, expanding into a massive database housing a huge amount of information. In this paper, we present the design and implementation of a MapReduce-based Wikipedia link analysis system that provides a hierarchical examination of document connectivity in Wikipedia and captures the semantic relationships between the articles. Our system consists of a Wikipedia crawler, a MapReduce-based distributed parser and the link analysis techniques. The results produced by this study are then modelled to the web Key Performance Indicators (KPIs) for link-structure interpretation. We find that Wikipedia has a remarkable capability as a corpus for content correlation with respect to connectivity among articles. Link Analysis and Semantic Structuration of Wikipedia not only provides an ergonomic report of tire-based link hierarchy of Wikipedia articles but also reflects the general cognition on semantic relationship between them. The results of our analysis are aimed at providing valuable insights on evaluating the accuracy and the content scalability of Wikipedia through its link schematics.
Keywords :
"Encyclopedias","Electronic publishing","Internet","Crawlers","Web pages","Accuracy"
Publisher :
ieee
Conference_Titel :
Information Reuse and Integration (IRI), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/IRI.2015.92
Filename :
7301030
Link To Document :
بازگشت