Title :
An improvement of weighted PageRank to handle the zero link similarity
Author :
Sang-yeon Lee ; Young-gi Kim ; Seok-Jong Lee ; Keon Myung Lee
Author_Institution :
Dept. of Comput. Sci., Chungbuk Nat. Univ., Cheongju, South Korea
Abstract :
The well-known PageRank algorithm makes use of the link structure to calculate a quality rank for pages. It basically delivers the same amount of probability to the neighboring pages of a page. As its extensions, the weighted PageRank algorithms have been proposed which give different weights to outgoing links from a page. Some weighted PageRank algorithm uses the inter-page similarities as weights. In Korean web pages, we have found that it sometimes happens to have zero value for the inter-page similarity of neighboring pages due to the language characteristics. This paper proposes an improved weighted PageRank algorithm that can deal with such zero inter-page similarities. The proposed method has been implemented using the MapReduce paradigm for big data handling, and has been evaluated over the Korean Wikipedia webpages and compared with two other methods.
Keywords :
Big Data; Web sites; parallel processing; probability; Big Data handling; Korean Web pages; Korean Wikipedia; MapReduce paradigm; probability; quality rank; weighted PageRank algorithms; zero inter-page similarities; zero link similarity; Clustering algorithms; Electronic publishing; Encyclopedias; Internet; Vectors; Web pages; MapReduce; PageRank; Similarity; TFIDF; Weighted PageRank;
Conference_Titel :
Soft Computing and Intelligent Systems (SCIS), 2014 Joint 7th International Conference on and Advanced Intelligent Systems (ISIS), 15th International Symposium on
DOI :
10.1109/SCIS-ISIS.2014.7044873