Title :
An MCL-Based Text Mining Approach for Namesake Disambiguation on the Web
Author :
Anwar, Toni ; Abulaish, Muhammad
Author_Institution :
Center of Excellence in Inf. Assurance, King Saud Univ., Riyadh, Saudi Arabia
Abstract :
In this paper, we propose a Markov Clustering (MCL) based text mining approach for namesake disambiguation on the Web. The novelty of the proposed technique lies in modeling the collection of web pages using a weighted graph structure and applying MCL to crystalize it into different clusters, each one containing the web pages related to a particular namesake individual. The proposed method focuses on three broad and realistic aspects to cluster web pages retrieved through search engines - content overlapping, structure overlapping, and local context overlapping. The efficacy of the proposed method is demonstrated through experimental evaluations on standard datasets.
Keywords :
Internet; Markov processes; data mining; graph theory; pattern clustering; text analysis; MCL-based text mining approach; Markov clustering-based text mining approach; Web namesake disambiguation; Web pages collection; cluster Web page retrieval; content overlapping; local context overlapping; standard datasets; structure overlapping; weighted graph structure; Markov clustering; Namesake disambiguation; Text mining; Web content mining; Web people search;
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2012 IEEE/WIC/ACM International Conferences on
Conference_Location :
Macau
Print_ISBN :
978-1-4673-6057-9
DOI :
10.1109/WI-IAT.2012.239