DocumentCode
2117843
Title
An MCL-Based Text Mining Approach for Namesake Disambiguation on the Web
Author
Anwar, Toni ; Abulaish, Muhammad
Author_Institution
Center of Excellence in Inf. Assurance, King Saud Univ., Riyadh, Saudi Arabia
Volume
1
fYear
2012
fDate
4-7 Dec. 2012
Firstpage
40
Lastpage
44
Abstract
In this paper, we propose a Markov Clustering (MCL) based text mining approach for namesake disambiguation on the Web. The novelty of the proposed technique lies in modeling the collection of web pages using a weighted graph structure and applying MCL to crystalize it into different clusters, each one containing the web pages related to a particular namesake individual. The proposed method focuses on three broad and realistic aspects to cluster web pages retrieved through search engines - content overlapping, structure overlapping, and local context overlapping. The efficacy of the proposed method is demonstrated through experimental evaluations on standard datasets.
Keywords
Internet; Markov processes; data mining; graph theory; pattern clustering; text analysis; MCL-based text mining approach; Markov clustering-based text mining approach; Web namesake disambiguation; Web pages collection; cluster Web page retrieval; content overlapping; local context overlapping; standard datasets; structure overlapping; weighted graph structure; Markov clustering; Namesake disambiguation; Text mining; Web content mining; Web people search;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2012 IEEE/WIC/ACM International Conferences on
Conference_Location
Macau
Print_ISBN
978-1-4673-6057-9
Type
conf
DOI
10.1109/WI-IAT.2012.239
Filename
6511863
Link To Document