An MCL-Based Text Mining Approach for Namesake Disambiguation on the Web

Author

Anwar, Toni ; Abulaish, Muhammad

Author_Institution

Center of Excellence in Inf. Assurance, King Saud Univ., Riyadh, Saudi Arabia

Volume

1

fYear

2012

fDate

4-7 Dec. 2012

Firstpage

40

Lastpage

44

Abstract

In this paper, we propose a Markov Clustering (MCL) based text mining approach for namesake disambiguation on the Web. The novelty of the proposed technique lies in modeling the collection of web pages using a weighted graph structure and applying MCL to crystalize it into different clusters, each one containing the web pages related to a particular namesake individual. The proposed method focuses on three broad and realistic aspects to cluster web pages retrieved through search engines - content overlapping, structure overlapping, and local context overlapping. The efficacy of the proposed method is demonstrated through experimental evaluations on standard datasets.

Keywords

Internet; Markov processes; data mining; graph theory; pattern clustering; text analysis; MCL-based text mining approach; Markov clustering-based text mining approach; Web namesake disambiguation; Web pages collection; cluster Web page retrieval; content overlapping; local context overlapping; standard datasets; structure overlapping; weighted graph structure; Markov clustering; Namesake disambiguation; Text mining; Web content mining; Web people search;

fLanguage

English

Publisher

ieee

Conference_Titel

Web Intelligence and Intelligent Agent Technology (WI-IAT), 2012 IEEE/WIC/ACM International Conferences on

Conference_Location

Macau

Print_ISBN

978-1-4673-6057-9

Type

conf

DOI

10.1109/WI-IAT.2012.239

Filename

6511863