• DocumentCode
    2117843
  • Title

    An MCL-Based Text Mining Approach for Namesake Disambiguation on the Web

  • Author

    Anwar, Toni ; Abulaish, Muhammad

  • Author_Institution
    Center of Excellence in Inf. Assurance, King Saud Univ., Riyadh, Saudi Arabia
  • Volume
    1
  • fYear
    2012
  • fDate
    4-7 Dec. 2012
  • Firstpage
    40
  • Lastpage
    44
  • Abstract
    In this paper, we propose a Markov Clustering (MCL) based text mining approach for namesake disambiguation on the Web. The novelty of the proposed technique lies in modeling the collection of web pages using a weighted graph structure and applying MCL to crystalize it into different clusters, each one containing the web pages related to a particular namesake individual. The proposed method focuses on three broad and realistic aspects to cluster web pages retrieved through search engines - content overlapping, structure overlapping, and local context overlapping. The efficacy of the proposed method is demonstrated through experimental evaluations on standard datasets.
  • Keywords
    Internet; Markov processes; data mining; graph theory; pattern clustering; text analysis; MCL-based text mining approach; Markov clustering-based text mining approach; Web namesake disambiguation; Web pages collection; cluster Web page retrieval; content overlapping; local context overlapping; standard datasets; structure overlapping; weighted graph structure; Markov clustering; Namesake disambiguation; Text mining; Web content mining; Web people search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence and Intelligent Agent Technology (WI-IAT), 2012 IEEE/WIC/ACM International Conferences on
  • Conference_Location
    Macau
  • Print_ISBN
    978-1-4673-6057-9
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2012.239
  • Filename
    6511863