• DocumentCode
    694712
  • Title

    A Three-Stage Clustering Framework Based on Multiple Feature Combination for Chinese Person Name Disambiguation

  • Author

    Fei Wang ; Yi Yang ; Zhaocai Ma ; Lian Li

  • Author_Institution
    Sch. of Inf. Sci. & Eng., Lanzhou Univ., Lanzhou, China
  • fYear
    2013
  • fDate
    7-8 Dec. 2013
  • Firstpage
    103
  • Lastpage
    109
  • Abstract
    To solve name ambiguity problems and improve the performance of person name disambiguation, we propose a three-stage clustering algorithm in the paper. In the first stage, organizations and locations (OLs) are used to cluster documents about the same person, therefore some texts with more resemblance will be assigned to one category. This stage is simply document clustering based on the similarity of OLs. In the second stage, the clustered documents are used as a new data source from which some novel features (like co-author names) are extracted. We used these new extracted features to make additional clustering between documents. Meanwhile, a method was proposed to solve name ambiguity problems by using social networks construction based on the relationships among co-authors. In the third stage, Web pages are further clustered using content-based hierarchical agglomerative clustering (HAC) algorithm, then analyzing the useful content including title and abstract and keywords (TAKs) to disambiguate the ambiguous names. Experimental results show that our three-stage clustering algorithm can availably enhance the performance of person name disambiguation.
  • Keywords
    natural language processing; pattern clustering; social networking (online); text analysis; Chinese person name disambiguation performance enhancement; OL similarity; TAK; Web page clustering; ambiguous name disambiguation; co-author names; co-author relationships; content-based HAC algorithm; content-based hierarchical agglomerative clustering algorithm; data source; document clustering; feature extraction; multiple feature combination; name ambiguity problems; organization-and-location; social network construction; three-stage clustering framework; title-and-abstract-and-keywords; useful content analyzing; Abstracts; Clustering algorithms; Educational institutions; Feature extraction; Organizations; Social network services; Vectors; hierarchical agglomerative clustering; person name disambiguation; social networks;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Science and Cloud Computing Companion (ISCC-C), 2013 International Conference on
  • Conference_Location
    Guangzhou
  • Type

    conf

  • DOI
    10.1109/ISCC-C.2013.33
  • Filename
    6973577