DocumentCode
694712
Title
A Three-Stage Clustering Framework Based on Multiple Feature Combination for Chinese Person Name Disambiguation
Author
Fei Wang ; Yi Yang ; Zhaocai Ma ; Lian Li
Author_Institution
Sch. of Inf. Sci. & Eng., Lanzhou Univ., Lanzhou, China
fYear
2013
fDate
7-8 Dec. 2013
Firstpage
103
Lastpage
109
Abstract
To solve name ambiguity problems and improve the performance of person name disambiguation, we propose a three-stage clustering algorithm in the paper. In the first stage, organizations and locations (OLs) are used to cluster documents about the same person, therefore some texts with more resemblance will be assigned to one category. This stage is simply document clustering based on the similarity of OLs. In the second stage, the clustered documents are used as a new data source from which some novel features (like co-author names) are extracted. We used these new extracted features to make additional clustering between documents. Meanwhile, a method was proposed to solve name ambiguity problems by using social networks construction based on the relationships among co-authors. In the third stage, Web pages are further clustered using content-based hierarchical agglomerative clustering (HAC) algorithm, then analyzing the useful content including title and abstract and keywords (TAKs) to disambiguate the ambiguous names. Experimental results show that our three-stage clustering algorithm can availably enhance the performance of person name disambiguation.
Keywords
natural language processing; pattern clustering; social networking (online); text analysis; Chinese person name disambiguation performance enhancement; OL similarity; TAK; Web page clustering; ambiguous name disambiguation; co-author names; co-author relationships; content-based HAC algorithm; content-based hierarchical agglomerative clustering algorithm; data source; document clustering; feature extraction; multiple feature combination; name ambiguity problems; organization-and-location; social network construction; three-stage clustering framework; title-and-abstract-and-keywords; useful content analyzing; Abstracts; Clustering algorithms; Educational institutions; Feature extraction; Organizations; Social network services; Vectors; hierarchical agglomerative clustering; person name disambiguation; social networks;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Science and Cloud Computing Companion (ISCC-C), 2013 International Conference on
Conference_Location
Guangzhou
Type
conf
DOI
10.1109/ISCC-C.2013.33
Filename
6973577
Link To Document