DocumentCode
1419027
Title
A Unified Probabilistic Framework for Name Disambiguation in Digital Library
Author
Tang, Jie ; Fong, A.C.M. ; Wang, Bo ; Zhang, Jing
Author_Institution
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
Volume
24
Issue
6
fYear
2012
fDate
6/1/2012 12:00:00 AM
Firstpage
975
Lastpage
987
Abstract
Despite years of research, the name ambiguity problem remains largely unresolved. Outstanding issues include how to capture all information for name disambiguation in a unified approach, and how to determine the number of people K in the disambiguation process. In this paper, we formalize the problem in a unified probabilistic framework, which incorporates both attributes and relationships. Specifically, we define a disambiguation objective function for the problem and propose a two-step parameter estimation algorithm. We also investigate a dynamic approach for estimating the number of people K. Experiments show that our proposed framework significantly outperforms four baseline methods of using clustering algorithms and two other previous methods. Experiments also indicate that the number K automatically found by our method is close to the actual number.
Keywords
digital libraries; parameter estimation; pattern clustering; probability; clustering algorithms; digital library; name disambiguation; parameter estimation algorithm; unified probabilistic framework; Clustering algorithms; Databases; Heuristic algorithms; Hidden Markov models; Marine vehicles; Partitioning algorithms; Probabilistic logic; Digital libraries; database applications; heterogeneous databases.; information search and retrieval;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2011.13
Filename
5680902
Link To Document