DocumentCode :
480741
Title :
Name Disambiguation Boosted by Latent Topics from Web Directories
Author :
Vu, Quang Minh ; Takasu, Atsuhiro ; Adachi, Jun
Author_Institution :
Nat. Inst. of Inf., Tokyo
Volume :
1
fYear :
2008
fDate :
9-12 Dec. 2008
Firstpage :
697
Lastpage :
703
Abstract :
Search results for personal name queries often contain documents relevant to several people as a personal name is often shared by several people. In order to differentiate people in these search results, it is required to extract contexts relevant to people in documents. However, since Web documents are noisy and the texts related to people might be short, it is difficult to extract contexts of people effectively. We propose a new method that uses web directories as additional information in order to recognize topic terms in documents more easily and to extract contexts of people more effectively. First, we apply latent Dirichlet allocation method to extract latent topics in Web directories. Then, the extracted topics are used to recognize topics contained in name ambiguity documents so that common context measurements can be calculated more effectively. Our experiments, conducted with documents of real people in the Web and several well-known Web directories, show that our approach disambiguates personal names better than some other conventional approaches like vector space model approach and named entity recognition approach.
Keywords :
Internet; document handling; information retrieval; Web directory; Web documents; context measurements; latent Dirichlet allocation; latent topics; name ambiguity documents; name disambiguation; named entity recognition; personal name query; vector space model approach; Context modeling; Data mining; Feature extraction; Frequency; Informatics; Intelligent agent; Linear discriminant analysis; Search engines; Web sites; World Wide Web; Personal name disambiguation; document similarity; knowledge base; latent Dirichlet allocation; latent topic extraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT '08. IEEE/WIC/ACM International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-0-7695-3496-1
Type :
conf
DOI :
10.1109/WIIAT.2008.171
Filename :
4740532
Link To Document :
بازگشت