DocumentCode :
2711239
Title :
Iterative Set Expansion of Named Entities Using the Web
Author :
Wang, Richard C. ; Cohen, William W.
Author_Institution :
Language Technol. Inst., Carnegie Mellon Univ. Pittsburgh, Pittsburgh, PA
fYear :
2008
fDate :
15-19 Dec. 2008
Firstpage :
1091
Lastpage :
1096
Abstract :
Set expansion refers to expanding a partial set of "seed" objects into a more complete set. One system that does set expansion is SEAL (set expander for any language), which expands entities automatically by utilizing resources from the Web in a language independent fashion. In a previous study, SEAL showed good set expansion performance using three seed entities; however, when given a larger set of seeds (e.g., ten), SEAL\´s expansion method performs poorly. In this paper, we present iterative SEAL (iSEAL), which allows a user to provide many seeds. Briefly, iSEAL makes several calls to SEAL, each call using a small number of seeds. We also show that iSEAL can be used in a "bootstrapping" manner, where each call to SEAL uses a mixture of user-provided and self-generated seeds. We show that the bootstrapping version of iSEAL obtains better results than SEAL even when using fewer user-provided seeds. In addition, we compare the performance of various ranking algorithms used in iSEAL, and show that the choice of ranking method has a small effect on performance when all seeds are user-provided, but a large effect when iSEAL is bootstrapped. In particular, we show that random walk with restart is nearly as good as Bayesian sets with user-provided seeds, and performs best with bootstrapped seeds.
Keywords :
Bayes methods; Internet; iterative methods; Bayesian sets; Web; bootstrapped seeds; bootstrapping version; iSEAL; iterative SEAL; iterative set expansion; named entities; random walk; seed entities; self-generated seeds; set expander; Bayesian methods; Data mining; HTML; Markup languages; Motion pictures; Natural languages; Seals; TV; USA Councils; Watches; bootstrapping; named entities; seal; set expansion;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
Conference_Location :
Pisa
ISSN :
1550-4786
Print_ISBN :
978-0-7695-3502-9
Type :
conf
DOI :
10.1109/ICDM.2008.145
Filename :
4781230
Link To Document :
بازگشت