DocumentCode
2632707
Title
Mining the URLs: An Approach to Measure the Similarities between Named-Entities
Author
Liu, Hui ; Zhao, Jinglei ; Lu, Ruzhan
Author_Institution
Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai
fYear
2008
fDate
18-20 June 2008
Firstpage
75
Lastpage
75
Abstract
Measuring the similarity between named-entities is a foundation work for a number of practical applications, such as information extraction, query expansion, etc. In this paper the authors study the similarity measure between two named-entities. Especially, the authors are interested in fine-grained similarity differences between named-entities in one class, such as "novelist". Different from previous works on named-entity associations, this paper suggests a novel Web mining method that solely depends on the URLs returned by a search engine using named-entities as queries. The problem of similarity between two namedentities is converted to that of similarity of two URL sets. Evaluations show that this method achieves good results under two experiments.
Keywords
Internet; data mining; query processing; search engines; URL; Web mining method; information extraction; named-entities; query expansion; search engine; similarity measure; Application software; Computer science; Data mining; Information analysis; Natural language processing; Pattern analysis; Search engines; Taxonomy; Uniform resource locators; Web mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Innovative Computing Information and Control, 2008. ICICIC '08. 3rd International Conference on
Conference_Location
Dalian, Liaoning
Print_ISBN
978-0-7695-3161-8
Electronic_ISBN
978-0-7695-3161-8
Type
conf
DOI
10.1109/ICICIC.2008.362
Filename
4603264
Link To Document