DocumentCode :
2717906
Title :
Google based name search: Resolving mixed entities on the web
Author :
On, Byung-Won ; Lee, Ingyu
Author_Institution :
Sch. of Inf. Syst., Singapore Manage. Univ., Singapore, Singapore
fYear :
2009
fDate :
1-4 Nov. 2009
Firstpage :
1
Lastpage :
6
Abstract :
When non-unique values are used as the identifier of entities, due to their homonym, confusion can occur. In particular, when part of ¿names¿ of entities are used as their identifiers, the problem is often referred to as a mixed entity resolution problem, where goal is to sort out the erroneous entities due to name homonyms (e.g., if only last name is used as an identifier, one cannot distinguish ¿Vannevar Bush¿ from ¿George Bush¿). Especially, a mixed entity resolution problem is common on the Web data. For instance, to search for a product name (e.g., Oracle) in Google, there exist a mixture of web pages due to the name homonyms (e.g., Oracle Database, Oracle Audio, Oracle Academy, etc.). In this paper, we present a practical system for resolving such mixed entities on the Web. For development of such a system, we propose a Web service based interface, an unsupervised clustering scheme, and cluster ranking algorithms. In particular, since the correct number of clusters is often unknown, we study a state-of-the-art unsupervised clustering solution based on propagation of pairwise similarities of entities. Our claim is empirically validated via experimentation, showing that our approach outperforms main competing solution.
Keywords :
Web services; Web sites; data mining; information retrieval; pattern clustering; search engines; user interfaces; Google based name search; Web data; Web service based interface; cluster ranking algorithm; entities identifier; mixed entity resolution problem; name homonym; nonunique values; pairwise similarities propagation; unsupervised clustering scheme; Audio databases; Auditory displays; Clustering algorithms; Educational institutions; Information management; Management information systems; Search engines; Uniform resource locators; Web pages; Web services;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Information Management, 2009. ICDIM 2009. Fourth International Conference on
Conference_Location :
Ann Arbor, MI
Print_ISBN :
978-1-4244-4253-9
Electronic_ISBN :
978-1-4244-4254-6
Type :
conf
DOI :
10.1109/ICDIM.2009.5356763
Filename :
5356763
Link To Document :
بازگشت