Title :
Disambiguation Algorithm for People Search on the Web
Author :
Kalashnikov, Dmitri V. ; Mehrotra, Sharad ; Chen, Zhaoqi ; Nuray-Turan, Rabia ; Ashish, Naveen
Author_Institution :
Dept. of Comput. Sci., California Univ., Irvine, CA
Abstract :
In this paper we develop a disambiguation algorithm and then study its impact on People Search. The proposed algorithm first uses extraction techniques to automatically extract `significant´ entities such as the names of other persons, organizations, and locations on each Web page. In addition, it extracts and parses HTML and Web related data on each Web page, such as hyperlinks and email addresses. The algorithm then views all this information in a unified way: as an entity-relationship graph where entities (e.g., people, organizations, locations, Web pages) are interconnected via relationships (e.g., `Web page-mentions-person´, relationships derived from hyperlinks, etc). The algorithm gains its power by being able to analyze several types of information: attributes associated with the entities (e.g., TF/IDF for Web pages) and, most importantly, direct and indirect interconnections that exist among entities in the ER graph. We next outline our approach in Section 2 and then compare it with the state of the art solutions in Section 3.
Keywords :
Web sites; information retrieval; HTML; People Search; Web page; World Wide Web; disambiguation algorithm; entity-relationship graph; extraction techniques; Clustering algorithms; Computer science; Data mining; Information analysis; Internet; Machine learning; Middleware; Search engines; Web pages; Web search;
Conference_Titel :
Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on
Conference_Location :
Istanbul
Print_ISBN :
1-4244-0802-4
Electronic_ISBN :
1-4244-0803-2
DOI :
10.1109/ICDE.2007.368987