DocumentCode :
3154096
Title :
How do they compare? Automatic identification of comparable entities on the Web
Author :
Jain, Alpa ; Pantel, Patrick
Author_Institution :
Yahoo! Labs., Sunnyvale, CA, USA
fYear :
2011
fDate :
3-5 Aug. 2011
Firstpage :
228
Lastpage :
233
Abstract :
People love comparing things: from home mortgages and digital cameras to travel destinations and political philosophies. Today, we are mostly limited to browsing documents after issuing comparative queries to Web search engines, such as “15-year vs. 30-year mortgage”, “Nikon D90 / Canon 40D”, “Oahu or Maui”, and “communism vs. fascism”. There is an opportunity to improve the search experience by automatically offering comparisons to users. In this paper, we propose a first step towards this goal of comparative analysis by mining a broad class of comparable entities from search query logs and a large Web crawl. Example comparables that we extract include medicines, appliances, electronics, vacation destinations, and many more. We present an extensive empirical analysis showing that our methods generate comparables with high precision and recall, and showing that Web search query logs are a superior source for mining such entities as compared to Web pages, typically used for extraction tasks. We further compare the performance of our methods with “related entities” reported by Google Sets, and show a gain of 39% in average precision and a gain of 30% in NCDG.
Keywords :
data mining; online front-ends; query processing; search engines; Google set; NCDG; Web entities mining; Web page; Web search engine; Web search query logs; automatic identification; digital camera; document browsing; home mortgage; large Web crawl; political philosophy; travel destination; vacation destination; Calculators; Data mining; Learning systems; Loans and mortgages; Noise measurement; Semantics; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Reuse and Integration (IRI), 2011 IEEE International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4577-0964-7
Electronic_ISBN :
978-1-4577-0965-4
Type :
conf
DOI :
10.1109/IRI.2011.6009551
Filename :
6009551
Link To Document :
بازگشت