DocumentCode
1230751
Title
Searching with numbers
Author
Agrawal, Rakesh ; Srikant, Ramakrishnan
Author_Institution
IBM Almaden Res. Center, San Jose, CA, USA
Volume
15
Issue
4
fYear
2003
Firstpage
855
Lastpage
870
Abstract
A large fraction of the useful Web is comprised of specification documents that largely consist of (attribute name, numeric value) pairs embedded in text. Examples include product information, classified advertisements, resumes, etc. The approach taken in the past to search these documents by first establishing correspondences between values and their names has achieved limited success because of the difficulty of extracting this information from free text. We propose a new approach that does not require this correspondence to be accurately established. Provided the data has "low reflectivity", we can do effective search even if the values in the data have not been assigned attribute names and the user has omitted attribute names in the query. We give algorithms and indexing structures for implementing the search. We also show how hints (i.e., imprecise, partial correspondences) from automatic data extraction techniques can be incorporated into our approach for better accuracy on high reflectivity data sets. Finally, we validate our approach by showing that we get high precision in our answers on real data sets from a variety of domains.
Keywords
Internet; Web sites; indexing; information retrieval; search engines; World Wide Web; attribute name; automatic data extraction techniques; data extraction; document searching; heterogeneous databases; high reflectivity data sets; hints; indexing; information extraction; search engines; searching with numbers; specification documents; text; Data engineering; Data mining; Databases; Design engineering; Indexing; Moon; PROM; Reflectivity; Resumes; Search engines;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2003.1209004
Filename
1209004
Link To Document