DocumentCode :
3269289
Title :
EShopMonitor: a Web content monitoring tool
Author :
Agrawal, Neeraj ; Ananthanarayanan, Rema ; Gupta, Rahul ; Joshi, Sachindra ; Krishnapuram, Raghu ; Negi, Sumit
Author_Institution :
IBM India Res. Lab., IIT, New Delhi, India
fYear :
2004
fDate :
30 March-2 April 2004
Firstpage :
817
Lastpage :
820
Abstract :
Data presented on commerce sites runs into thousands of pages, and is typically delivered from multiple back-end sources. This makes it difficult to identify incorrect, anomalous, or interesting data such as $9.99 air fares, missing links, drastic changes in prices and addition of new products or promotions. We describe a system that monitors Web sites automatically and generates various types of reports so that the content of the site can be monitored and the quality maintained. The solution designed and implemented by us consists of a site crawler that crawls dynamic pages, an information miner that learns to extract useful information from the pages based on examples provided by the user, and a reporter that can be configured by the user to answer specific queries. The tool can also be used for identifying price trends and new products or promotions at competitor sites. A pilot run of the tool has been successfully completed at the ibm.com site.
Keywords :
Internet; Web sites; content management; data mining; electronic commerce; query processing; EshopMonitor; Web content monitoring tool; Web sites; competitor sites; information miner; query processing; site crawler; Business; Computerized monitoring; Crawlers; Data mining; Databases; Graphical user interfaces; Law; Legal factors; Robustness; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2004. Proceedings. 20th International Conference on
ISSN :
1063-6382
Print_ISBN :
0-7695-2065-0
Type :
conf
DOI :
10.1109/ICDE.2004.1320055
Filename :
1320055
Link To Document :
بازگشت