Title :
EShopMonitor: a Web content monitoring tool
Author :
Agrawal, Neeraj ; Ananthanarayanan, Rema ; Gupta, Rahul ; Joshi, Sachindra ; Krishnapuram, Raghu ; Negi, Sumit
Author_Institution :
IBM India Res. Lab., IIT, New Delhi, India
fDate :
30 March-2 April 2004
Abstract :
Data presented on commerce sites runs into thousands of pages, and is typically delivered from multiple back-end sources. This makes it difficult to identify incorrect, anomalous, or interesting data such as $9.99 air fares, missing links, drastic changes in prices and addition of new products or promotions. We describe a system that monitors Web sites automatically and generates various types of reports so that the content of the site can be monitored and the quality maintained. The solution designed and implemented by us consists of a site crawler that crawls dynamic pages, an information miner that learns to extract useful information from the pages based on examples provided by the user, and a reporter that can be configured by the user to answer specific queries. The tool can also be used for identifying price trends and new products or promotions at competitor sites. A pilot run of the tool has been successfully completed at the ibm.com site.
Keywords :
Internet; Web sites; content management; data mining; electronic commerce; query processing; EshopMonitor; Web content monitoring tool; Web sites; competitor sites; information miner; query processing; site crawler; Business; Computerized monitoring; Crawlers; Data mining; Databases; Graphical user interfaces; Law; Legal factors; Robustness; Web pages;
Conference_Titel :
Data Engineering, 2004. Proceedings. 20th International Conference on
Print_ISBN :
0-7695-2065-0
DOI :
10.1109/ICDE.2004.1320055