Title :
Cost-benefit analysis of Web bag in a Web warehouse
Author :
Bhowmick, Sourav S. ; Madria, Sanjay ; Ng, Wee-Keong ; Lim, Ee-Peng
Author_Institution :
Centre for Adv. Inf. Syst., Nanyang Technol. Univ., Singapore
Abstract :
Sets and bags are closely related structures and have been studied in relational databases. A bag is different from a set in that it is sensitive to the number of times an element occurs, while a set is not. In this paper, we introduce the concept of a Web bag in the context of a World Wide Web warehouse called WHOWEDA (WareHouse Of WEb DAta) which we are currently building. Informally, a Web bag is a Web table which allows multiple occurrences of identical Web types. A Web bag helps one to discover useful knowledge from a Web table, such as visible documents or Web sites (i.e. documents/sites which can be reached by many paths), luminous documents (i.e. documents with many outgoing links) and luminous paths (i.e. frequently traversed paths). In this paper, we provide a cost-benefit analysis of materializing Web bags as compared to Web tables with distinct Web tuples
Keywords :
cost-benefit analysis; data mining; data structures; data warehouses; information resources; search engines; WHOWEDA; Web bags; Web tables; Web tuples; World Wide Web warehouse; cost-benefit analysis; element occurrence; fan-in; fan-out; frequently traversed paths; identical Web types; luminous documents; luminous paths; outgoing links; useful knowledge discovery; visible Web sites; visible documents; Computer science; Cost benefit analysis; Current measurement; Educational technology; Hard disks; Information systems; Read only memory; Search engines; World Wide Web;
Conference_Titel :
Database Engineering and Applications, 1999. IDEAS '99. International Symposium Proceedings
Conference_Location :
Montreal, Que.
Print_ISBN :
0-7695-0265-2
DOI :
10.1109/IDEAS.1999.787249