Title : 
Putting the World Wide Web into a data warehouse: a DWH-based approach to Web analysis
         
        
            Author : 
Rauber, Andreas ; Witvoet, Oliver ; Aschenbrenner, Andreas ; Bruckner, Robert
         
        
            Author_Institution : 
Dept. of Software Technol. & Interactive Syst., Vienna Univ. of Technol., Austria
         
        
        
        
        
        
            Abstract : 
The World-Wide Web, due to its sheer size and dynamics, has turned into one of the most fascinating and important data sources for large-scale analysis and investigation, ranging from content-based information location, dynamics of change, to community analysis. Yet, most projects so far rely on special-purpose tools optimized for a given task, providing only limited flexibility. In this paper we propose a data warehouse-based approach to analyze the World-Wide Web. Information contained in the Web pages, meta data on the documents, as well as information acquired from additional sources such as the WHOIS database, are integrated into a multidimensional view of the Web. The resulting system allows for flexible analysis of the various characteristics of the Web. Results from a prototypical study of the Austrian national Web space as part of the AOLA project demonstrate the potential of the presented approach.
         
        
            Keywords : 
Internet; Web sites; data mining; data warehouses; integrated software; meta data; AOLA project; Austrian national Web space; WHOIS database; Web analysis; Web pages; World-Wide Web; data sources; data warehouse; document meta data; integrated database; large-scale analysis; multidimensional view; Algorithm design and analysis; Data mining; Data warehouses; Information analysis; Information resources; Interactive systems; Large-scale systems; Space technology; Web sites; World Wide Web;
         
        
        
        
            Conference_Titel : 
Database and Expert Systems Applications, 2002. Proceedings. 13th International Workshop on
         
        
        
            Print_ISBN : 
0-7695-1668-8
         
        
        
            DOI : 
10.1109/DEXA.2002.1045999