Title : 
High Efficiency of Hybrid Resumption in Distributed Data Warehouses
         
        
            Author : 
Gorawski, Marcin ; Marks, Pawel
         
        
            Author_Institution : 
Inst. of Comput. Sci., Silesian Univ. of Technol., Gliwice
         
        
        
        
        
        
            Abstract : 
ETL processes are sometimes interrupted by occurrence of a failure. In such a case, one of the interrupted extraction resumption algorithms is usually used. In this paper we present a modified Design-Resume algorithm enriched by the possibility of handling ETL processes containing many loading nodes. We use the DR algorithm to resume a distributed data warehouse load process. The key feature of this algorithm is that it does not impose additional overhead on the normal ETL process. In our work we modify the algorithm to work with more than one loading node, and combine it with staging technique, which increases the efficiency of the resumption process. The combined algorithm, we name it hybrid resumption algorithm. Based on the results of performed tests, the benefits of our improvements are discussed
         
        
            Keywords : 
data warehouses; distributed databases; ETL processes; distributed data warehouse load process; hybrid resumption algorithm; interrupted extraction resumption algorithms; modified Design-Resume algorithm; Algorithm design and analysis; Checkpointing; Computer science; Data mining; Data warehouses; Hardware; Java; Performance evaluation; Resumes; Testing;
         
        
        
        
            Conference_Titel : 
Database and Expert Systems Applications, 2005. Proceedings. Sixteenth International Workshop on
         
        
            Conference_Location : 
Copenhagen
         
        
        
            Print_ISBN : 
0-7695-2424-9
         
        
        
            DOI : 
10.1109/DEXA.2005.108