Title : 
Efficiently Maintaining the Fast Updated Sequential Pattern Trees With Sequence Deletion
         
        
            Author : 
Lin, Jerry Chun-Wei ; Wensheng Gan ; Tzung-Pei Hong
         
        
            Author_Institution : 
Innovative Inf. Ind. Res. Center, Harbin Inst. of Technol., Shenzhen, China
         
        
        
        
        
        
        
            Abstract : 
Among the discovered knowledge, sequential-pattern mining is used to discover the frequent subsequences from a sequence database. Most research handles the static database in batch mode to discover the desired sequential patterns. In the past, the fast updated (FUP) and Fast UPdated 2 (FUP2) concepts were adopted to, respectively, maintain and update the discovered sequential patterns with sequence insertion and sequence deletion based on the designed FUP sequential pattern (FUSP)-tree structure. Based on the FUP or FUP2 concepts, original customer sequences are required to be rescanned if it is necessary to maintain and update the unpromising (small) sequences from the original database. In the past, pre-large concept was designed to keep the prelarge itemsets as the buffer to avoid the database rescan each time whether transaction insertion or deletion in the dynamic databases. In this paper, the prelarge concept is adopted to handle the discovered sequential patterns with sequence deletion. An FUSP tree is first built to keep only the frequent 1-sequences from the original database. The prelarge 1-sequences are also kept in a set for later maintenance approach. When some sequences are deleted from the original database, the proposed algorithm is then performed to divide the kept frequent 1-sequences and prelarge 1-sequences from the original database and the mined 1-sequences from the deleted customer sequences into three parts with nine cases. Each case is then processed by the designed algorithm to maintain and update the built FUSP tree. When the number of deleted customer sequences is smaller than the safety bound of the prelarge concept, the original customer sequences are unnecessary to be rescanned, but the sequential patterns can still be actually maintained and updated. Experiments are conducted to show the performance of the proposed algorithm in terms of execution time and the number of tree nodes.
         
        
            Keywords : 
data mining; tree data structures; FUP sequential pattern; FUP2 concepts; FUSP-tree structure; fast updated 2 concepts; fast updated sequential pattern trees; frequent 1-sequences; mined 1-sequences; prelarge 1-sequences; prelarge concept; sequence database; sequence deletion; sequence insertion; sequential-pattern mining; Algorithm design and analysis; Data mining; Database systems; Pattern recognition; Sequential analysis; Tree data structures; FUSP-tree structure; Sequence deletion; data mining; dynamic databases; pre-large concept;
         
        
        
            Journal_Title : 
Access, IEEE
         
        
        
        
        
            DOI : 
10.1109/ACCESS.2014.2373433