Title :
Parallel processing of approximate sequence matching using disk-based suffix tree on multi-core CPU
Author :
Watanuki, Yosuke ; Tamura, Keiichi ; Kitakami, Hajime ; Takahashi, Y.
Author_Institution :
Grad. Sch. of Inf. Sci., Hiroshima City Univ., Hiroshima, Japan
Abstract :
Suffix trees, which are trie structures that present the suffixes of given sequences (e.g., strings), are widely used for sequence search in different application domains such as, text data mining, web intelligence, bioinformatics and computational biology. In particular, suffix trees are useful in bioinformatics applications, because they can search similar sub-sequences and extract frequent sequence patterns efficiently. In recent years, efficient construction of a suffix tree that allows faster sequence searches has become one of the most important challenges, because the number and size of the data that are stored in sequence databases have been increasing exponentially. This paper proposes a novel parallelization model for approximate sequence matching that uses disk-based suffix trees, which are built on hard disks not on memory, on a multi-core CPU. In the proposed parallelization model, we divide an entire sequence database into two or more sub-databases called partitions. For each partition, we build a suffix tree and define a task as an approximate sequence matching on one suffix tree. Moreover, the proposed parallelization model involves a multiple buffering management system to avoid conflicts among CPU-cores. We evaluated the proposed parallelization model using an actual amino acid sequence database on a PC. The experimental results show a substantial improvement in computation performance.
Keywords :
multiprocessing systems; parallel processing; pattern matching; trees (mathematics); Web intelligence; actual amino acid sequence database; approximate sequence matching; bioinformatics; central processing unit; computational biology; disk-based suffix tree; frequent sequence patterns extraction; multicore CPU; parallel processing; parallelization model; sequence search; text data mining; Amino acids; Bioinformatics; Hard disks; Indexes; Instruction sets; Parallel processing; approximate sequence matching; buffer management; multicore CPU; parallel processing; suffix tree;
Conference_Titel :
Computational Intelligence & Applications (IWCIA), 2013 IEEE Sixth International Workshop on
Conference_Location :
Hiroshima
Print_ISBN :
978-1-4673-5725-8
DOI :
10.1109/IWCIA.2013.6624801