مرکز منطقه ای اطلاع رساني علوم و فناوري - Parallel processing of approximate sequence matching using disk-based suffix tree on multi-core CPU

DocumentCode :

3420368

Title :

Parallel processing of approximate sequence matching using disk-based suffix tree on multi-core CPU

Author :

Watanuki, Yosuke ; Tamura, Keiichi ; Kitakami, Hajime ; Takahashi, Y.

Author_Institution :

Grad. Sch. of Inf. Sci., Hiroshima City Univ., Hiroshima, Japan

fYear :

2013

fDate :

13-13 July 2013

Firstpage :

137

Lastpage :

142

Abstract :

Suffix trees, which are trie structures that present the suffixes of given sequences (e.g., strings), are widely used for sequence search in different application domains such as, text data mining, web intelligence, bioinformatics and computational biology. In particular, suffix trees are useful in bioinformatics applications, because they can search similar sub-sequences and extract frequent sequence patterns efficiently. In recent years, efficient construction of a suffix tree that allows faster sequence searches has become one of the most important challenges, because the number and size of the data that are stored in sequence databases have been increasing exponentially. This paper proposes a novel parallelization model for approximate sequence matching that uses disk-based suffix trees, which are built on hard disks not on memory, on a multi-core CPU. In the proposed parallelization model, we divide an entire sequence database into two or more sub-databases called partitions. For each partition, we build a suffix tree and define a task as an approximate sequence matching on one suffix tree. Moreover, the proposed parallelization model involves a multiple buffering management system to avoid conflicts among CPU-cores. We evaluated the proposed parallelization model using an actual amino acid sequence database on a PC. The experimental results show a substantial improvement in computation performance.

Keywords :

multiprocessing systems; parallel processing; pattern matching; trees (mathematics); Web intelligence; actual amino acid sequence database; approximate sequence matching; bioinformatics; central processing unit; computational biology; disk-based suffix tree; frequent sequence patterns extraction; multicore CPU; parallel processing; parallelization model; sequence search; text data mining; Amino acids; Bioinformatics; Hard disks; Indexes; Instruction sets; Parallel processing; approximate sequence matching; buffer management; multicore CPU; parallel processing; suffix tree;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computational Intelligence & Applications (IWCIA), 2013 IEEE Sixth International Workshop on

Conference_Location :

Hiroshima

ISSN :

1883-3977

Print_ISBN :

978-1-4673-5725-8

Type :

conf

DOI :

10.1109/IWCIA.2013.6624801

Filename :

6624801

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3420368