Title :
Mining Frequent Contiguous Sequence Patterns in Biological Sequences
Author :
Kang, Tae Ho ; Yoo, Jae Soo ; Kim, Hak Yong
Author_Institution :
Chungbuk Nat. Univ., Cheongju
Abstract :
Biological sequences such as DNA and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of more than hundreds of frequent items. In biological sequences analysis (BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with a fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. The experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.
Keywords :
DNA; biology computing; data mining; molecular biophysics; proteins; sequences; DNA; MacosVSpan algorithm; amino acid sequences; biological sequences; mining; prefixSpan algorithm; sequence patterns; Amino acids; Biochemistry; Bioinformatics; Biology computing; DNA computing; Databases; Genetics; Information analysis; Pattern analysis; Sequences; Bioinformatics; biological Sequence Analysis; sequencel pattern mining;
Conference_Titel :
Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
Conference_Location :
Boston, MA
Print_ISBN :
978-1-4244-1509-0
DOI :
10.1109/BIBE.2007.4375640