Title :
Genome sequence clustering using hybrid method: Self-organizing map and frequent max substring techniques
Author :
Chumwatana, Todsanai
Author_Institution :
Fac. of Inf. Technol., Rangsit Univ., Pathumthani, Thailand
Abstract :
This paper proposes a genome sequence clustering based on the combination of two techniques: self-organizing map (SOM) and frequent max substring technique to improve the efficiency of information retrieval. The proposed technique appears to be a promising alternative for clustering a large amount of genome sequences in large sequence databases. To illustrate the proposed technique, experiment on clustering the genome sequences is presented in this paper. Firstly, the frequent max substring technique is applied to enumerate the interesting patterns ´called frequent max substrings´ from the genome sequences. Then, these frequent max substrings are used as terms, together with their frequency, to form a sequence vector. Finally, self-organizing map is applied to generate the cluster map by using the vector generated from the earlier step. Consequently, the generated cluster map can be used to show the group of similar genome sequences as well as the group of different genome sequences.
Keywords :
bioinformatics; genomics; pattern clustering; self-organising feature maps; sequences; string matching; vectors; SOM; frequent max substring; genome sequence clustering; self-organizing map; sequence vector; Abstracts; Bioinformatics; Biological cells; DNA; Genomics; Mice; Neurons; Frequent Max Substring; Genome Sequence; Neuron Network; Self-Organizing Map; Sequence Clustering;
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2013 International Conference on
Conference_Location :
Tianjin
DOI :
10.1109/ICMLC.2013.6890863