Title of article :
Integrative Discovery of Multifaceted Sequence Patterns by Frame-Relayed Search and Hybrid PSO-ANN
Author/Authors :
Liou, Sing-Wu National Yunlin University of Science and Technology - Graduate School of Engineering Science and Technology, Taiwan , Wang, Chia-Ming National Yunlin University of Science and Technology - Graduate School of Engineering Science and Technology, Taiwan , Huang, Yin-Fu National Yunlin University of Science and Technology - Graduate School of Computer Science and Information Engineering, Taiwan
From page :
742
To page :
764
Abstract :
For de novo pattern mining in genomic sequences, the main issues are constructing pattern definition Model (PDM) and mining sequence patterns (MSP). The representations of PDMs and the discovery of patterns are functionally dependent; the performances thus depend on the adopted PDMs. The popular PDMs provide only descriptive patterns; they lack multifaceted considerations. Many of existing MSP methods are tied up with the exclusively devised PDMs, and the specialized and sophisticated models make the mined results hard to be reused. In this research, an integrative pattern mining system is proposed, which consists of a omputationoriented PDM (CO-PDM) and general-purpose MSP (GP-MSP) methods. The CO-PDM defines four computational concerns (CCs) as facets of MSP: expression (E), location (L), range (R) and weight (W), which are integrated into a frame-relayed pattern model (FRPM). The GP-MSP develops a frame-relayed search strategy to resolve the ELR-CCs firstly, with the aids of riticalparameter automating (CPA) procedure; and then the W-CC is determined by hybridizing particle swarm optimization (PSO) and artificial neural network (ANN). The proposed FRPM and GP-MSP had been implemented and applied to 22,448 human introns; from the results, all the well-known patterns were recovered and some new ones were also discovered. Furthermore, the effectiveness of identified patterns were verified by a two-layered k-nearest neighbor (k-NN) classifier; the average precision and recall are 0.88 and 0.92, respectively. By the case study, the integrative PDM-MSP system is believed to be effective and reliable; it is optimistic the proposed CO-PDM and GP-MSP are both widely applicable and reusable for mining sequence patterns in the eukaryotic protein-coding genes.
Keywords :
pattern mining , multifaceted sequence patterns , computation , oriented pattern definition model , computational concerns , frame , relayed pattern model
Journal title :
Journal of J.UCS (Journal of Universal Computer Science)
Journal title :
Journal of J.UCS (Journal of Universal Computer Science)
Record number :
2661555
Link To Document :
بازگشت