Title :
Fuzzy filtering systems for performing environment improvement of computational DNA motif discovery
Author :
Wang, Dianhui ; Tapan, Sarwar
Author_Institution :
Dept. of Comput. Sci. & Eng., La Trobe Univ., Melbourne, VIC, Australia
Abstract :
DNA datasets demonstrate considerably low signal-to-noise ratio that constrains the computational motif discovery tools to achieve satisfactory performances. Thus, reducing the search space and increasing the signal-to-noise ratio (by the means of filtering) can be useful to facilitate computational motif discovery tools with better performing environments. This paper proposes unsupervised fuzzy filtering systems, that aim to remove a large portion of k-mers that are less relevant to potential motif instances in terms of location overlaps in given sequences. Relative Model Mismatch Score (RMMS), which is a new quantitative metric for measuring the quality of motif models, is employed in this work to facilitate the proposed filtering. A modified version of fuzzy c-means clustering algorithm with an initialization strategy is then adopted to group k-mers, while a complement of fuzzified RMMS is used to rank k-mers for data filtering. Experimental results on eight real DNA datasets showed that, the proposed filtering systems could remove approximately (85 ± 5)% of data samples while maintaining a high retention rate of relevant k-mers. Thus, this filtering as a data pre-processing component, will improve the performing environments of the motif discovery tools, since the filtered datasets will contain much smaller cardinality and higher signal-to-noise ratio than the original datasets.
Keywords :
biocomputing; filtering theory; fuzzy reasoning; unsupervised learning; DNA dataset; computational DNA motif discovery; data filtering systems; data pre-processing component; filtered dataset cardinality; fuzzified RMMS; fuzzy-means clustering algorithm; group k-mers; motif discovery tools; potential motif instances; relative motif model mismatch score; search space reduction; signal-to-noise ratio; unsupervised fuzzy filtering systems; Clustering algorithms; Computational modeling; DNA; Encoding; Hamming distance; Measurement; Signal to noise ratio;
Conference_Titel :
Fuzzy Systems (FUZZ), 2010 IEEE International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-6919-2
DOI :
10.1109/FUZZY.2010.5584550