Averaging measurement strategies for identifying single nucleotide polymorphisms from redundant data sets

Author

Wang, Tai-Chun ; Taheri, Javid ; Zomaya, Albert Y.

Author_Institution

Centre for Distrib. & High Performance Comput., Univ. of Sydney, Sydney, NSW, Australia

fYear

2011

fDate

27-30 Dec. 2011

Firstpage

67

Lastpage

74

Abstract

Single nucleotide polymorphisms (SNPs) studies have been an active topic of research in the life sciences in recent years. Because SNPs are abundant, stable and sometimes can be related to specific diseases, they have been widely selected as biomarkers for multi-purpose research. As traditional methods for identifying SNPs are time-consuming and expensive, discovering SNPs from expressed sequence tags (ESTs) has became an alternative efficient way. As most EST databases do not store quality/trace files together with EST reads, several methods, like Phard, which requires corresponding sequences quality files, will not be suitable for further research purpose. Thus, computational methods that are able to obtain reliable SNPs without the need for trace/quality information are still essential. We have developed a pipeline framework, called PFSNP, to reveal reliable SNPs from EST data sets without the association of trace/quality files. PFSNP deploys several strategies, like modified neighborhood quality standard measurement and fuzzy logic, in this framework. Also, it automatically adjusts the slide window to efficiently fit different conditions of data sets. PFSNP is demonstrated by identifying SNPs from two subgroups of Oryza sativa with two different strategies as well as zebrafish. Based on our experimental results, PFSNP can obtain higher reliable results when compared to existing methods.

Keywords

bioinformatics; data mining; diseases; fuzzy logic; pipeline processing; Oiγza sativa; PFSNP; expressed sequence tags; fuzzy logic; life sciences; modified neighborhood quality standard measurement; multi-purpose research; pipeline framework; redundant data sets; sequences quality files; single nucleotide polymorphism identification; slide window; Data mining; Databases; Fuzzy logic; Fuzzy systems; Genomics; Pipelines; Reliability;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Systems and Applications (AICCSA), 2011 9th IEEE/ACS International Conference on

Conference_Location

Sharm El-Sheikh

ISSN

2161-5322

Print_ISBN

978-1-4577-0475-8

Electronic_ISBN

2161-5322

Type

conf

DOI

10.1109/AICCSA.2011.6126593

Filename

6126593