• DocumentCode
    2960255
  • Title

    Averaging measurement strategies for identifying single nucleotide polymorphisms from redundant data sets

  • Author

    Wang, Tai-Chun ; Taheri, Javid ; Zomaya, Albert Y.

  • Author_Institution
    Centre for Distrib. & High Performance Comput., Univ. of Sydney, Sydney, NSW, Australia
  • fYear
    2011
  • fDate
    27-30 Dec. 2011
  • Firstpage
    67
  • Lastpage
    74
  • Abstract
    Single nucleotide polymorphisms (SNPs) studies have been an active topic of research in the life sciences in recent years. Because SNPs are abundant, stable and sometimes can be related to specific diseases, they have been widely selected as biomarkers for multi-purpose research. As traditional methods for identifying SNPs are time-consuming and expensive, discovering SNPs from expressed sequence tags (ESTs) has became an alternative efficient way. As most EST databases do not store quality/trace files together with EST reads, several methods, like Phard, which requires corresponding sequences quality files, will not be suitable for further research purpose. Thus, computational methods that are able to obtain reliable SNPs without the need for trace/quality information are still essential. We have developed a pipeline framework, called PFSNP, to reveal reliable SNPs from EST data sets without the association of trace/quality files. PFSNP deploys several strategies, like modified neighborhood quality standard measurement and fuzzy logic, in this framework. Also, it automatically adjusts the slide window to efficiently fit different conditions of data sets. PFSNP is demonstrated by identifying SNPs from two subgroups of Oryza sativa with two different strategies as well as zebrafish. Based on our experimental results, PFSNP can obtain higher reliable results when compared to existing methods.
  • Keywords
    bioinformatics; data mining; diseases; fuzzy logic; pipeline processing; Oiγza sativa; PFSNP; expressed sequence tags; fuzzy logic; life sciences; modified neighborhood quality standard measurement; multi-purpose research; pipeline framework; redundant data sets; sequences quality files; single nucleotide polymorphism identification; slide window; Data mining; Databases; Fuzzy logic; Fuzzy systems; Genomics; Pipelines; Reliability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Systems and Applications (AICCSA), 2011 9th IEEE/ACS International Conference on
  • Conference_Location
    Sharm El-Sheikh
  • ISSN
    2161-5322
  • Print_ISBN
    978-1-4577-0475-8
  • Electronic_ISBN
    2161-5322
  • Type

    conf

  • DOI
    10.1109/AICCSA.2011.6126593
  • Filename
    6126593