• DocumentCode
    2707899
  • Title

    Investigation into effectiveness of rough sets in prediction of enzyme and protein structure classes

  • Author

    Newby, Chris ; Yang, Yingjie ; Seker, Huseyin

  • Author_Institution
    Dept. of Health Sci., Leicester Univ., Leicester, UK
  • fYear
    2009
  • fDate
    14-19 June 2009
  • Firstpage
    2243
  • Lastpage
    2249
  • Abstract
    Among various methods in protein function prediction, rough set has recently been applied to prediction of protein structural classes. However, this was a blind application on a single but small data set of high homology, which did not consider investigation of various parameters in the rough set. The aim of this paper is therefore to study rough set in the area through comprehensive and consistent analysis and then to present a practical strategy in the rough set-based protein function prediction. To achieve this aim, three different data sets were considered: the first data set for prediction of six main enzyme classes, and other two for prediction of structural classes. Boolean reasoning, Entropy scaling and Equal frequency binning were used for discretization along with two methods for producing reducts and rules, genetic and Johnson´s algorithms. It can be seen that the predictive accuracies were poor for the enzyme dataset whereas it performed better at prediction of the protein structural classes. It is also observed that the dataset with low homology produced poor accuracies than the dataset with high homology. Furthermore, various parameters and methods used in the rough set were sensitive to the problems in the area, as well as the data sets of low and high homology and different number of the features. The results appear to indicate that the equal frequency-based approach combined with genetic algorithm yields higher prediction. However, other methods such as Boolean reasoning with the genetic algorithm are also found to be promising. Further investigation will provide a practical strategy that can be used in the rough set-based protein function prediction as well as other areas of Bioinformatics.
  • Keywords
    Boolean functions; biology computing; enzymes; genetic algorithms; rough set theory; Boolean reasoning; Johnson algorithm; bioinformatics; blind application; entropy scaling; enzyme classes; enzyme dataset; enzyme structure class prediction; equal frequency binning; genetic algorithm; homology; protein structural classes; protein structure class prediction; rough set-based protein function prediction; rough sets; Biochemistry; Bioinformatics; Data mining; Frequency; Fuzzy sets; Information systems; Neural networks; Proteins; Rough sets; Uncertainty;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks, 2009. IJCNN 2009. International Joint Conference on
  • Conference_Location
    Atlanta, GA
  • ISSN
    1098-7576
  • Print_ISBN
    978-1-4244-3548-7
  • Electronic_ISBN
    1098-7576
  • Type

    conf

  • DOI
    10.1109/IJCNN.2009.5178695
  • Filename
    5178695