• DocumentCode
    3418657
  • Title

    Feature selection and classification of protein subfamilies using Rough Sets

  • Author

    Rahman, Shuzlina Abdul ; Abu Bakar, Azuraliza ; Hussein, Z.A.M.

  • Author_Institution
    Dept. of Sci. & Syst. Manage., Univ. Kebangsaan Malaysia, Bangi, Malaysia
  • Volume
    01
  • fYear
    2009
  • fDate
    5-7 Aug. 2009
  • Firstpage
    32
  • Lastpage
    35
  • Abstract
    Machine learning methods are known to be inefficient when faced with many features that are unnecessary for rule discovery. In coping with this issue, many methods have been proposed for selecting important features. Among them is feature selection that selects a subset of discriminative features or attribute for model building due to its ability to avoid overfitting issue, improve model performance, provide faster and producing reliable model. This paper proposes a new method based on rough set algorithms, which is a rule-based data mining method to select the important features in bioinformatics datasets. Amino acid compositions are used as conditional features for the classification task. However, our results indicate that all amino acid composition features are equally important thus selecting the features are unnecessary. We do confirm the need of having a balance classes in classifying the protein function by demonstrating an increase of more than 15% in accuracy.
  • Keywords
    biology computing; data mining; pattern classification; proteins; rough set theory; bioinformatics datasets; feature selection; machine learning methods; protein subfamilies classification; rough sets; rule discovery; rule-based data mining method; Amino acids; Bioinformatics; Conference management; Data mining; Informatics; Information management; Machine learning; Protein engineering; Rough sets; Sequences; Feature Selection; Protein Function Classification; Rough Sets;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical Engineering and Informatics, 2009. ICEEI '09. International Conference on
  • Conference_Location
    Selangor
  • Print_ISBN
    978-1-4244-4913-2
  • Type

    conf

  • DOI
    10.1109/ICEEI.2009.5254822
  • Filename
    5254822