DocumentCode :
3262929
Title :
Feature selection from protein primary sequence database using Enhanced QuickReduct Fuzzy-Rough set
Author :
Chandran, C.P.
Author_Institution :
Dept. of Comput. Sci., Ayya Nadar Janaki Ammal Coll., Sivakasi
fYear :
2008
fDate :
26-28 Aug. 2008
Firstpage :
111
Lastpage :
114
Abstract :
Feature extraction and feature selection have become an apparent need in many bioinformatics applications. In this paper, the features are extracted from protein primary single sequence database, based on amino acid composition and k-mer patterns or k-tuples and then feature selection is carried out from the extracted features. Since the rough QuickReduct is not yet applied for protein sequence data set, the enhanced QuickReduct feature selection (EQRFS) algorithm using fuzzy-rough set is proposed. Rough sets theory deals with uncertainty and vagueness of an information system in data mining. Fuzzy-rough based feature selection provides a means by which discrete or real-valued noisy data or a mixture of both can be effectively reduced. The experiments are carried out on protein primary single sequence data sets which are derived from PDB on SCOP classification, based on the structural class predictions such as all alpha, all beta, all alpha+beta and alpha/beta.
Keywords :
bioinformatics; data mining; feature extraction; fuzzy set theory; rough set theory; SCOP classification; amino acid composition; bioinformatics applications; data mining; enhanced QuickReduct feature selection algorithm; enhanced QuickReduct fuzzy-rough set; feature extraction; k-mer patterns; k-tuples; protein primary single sequence database; structural class predictions; Amino acids; Bioinformatics; Data mining; Feature extraction; Information systems; Noise reduction; Protein sequence; Rough sets; Spatial databases; Uncertainty; Feature Selection; Fuzzy-Rough Set; Protein primary sequence database; QuickReduct;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Granular Computing, 2008. GrC 2008. IEEE International Conference on
Conference_Location :
Hangzhou
Print_ISBN :
978-1-4244-2512-9
Electronic_ISBN :
978-1-4244-2513-6
Type :
conf
DOI :
10.1109/GRC.2008.4664758
Filename :
4664758
Link To Document :
بازگشت