Title :
A dissimilarity-based classifier for generalized sequences by a granular computing approach
Author :
Rizzi, Antonello ; Possemato, Francesca ; Livi, Lorenzo ; Sebastiani, Azzurra ; Giuliani, Alessandro ; Mascioli, Fabio Massimo Frattale
Author_Institution :
Dept. of Inf. Eng., Electron., & Telecommun., SAPIENZA Univ. of Rome, Rome, Italy
Abstract :
In this paper we propose a classifier for generalized sequences that is conceived in the granular computing framework. The classification system processes the input sequences of objects by means of a suited interplay among dissimilarity and clustering based techniques. The core data mining engine retrieves information granules that are used to represent the input sequences as feature vectors. Such a representation allows to deal with the original sequence classification problem through standard pattern recognition tools. We have evaluated the generalization capability of the system in an interesting case study concerning the protein folding problem. In the considered dataset, the entire E. Coli proteome was screened as for the prediction of protein relative solubility on a pure amino acids sequence basis. We report the analysis of the dataset considering different settings, showing interesting test set classification accuracy results. The developed system consents also to extract knowledge from the considered training set, by allowing the analysis of the retrieved information granules.
Keywords :
biology computing; data mining; granular computing; pattern classification; pattern clustering; proteins; vectors; E. Coli proteome; clustering based technique; core data mining engine; dissimilarity based technique; dissimilarity-based classifier; feature vectors; generalization capability; generalized sequences; granular computing approach; information granule retrieval; input sequence representation; knowledge extraction; pattern recognition tools; protein folding problem; protein relative solubility prediction; pure amino acids sequence basis; sequence classification problem; suited interplay; test set classification; Data mining; Feature extraction; Histograms; Mathematical model; Optimization; Proteins; Training; Granular computing and modeling; Protein folding prediction; Sequence representation and classification;
Conference_Titel :
Neural Networks (IJCNN), The 2013 International Joint Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4673-6128-6
DOI :
10.1109/IJCNN.2013.6707041