Title :
Data mining framework for protein function prediction
Author :
Rahman, Shuzlina Abdul ; Hussein, Zeti Azura Mohamed ; Bakar, Azuraliza Abu
Author_Institution :
Faculty of Information Technology and Quantitative Sciences, Universiti Teknologi MARA, 44500 Shah Alam, Selangor, Malaysia
Abstract :
Determining the functions of uncharacterized proteins from sequences remains a challenge despite the growth of the number of prediction methods. This is due to the nature of the inherent limitations of current tools and databases and the ambiguity of the function definition. Additionally, standard methods of functional assignment involve sequence alignment to a gene function often fail to find the significant matches. This paper proposes a framework of machine learning method in predicting protein function irrespective of sequence similarity. The framework aims to provide a workflow on predicting protein function that combines both data mining and machine learning algorithms. Three main components are involved: pre-processing, model development and testing & evaluation. The study is expected to create a new method on feature selection processes towards predicting protein functional classes in addition to complementing the existing conventional method of functional assignment.
Keywords :
Data mining; Databases; Information systems; Information technology; Learning systems; Machine learning; Machine learning algorithms; Prediction methods; Protein engineering; Sequences;
Conference_Titel :
Information Technology, 2008. ITSim 2008. International Symposium on
Conference_Location :
Kuala Lumpur, Malaysia
Print_ISBN :
978-1-4244-2327-9
Electronic_ISBN :
978-1-4244-2328-6
DOI :
10.1109/ITSIM.2008.4631683