Title :
Classifying Protein Sequences Using Regularized Multi-Task Learning
Author :
Charuvaka, Anveshi ; Rangwala, Huzefa
Author_Institution :
Dept. of Comput. Sci., George Mason Univ., Fairfax, VA, USA
Abstract :
Classification problems in which several learning tasks are organized hierarchically pose a special challenge because the hierarchical structure of the problems needs to be considered. Multi-task learning (MTL) provides a framework for dealing with such interrelated learning tasks. When two different hierarchical sources organize similar information, in principle, this combined knowledge can be exploited to further improve classification performance. We have studied this problem in the context of protein structure classification by integrating the learning process for two hierarchical protein structure classification database, SCOP and CATH. Our goal is to accurately predict whether a given protein belongs to a particular class in these hierarchies using only the amino acid sequences. We have utilized the recent developments in multi-task learning to solve the interrelated classification problems. We have also evaluated how the various relationships between tasks affect the classification performance. Our evaluations show that learning schemes in which both the classification databases are used outperform the schemes which utilize only one of them.
Keywords :
bioinformatics; learning (artificial intelligence); molecular biophysics; molecular configurations; pattern classification; proteins; CATH; SCOP; amino acid sequences; classification databases; classification performance; hierarchical protein structure classification database; hierarchical source organisation; interrelated classification problems; interrelated learning tasks; learning process; protein sequence classification; regularized multitask learning; Classification; Feature extraction; Learning systems; Proteins; Sequential analysis; Multi-task learning; protein structure classification;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2014.2338303