DocumentCode
659585
Title
A novel integrated method for human multiplex protein subcellular localization prediction
Author
Hong Gu ; Junzhe Cao
Author_Institution
Sch. of Control Sci. & Eng., Dalian Univ. of Technol., Dalian, China
fYear
2013
fDate
6-9 Oct. 2013
Firstpage
58
Lastpage
62
Abstract
Protein subcellular localization prediction based on machine learning is a research focus in bioinformatics. The fast growth of protein sequences in databases leads to be hard to label enough protein samples only by experts for training a learner to get satisfying prediction result. This paper proposes a novel integrated method for human multiplex protein subcellular localization prediction. In this method, to avoid artificially evaluating and labeling the big data of unseen proteins, an active sample selection algorithm is presented to pick out protein samples with non-experimental labels as supplementary training data to help train an ensemble predictor, which includes a protein identifying module, a single-label classifier and a multilabel classifier. The numerical experiments show the effectiveness of the proposed approach.
Keywords
Big Data; bioinformatics; learning (artificial intelligence); molecular biophysics; pattern classification; proteins; active sample selection algorithm; big data evaluation; big data labeling; bioinformatics; ensemble predictor; human multiplex protein subcellular localization prediction; machine learning; multilabel classifier; novel integrated method; protein identifying module; protein sequence growth; single-label classifier; Amino acids; Bioinformatics; Classification algorithms; Multiplexing; Prediction algorithms; Proteins; Training; active learning; big data; multiplex protein; protein subcellular localizaiton; transductive learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data, 2013 IEEE International Conference on
Conference_Location
Silicon Valley, CA
Type
conf
DOI
10.1109/BigData.2013.6691734
Filename
6691734
Link To Document