• DocumentCode
    179755
  • Title

    Ensemble random projection for multi-label classification with application to protein subcellular localization

  • Author

    Shibiao Wan ; Man-Wai Mak ; Bai Zhang ; Yue Wang ; Sun-Yuan Kung

  • Author_Institution
    Dept. of Electron. & Inf. Eng., Hong Kong Polytech. Univ., Hong Kong, China
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    5999
  • Lastpage
    6003
  • Abstract
    The curse of dimensionality severely restricts the predictive power of multi-label classification systems. High-dimensional feature vectors may contain redundant or irrelevant information, causing the classification systems suffer from overfitting. To address this problem, this paper proposes a dimensionality-reduction method that applies random projection (RP) to construct an ensemble of multilabel classifiers. The merits of the proposed method are demonstrated through a multi-label protein classification task. Specifically, high-dimensional feature vectors are extracted from protein sequences using the gene ontology (GO) and Swiss-Prot databases. The feature vectors are then projected onto lower-dimensional spaces by random projection matrices whose elements conform to a distribution with zero mean and unit variance. The transformed low-dimensional vectors are classified by an ensemble of one-vs-rest multi-label support vector machine (SVM) classifiers, each corresponding to one of the RP matrices. The scores obtained from the ensemble are then fused for predicting the subcellular localization of proteins. Experimental results suggest that the proposed method can reduce the dimensions by seven folds and impressively improve the classification performance.
  • Keywords
    cellular biophysics; medical signal processing; proteins; support vector machines; SVM; Swiss-Prot databases; dimensionality-reduction method; gene ontology; multilabel classification systems; multilabel classifiers; multilabel protein classification; protein subcellular localization; proteins; support vector machine classifiers; Accuracy; Conferences; Feature extraction; Ontologies; Proteins; Support vector machines; Vectors; Dimension reduction; Multi-label classification; Protein subcellular localization; Random projection; Support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854755
  • Filename
    6854755