Title :
Dimension reduction for p53 protein recognition by using incremental partial least squares
Author :
Xue-Qiang Zeng ; Guo-Zheng Li
Author_Institution :
Comput. Center, Nanchang Univ., Nanchang, China
Abstract :
As an important tumor suppressor protein, reactivate mutated p53 was found in many kinds of human cancers and that restoring active p53 would lead to tumor regression. In recent years, more and more data extracted from biophysical simulations, which makes the modelling of mutant p53 transcriptional activity suffers from the problems of huge amount instances and very high feature dimension. Incremental feature extraction is effective to facilitate analysis of large-scale big data. However, most current incremental feature extraction methods are not suitable for processing big data with high feature dimension. In addition, feature extraction methods should improve performance of further classification. Therefore, incremental feature extraction methods need to be more efficient and effective. Partial Least Squares (PLS) has been demonstrated to be an effective dimension reduction technique for classification. But, how to apply PLS on big data is still an open problem. In this paper, we design a highly efficient and powerful algorithm named Incremental Partial Least Squares (IPLS), which conducts a two-stage extraction process. In the first stage, the PLS target function is adapted to be incremental with updating historical mean to extract the leading projection direction. In the last stage, the other projection directions are calculated through equivalence between the PLS vectors and the Krylov sequence. We compare IPLS with some state-of-the-arts incremental feature extraction methods like Incremental Principal Component Analysis, Incremental Maximum Margin Criterion and Incremental Inter-class Scatter on real p53 proteins data. Empirical results show IPLS performs better than other methods in terms of balanced classification accuracy.
Keywords :
data analysis; feature extraction; least squares approximations; medical computing; pattern classification; principal component analysis; proteins; tumours; IPLS; Incremental Inter-class Scatter; Incremental Maximum Margin Criterion; Incremental Partial Least Squares; Incremental Principal Component Analysis; Krylov sequence; balanced classification accuracy; biophysical simulations; data extraction; data processing; dimension reduction technique; human cancers; incremental feature extraction; incremental partial least squares; mutated p53; p53 protein recognition; transcriptional activity; tumor regression; tumor suppressor protein; two-stage extraction process; Cancer; Data handling; Data storage systems; Feature extraction; Information management; Proteins; Vectors; Big Data; Feature Extraction; Incremental Learning; Partial Least Squares; p53 Protein;
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on
Conference_Location :
Shanghai
DOI :
10.1109/BIBM.2013.6732522