Title :
Business email classification using incremental subspace learning
Author :
Min Li ; Youngja Park ; Rui Ma ; He Yuan Huang
Author_Institution :
IBM Res. China, China
Abstract :
We consider a new text classification task: classifying enterprise email messages into sensitive business topics. The identification of sensitive topics in email messages is important for enterprises to safeguard their critical data such as intellectual properties and trade secrets. We introduce the incremental PCA (Principal Component Analysis) to email representation, which can learn a feature subspace incrementally and effectively to reduce the feature dimensionality. Linear SVM (Support Vector Machine) is then adopted to learn the classification models. We validate our approaches with 5,000 emails extracted from the Enron Email set. Experimental results show that SVM outperforms other classification methods, and the incremental PCA produces a substantial reduction in the processing time and a slight increase in the classification accuracy compared to SVM with all the features.
Keywords :
electronic mail; feature extraction; learning (artificial intelligence); pattern classification; principal component analysis; security of data; support vector machines; text analysis; Enron email set; business email classification; classification models; email extraction; email representation; enterprise email message classification; feature dimensionality; feature subspace; incremental PCA; incremental principal component analysis; incremental subspace learning; intellectual properties; linear SVM; linear support vector machine; sensitive business topics; sensitive topic identification; substantial reduction; text classification task; trade secrets; Accuracy; Companies; Electronic mail; Feature extraction; Principal component analysis; Support vector machines;
Conference_Titel :
Pattern Recognition (ICPR), 2012 21st International Conference on
Conference_Location :
Tsukuba
Print_ISBN :
978-1-4673-2216-4