DocumentCode :
594792
Title :
Business email classification using incremental subspace learning
Author :
Min Li ; Youngja Park ; Rui Ma ; He Yuan Huang
Author_Institution :
IBM Res. China, China
fYear :
2012
fDate :
11-15 Nov. 2012
Firstpage :
625
Lastpage :
628
Abstract :
We consider a new text classification task: classifying enterprise email messages into sensitive business topics. The identification of sensitive topics in email messages is important for enterprises to safeguard their critical data such as intellectual properties and trade secrets. We introduce the incremental PCA (Principal Component Analysis) to email representation, which can learn a feature subspace incrementally and effectively to reduce the feature dimensionality. Linear SVM (Support Vector Machine) is then adopted to learn the classification models. We validate our approaches with 5,000 emails extracted from the Enron Email set. Experimental results show that SVM outperforms other classification methods, and the incremental PCA produces a substantial reduction in the processing time and a slight increase in the classification accuracy compared to SVM with all the features.
Keywords :
electronic mail; feature extraction; learning (artificial intelligence); pattern classification; principal component analysis; security of data; support vector machines; text analysis; Enron email set; business email classification; classification models; email extraction; email representation; enterprise email message classification; feature dimensionality; feature subspace; incremental PCA; incremental principal component analysis; incremental subspace learning; intellectual properties; linear SVM; linear support vector machine; sensitive business topics; sensitive topic identification; substantial reduction; text classification task; trade secrets; Accuracy; Companies; Electronic mail; Feature extraction; Principal component analysis; Support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (ICPR), 2012 21st International Conference on
Conference_Location :
Tsukuba
ISSN :
1051-4651
Print_ISBN :
978-1-4673-2216-4
Type :
conf
Filename :
6460212
Link To Document :
بازگشت