DocumentCode :
2792622
Title :
Experiments on Supervised Learning Algorithms for Text Categorization
Author :
Namburu, Setu Madhavi ; Tu, Haiying ; Luo, Jianhui ; Pattipati, Krishna R.
Author_Institution :
Dept. of Electr. & Comput. Eng., Connecticut Univ., Storrs, CT
fYear :
2005
fDate :
5-12 March 2005
Firstpage :
1
Lastpage :
8
Abstract :
Modern information society is facing the challenge of handling massive volume of online documents, news, intelligence reports, and so on. How to use the information accurately and in a timely manner becomes a major concern in many areas. While the general information may also include images and voice, we focus on the categorization of text data in this paper. We provide a brief overview of the information processing flow for text categorization, and discuss two supervised learning algorithms, viz., support vector machines (SVM) and partial least squares (PLS), which have been successfully applied in other domains, e.g., fault diagnosis Error! Reference source not found.. While SVM has been well explored for binary classification and was reported as an efficient algorithm for text categorization, PLS has not yet been applied to text categorization. Our experiments are conducted on three data sets: Reuter´s-21578 dataset about corporate mergers and data acquisitions (ACQ), WebKB and the 20-Newsgroups. Results show that the performance of PLS is comparable to SVM in text categorization. A major drawback of SVM for multi-class categorization is that it requires a voting scheme based on the results of pair-wise classification. PLS does not have this drawback and could be a better candidate for multi-class text categorization
Keywords :
classification; learning (artificial intelligence); least squares approximations; support vector machines; text analysis; 20-Newsgroups; Reuters-21578 dataset; WebKB; binary classification; corporate mergers; data acquisitions; information processing flow; multiclass categorization; pair-wise classification; partial least squares; supervised learning algorithms; support vector machines; text categorization; voting scheme; Corporate acquisitions; Data acquisition; Document handling; Fault diagnosis; Information processing; Least squares methods; Supervised learning; Support vector machine classification; Support vector machines; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Aerospace Conference, 2005 IEEE
Conference_Location :
Big Sky, MT
Print_ISBN :
0-7803-8870-4
Type :
conf
DOI :
10.1109/AERO.2005.1559612
Filename :
1559612
Link To Document :
بازگشت