Title :
A Comparative Study on Supervised and Unsupervised Learning Approaches for Multilingual Text Categorization
Author :
Lee, Chung-Hong ; Yang, Hsin-Chang ; Chen, Ting-Chung ; Ma, Sheng-Min
Author_Institution :
Dept. of Electr. Eng., Nat. Kaohsiung Univ. of Appl. Sci.
fDate :
Aug. 30 2006-Sept. 1 2006
Abstract :
Recently users of internationally distributed information networks need tools and methods that enable them to discover, retrieve and categorize relevant information, in whatever language and form it may have been stored. This drives a convergence of numerous interests from diverse research communities focusing on the issues related to multilingual text categorization. In this work we compare and evaluate the performance of the leading supervised and unsupervised approaches for multilingual text categorization by using various performance measures and standard document corpora. For simplicity, we selected support vector machines (SVM) and latent semantic indexing (LSI) techniques as representatives of supervised and unsupervised methods for multilingual text categorization, respectively. The preliminary results show that our platform models including both supervised and unsupervised learning methods have the potentials for multilingual text categorization
Keywords :
indexing; semantic Web; support vector machines; text analysis; unsupervised learning; SVM; distributed information network; information retrieval; latent semantic indexing; multilingual text categorization; supervised learning; support vector machines; unsupervised learning; Convergence; Humans; Indexing; Information retrieval; Large scale integration; Measurement standards; Natural languages; Support vector machines; Text categorization; Unsupervised learning;
Conference_Titel :
Innovative Computing, Information and Control, 2006. ICICIC '06. First International Conference on
Conference_Location :
Beijing
Print_ISBN :
0-7695-2616-0
DOI :
10.1109/ICICIC.2006.189