Title :
Enhancing techniques for efficient topic hierarchy integration
Author :
Tsay, Jyh-Jong ; Chang, Chi-Feng ; Chen, Hsuan-Yu ; Lin, Ching-Han
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Chung Cheng Univ., Chiayi, Taiwan
Abstract :
Here, we study the problem of integrating documents from different sources into a comprehensive topic hierarchy. Our objective is to develop efficient techniques that improve the accuracy of traditional categorization methods by incorporating categorization information provided by data sources into categorization process. Notice that in the World-Wide Web, categorization information is often available from information sources. We present several enhancing techniques that use categorization information to enhance traditional methods such as naive Bayes and support vector machines. Experiment on collections from Openfind and Yam, and Google and Yahoo!, well-known popular Web sites in Taiwan and USA, respectively, shows that our techniques significantly improve the classification accuracy from, for example, 55% to 66% for Naive Bayes, and from 57% to 67% for SVM for the data set collected from Yam and Openfind.
Keywords :
Bayes methods; Web sites; document handling; support vector machines; Naive Bayes method; Web sites; World-Wide Web; categorization information; data set; data sources; document integration; information sources; support vector machines; Bayesian methods; Classification tree analysis; Computer science; Councils; Decision trees; Neural networks; Niobium; Support vector machine classification; Support vector machines; Text categorization;
Conference_Titel :
Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
Print_ISBN :
0-7695-1978-4
DOI :
10.1109/ICDM.2003.1251001