• DocumentCode
    2370940
  • Title

    Enhancing techniques for efficient topic hierarchy integration

  • Author

    Tsay, Jyh-Jong ; Chang, Chi-Feng ; Chen, Hsuan-Yu ; Lin, Ching-Han

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Chung Cheng Univ., Chiayi, Taiwan
  • fYear
    2003
  • fDate
    19-22 Nov. 2003
  • Firstpage
    657
  • Lastpage
    660
  • Abstract
    Here, we study the problem of integrating documents from different sources into a comprehensive topic hierarchy. Our objective is to develop efficient techniques that improve the accuracy of traditional categorization methods by incorporating categorization information provided by data sources into categorization process. Notice that in the World-Wide Web, categorization information is often available from information sources. We present several enhancing techniques that use categorization information to enhance traditional methods such as naive Bayes and support vector machines. Experiment on collections from Openfind and Yam, and Google and Yahoo!, well-known popular Web sites in Taiwan and USA, respectively, shows that our techniques significantly improve the classification accuracy from, for example, 55% to 66% for Naive Bayes, and from 57% to 67% for SVM for the data set collected from Yam and Openfind.
  • Keywords
    Bayes methods; Web sites; document handling; support vector machines; Naive Bayes method; Web sites; World-Wide Web; categorization information; data set; data sources; document integration; information sources; support vector machines; Bayesian methods; Classification tree analysis; Computer science; Councils; Decision trees; Neural networks; Niobium; Support vector machine classification; Support vector machines; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
  • Print_ISBN
    0-7695-1978-4
  • Type

    conf

  • DOI
    10.1109/ICDM.2003.1251001
  • Filename
    1251001