• DocumentCode
    3046990
  • Title

    Document Categorization with Entropy Based TF/IDF Classifier

  • Author

    Lu, Yi-Hong ; Huang, Yan

  • Author_Institution
    Coll. of Inf. Eng., Zhejiang Univ. of Technologe, Hangzhou, China
  • Volume
    4
  • fYear
    2009
  • fDate
    19-21 May 2009
  • Firstpage
    269
  • Lastpage
    273
  • Abstract
    The task of text categorization is assigning a given text document to one or more predefined categories. High availability of digital data requires methods for automatic processing of this data. Day-by day increase of this digital data gives rise to the need of fast and better text classifiers. This paper mainly focuses on classifying data in context of text categorization. This paper reports a study conducted on 20 news group dataset, using TFIDF in the context of document categorization. Feature selection is added to this result to improvise the categorization. The results achieved using this algorithm are very promising when compared to conventional methods with features chosen on the basis of bag-of-words text.
  • Keywords
    text analysis; TF/IDF classifier; bag-of-words text; document categorization; entropy; feature selection; mutual gain information; text categorization; Availability; Clustering algorithms; Computer science; Educational institutions; Entropy; Information filtering; Information filters; Intelligent systems; Mutual information; Text categorization; Entropy; Mutual gain information; TFIDF;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems, 2009. GCIS '09. WRI Global Congress on
  • Conference_Location
    Xiamen
  • Print_ISBN
    978-0-7695-3571-5
  • Type

    conf

  • DOI
    10.1109/GCIS.2009.311
  • Filename
    5209295