• DocumentCode
    1971310
  • Title

    Document classification efficiency of phrase-based techniques

  • Author

    Kapalavayi, Nagesh ; Murthy, S. N Jayaram ; Hu, Gongzhu

  • Author_Institution
    Dept. of Comput. Sci., Central Michigan Univ., Mount Pleasant, MI
  • fYear
    2009
  • fDate
    10-13 May 2009
  • Firstpage
    174
  • Lastpage
    178
  • Abstract
    Due to the exponential growth of available text documents in digital form, it is of great importance to develop techniques for automatic document classification based on the textual contents. Earlier document classification techniques have used keyword-based features and related statistics to achieve good results when applied to certain datasets. More recently, some of these techniques have been extended to include phrase-based and concept-based features to achieve better results. Since the characteristics of data sets used by each of these research groups are remarkably different, it is not possible to compare the efficiency of these methods. In this paper, we present a study that uses the same data set to compare efficiency of a phrase-based technique with key-word based techniques. Results prove conclusively that use of phrase-based features is very effective in document classification.
  • Keywords
    classification; statistical analysis; text analysis; document classification; keyword based feature; phrase based technique; statistical dataset; text document; textual content; Computer science; Data engineering; Data mining; Databases; Information retrieval; Natural language processing; Programming profession; Statistics; Synthetic aperture sonar; Text mining; document classication; keyword-based and phrase-based features; text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Systems and Applications, 2009. AICCSA 2009. IEEE/ACS International Conference on
  • Conference_Location
    Rabat
  • Print_ISBN
    978-1-4244-3807-5
  • Electronic_ISBN
    978-1-4244-3806-8
  • Type

    conf

  • DOI
    10.1109/AICCSA.2009.5069321
  • Filename
    5069321