• DocumentCode
    3273803
  • Title

    Document Classification through Building Specified N-Gram

  • Author

    Ko, Byeongkyu ; Choi, Dongjin ; Choi, Chang ; Choi, Junho ; Kim, Pankoo

  • Author_Institution
    Dept. of Comput. Eng., Chosun Univ., Gwangju, South Korea
  • fYear
    2012
  • fDate
    4-6 July 2012
  • Firstpage
    171
  • Lastpage
    176
  • Abstract
    This paper proposed a method to classify textural documents using specified n-gram data set. Human lives in the world where web documents have a great potential and the amount of valuable information has been consistently growing over the year. There is a problem that finding relevant web documents corresponding to what users want is more difficult due to the huge amount of web size. For this reason, many approaches have been suggested to overcome this obstacle. The most important task is classifying textural documents into predefined categories. Over the years, many statistical approaches were introduced though, no one can find perfect solution yet. In this paper, we suggest a method for textural document classification using n-gram model. The n-gram data frequency has a great potential to find similarities between documents. For this reason, we construct our own n-gram data sets from research papers. If an unknown document comes to the system, the system will extract n-grams from the given unknown documents. After this step, n-grams from unknown document and n-grams in previous data sets will be compared by proposed similarity measurement. The precision rate of this method comes to 86%.
  • Keywords
    Internet; pattern classification; statistical analysis; text analysis; Web documents; document similarity measurement; n-gram data set frequency extraction; precision rate; statistical approaches; textural document classification; Buildings; Computers; Databases; Google; HTML; Support vector machines; Training; Document Classification; N-gram; NLP; Statistical Language Modeling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS), 2012 Sixth International Conference on
  • Conference_Location
    Palermo
  • Print_ISBN
    978-1-4673-1328-5
  • Type

    conf

  • DOI
    10.1109/IMIS.2012.142
  • Filename
    6296850