• DocumentCode
    590946
  • Title

    Improvement in automatic classification of Persian documents by means of Naïve Bayes and Representative Vector

  • Author

    Jafari, Aghil ; Hosseinejad, M. ; Amiri, Ali

  • Author_Institution
    Islamic Azad Univ. of Zanjan, Zanjan, Iran
  • fYear
    2011
  • fDate
    13-14 Oct. 2011
  • Firstpage
    226
  • Lastpage
    229
  • Abstract
    Representative Vector is a kind of Vector which includes related words and the degree of their relationships. In this paper the effect of using this kind of Vector on automatic classification of Persian documents is examined. In this method, preprocessed documents, extra words as well as word stems are at first found. Next, through one of the known ways, some features are extracted for each category. Then, the Representative Vector, which is made based on the elicited features, leads to some more detailed words which are better Representatives for each category. Findings of the experiments show that Precision and Recall can be increased significantly by extra words omission and addition of few words in the Representative Vectors as well as the use of a famous classification model like Naïve Bayes.
  • Keywords
    Bayes methods; classification; document handling; Naive Bayes; Persian documents; automatic classification model; feature extraction; representative vector; Computers; Educational institutions; Information retrieval; Semantics; Support vector machine classification; Text categorization; Vectors; Documents Classification; Naïve Bayes Classifier; Representative Vector; Stemming;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Knowledge Engineering (ICCKE), 2011 1st International eConference on
  • Conference_Location
    Mashhad
  • Print_ISBN
    978-1-4673-5712-8
  • Type

    conf

  • DOI
    10.1109/ICCKE.2011.6413355
  • Filename
    6413355