• DocumentCode
    2347411
  • Title

    Automatic classification of documents by formality

  • Author

    Abu Sheikha, Fadi ; Inkpen, Diana

  • Author_Institution
    SITE, Univ. of Ottawa, Ottawa, ON, Canada
  • fYear
    2010
  • fDate
    21-23 Aug. 2010
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    This paper addresses the task of classifying documents into formal or informal style. We studied the main characteristics of each style in order to choose features that allowed us to train classifiers that can distinguish between the two styles. We built our data set by collecting documents for both styles, from different sources. We tested several classification algorithms, namely Decision Trees, Naïve Bayes, and Support Vector Machines, to choose the classifier that leads to the best classification results. We performed attribute selection in order to determine the contribution of each feature to our model.
  • Keywords
    decision trees; document handling; pattern classification; support vector machines; Decision Trees; Naïve Bayes; automatic classification; documents classification; support vector machines; Formal Style; Informal Style; Text Classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-6896-6
  • Type

    conf

  • DOI
    10.1109/NLPKE.2010.5587767
  • Filename
    5587767