• DocumentCode
    3359122
  • Title

    Automatic Text Classification of sports blog data

  • Author

    Dalal, Mita K. ; Zaveri, Mukesh A.

  • Author_Institution
    Inf. Technol. Dept., Sarvajanik Coll. of Eng. & Technol., Surat, India
  • fYear
    2012
  • fDate
    11-13 Jan. 2012
  • Firstpage
    219
  • Lastpage
    222
  • Abstract
    Automatic Text Classification is a semi-supervised machine learning task that automatically assigns a given text document to a set of pre-defined categories based on the features extracted from its textual content. This paper attempts to automatically classify the textual entries made by bloggers on various sports blogs, to the appropriate category of sport by following steps like pre-processing, feature extraction and naïve Bayesian classification. Empirical evaluation of this technique has resulted in a classification accuracy of approximately 87% over the test set. In addition to classifying the textual entries of sports blogs, it is proposed that the extracted features themselves be further classified under more meaningful heads which results in generation of a semantic resource that lends greater understanding to the classification task. This semantic resource can be used for data mining requirements that arise in the future.
  • Keywords
    Bayes methods; Web sites; data mining; feature extraction; learning (artificial intelligence); pattern classification; semantic Web; sport; text analysis; automatic text classification; data mining requirements; feature extraction; naïve Bayesian classification; semantic resource; semi-supervised machine learning task; sports blog data; text document; Accuracy; Bayesian methods; Blogs; Feature extraction; Semantics; Text categorization; Training; automatic text classification; feature extraction; heuristics; intelligent data mining; machine learning; naïve Bayes classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computing, Communications and Applications Conference (ComComAp), 2012
  • Conference_Location
    Hong Kong
  • Print_ISBN
    978-1-4577-1717-8
  • Type

    conf

  • DOI
    10.1109/ComComAp.2012.6154802
  • Filename
    6154802