• DocumentCode
    3734255
  • Title

    Effect of different feature types on age based classification of short texts

  • Author

    Avar Pentel

  • Author_Institution
    Institute of Informatics, Tallinn University, Tallinn, Estonia
  • fYear
    2015
  • fDate
    7/1/2015 12:00:00 AM
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    The aim of the current study is to compare the effect of three different feature types for age-based categorization of short texts as average 85 words per author. Besides widely used word and character n-grams, text readability features are proposed as an alternative. By readability features we mean different relative ratios of text elements as characters per word, words per sentence, etc. Support Vector Machines, Logistic Regression, and Bayesian algorithms were used to build models. Most effective features were readability features and character n-grams. Model generated by Support Vector Machine and combined feature set yield to f-score 0.968. Age prediction application was built using a model with readability features.
  • Keywords
    "Feature extraction","Support vector machines","Indexes","Classification algorithms","Training","Logistics"
  • Publisher
    ieee
  • Conference_Titel
    Information, Intelligence, Systems and Applications (IISA), 2015 6th International Conference on
  • Type

    conf

  • DOI
    10.1109/IISA.2015.7388069
  • Filename
    7388069