• DocumentCode
    650200
  • Title

    Predicting latent attributes of Twitter user by employing lexical features

  • Author

    Siswanto, Elisafina ; Khodra, Masayu Leylia

  • Author_Institution
    Sch. of Electr. Eng. & Inf., Inst. Teknol. Bandung, Bandung, Indonesia
  • fYear
    2013
  • fDate
    7-8 Oct. 2013
  • Firstpage
    176
  • Lastpage
    180
  • Abstract
    The rapid growth of social media, especially Twitter in Indonesia, has produced a large amount of user generated texts in the form of tweets. Since Twitter only provides the name and location of its users, we develop a classification system that predicts latent attributes of Twitter user based on his tweets. Latent attribute is an attribute that is not stated directly. Our system predicts age and job attributes of Twitter users that use Indonesian language. Classification model is developed by employing lexical features and three learning algorithms (Naïve Bayes, SVM, and Random Forest). Based on the experimental results, it can be concluded that the SVM method produces the best accuracy for balanced data.
  • Keywords
    Bayes methods; feature extraction; learning (artificial intelligence); natural language processing; pattern classification; social networking (online); support vector machines; text analysis; trees (mathematics); Indonesian language; SVM method; Twitter user; age attribute; classification model; classification system; job attribute; latent attribute predicting; learning algorithm; lexical features; naive Bayes; random forest; social media; tweets; user generated text; user location; user name; Twitter; age; classification; job; lexical; machine learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology and Electrical Engineering (ICITEE), 2013 International Conference on
  • Conference_Location
    Yogyakarta
  • Print_ISBN
    978-1-4799-0423-5
  • Type

    conf

  • DOI
    10.1109/ICITEED.2013.6676234
  • Filename
    6676234