Title :
Predicting latent attributes of Twitter user by employing lexical features
Author :
Siswanto, Elisafina ; Khodra, Masayu Leylia
Author_Institution :
Sch. of Electr. Eng. & Inf., Inst. Teknol. Bandung, Bandung, Indonesia
Abstract :
The rapid growth of social media, especially Twitter in Indonesia, has produced a large amount of user generated texts in the form of tweets. Since Twitter only provides the name and location of its users, we develop a classification system that predicts latent attributes of Twitter user based on his tweets. Latent attribute is an attribute that is not stated directly. Our system predicts age and job attributes of Twitter users that use Indonesian language. Classification model is developed by employing lexical features and three learning algorithms (Naïve Bayes, SVM, and Random Forest). Based on the experimental results, it can be concluded that the SVM method produces the best accuracy for balanced data.
Keywords :
Bayes methods; feature extraction; learning (artificial intelligence); natural language processing; pattern classification; social networking (online); support vector machines; text analysis; trees (mathematics); Indonesian language; SVM method; Twitter user; age attribute; classification model; classification system; job attribute; latent attribute predicting; learning algorithm; lexical features; naive Bayes; random forest; social media; tweets; user generated text; user location; user name; Twitter; age; classification; job; lexical; machine learning;
Conference_Titel :
Information Technology and Electrical Engineering (ICITEE), 2013 International Conference on
Conference_Location :
Yogyakarta
Print_ISBN :
978-1-4799-0423-5
DOI :
10.1109/ICITEED.2013.6676234