DocumentCode
650200
Title
Predicting latent attributes of Twitter user by employing lexical features
Author
Siswanto, Elisafina ; Khodra, Masayu Leylia
Author_Institution
Sch. of Electr. Eng. & Inf., Inst. Teknol. Bandung, Bandung, Indonesia
fYear
2013
fDate
7-8 Oct. 2013
Firstpage
176
Lastpage
180
Abstract
The rapid growth of social media, especially Twitter in Indonesia, has produced a large amount of user generated texts in the form of tweets. Since Twitter only provides the name and location of its users, we develop a classification system that predicts latent attributes of Twitter user based on his tweets. Latent attribute is an attribute that is not stated directly. Our system predicts age and job attributes of Twitter users that use Indonesian language. Classification model is developed by employing lexical features and three learning algorithms (Naïve Bayes, SVM, and Random Forest). Based on the experimental results, it can be concluded that the SVM method produces the best accuracy for balanced data.
Keywords
Bayes methods; feature extraction; learning (artificial intelligence); natural language processing; pattern classification; social networking (online); support vector machines; text analysis; trees (mathematics); Indonesian language; SVM method; Twitter user; age attribute; classification model; classification system; job attribute; latent attribute predicting; learning algorithm; lexical features; naive Bayes; random forest; social media; tweets; user generated text; user location; user name; Twitter; age; classification; job; lexical; machine learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Technology and Electrical Engineering (ICITEE), 2013 International Conference on
Conference_Location
Yogyakarta
Print_ISBN
978-1-4799-0423-5
Type
conf
DOI
10.1109/ICITEED.2013.6676234
Filename
6676234
Link To Document