Title :
Twitter gender classification using user unstructured information
Author :
Marco Vicente;Fernando Batista;Joao Paulo Carvalho
Author_Institution :
INESC-ID, ISCTE-IUL, Lisboa, Portugal
Abstract :
This paper describes an approach to automatically detect the gender of Twitter users, based only on clues provided by their profile information in an unstructured form. A number of features that capture phenomena specific of Twitter users is proposed and evaluated on a dataset of about 242K English language users. Different supervised and unsupervised approaches are used to assess the performance of the proposed features, including Naive Bayes variants, Logistic Regression, Support Vector Machines, Fuzzy c-Means clustering, and K-means. An unsupervised approach based on Fuzzy c-Means proved to be very suitable for this task, returning the correct gender for about 96% of the users.
Keywords :
"Feature extraction","Twitter","Dictionaries","Accuracy","Support vector machines","Blogs"
Conference_Titel :
Fuzzy Systems (FUZZ-IEEE), 2015 IEEE International Conference on
DOI :
10.1109/FUZZ-IEEE.2015.7338102