Title :
Big data feature selection and projection for gender prediction based on user web behaviour
Author :
Gulsen, Esra ; Gunduz, Hakan ; Cataltepe, Zehra ; Serinol, Levent
Author_Institution :
Bilgisayar Muhendisligi Bolumu, Istanbul Tek. Univ., Istanbul, Turkey
Abstract :
Prediction of a visitors´ gender and other demographic information helps with the presentation of the appropriate content to the user. In this paper, we perform gender prediction based on Turkish users´ web log data. Our methods use three different sets of features, namely the URLs (Uniform Resource Locator), the textual contents and the DMOZ (from directory.mozilla.org) hierarchies of the pages visited by each user. Since we have a sparse high-dimensional input dataset, first we apply Information Gain and Chi-square based feature selection. We use a MapReduce based approach to compute these feature relevance measures. We also apply stochastic singular value decomposition (SSVD) feature projection method. We present gender classification results, based on these feature selection and projection methods, using the Logistic Regression classifier. Using the Logistic Regression classifier on the selected URL features results in the best performance.
Keywords :
Big Data; gender issues; pattern classification; regression analysis; singular value decomposition; Big Data feature selection; Big Data projection; DMOZ feature; MapReduce based approach; SSVD feature projection method; Turkish user; URL feature; chi-square based feature selection; demographic information; feature relevance measures; gender prediction; information gain; logistic regression classifier; sparse high-dimensional input dataset; stochastic singular value decomposition; textual contents feature; uniform resource locator; user Web behaviour; user content; Big data; Internet; Logistics; Principal component analysis; Stochastic processes; Uniform resource locators; Turkish web mining; chi-square; feature selection; gender prediction; information gain; multimodal classification; singular value decomposition;
Conference_Titel :
Signal Processing and Communications Applications Conference (SIU), 2015 23th
Conference_Location :
Malatya
DOI :
10.1109/SIU.2015.7130141