DocumentCode :
2871529
Title :
Big data feature selection and projection for gender prediction based on user web behaviour
Author :
Gulsen, Esra ; Gunduz, Hakan ; Cataltepe, Zehra ; Serinol, Levent
Author_Institution :
Bilgisayar Muhendisligi Bolumu, Istanbul Tek. Univ., Istanbul, Turkey
fYear :
2015
fDate :
16-19 May 2015
Firstpage :
1545
Lastpage :
1548
Abstract :
Prediction of a visitors´ gender and other demographic information helps with the presentation of the appropriate content to the user. In this paper, we perform gender prediction based on Turkish users´ web log data. Our methods use three different sets of features, namely the URLs (Uniform Resource Locator), the textual contents and the DMOZ (from directory.mozilla.org) hierarchies of the pages visited by each user. Since we have a sparse high-dimensional input dataset, first we apply Information Gain and Chi-square based feature selection. We use a MapReduce based approach to compute these feature relevance measures. We also apply stochastic singular value decomposition (SSVD) feature projection method. We present gender classification results, based on these feature selection and projection methods, using the Logistic Regression classifier. Using the Logistic Regression classifier on the selected URL features results in the best performance.
Keywords :
Big Data; gender issues; pattern classification; regression analysis; singular value decomposition; Big Data feature selection; Big Data projection; DMOZ feature; MapReduce based approach; SSVD feature projection method; Turkish user; URL feature; chi-square based feature selection; demographic information; feature relevance measures; gender prediction; information gain; logistic regression classifier; sparse high-dimensional input dataset; stochastic singular value decomposition; textual contents feature; uniform resource locator; user Web behaviour; user content; Big data; Internet; Logistics; Principal component analysis; Stochastic processes; Uniform resource locators; Turkish web mining; chi-square; feature selection; gender prediction; information gain; multimodal classification; singular value decomposition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing and Communications Applications Conference (SIU), 2015 23th
Conference_Location :
Malatya
Type :
conf
DOI :
10.1109/SIU.2015.7130141
Filename :
7130141
Link To Document :
بازگشت