Title :
Content-Based Methods for Predicting Web-Site Demographic Attributes
Author :
Kabbur, Santosh ; Han, Eui-Hong ; Karypis, George
Author_Institution :
Dept. of Comput. Sci., Univ. of Minnesota, Twin Cities, MN, USA
Abstract :
Demographic information plays an important role in gaining valuable insights about a web-site´s user-base and is used extensively to target online advertisements and promotions. This paper investigates machine-learning approaches for predicting the demographic attributes of web-sites using information derived from their content and their hyper linked structure and not relying on any information directly or indirectly obtained from the web-site´s users. Such methods are important because users are becoming increasingly more concerned about sharing their personal and behavioral information on the Internet. Regression-based approaches are developed and studied for predicting demographic attributes that utilize different content-derived features, different ways of building the prediction models, and different ways of aggregating web-page level predictions that take into account the web´s hyper linked structure. In addition, a matrix-approximation based approach is developed for coupling the predictions of individual regression models into a model designed to predict the probability mass function of the attribute. Extensive experiments show that these methods are able to achieve an RMSE of 8-10% and provide insights on how to best train and apply such models.
Keywords :
Internet; Web sites; advertising data processing; approximation theory; content-based retrieval; demography; learning (artificial intelligence); matrix algebra; regression analysis; Internet; Web hyperlinked structure; Web site demographic attributes prediction; Web-page level predictions; content based method; hyperlinked structure; machine-learning approaches; matrix-approximation based approach; online advertisements; probability mass function; regression based approach; Content Based Models; Demographic Attribute Prediction; Inlink Count; Probability Mass Function; Regression;
Conference_Titel :
Data Mining (ICDM), 2010 IEEE 10th International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4244-9131-5
Electronic_ISBN :
1550-4786
DOI :
10.1109/ICDM.2010.97