DocumentCode :
2209816
Title :
On the Computation of Stochastic Search Variable Selection in Linear Regression with UDFs
Author :
Navas, Mario ; Ordonez, Carlos ; Baladandayuthapani, Veerabhadran
Author_Institution :
Dept. of Comput. Sci., Univ. of Houston, Houston, TX, USA
fYear :
2010
fDate :
13-17 Dec. 2010
Firstpage :
941
Lastpage :
946
Abstract :
Computing Bayesian statistics with traditional techniques is extremely slow, specially when large data has to be exported from a relational DBMS. We propose algorithms for large scale processing of stochastic search variable selection (SSVS) for linear regression that can work entirely inside a DBMS. The traditional SSVS algorithm requires multiple scans of the input data in order to compute a regression model. Due to our optimizations, SSVS can be done in either one scan over the input table for large number of records with sufficient statistics, or one scan per iteration for high-dimensional data. We consider storage layouts which efficiently exploit DBMS parallel processing of aggregate functions. Experimental results demonstrate correctness, convergence and performance of our algorithms. Finally, the algorithms show good scalability for data with a very large number of records, or a very high number of dimensions.
Keywords :
Bayes methods; data mining; iterative methods; optimisation; regression analysis; relational databases; stochastic processes; Bayesian statistics; SSVS algorithm; UDF; data mining; data scanning; linear regression; parallel processing; relational DBMS; stochastic search variable selection; user defined function; Bayesian statistics; UDF; variable selection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2010 IEEE 10th International Conference on
Conference_Location :
Sydney, NSW
ISSN :
1550-4786
Print_ISBN :
978-1-4244-9131-5
Electronic_ISBN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2010.79
Filename :
5694065
Link To Document :
بازگشت