DocumentCode :
909414
Title :
Bayesian Classifiers Programmed in SQL
Author :
Ordonez, Carlos ; Pitchaimalai, Sasi K.
Author_Institution :
Dept. of Comput. Sci., Univ. of Houston, Houston, TX, USA
Volume :
22
Issue :
1
fYear :
2010
Firstpage :
139
Lastpage :
144
Abstract :
The Bayesian classifier is a fundamental classification technique. In this work, we focus on programming Bayesian classifiers in SQL. We introduce two classifiers: naive Bayes and a classifier based on class decomposition using K-means clustering. We consider two complementary tasks: model computation and scoring a data set. We study several layouts for tables and several indexing alternatives. We analyze how to transform equations into efficient SQL queries and introduce several query optimizations. We conduct experiments with real and synthetic data sets to evaluate classification accuracy, query optimizations, and scalability. Our Bayesian classifier is more accurate than naive Bayes and decision trees. Distance computation is significantly accelerated with horizontal layout for tables, denormalization, and pivoting. We also compare naive Bayes implementations in SQL and C++: SQL is about four times slower. Our Bayesian classifier in SQL achieves high classification accuracy, can efficiently analyze large data sets, and has linear scalability.
Keywords :
Bayes methods; C++ language; SQL; pattern classification; pattern clustering; query processing; Bayesian classifier; C++ language; K-means clustering; SQL queries; data set scoring; distance computation; linear scalability; model computation; naive Bayes; query optimization; Classification; K-means; query optimization.;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2009.127
Filename :
4967589
Link To Document :
بازگشت