Title :
Fuzzy Naive Bayes classifier based on fuzzy clustering
Author :
Tang, Yongchuan ; Pan, Wuming ; Li, Haiming ; Xu, Yang
Author_Institution :
Dept. of Appl. Math., Southwest Jiaotong Univ., Sichuan, China
Abstract :
Despite its unrealistic independence assumption, the Naive Bayes classifier is remarkably successful in practice. In the Naive Bayes classifier, all variables are assumed to be nominal variables, it means that each variable has a finite number of values. But in large databases, the variables (or fields) often take continuous values or have a large number of numerical values. So many researchers discussed the discretization (or crisp partitioning) of the domain of the continuous variables. We generalize the Naive Bayes classifier to the situation in which the fuzzy partition of the variable domains instead of discretization is taken. Therefore each variable in the Fuzzy Naive Bayes classifier can take a linguistic value or fuzzy set. From the observed data set one method of estimating the conditional probabilities in the Fuzzy Naive Bayes classifier is proposed in this paper. For each numeric input the method to predict its class label using the fuzzy Naive Bayes classifier is presented. In the training phase of the classifier, the training data (just including the feature variables without class labels) is first clustered in an unsupervised way by fuzzy c-means or a similar algorithm. Then the optimal cluster centers of training data are used to determine the fuzzy partition of the feature variables space. This generalization can decrease the complexity of learning optimal discretization which the classical Naive Bayes Classifier often faces, reduce the loss of information because of the discretization and increase the power of dealing with imprecise data and the large databases. Some well-known classification problems in the machine learning field have been tested in this paper, the results show that the Fuzzy Naive Bayes classifier is an alternative and effective tool to deal with the classification problem which has continuous variables.
Keywords :
Bayes methods; belief networks; fuzzy logic; fuzzy set theory; learning (artificial intelligence); pattern classification; pattern clustering; probability; uncertainty handling; very large databases; Fuzzy Naive Bayes classifier; belief network; classification; conditional probabilities; fuzzy c-means; fuzzy clustering; fuzzy set; imprecise data; independence assumption; large databases; linguistic value; machine learning; nominal variables; Bayesian methods; Clustering algorithms; Databases; Fuzzy control; Fuzzy logic; Fuzzy set theory; Fuzzy sets; Humans; Mathematics; Uncertainty;
Conference_Titel :
Systems, Man and Cybernetics, 2002 IEEE International Conference on
Print_ISBN :
0-7803-7437-1
DOI :
10.1109/ICSMC.2002.1176401