Title :
The Convergence Behavior of Naive Bayes on Large Sparse Datasets
Author :
Xiang Li;Charles X. Ling;Huaimin Wang
Author_Institution :
Comput. Sci. Dept., Univ. of Western Ontario, London, ON, Canada
Abstract :
Large and sparse datasets with a lot of missing values are common in the big data era. Naive Bayes is a good classification algorithm for such datasets, as its time and space complexity scales well with the size of non-missing values. However, several important questions about the behavior of naive Bayes are yet to be answered. For example, how different mechanisms of missing, data sparseness and the number of attributes systematically affect the learning curves and convergence? Recent work in classifying large and sparse real-world datasets still could not address these questions mainly because the data missing mechanisms of these datasets are not taken into account. In this paper, we propose two novel data missing and expansion mechanisms to answer these questions. We use the data missing mechanisms to generate large and sparse data with various properties, and study the entire learning curve and convergence behavior of naive Bayes. We made several observations, which are verified through detailed theoretical study. Our results are useful for learning large sparse data in practice.
Keywords :
"Prototypes","Convergence","Motion pictures","Upper bound","Big data","Training","Complexity theory"
Conference_Titel :
Data Mining (ICDM), 2015 IEEE International Conference on
DOI :
10.1109/ICDM.2015.53