DocumentCode :
2164542
Title :
Visualizing Distributions and Classification Accuracy
Author :
Groth, Dennis P.
Author_Institution :
Indiana Univ. Sch. of Informatics, Bloomington, IN
fYear :
2006
fDate :
5-7 July 2006
Firstpage :
389
Lastpage :
394
Abstract :
Data mining is the search for novel, actionable information within data. It is important to note that the number of records in the data being analyzed is only one (and perhaps a small) factor in determining the complexity of a given data mining technique. Most complexity in data mining arises from the distribution of values contained in the data - not the number of records. In this paper, we utilize straightforward histogram-based visualizations to gain insight into how the performance of a well-studied data mining technique, the naive-Bayes classifier, performs under various discretization schemes for both continuous and discrete values. The resulting visualization system provides users with a tool that describes the underlying model of the data used by the classifier. Exploratory visualizations of the distributions of training data can be selected based on expert domain knowledge and then combined to apply to the test data
Keywords :
Bayes methods; data mining; data visualisation; pattern classification; data mining; discretization scheme; expert domain knowledge; histogram-based visualization; naive-Bayes classifier; Association rules; Cleaning; Data analysis; Data mining; Data visualization; Informatics; Performance gain; Remuneration; Testing; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Visualization, 2006. IV 2006. Tenth International Conference on
Conference_Location :
London, England
ISSN :
1550-6037
Print_ISBN :
0-7695-2602-0
Type :
conf
DOI :
10.1109/IV.2006.129
Filename :
1648290
Link To Document :
بازگشت