DocumentCode
2164542
Title
Visualizing Distributions and Classification Accuracy
Author
Groth, Dennis P.
Author_Institution
Indiana Univ. Sch. of Informatics, Bloomington, IN
fYear
2006
fDate
5-7 July 2006
Firstpage
389
Lastpage
394
Abstract
Data mining is the search for novel, actionable information within data. It is important to note that the number of records in the data being analyzed is only one (and perhaps a small) factor in determining the complexity of a given data mining technique. Most complexity in data mining arises from the distribution of values contained in the data - not the number of records. In this paper, we utilize straightforward histogram-based visualizations to gain insight into how the performance of a well-studied data mining technique, the naive-Bayes classifier, performs under various discretization schemes for both continuous and discrete values. The resulting visualization system provides users with a tool that describes the underlying model of the data used by the classifier. Exploratory visualizations of the distributions of training data can be selected based on expert domain knowledge and then combined to apply to the test data
Keywords
Bayes methods; data mining; data visualisation; pattern classification; data mining; discretization scheme; expert domain knowledge; histogram-based visualization; naive-Bayes classifier; Association rules; Cleaning; Data analysis; Data mining; Data visualization; Informatics; Performance gain; Remuneration; Testing; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Visualization, 2006. IV 2006. Tenth International Conference on
Conference_Location
London, England
ISSN
1550-6037
Print_ISBN
0-7695-2602-0
Type
conf
DOI
10.1109/IV.2006.129
Filename
1648290
Link To Document