• DocumentCode
    2164542
  • Title

    Visualizing Distributions and Classification Accuracy

  • Author

    Groth, Dennis P.

  • Author_Institution
    Indiana Univ. Sch. of Informatics, Bloomington, IN
  • fYear
    2006
  • fDate
    5-7 July 2006
  • Firstpage
    389
  • Lastpage
    394
  • Abstract
    Data mining is the search for novel, actionable information within data. It is important to note that the number of records in the data being analyzed is only one (and perhaps a small) factor in determining the complexity of a given data mining technique. Most complexity in data mining arises from the distribution of values contained in the data - not the number of records. In this paper, we utilize straightforward histogram-based visualizations to gain insight into how the performance of a well-studied data mining technique, the naive-Bayes classifier, performs under various discretization schemes for both continuous and discrete values. The resulting visualization system provides users with a tool that describes the underlying model of the data used by the classifier. Exploratory visualizations of the distributions of training data can be selected based on expert domain knowledge and then combined to apply to the test data
  • Keywords
    Bayes methods; data mining; data visualisation; pattern classification; data mining; discretization scheme; expert domain knowledge; histogram-based visualization; naive-Bayes classifier; Association rules; Cleaning; Data analysis; Data mining; Data visualization; Informatics; Performance gain; Remuneration; Testing; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Visualization, 2006. IV 2006. Tenth International Conference on
  • Conference_Location
    London, England
  • ISSN
    1550-6037
  • Print_ISBN
    0-7695-2602-0
  • Type

    conf

  • DOI
    10.1109/IV.2006.129
  • Filename
    1648290