• DocumentCode
    2595978
  • Title

    Medical Datamining with a New Algorithm for Feature Selection and Naive Bayesian Classifier

  • Author

    Abraham, Ranjit ; Simha, Jay B. ; Iyengar, S.S.

  • fYear
    2007
  • fDate
    17-20 Dec. 2007
  • Firstpage
    44
  • Lastpage
    49
  • Abstract
    Much research work in datamining has gone into improving the predictive accuracy of statistical classifiers by applying the techniques of discretization and feature selection. As a probability-based statistical classification method, the Naive Bayesian classifier has gained wide popularity despite its assumption that attributes are conditionally mutually independent given the class label. In this paper we propose a new feature selection algorithm to improve the classification accuracy of Naive Bayes with respect to medical datasets. Our experimental results with 17 medical datasets suggest that on an average the new CHI-WSS algorithm gave best results. The proposed algorithm utilizes discretization and simplifies the´ wrapper´ approach based feature selection by reducing the feature dimensionality through the elimination of irrelevant and least relevant features using chi-square statistics. For our experiments we utilize two established measures to compare the performance of statistical classifiers namely; classification accuracy (or error rate) and the area under ROC to demonstrate that the proposed algorithm using generative Naive Bayesian classifier on the average is more efficient than using discriminative models namely logistic regression and support vector machine.
  • Keywords
    belief networks; classification; data mining; feature extraction; medical administrative data processing; statistical analysis; support vector machines; feature selection; logistic regression; medical data mining; naive Bayesian classifier; statistical classification; support vector machine; Accuracy; Area measurement; Bayesian methods; Computer science; Data mining; Databases; Information technology; Machine learning; Probability distribution; Statistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology, (ICIT 2007). 10th International Conference on
  • Conference_Location
    Orissa
  • Print_ISBN
    0-7695-3068-0
  • Type

    conf

  • DOI
    10.1109/ICIT.2007.41
  • Filename
    4418266