• DocumentCode
    1520649
  • Title

    An improved naive Bayesian classifier technique coupled with a novel input solution method [rainfall prediction]

  • Author

    Liu, James N K ; Li, Bavy N L ; Dillon, Tharam S.

  • Author_Institution
    Dept. of Comput., Hong Kong Polytech. Univ., Hung Hom, China
  • Volume
    31
  • Issue
    2
  • fYear
    2001
  • fDate
    5/1/2001 12:00:00 AM
  • Firstpage
    249
  • Lastpage
    256
  • Abstract
    Data mining is the study of how to determine underlying patterns in the data to help make optimal decisions on computers when the database involved is voluminous, hard to characterize accurately and constantly changing. It deploys techniques based on machine learning alongside more conventional methods. These techniques can generate decision or prediction models based on actual historical data. Therefore, they represent true evidence-based decision support. Rainfall prediction is a good problem to solve by data mining techniques. This paper proposes an improved naive Bayes classifier (INCB) technique and explores the use of genetic algorithms (GAs) for the selection of a subset of input features in classification problems. It then carries out a comparison with several other techniques. It compares the following algorithms on real meteorological data in Hong Kong: (1) genetic algorithms with average classification or general classification (GA-AC and GA-C), (2) C4.5 with pruning, and (3) INBC with relative frequency or initial probability density (INBC-RF and INBC-IPD). Two simple schemes are proposed to construct a suitable data set for improving their performance. Scheme I uses all the basic input parameters for rainfall prediction. Scheme II uses the optimal subset of input variables which are selected by a GA. The results show that, among the methods we compared, INBC achieved about a 90% accuracy rate on the rain/no-rain classification problems. This method also attained reasonable performance on rainfall prediction with three-level depth and five-level depth, which are around 65%-70%
  • Keywords
    Bayes methods; data mining; forecasting theory; genetic algorithms; geophysics computing; learning (artificial intelligence); pattern classification; probability; rain; software performance evaluation; temporal databases; very large databases; weather forecasting; 3-level depth; 5-level depth; C4.5; Hong Kong; accuracy; algorithm performance; average classification; constantly changing data; data mining; decision models; evidence-based decision support; general classification; genetic algorithms; historical data; improved naive Bayesian classifier; initial probability density; input feature subset selection; input parameters; input solution method; large database; machine learning; meteorological data; optimal decisions; optimal input variables subset; prediction models; pruning; rainfall prediction; relative frequency; underlying pattern determination; Bayesian methods; Data mining; Databases; Frequency; Genetic algorithms; Input variables; Machine learning; Meteorology; Predictive models; Rain;
  • fLanguage
    English
  • Journal_Title
    Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1094-6977
  • Type

    jour

  • DOI
    10.1109/5326.941848
  • Filename
    941848