• DocumentCode
    2516366
  • Title

    Building a new taxonomy for data discretization techniques

  • Author

    Bakar, Afarulrazi Abu ; Othman, Zulaiha Ali ; Shuib, Nor Liyana Mohd

  • Author_Institution
    Data Min. & Optimization Res. Group, Univ. Kebangsaan Malaysia, Bangi, Malaysia
  • fYear
    2009
  • fDate
    27-28 Oct. 2009
  • Firstpage
    132
  • Lastpage
    140
  • Abstract
    Data preprocessing is an important step in data mining. It is used to resolve various types of problem in a large dataset in order to produce quality data. It consists of four steps, namely, data cleaning, integration, reduction and transformation. The literature shows that each preprocessing step consists of various techniques. In order to develop quality data, a data miner must decide the most appropriate techniques in every step of data preprocessing. In this study, we focus on data reduction particularly data discretization as one important data preprocessing step. Data reduction involves reducing the data distribution by reducing the range of continuous data into a range of values or categories. Data discretization plays a major role in reducing the attribute intervals of data values. Finding an appropriate number of discrete values will improve the performance of the data mining modelling, particularly in terms of classification accuracy. This paper proposes four levels of data discretization taxonomy as follows: hierarchical and non-hierarchical; splitting, merging and combination; supervised and unsupervised combinations; and binning, statistic, entropy and other related techniques. The taxonomy is developed based on a detailed review of previous discretization techniques. More than fifty techniques are investigated, and the structure of the discretization approach is outlined. Guidelines on how to use the proposed taxonomy are also discussed.
  • Keywords
    data mining; data reduction; data cleaning; data discretization techniques; data distribution reduction; data integration; data mining; data preprocessing; data transformation; Artificial intelligence; Cleaning; Computer science; Data mining; Data preprocessing; Entropy; Information technology; Merging; Statistics; Taxonomy; Data Discretization; Data Mining; Data Preprocessing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining and Optimization, 2009. DMO '09. 2nd Conference on
  • Conference_Location
    Kajand
  • Print_ISBN
    978-1-4244-4944-6
  • Type

    conf

  • DOI
    10.1109/DMO.2009.5341896
  • Filename
    5341896