• DocumentCode
    2576021
  • Title

    A high speed decision tree classifier algorithm for huge dataset

  • Author

    Thangaparvathi, B. ; Anandhavalli, D. ; Mercy Shalinie, S.

  • Author_Institution
    Dept. of Comput. Sci., Thiagarajar Coll. of Eng.-Madurai, Madurai, India
  • fYear
    2011
  • fDate
    3-5 June 2011
  • Firstpage
    695
  • Lastpage
    700
  • Abstract
    Knowledge discovery is an important tool for the intelligent business to transform data into useful information that will increase the business revenue. Data mining techniques support automatic exploration of data, and attempts to classify the patterns and trends in data, and also infer decision rules from those patterns. Classification of dataset is an important function of mining which is a supervised machine learning procedure. Scalability and efficiency of the classifier algorithm becomes a major issue of concern when we use a large dataset and requires more number of dataset parsing. In this paper, we present a scalable decision tree algorithm for classifying large dataset with high processing speed, which requires only one scan over the dataset. It overcomes the drawback of RainForest algorithm which addresses the scalability issue and requires a pass over the dataset in each level of decision tree construction. The proposed algorithm significantly reduces the IO cost and also requires one time sorting for numerical attributes which leads to a better performance in time dimension. According to the experimental results, our algorithm acquires less execution time over the RainForest algorithm and also adoptable for any attribute selection method by which the accuracy of decision tree is improved.
  • Keywords
    data mining; decision trees; learning (artificial intelligence); pattern classification; IO cost reduction; RainForest algorithm; acyclic graph; attribute selection method; business revenue; data mining techniques; dataset classification; dataset parsing; decision tree classifier algorithm; decision tree construction; intelligent business; knowledge discovery; numerical attributes; supervised machine learning procedure; Automatic voltage control; Classification algorithms; Data structures; Databases; Decision trees; Partitioning algorithms; Prediction algorithms; Classification; Data mining; Decision tree; Performance; RainForest algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Recent Trends in Information Technology (ICRTIT), 2011 International Conference on
  • Conference_Location
    Chennai, Tamil Nadu
  • Print_ISBN
    978-1-4577-0588-5
  • Type

    conf

  • DOI
    10.1109/ICRTIT.2011.5972267
  • Filename
    5972267