• DocumentCode
    659587
  • Title

    Data chaos: An entropy based MapReduce framework for scalable learning

  • Author

    Jiaoyan Chen ; Huajun Chen ; Xi Chen ; Guozhou Zheng ; Zhaohui Wu

  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    71
  • Lastpage
    78
  • Abstract
    Chaos of data is the total unpredictability of all the data elements, and can by quantified by Shannon entropy. In this paper, we firstly propose an entropy based theoretic framework for machine learning, which states that chaos in sample data will decrease and rule will advance as learning progresses. However, it is usually time consuming to apply the theoretic framework because groups of rule need to be trained iteratively and data chaos will be recalculated during each iteration. To implement the theoretic framework for scalable learning, we propose a MapReduce based distributed computational framework. In a case study of classification, the framework parallelly trains multiple classifiers and calculats chaos of the sample set during each iteration, and then resamples a small sample subset with the highest entropy for training of the next iteration, reducing chaos in sample data as quickly as possible. With typical classification benchmarks, our experiment presents entropy in sample data, and proves that the theoretic framework is rational and can help improve the accuracy of machine learning. Meanwhile, the computational framework shows high performance including high efficiency and scalability for large scale learning on hadoop cluster.
  • Keywords
    entropy; learning (artificial intelligence); parallel processing; pattern classification; Hadoop cluster; MapReduce based distributed computational framework; Shannon entropy; classification benchmark; computational framework; data chaos; data element total unpredictability; entropy based MapReduce framework; entropy based theoretic framework; large scale learning; machine learning; multiple classifier parallel training; scalable learning; Accuracy; Benchmark testing; Chaos; Entropy; Prediction algorithms; Training; Uncertainty; Chaos; Entropy; Machine Learning; MapReduce;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691736
  • Filename
    6691736