• DocumentCode
    659592
  • Title

    Meta-learning for large scale machine learning with MapReduce

  • Author

    Xuan Liu ; Xiaoguang Wang ; Matwin, S. ; Japkowicz, Nathalie

  • Author_Institution
    Sch. of EECS, Univ. of Ottawa, Ottawa, ON, Canada
  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    105
  • Lastpage
    110
  • Abstract
    We have entered the big data age. Knowledge extraction from massive data is becoming more and more rewarding and urgent. MapReduce has provided a feasible framework for programming machine learning algorithms in Map and Reduce functions. The relatively simple programming interface has helped to solve machine learning algorithms´ scalability problems. However, this framework suffers from an obvious weakness: it does not support iterations. This makes those algorithms requiring iterations difficult to fully explore the efficiency of MapReduce. In this paper, we propose to apply Meta-learning programmed with MapReduce to avoid parallelizing machine learning algorithms while also improving their scalability to big datasets. The experiments conducted on Hadoop fully distributed mode on Amazon EC2 demonstrate that our algorithm PML reduces the training computational complexity significantly when the number of computing nodes increases while gaining smaller error rates than those on one single node. The comparison of PML with the contemporary parallelized AdaBoost algorithm: AdaBoost.PL shows that PML has lower error rates.
  • Keywords
    Big Data; application program interfaces; knowledge acquisition; learning (artificial intelligence); parallel programming; Amazon EC2; Map function; MapReduce; PML; Reduce function; big dataset scalability improvement; computational complexity; computing nodes; error rates; fully-distributed Hadoop mode; knowledge extraction; large-scale machine learning algorithm programming; machine learning algorithm scalability problems; massive data; meta-learning; programming interface; Algorithm design and analysis; Classification algorithms; Computational modeling; Error analysis; Machine learning algorithms; Training; Training data; Adaboost; MapReduce; big data; meta-learning; parallel computing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691741
  • Filename
    6691741