• DocumentCode
    3585377
  • Title

    A Massively Parallel Processing for the Multiple Linear Regression

  • Author

    Adjout, Moufida Rehab ; Boufares, Faouzi

  • Author_Institution
    Lab. LIPN, Paris 13 Univ., Villetaneuse, France
  • fYear
    2014
  • Firstpage
    666
  • Lastpage
    671
  • Abstract
    The amount of data generated by traditional business activities, has resulted data warehouses with a size up to petabytes. The ability to analyze this torrent of data will become the basis of competition and growth for individual firms by ever-narrower segmentation of customers, improvement of decision-making and unearth valuable insights that would otherwise remain hidden. For this purpose, the large size of data to be processed requires the use of high-performance analytical systems running on distributed environments. Because the data is so big it affects the types of algorithms we are willing to consider. Then standard analytics algorithms need to be adapted to take advantage of cloud computing models which provide scalability and flexibility. This work illustrates an implementation of a parallel version of the multiple linear regression. It can extract coefficients from large amounts of data, based on MapReduce Framework with large scale. Parallel processing of multiple linear regression will be based on the QR decomposition and the ordinary least squares method adapted to Map Reduce. Our platform in deployed on Cloud Amazon EMR. Experimental results demonstrate that the our parallel version of the multiple linear regression can efficiently handle very large datasets on commodity hardware with a good performance on different evaluation criterions, including number, size and structure of machines in the cluster.
  • Keywords
    cloud computing; data mining; data warehouses; least mean squares methods; parallel processing; regression analysis; Cloud Amazon EMR; MapReduce framework; QR decomposition; cloud computing; data warehouses; distributed environment; high-performance analytical system; massively parallel processing; multiple linear regression; ordinary least squares method; Algorithm design and analysis; Big data; Computational modeling; Linear regression; Matrix decomposition; Parallel processing; Scalability; Big Data; Cloud Computing; Data mining; Hadoop; MapReduce; Multiple linear regression; Predictive analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal-Image Technology and Internet-Based Systems (SITIS), 2014 Tenth International Conference on
  • Type

    conf

  • DOI
    10.1109/SITIS.2014.26
  • Filename
    7081613