• DocumentCode
    53343
  • Title

    A Parallel Matrix-Based Method for Computing Approximations in Incomplete Information Systems

  • Author

    Junbo Zhang ; Jian-Syuan Wong ; Yi Pan ; Tianrui Li

  • Author_Institution
    Sch. of Inf. Sci. & Technol., Southwest Jiaotong Univ., Chengdu, China
  • Volume
    27
  • Issue
    2
  • fYear
    2015
  • fDate
    Feb. 1 2015
  • Firstpage
    326
  • Lastpage
    339
  • Abstract
    As the volume of data grows at an unprecedented rate, large-scale data mining and knowledge discovery present a tremendous challenge. Rough set theory, which has been used successfully in solving problems in pattern recognition, machine learning, and data mining, centers around the idea that a set of distinct objects may be approximated via a lower and upper bound. In order to obtain the benefits that rough sets can provide for data mining and related tasks, efficient computation of these approximations is vital. The recently introduced cloud computing model, MapReduce, has gained a lot of attention from the scientific community for its applicability to large-scale data analysis. In previous research, we proposed a MapReduce-based method for computing approximations in parallel, which can efficiently process complete data but fails in the case of missing (incomplete) data. To address this shortcoming, three different parallel matrix-based methods are introduced to process large-scale, incomplete data. All of them are built on MapReduce and implemented on Twister that is a lightweight MapReduce runtime system. The proposed parallel methods are then experimentally shown to be efficient for processing large-scale data.
  • Keywords
    approximation theory; cloud computing; data mining; matrix algebra; rough set theory; Twister; approximation method; cloud computing; incomplete information system; knowledge discovery; large-scale data analysis; large-scale data mining; lightweight MapReduce runtime system; parallel matrix-based method; rough set theory; Approximation algorithms; Approximation methods; Computational modeling; Data mining; Information systems; Rough sets; Vectors; MapReduce; Rough sets; data mining; incomplete information systems; matrix;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2014.2330821
  • Filename
    6834786