• DocumentCode
    259116
  • Title

    Efficient data transfer scheme using word-pair-encoding-based compression for large-scale text-data processing

  • Author

    Waidyasooriya, Hasitha Muthumala ; Ono, Daisuke ; Hariyama, Masanori ; Kameyama, Michitaka

  • Author_Institution
    Grad. Sch. of Inf. Sci., Tohoku Univ., Sendai, Japan
  • fYear
    2014
  • fDate
    17-20 Nov. 2014
  • Firstpage
    639
  • Lastpage
    642
  • Abstract
    Large-scale data processing is very common in many fields such as data-mining, genome mapping, etc. To accelerate such processing, Graphic Accelerator Units (GPU) and FPGAs (Feild-Programmable Gate-Array) are used. However, the large data transfer time between the accelerator and the host computer is a huge performance bottleneck. In this paper, we use a word-pair-encoding method to compress the data down to 25% of its original size. The encoded data can be decoded from any position without decoding the whole data file. For some algorithms, the encoded data can be processed without decoding. Using Burrows-Wheeler algorithm based text search, we show that the data amount and transfer time can be reduced by over 70%.
  • Keywords
    data compression; data mining; encoding; field programmable gate arrays; graphics processing units; text analysis; Burrows- Wheeler algorithm based text search; FPGA; GPU; data transfer scheme; data-mining; encoded data; field-programmable gate-array; genome mapping; graphic accelerator units; large-scale text-data processing; performance bottleneck; word-pair-encoding-based compression; Arrays; Bioinformatics; Data compression; Data transfer; Encoding; Genomics; Graphics processing units; Succinct data structures; big data; data compression;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Circuits and Systems (APCCAS), 2014 IEEE Asia Pacific Conference on
  • Conference_Location
    Ishigaki
  • Type

    conf

  • DOI
    10.1109/APCCAS.2014.7032862
  • Filename
    7032862