• DocumentCode
    1968594
  • Title

    Development of word-based text compression algorithm for Indonesian language document

  • Author

    Sinaga, Ardiles ; Adiwijaya ; Nugroho, Hertog

  • Author_Institution
    Telkom Univ., Bandung, Indonesia
  • fYear
    2015
  • fDate
    27-29 May 2015
  • Firstpage
    450
  • Lastpage
    454
  • Abstract
    Information technology is growing very rapidly, in particular for data handling. Data is a valuable asset for everyone, especially for larger companies with branches in several places. Data transmission from headquarters to branch offices make the company must provide good tools to do it. These companies also need tools that can be used to compress data to reduce their size. The main idea of the word-based encoding is to extract each word of the source text, then it is checked whether containing capital letters or not. After that, it is checked if there is a symbol or number. The particle will be separated from the basic word using stemming algorithm. Symbols, numbers and affixes will be indexed in the basic dictionary. The basic word will also be checked whether it exists in the basic dictionary or not. If there is not a match, then the word will be stored in the supplement dictionary. The experiment was conducted on the text file with the size from about 10K bytes up to 500K bytes with 16-bits length codewords. The result shows that the compression ratio of the proposed method is comparable with the previous ones, while its processing time is much better than the Reversed Sequence of Characters on LZW method.
  • Keywords
    data compression; text analysis; Indonesian language document; characters reversed sequence; compression ratio; data handling; data transmission; information technology; stemming algorithm; word-based encoding; word-based text compression algorithm; Companies; Compression algorithms; Conferences; Data compression; Dictionaries; Encoding; Data Compression; LZW; Stemming; Tree Structure; Word-Based;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information and Communication Technology (ICoICT ), 2015 3rd International Conference on
  • Conference_Location
    Nusa Dua
  • Type

    conf

  • DOI
    10.1109/ICoICT.2015.7231466
  • Filename
    7231466