DocumentCode
1968594
Title
Development of word-based text compression algorithm for Indonesian language document
Author
Sinaga, Ardiles ; Adiwijaya ; Nugroho, Hertog
Author_Institution
Telkom Univ., Bandung, Indonesia
fYear
2015
fDate
27-29 May 2015
Firstpage
450
Lastpage
454
Abstract
Information technology is growing very rapidly, in particular for data handling. Data is a valuable asset for everyone, especially for larger companies with branches in several places. Data transmission from headquarters to branch offices make the company must provide good tools to do it. These companies also need tools that can be used to compress data to reduce their size. The main idea of the word-based encoding is to extract each word of the source text, then it is checked whether containing capital letters or not. After that, it is checked if there is a symbol or number. The particle will be separated from the basic word using stemming algorithm. Symbols, numbers and affixes will be indexed in the basic dictionary. The basic word will also be checked whether it exists in the basic dictionary or not. If there is not a match, then the word will be stored in the supplement dictionary. The experiment was conducted on the text file with the size from about 10K bytes up to 500K bytes with 16-bits length codewords. The result shows that the compression ratio of the proposed method is comparable with the previous ones, while its processing time is much better than the Reversed Sequence of Characters on LZW method.
Keywords
data compression; text analysis; Indonesian language document; characters reversed sequence; compression ratio; data handling; data transmission; information technology; stemming algorithm; word-based encoding; word-based text compression algorithm; Companies; Compression algorithms; Conferences; Data compression; Dictionaries; Encoding; Data Compression; LZW; Stemming; Tree Structure; Word-Based;
fLanguage
English
Publisher
ieee
Conference_Titel
Information and Communication Technology (ICoICT ), 2015 3rd International Conference on
Conference_Location
Nusa Dua
Type
conf
DOI
10.1109/ICoICT.2015.7231466
Filename
7231466
Link To Document