Title :
An Experiment Study on Text Transformation for Compression Using Stoplists and Frequent Words
Author :
Tadrat, Jirapond ; Boonjing, Veera
Author_Institution :
King Mongkut´´s Inst. of Technol. Ladkrabang, Bangkok
Abstract :
The paper presents a new text transform algorithm suitable for embedding in compression algorithms. The strategy the new algorithm employed to increase performance of text compression is to replace words with predefined codes. Instead of using a huge dictionary containing exhaustive words as in previous works, the new algorithm uses a list of stoplists and/or frequent words. The research devised different encoding schemes for such a list. It then made experiments of using these schemes with different compression algorithms on standard texts. The result shows that each scheme gives increasing compression when using with specific compression algorithms.
Keywords :
data compression; text analysis; dictionary; frequent word; predefined codes; stoplist word; text compression algorithm; text transformation algorithm; Compression algorithms; Computer science; Dictionaries; Encoding; Information technology; Laboratories; Mathematics; Natural languages; Software systems; Systems engineering and theory; LIPT; LPT; RLPT; SCLPT; Star encoding; Text preprocessing; Text transformation;
Conference_Titel :
Information Technology: New Generations, 2008. ITNG 2008. Fifth International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
0-7695-3099-0
DOI :
10.1109/ITNG.2008.178