DocumentCode :
3288681
Title :
An Experiment Study on Text Transformation for Compression Using Stoplists and Frequent Words
Author :
Tadrat, Jirapond ; Boonjing, Veera
Author_Institution :
King Mongkut´´s Inst. of Technol. Ladkrabang, Bangkok
fYear :
2008
fDate :
7-9 April 2008
Firstpage :
709
Lastpage :
713
Abstract :
The paper presents a new text transform algorithm suitable for embedding in compression algorithms. The strategy the new algorithm employed to increase performance of text compression is to replace words with predefined codes. Instead of using a huge dictionary containing exhaustive words as in previous works, the new algorithm uses a list of stoplists and/or frequent words. The research devised different encoding schemes for such a list. It then made experiments of using these schemes with different compression algorithms on standard texts. The result shows that each scheme gives increasing compression when using with specific compression algorithms.
Keywords :
data compression; text analysis; dictionary; frequent word; predefined codes; stoplist word; text compression algorithm; text transformation algorithm; Compression algorithms; Computer science; Dictionaries; Encoding; Information technology; Laboratories; Mathematics; Natural languages; Software systems; Systems engineering and theory; LIPT; LPT; RLPT; SCLPT; Star encoding; Text preprocessing; Text transformation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology: New Generations, 2008. ITNG 2008. Fifth International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
0-7695-3099-0
Type :
conf
DOI :
10.1109/ITNG.2008.178
Filename :
4492565
Link To Document :
بازگشت