DocumentCode
2945582
Title
Improving PPM Algorithm Using Dictionaries
Author
Yichuan Hu ; Jianzhong Zhang ; Farooq Khan ; Ying Li
Author_Institution
Dept. of ESE, Univ. of Pennsylvania, Philadelphia, PA, USA
fYear
2011
fDate
29-31 March 2011
Firstpage
459
Lastpage
459
Abstract
We propose a method to improve traditional character-based PPM text compression algorithm for natural languages. Consider a text file as a sequence of alternating words and non-words, the basic idea of our algorithm is to encode non words and prefixes of words using character-based context models and encode suffixes of words using dictionary models. By using dictionary models, the algorithm can encode multiple characters as a whole, and thus enhance the compression efficiency. The advantages of the proposed algorithm are: 1) it does not require any text preprocessing; 2) it does not need any explicit codeword to identify switch between context and dictionary models; 3) it can be applied to any character-based PPM algorithms without incurring much additional computational cost. Details about the algorithm are described below.
Keywords
data compression; dictionaries; natural language processing; text analysis; alternating words; character based PPM text compression algorithm; character based context models; dictionary models; natural languages; non words; words suffixes; Computational modeling; Context; Context modeling; Data compression; Decoding; Dictionaries; Encoding; Dictionary model; Markov model; PPM; Text compression!!;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Compression Conference (DCC), 2011
Conference_Location
Snowbird, UT
ISSN
1068-0314
Print_ISBN
978-1-61284-279-0
Type
conf
DOI
10.1109/DCC.2011.63
Filename
5749516
Link To Document