DocumentCode :
3432348
Title :
Study of Japanese text compression
Author :
Satoh, Noriko ; Morihara, Takashi ; Okada, Yoshiyuki ; Yoshida, Shigeru
Author_Institution :
Fujitsu Labs. Ltd., Atsugi, Japan
fYear :
1997
fDate :
25-27 Mar 1997
Firstpage :
467
Abstract :
Summary form only given. The Japanese language has several thousand distinct characters, and the character code length is 16 bits. In such documents the 16-bit units are interrelated. Conventional text compression employs 8-bit sampling because the compressed object is usually English text. We investigated compression schemes based on 16-bit sampling, expecting it to improve the compression performance. In Japanese text where words are short, statistical schemes with a PPM provide better compression ratios than slide dictionary schemes. So we investigated the 16-bit sampling based on statistical schemes with a PPM model. We show the 16-bit sampling scheme provides good compression ratios in short documents under several tens of kilobytes, such as office reports. The processing speed is also better
Keywords :
data compression; document image processing; image coding; image sampling; statistical analysis; word processing; 16 bit; 16-bit sampling; Japanese language; Japanese text compression; PPM model; character code length; compression performance; compression ratios; documents; office reports; processing speed; statistical schemes; Dictionaries; Encoding; Huffman coding; Natural languages; Sampling methods; Table lookup;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 1997. DCC '97. Proceedings
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
0-8186-7761-9
Type :
conf
DOI :
10.1109/DCC.1997.582134
Filename :
582134
Link To Document :
بازگشت