Title :
Pattern-based content lossless compression of Chinese document images
Author :
Tsui, Maggie M K ; Liew, Alan Wee-chung ; Yan, Hong
Author_Institution :
Dept. of Comput. Eng. & Inf. Technol., City Univ. of Hong Kong, Kowloon, China
Abstract :
Compression of scanned text document images is important in modern document management, communications and retrieval systems. However, most existing compression techniques have been studied extensively only for documents in English or similar alphabet-based languages. In this paper, we purpose a content-lossless scheme for compression of Chinese text documents. This method utilizes the radical characteristics, unique to Chinese characters, to minimize the size of compressed documents. Our method consists of two main parts. The first part is the development of a radical pattern library. The second part is to utilize the radical pattern library to match character patterns in a document. The technique has been tested with many Chinese text document images with good results.
Keywords :
character recognition; data compression; document image processing; image coding; image matching; Chinese document images; character pattern matching; content-lossless scheme; document management; pattern-based compression; radical pattern library; retrieval systems; scanned text document images; Australia; Engineering management; Image coding; Information retrieval; Information technology; Libraries; Natural languages; Pattern analysis; Pattern matching; Technology management;
Conference_Titel :
Intelligent Multimedia, Video and Speech Processing, 2004. Proceedings of 2004 International Symposium on
Print_ISBN :
0-7803-8687-6
DOI :
10.1109/ISIMP.2004.1434137