Title :
Content-based indexing and retrieval method of Chinese document images
Author :
He, Yaodong ; Jiang, Zao ; Liu, Bing ; Zhao, Hong
Author_Institution :
Software Center, Northeast Univ. of Technol., Shenyang, China
Abstract :
In Chinese information retrieval, it is easy to index a Chinese text document for retrieval. We just need to segment the text document into phrases. When the document is a Chinese document image (non-ASCII file), we may first convert the document image into the text file by using Chinese optical character recognition (OCR) technology and then index the document by using an information retrieval algorithm. However, OCR needs more time, which can influence retrieval efficiency. This paper proposes an index method based on stroke density code. First segment the document image to get all the Chinese character images, then calculate the stroke density of each Chinese character image, and at last attain the stroke density code of the character image. The index method has the advantage of speed and robustness to noise. In addition, this paper also offers a retrieval method for Chinese document images based on the index technology. We discuss the index and retrieval method for duplicate detection. We have proved the validity of the index method through its application to keyword spotting and duplicate detection
Keywords :
content-based retrieval; database indexing; document image processing; image segmentation; optical character recognition; visual databases; Chinese document images; OCR; content-based indexing; content-based retrieval; duplicate detection; information retrieval; keyword spotting; optical character recognition; stroke density; text document segmentation; text file; Books; Content based retrieval; Helium; Image databases; Image recognition; Image retrieval; Image segmentation; Indexing; Optical character recognition software; Robustness;
Conference_Titel :
Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
Conference_Location :
Bangalore
Print_ISBN :
0-7695-0318-7
DOI :
10.1109/ICDAR.1999.791880