Title :
Retrieval of degraded Chinese document based on fuzzy coding strategy
Author :
Xia Yong ; Jia Xu-Hui ; Wang Kuan-Quan
Author_Institution :
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
Abstract :
For the sake of the low recognition rate for degraded Chinese document, the performance of retrieval is not good if directly based on OCR result. This paper presents a new way to improve the performance of retrieval by fuzzy coding strategy. Lots of character classes with similar shapes are clustered and are indexed by pseudo code. For ease of test, this paper also presents a way to generate ground-truth of imaged document and synthesized degraded document image. A true OCR text collection and two synthesized document image collections are used for performance evaluation, and the result confirms the validation of our method.
Keywords :
document image processing; fuzzy set theory; image coding; image retrieval; optical character recognition; OCR text collection; degraded Chinese document retrieval; fuzzy coding strategy; ground-truth generate; imaged document; pseudo code; synthesized degraded document image; Degradation; Encoding; Image retrieval; Indexing; Optical character recognition software; Performance evaluation; Text analysis; Retrieval of degraded Chinese document; Synthesis of degraded document; fuzzy coding strategy;
Conference_Titel :
Systems and Informatics (ICSAI), 2012 International Conference on
Conference_Location :
Yantai
Print_ISBN :
978-1-4673-0198-5
DOI :
10.1109/ICSAI.2012.6223602