Title :
Multilingual document recognition research and its application in China
Author :
Peng, Liangrui ; Liu, Changsong ; Ding, Xiaoqing ; Wang, Hua
Author_Institution :
Dept. of Electron. Eng., Tsinghua Univ., Beijing
Abstract :
This paper demonstrates the research work on multilingual document recognition technology and its application in China, which is useful for building multilingual digital library. The multilingual OCR (optical character recognition) key technologies and general system framework are summarized based on the previous research work for Chinese, Japanese, Korean, English, and recent research advancement for Tibetan, Uighur, Kazakh, Kirghiz, Arabic, and Mongolian. The key technologies include statistical character recognition, structural analysis for similar character discrimination, character segmentation, script identification, post-processing. Application of multilingual document recognition system in digital library and Web site content construction will benefit people using various languages to retrieve knowledge
Keywords :
digital libraries; document image processing; natural languages; optical character recognition; Arabic language; Chinese language; English language; Japanese language; Kazakh language; Kirghiz language; Korean language; Mongolian language; Tibetan language; Uighur language; Web site content construction; character segmentation; knowledge retrieval; multilingual digital library; multilingual document recognition; multilingual optical character recognition; post-processing; script identification; similar character discrimination structural analysis; statistical character recognition; Books; Character recognition; Cultural differences; Image converters; Image retrieval; Natural languages; Optical character recognition software; Paper technology; Software libraries; Text recognition;
Conference_Titel :
Document Image Analysis for Libraries, 2006. DIAL '06. Second International Conference on
Conference_Location :
Lyon
Print_ISBN :
0-7695-2531-8
DOI :
10.1109/DIAL.2006.27