DocumentCode :
2530052
Title :
Document digitization technology and its application for digital library in China
Author :
Ding, Xiaoqing ; Wen, Di ; Peng, Liangrui ; Liu, Changsong
Author_Institution :
Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
fYear :
2004
fDate :
2004
Firstpage :
46
Lastpage :
53
Abstract :
We introduce the research of document digitization technology and its applications for constructing digital libraries in China. We focus on two major objectives of document digitization technologies: performance and efficiency. Taking the most representative TH-OCR product as an example, the up-to-date research achievements on both kernel OCR technologies and peripheral technologies in China are presented. The kernel technologies include high performance multilingual (Chinese, Japanese, Korean and English) text recognition, layout analysis, understanding and reconstruction; the peripheral technologies include the network document digitization workflow and intelligent proofreading, which greatly improve the efficiency. The applications of TH-OCR has two types of final output digital documents, one is the reconstructed electronic document with full text and layout information of the original paper-based document, the other is the multilevel document with OCR output text layer under the image layer. Numerous applications indicate that current technologies can greatly facilitate the mass-volume digitization labour in building digital library infrastructure.
Keywords :
digital libraries; document image processing; optical character recognition; text analysis; TH-OCR product; digital library; document digitization technology; electronic document; intelligent proofreading; kernel OCR technology; layout analysis; mass-volume digitization labour; multilingual character recognition; network document digitization workflow; paper-based document; peripheral technology; text recognition; Automation; Books; Character recognition; Humans; Image reconstruction; Intelligent networks; Kernel; Laboratories; Optical character recognition software; Software libraries;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Image Analysis for Libraries, 2004. Proceedings. First International Workshop on
Print_ISBN :
0-7695-2088-X
Type :
conf
DOI :
10.1109/DIAL.2004.1263236
Filename :
1263236
Link To Document :
بازگشت