DocumentCode
2691385
Title
A Chinese OCR spelling check approach based on statistical language models
Author
Zhuang, Li ; Bao, Ta ; Zhu, Xiaoyan ; Wang, Chunheng ; Naoi, Satoshi
Author_Institution
DCST, Tsinghua Univ., Beijing, China
Volume
5
fYear
2004
fDate
10-13 Oct. 2004
Firstpage
4727
Abstract
This work describes an effective spelling check approach for Chinese OCR with a new multi-knowledge based statistical language model. This language model combines the conventional n-gram language model and the new LSA (latent semantic analysis) language model, so both local information (syntax) and global information (semantic) are utilized. Furthermore, Chinese similar characters are used in Viterbi search process to expand the candidate list in order to add more possible correct results. With our approach, the best recognition accuracy rate increases from 79.3% to 91.9%, which means 60.9% error reduction.
Keywords
maximum likelihood estimation; natural languages; optical character recognition; spelling aids; Chinese optical character recognition; Viterbi search process; latent semantic analysis language; spelling check; statistical language models; Character recognition; Computer errors; Engines; Image recognition; Information analysis; Natural languages; Optical character recognition software; Optical computing; Probability; Text recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems, Man and Cybernetics, 2004 IEEE International Conference on
ISSN
1062-922X
Print_ISBN
0-7803-8566-7
Type
conf
DOI
10.1109/ICSMC.2004.1401278
Filename
1401278
Link To Document