Title :
Keyword Spotting in Online Chinese Handwritten Documents with Candidate Scoring Based on Semi-CRF Model
Author :
Heng Zhang ; Xiang-Dong Zhou ; Cheng-Lin Liu
Author_Institution :
Inst. of Autom., Nat. Lab. of Pattern Recognition, Beijing, China
Abstract :
For text-query-based keyword spotting from handwritten Chinese documents, the index is usually organized as a candidate lattice to overcome the ambiguity of character segmentation. Each edge in the lattice denotes a candidate character associated with a candidate class. Character similarity (between character and class) scores are calculated on each edge, and the similarity between a query word and handwriting is obtained by combining these edge scores. In this paper, we propose a document indexing method using semi-Markov conditional random fields (semi-CRFs), which provide a principled framework for fusing the information of different contexts. For fast retrieval and to save storage space, the lattice is first purged by a forward-backward pruning approach. On the reduced lattice, we estimate the character similarity scores based on the semi-CRF model. Experimental results on a large handwriting database CASIAOLHWDB justify the effectiveness of the proposed method.
Keywords :
Markov processes; document image processing; handwritten character recognition; indexing; lattice theory; query processing; sensor fusion; text analysis; CASIA-OLHWDB database; candidate scoring; character similarity scores; document indexing method; forward-backward pruning; handwriting database; information fusion; lattice; online Chinese handwritten documents; semiCRF model; semiMarkov conditional random fields; text-query-based keyword spotting; Character recognition; Computational modeling; Context; Indexes; Lattices; Pragmatics; Online Chinese handwritten documents; keyword spotting; semi-Markov conditional random fields;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.118