• DocumentCode
    2631918
  • Title

    A Markov language model in Chinese text recognition

  • Author

    Lee, Hsi-Jian ; Tung, Cheng-Huang ; Chien, Che-Hui Chang

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan
  • fYear
    1993
  • fDate
    20-22 Oct 1993
  • Firstpage
    72
  • Lastpage
    75
  • Abstract
    A two-stage Chinese text recognition system is presented. In the first stage, a Chinese character is first segmented nonuniformly into 10 strips horizontally and vertically. Then three statistical features, viz. crossing counts peripheral background area and contour line length are extracted to form a 60-dimension feature vector. A feature matching method based on the city-block distance metric is employed to select N nearest neighbors as the candidates for each input character from the reference template base, which consists of 5,401 frequently-used Chinese characters. In the second stage, a 3-part-of-speech (tri-POS) Markov language model is employed to extract the most promising characters from all candidate characters in an input sentence. The dynamic programming method is applied to find the most promising sentence hypothesis whose part-of-speech sequence has the maximum likelihood to occur among all of the candidate sentences for an input sentence. The tri-POS contextual information is estimated from a tagged corpus
  • Keywords
    Markov processes; character recognition; dynamic programming; feature extraction; heuristic programming; model-based reasoning; natural languages; Chinese text recognition; Markov language model; candidate sentences; city-block distance metric; contour line length; crossing counts; dynamic programming; feature matching method; feature vector; input sentence; nearest neighbors; nonuniform character segmentation; part-of-speech sequence; peripheral background area; reference template base; sentence hypothesis; statistical features; strips; tagged corpus; tri-POS contextual information; Character recognition; Image segmentation; Maximum likelihood estimation; Natural languages; Nearest neighbor searches; Probability; Spatial databases; Strips; Text recognition; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
  • Conference_Location
    Tsukuba Science City
  • Print_ISBN
    0-8186-4960-7
  • Type

    conf

  • DOI
    10.1109/ICDAR.1993.395779
  • Filename
    395779