• DocumentCode
    430780
  • Title

    Using projection and loop for segmentation of touching Thai typewritten

  • Author

    Watcharabutsarakham, Sarin

  • Author_Institution
    Nat. Electron. & Comput. Technol. Center, Pathumthani, Thailand
  • Volume
    1
  • fYear
    2004
  • fDate
    26-29 Oct. 2004
  • Firstpage
    504
  • Abstract
    This paper proposes a segmentation technique for touching Thai typewritten characters. Thai characters vary in size and position when they are in a sentence. A Thai word is composed of consonants, vowels and tones. Touching characters can occur both in horizontal and vertical directions. The proposed technique uses structural characteristics to detect suitable segmentation points in both directions. The segmentation process consists of four steps. First the height and then the position of characters are used to identify character zones. Next, size and both horizontal and vertical projections are used to classify the types of touching. Lastly, touching characters are segmented using directions and positions identified by the previous steps. The edge of touching characters is used to identify the edge of two isolated characters. The proposed segmentation technique is tested with both electronic typewriters and manual portable typewriters. Segmentation accuracy of 95.4% has been obtained for two hundred sentences of typewritten thesis documents.
  • Keywords
    character recognition; document image processing; edge detection; image segmentation; Thai characters; character height; character position; character recognition; character zones; consonants; tones; touching Thai typewritten character segmentation; touching characters; vowels; Character recognition; Image segmentation; Isolation technology; Manuals; Optical character recognition software; Speech recognition; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communications and Information Technology, 2004. ISCIT 2004. IEEE International Symposium on
  • Print_ISBN
    0-7803-8593-4
  • Type

    conf

  • DOI
    10.1109/ISCIT.2004.1412896
  • Filename
    1412896