• DocumentCode
    596344
  • Title

    A clustering strategy for touching characters in Korean and English printed text segmentation

  • Author

    Wahyono ; Kang-Hyun Jo

  • Author_Institution
    Dept. of Electr. Eng., Univ. of Ulsan, Ulsan, South Korea
  • fYear
    2012
  • fDate
    26-28 Nov. 2012
  • Firstpage
    23
  • Lastpage
    25
  • Abstract
    This paper proposes segmentation method in mixed Korean and English printed text which contains touching characters using clustering strategy. At the first step, a vertical projection of image text is determined, and clustering process performed on it. Then the cluster with the smallest mean value used as candidate segmentation point. This process will produce candidate bounding boxes. Furthermore, they should be verified whether according to Korean or English characteristics otherwise they will be splitted or merged each others. The merged process could be done based on Korean vowel characteristics since Korean alphabet consist several symbols, while splitted process could be done by local vertical projection clustering. The proposed method gives 99.36% correct segmentation rate in un-touching characters and 99.25% in touching characters. This result shows that the proposed method using clustering strategy is very effective for touching problem in mixed Korean and English printed text. Besides, it also improves the speed of segmentation process, because the method does not need a character recognizer to verify bounding boxes.
  • Keywords
    character recognition; image segmentation; natural language processing; pattern clustering; robot vision; text analysis; English printed text segmentation method; Korean alphabet; Korean printed text segmentation method; Korean vowel characteristics; candidate bounding boxes; candidate segmentation point; local vertical projection clustering strategy; segmentation process; segmentation rate; touching characters; vertical image text projection; Ambient intelligence; Character recognition; Clustering algorithms; Image segmentation; Robots; Text recognition; Writing; Character Recognition; Clustering; Segmentation; Touching Character;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Ubiquitous Robots and Ambient Intelligence (URAI), 2012 9th International Conference on
  • Conference_Location
    Daejeon
  • Print_ISBN
    978-1-4673-3111-1
  • Electronic_ISBN
    978-1-4673-3110-4
  • Type

    conf

  • DOI
    10.1109/URAI.2012.6462921
  • Filename
    6462921