• DocumentCode
    3315604
  • Title

    Simultaneous word segmentation from document images using recursive morphological closing transform

  • Author

    Chen, Su ; Haralick, Robert M. ; Phillips, Ihsin T.

  • Author_Institution
    Dept. of Electr. Eng., Washington Univ., Seattle, WA, USA
  • Volume
    2
  • fYear
    1995
  • fDate
    14-16 Aug 1995
  • Firstpage
    761
  • Abstract
    This paper describes a word segmentation algorithm which is based on the recursive morphological closing transform. The algorithm is trainable for any given document image population and is capable of detecting words on a document image simultaneously. We describe an experimental protocol to train and evaluate our word segmentation algorithm based on a set of layout ground-truthed document images. We also discussed a method to compare two sets of word bounding boxes-one from the ground truth and the other from the output of the word segmentation algorithm, and compute the numbers of miss, false, correct splitting, merging and spurious detections. The experimental results demonstrate that under the optimal algorithm parameter settings, the correct word detection percentage is about 95% on both the training and testing image populations. If this includes the splitting and merging detections, the detection percentage is about 99.4%
  • Keywords
    document image processing; image segmentation; mathematical morphology; protocols; visual databases; document image population; document images; layout ground-truthed document images; optimal algorithm parameter settings; protocol; recursive morphological closing transform; simultaneous word segmentation; Computer science; Data mining; Image resolution; Image segmentation; Markov random fields; Pixel; Shape; White spaces;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
  • Conference_Location
    Montreal, Que.
  • Print_ISBN
    0-8186-7128-9
  • Type

    conf

  • DOI
    10.1109/ICDAR.1995.602014
  • Filename
    602014