• DocumentCode
    1266035
  • Title

    Automatic Feature Extraction and Text Recognition From Scanned Topographic Maps

  • Author

    Pezeshk, Aria ; Tutwiler, Richard L.

  • Author_Institution
    Appl. Res. Lab., Pennsylvania State Univ., University Park, PA, USA
  • Volume
    49
  • Issue
    12
  • fYear
    2011
  • Firstpage
    5047
  • Lastpage
    5063
  • Abstract
    A system for automatic extraction of various feature layers and recognition of the text content of scanned topographic maps is presented here. Linear features which are often intersecting with the text are first extracted using a novel line representation method and a set of directional morphological operations. Other graphical objects are then removed in several stages to obtain a text-only image. A custom defect model is subsequently used to create an artificial training set for a Hidden Markov Model-based character recognition engine. Finally, the recovered text is recognized using this multifont segmentation-free optical character recognition (OCR). Extensive testing is conducted to assess the performance of different stages of the proposed system. Furthermore, our custom OCR is shown to achieve a 94% recognition rate for the extracted text, thereby outperforming a commercial OCR used as a benchmark.
  • Keywords
    cartography; geographic information systems; geophysical image processing; automatic feature extraction; character recognition engine; directional morphological operations; geographic information systems; graphical objects; hidden Markov models; optical character recognition; scanned topographic maps; text recognition; text-only image; Feature extraction; Graphics; Hidden Markov models; Image color analysis; Image edge detection; Image segmentation; Text recognition; Document analysis and recognition; feature extraction; hidden Markov models (HMMs); map segmentation; mathematical morphology; text recognition;
  • fLanguage
    English
  • Journal_Title
    Geoscience and Remote Sensing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0196-2892
  • Type

    jour

  • DOI
    10.1109/TGRS.2011.2157697
  • Filename
    5942154