• DocumentCode
    3692971
  • Title

    Document image classification using SEMCON

  • Author

    Zenun Kastrati;Ali Shariq Imran

  • Author_Institution
    Faculty of Computer Science and Media Technology, Gjovik University College, Norway
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    In this paper, we are proposing a new semantic and contextual based document image classification framework. The framework is composed of two main modules. The first one is the text analysis module (TAM) which processes document images and extracts words from the image, and second one is the SEMCON, which is a semantic and contextual objective metric. From the list of extracted words by TAM, SEMCON finds a list of noun terms, employs contextual and semantic meaning to it and then uses those terms to classify documents. The scope of this paper is limited to the proposed framework and testing the approach presented on a limited test dataset. Our preliminary results are very promising and suggest that the proposed framework can be used effectively to classify document images.
  • Keywords
    "Semantics","Feature extraction","Databases","Text analysis","Context","Optical character recognition software","Visualization"
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing, Images and Computer Vision (STSIVA), 2015 20th Symposium on
  • Type

    conf

  • DOI
    10.1109/STSIVA.2015.7330427
  • Filename
    7330427