DocumentCode
3692971
Title
Document image classification using SEMCON
Author
Zenun Kastrati;Ali Shariq Imran
Author_Institution
Faculty of Computer Science and Media Technology, Gjovik University College, Norway
fYear
2015
Firstpage
1
Lastpage
6
Abstract
In this paper, we are proposing a new semantic and contextual based document image classification framework. The framework is composed of two main modules. The first one is the text analysis module (TAM) which processes document images and extracts words from the image, and second one is the SEMCON, which is a semantic and contextual objective metric. From the list of extracted words by TAM, SEMCON finds a list of noun terms, employs contextual and semantic meaning to it and then uses those terms to classify documents. The scope of this paper is limited to the proposed framework and testing the approach presented on a limited test dataset. Our preliminary results are very promising and suggest that the proposed framework can be used effectively to classify document images.
Keywords
"Semantics","Feature extraction","Databases","Text analysis","Context","Optical character recognition software","Visualization"
Publisher
ieee
Conference_Titel
Signal Processing, Images and Computer Vision (STSIVA), 2015 20th Symposium on
Type
conf
DOI
10.1109/STSIVA.2015.7330427
Filename
7330427
Link To Document