Title :
Automatic extraction of correlation-entropy features for text document analysis directly in run-length compressed domain
Author :
Mohammed Javed;P. Nagabhushan;B.B. Chaudhuri
Author_Institution :
Department of Studies in Computer Science, University of Mysore, 570006, India
Abstract :
Automatic feature extraction plays a pivotal role in defining the overall performance of any Document Image Analysis system, which conventionally operates directly over uncompressed images, although most of the real time systems such as fax machines, digital libraries and e-governance applications accrue and archive the documents in the compressed form for the sake of storage and transfer efficiencies. However, this infers that the compressed documents need to be decompressed before carrying out any operation or analysis which warrants additional computing resources. This limitation in existing systems instigates motivation to explore for feature extraction techniques directly from the compressed documents and eventually design a document analysis system that works directly in compressed domain. Therefore, this research work proposes to extract novel correlation-entropy features directly from run-length compressed TIFF documents. Further, the research work also investigates different methods to demonstrate some of the straight forward application of the proposed features in carrying out compressed document image analysis such as text and non-text component detection, and subsequently performing compressed text line segmentation and characterization, all carried out in the compressed version of the printed text document without going through the stage of decompression. Finally, the experimental results reported validate the developed algorithms and also illustrate that the proposed features are quite powerful in distinguishing compressed text and non-text components.
Keywords :
"Image coding","Image segmentation"
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
DOI :
10.1109/ICDAR.2015.7333714