Title :
Summarization of JBIG2 Compressed Indian Language Textual Images
Author :
Garain, Utpal ; Datta, Alok K. ; Bhattacharya, U. ; Parui, S.K.
Author_Institution :
Indian Stat. Inst., Kolkata
Abstract :
This paper presents a method for automatic summarization of JBIG2 coded textual images without optical character recognition (OCR). Compressed images are partially (less than 10% of the uncompressed image size) decompressed and text lines and words are marked. A few features are computed at each sentence level. Based on the feature values sentences are then marked as a summary sentence or not. The system finally generates a set of sentences as summary. In addition, sentences are ranked within the summary. Experiment considers Indian language text images. Test results show a sentence selection efficiency of about 56% when judged against summarization generated by human. A nonparametric (distribution-free) rank statistic shows a correlation coefficient of 0.28 as a measure of the (minimum) strength of the associations between sentence ranking by machine and human
Keywords :
data compression; document image processing; image coding; natural languages; Indian language textual image summarization; JBIG2 compressed textual image; nonparametric distribution-free rank statistic; Character recognition; Humans; Image coding; Image retrieval; Information retrieval; Libraries; Optical character recognition software; Prototypes; Statistical distributions; Testing;
Conference_Titel :
Pattern Recognition, 2006. ICPR 2006. 18th International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2521-0
DOI :
10.1109/ICPR.2006.1090