DocumentCode
3488489
Title
A Pair-Copula Based Scheme for Text Extraction from Digital Images
Author
Roy, Anirban ; Parui, Swapan K. ; Roy, Utpal
Author_Institution
CVPR Unit, Indian Stat. Inst., Kolkata, India
fYear
2013
fDate
25-28 Aug. 2013
Firstpage
892
Lastpage
896
Abstract
This paper presents a statistical model based scheme for automatic extraction of text components from digital images. The work is composed of two tasks. First, we perform segmentation of a color image by applying a pair-copula based mixture model. This produces a number of spatially connected components (some of which may be text). From each of these components, we extract certain features that could discriminate text from non-text components. The feature vectors, arising from text components, are assumed to be random samples from a pair-copula based multivariate distribution. This distribution parameters can be estimated using training text samples (i.e., connected components). Here, we use the ICDAR 2011 "Born-Digital Images\´\´ data set since it provides such ground truth text components. We estimate distribution parameters based on the feature vectors obtained from these training text components. The final task remain is to infer whether a test sample is a text component. We apply a non-parametric statistical hypothesis testing to assess whether a test sample is generated from the known multivariate distribution. If so, we may regard the sample to be a text. Our results obtained on ICDAR 2011 "Born-Digital Images\´\´ data set, are satisfactory.
Keywords
feature extraction; image colour analysis; image segmentation; statistical testing; text detection; ICDAR 2011; automatic extraction; color image segmentation; connected components; digital images; feature vectors; nonparametric statistical hypothesis testing; pair-copula based multivariate distribution; pair-copula based scheme; statistical model; text components; text extraction; Digital images; Distribution functions; Feature extraction; Image color analysis; Image segmentation; Training; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location
Washington, DC
ISSN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2013.182
Filename
6628747
Link To Document