Title :
Performance Evaluation and Benchmarking of Six Texture-Based Feature Sets for Segmenting Historical Documents
Author :
Mehri, M. ; Mhiri, M. ; Heroux, P. ; Gomez-Kramer, P. ; Mahjoub, M.A. ; Mullot, R.
Author_Institution :
L3i, Univ. of La Rochelle, La Rochelle, France
Abstract :
Recently, texture-based features have been used for digitized historical document image segmentation. It has been proven that these methods work effectively with no a priori knowledge. Moreover, it has been shown that they are robust when they are applied on degraded documents under different noise levels and types. In this paper an approach of evaluating texture-based feature sets for segmenting historical documents is presented in order to compare them. We aim at determining which texture features could be more adequate for segmenting graphical regions from textual ones on the one hand and for discriminating text in a variety of situations of different fonts and scales on the other hand. For this purpose, six well-known and widely used texture-based feature sets (autocorrelation function, Grey Level Co occurrence Matrix, Gabor filters, 3-level Haar wavelet transform, 3-level wavelet transform using 3-tap Daubechies filter and 3-level wavelet transform using 4-tap Daubechies filter) are evaluated and compared on a large corpus of historical documents. An additional insight into the computation time and complexity of each texture-based feature set is given. Qualitative and numerical experiments are also given to demonstrate each texture-based feature set performance.
Keywords :
Gabor filters; Haar transforms; computational complexity; correlation methods; document image processing; feature extraction; filtering theory; history; image segmentation; image texture; matrix algebra; performance evaluation; set theory; text analysis; wavelet transforms; 3-level Haar wavelet transform; 3-level wavelet transform; 3-tap Daubechies filter; 4-tap Daubechies filter; Gabor filters; autocorrelation function; computation complexity; computation time; digitized historical document image segmentation; graphical region segmentation; grey level cooccurrence matrix; noise levels; noise types; performance benchmarking; performance evaluation; text discrimination; texture-based feature set evaluation; Complexity theory; Correlation; Feature extraction; Graphics; Image segmentation; Wavelet transforms; Historical digitized document images; Multiscale approach; Segmentation; Texture;
Conference_Titel :
Pattern Recognition (ICPR), 2014 22nd International Conference on
Conference_Location :
Stockholm
DOI :
10.1109/ICPR.2014.497