Title :
Unsupervised categorization of heterogeneous text images based on fractals
Author :
Khelifi, Badreddine ; Zaghden, Nizar ; Alimi, Mohamed Adel ; Mullot, Remy
Author_Institution :
Dept. of Electr. Eng., Univ. of Sfax, Sfax, Tunisia
Abstract :
This paper deals about text extraction from heterogeneous documents for categorizing documents and indexing tasks. The purpose of this work is to find similar text regions basing on their fonts. First text regions are extracted, and then font matching is performed using fractal descriptors. Experiments are done for both maps and ancient documents.
Keywords :
character sets; distributed databases; feature extraction; fractals; image classification; image matching; text analysis; document categorisation; document indexing; font matching; fractal descriptor; heterogeneous text image; text extraction; unsupervised categorization; Data mining; Focusing; Fractals; Graphics; Indexing; Machine intelligence; Noise robustness; Optical character recognition software; Paper technology; Text analysis;
Conference_Titel :
Pattern Recognition, 2008. ICPR 2008. 19th International Conference on
Conference_Location :
Tampa, FL
Print_ISBN :
978-1-4244-2174-9
Electronic_ISBN :
1051-4651
DOI :
10.1109/ICPR.2008.4761176