DocumentCode :
3186966
Title :
A knowledge-based approach for textual information extraction from mixed text/graphics complex document images
Author :
Chen, Yen-Lin
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Taipei Univ. of Technol., Taipei, Taiwan
fYear :
2010
fDate :
10-13 Oct. 2010
Firstpage :
3270
Lastpage :
3277
Abstract :
A new knowledge-based technique for extracting and identifying text-lines from various real-life mixed text/graphics complex document images is presented in this paper. The proposed technique first decompose the document image into distinct object planes to separate homogeneous objects including textual regions of interest, non-text objects such as graphics and pictures, and background textures. Then a knowledge-based text extraction and identification method is performed on the resultant planes to obtain text-lines with different characteristics in each plane. This proposed system can offer high flexibility and expandability by just updating new rules for coping with more various types of real-life and future complex document images. From the experimental and comparative results, the proposed knowledge-based technique demonstrates its effectiveness and advantages on extracting text-lines with various illuminations, sizes, and font styles from various types of mixed text/graphics complex document images.
Keywords :
computer graphics; document handling; document image processing; information retrieval; knowledge based systems; text analysis; homogeneous objects; knowledge based approach; mixed text-graphics complex document image; nontext object; text lines identifyication; textual information extraction; textual region; Image segmentation; Document analysis; complex document images; knowledge-based systems; region segmentation; text extraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems Man and Cybernetics (SMC), 2010 IEEE International Conference on
Conference_Location :
Istanbul
ISSN :
1062-922X
Print_ISBN :
978-1-4244-6586-6
Type :
conf
DOI :
10.1109/ICSMC.2010.5642309
Filename :
5642309
Link To Document :
بازگشت