DocumentCode :
2546789
Title :
The extraction of text/graphs from degraded documents
Author :
Yen, Shwu-Huey ; Chen, Yi-Jin ; Lin, Hui-Jen ; Wang, Chia-Jen
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Tamkang Univ., Tamsui, Taiwan
fYear :
2004
fDate :
5-7 Jan. 2004
Firstpage :
181
Lastpage :
186
Abstract :
This paper presents a method for improving the quality of degraded documents by noise removal and text enhancing. Histogram of a degraded document is analyzed to find out the approximate ranges of gray-value for text-, graph-, (i.e. photographs), and background-pixels. After the graph-pixels are identified, they are replaced by the background pixels. Agent-growing method described by S. H. Yen and M. C. Shih (2000) is then applied to smooth the noisy background and a document with clear readable condition for text and background is obtained. At last, graph pixels are recovered to get the final result such that the degraded document now has the text in much better quality and photographs preserved if there is any. Experiments to verify the efficacy of the proposed method and comparison to some existing techniques are also presented.
Keywords :
character recognition; feature extraction; noise; software agents; text analysis; agent-growing method; background pixels; degraded documents; graph pixels; gray-value; histogram; noise removal; photograph pixels; text enhancing; Background noise; Computer science; Councils; Data mining; Degradation; Electronic mail; Histograms; Neural networks; Protection; Smoothing methods;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia Modelling Conference, 2004. Proceedings. 10th International
Print_ISBN :
0-7695-2084-7
Type :
conf
DOI :
10.1109/MULMM.2004.1264984
Filename :
1264984
Link To Document :
بازگشت