DocumentCode :
3141237
Title :
Extraction of type style based meta-information from imaged documents
Author :
Garain, U. ; Chaudhuri, B.B.
Author_Institution :
Comput. Vision & Pattern Recognition Unit, Indian Stat. Inst., Calcutta, India
fYear :
1999
fDate :
20-22 Sep 1999
Firstpage :
341
Lastpage :
344
Abstract :
Extraction of some meta-information from printed documents without an OCR approach is considered. It can be statistically verified that important terms in articles are printed in italic, bold and all capital style. Detection of these type styles helps in automatic extraction of the lines containing titles, authors´ names, subtitles, references as well as sentences having important terms occurring in the text. It also helps in improving the OCR performance for reading the italic text. Some experimental results on the performance of the approach on good quality as well as degraded document images are presented
Keywords :
character sets; document image processing; document image processing; experimental results; line extraction; meta-information; printed documents; sentences; statistics; terms; type style information; Computer vision; Data mining; Image converters; Optical character recognition software; Pattern recognition; Postal services; Pressing; Read only memory; Search engines; Sections;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
Conference_Location :
Bangalore
Print_ISBN :
0-7695-0318-7
Type :
conf
DOI :
10.1109/ICDAR.1999.791794
Filename :
791794
Link To Document :
بازگشت