مرکز منطقه ای اطلاع رساني علوم و فناوري - Using character shape coding for information retrieval

DocumentCode :

2193738

Title :

Using character shape coding for information retrieval

Author :

Smeaton, A.F. ; Spitz, A.L.

Author_Institution :

Sch. of Comput. Applications, Dublin City Univ., Ireland

Volume :

fYear :

1997

fDate :

18-20 Aug 1997

Firstpage :

974

Abstract :

In conventional information retrieval the task of finding users´ search terms in a document is simple. When the document is not available in machine readable format, optical character recognition (OCR) can usually be performed. We have developed a technique for performing information retrieval on document images in such a manner that the accuracy has great utility. The method makes generalisations about the images of characters, then performs classification of these and agglomerates the resulting character shape codes into word tokens based on character shape coding. These are sufficiently specific in their representation of the underlying words to allow reasonable performance of retrieval. Using a collection of over 250 Mbytes of document texts and queries with known relevance assessments, we present a series of experiments to determine how various parameters in the retrieval strategy affect retrieval performance and we obtain a surprisingly good result

Keywords :

document image processing; image classification; image coding; information retrieval; optical character recognition; software performance evaluation; OCR; character shape coding; classification; document images; document texts; information retrieval; machine readable format; optical character recognition; performance; queries; relevance assessments; search terms; word tokens; Character recognition; Computer applications; Computer interfaces; Humans; Image retrieval; Information retrieval; Knowledge representation; Natural languages; Optical character recognition software; Shape;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on

Conference_Location :

Ulm

Print_ISBN :

0-8186-7898-4

Type :

conf

DOI :

10.1109/ICDAR.1997.620655

Filename :

620655

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2193738