DocumentCode
3785505
Title
Content-based retrieval of historical Ottoman documents stored as textual images
Author
E. Saykol;A.K. Sinop;U. Gudukbay;O. Ulusoy;A.E. Cetin
Author_Institution
Dept. of Comput. Eng., Bilkent Univ., Ankara, Turkey
Volume
13
Issue
3
fYear
2004
Firstpage
314
Lastpage
325
Abstract
There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. A framework for content-based retrieval of historical documents in the Ottoman Empire archives is presented. The documents are stored as textual images, which are compressed by constructing a library of symbols occurring in a document, and the symbols in the original image are then replaced with pointers into the codebook to obtain a compressed representation of the image. The features in wavelet and spatial domains, based on angular and distance span of shapes, are used to extract the symbols. In order to make content-based retrieval in the historical archives, a query is specified as a rectangular region in an input image and the same symbol-extraction process is applied to the query region. The queries are processed on the codebook of documents and the query images are identified in the resulting documents using the pointers in the textual images. The query process does not require decompression of images. The new content-based retrieval framework is also applicable to many other document archives using different scripts.
Keywords
"Image retrieval","Content based retrieval","Multimedia databases","Image coding","Acceleration","Cultural differences","Image processing","Image databases","Visual databases","Libraries"
Journal_Title
IEEE Transactions on Image Processing
Publisher
ieee
ISSN
1057-7149
Type
jour
DOI
10.1109/TIP.2003.821114
Filename
1278356
Link To Document