DocumentCode
1994393
Title
Document image retrieval based on 2D density distributions of terms with pseudo relevance feedback
Author
Kise, Koichi ; Wuotang, Yin ; Matsumoto, Keinosuke
Author_Institution
Dept. of Comput. & Syst. Sci., Osaka Prefecture Univ., Japan
fYear
2003
fDate
3-6 Aug. 2003
Firstpage
488
Abstract
Document image retrieval is a task to retrieve document images relevant to a user\´s query. Most existing methods based on word-level indexing rely on the representation called "bag of words" which originated in the field of information retrieval. This paper presents a new representation of documents that utilizes additional information about the location of words in pages so as to improve the retrieval performance. We consider that pages are relevant to a query if they contain its terms densely. This notion is embodied as density distributions of terms calculated in the proposed method. Its performance is improved with the help of "pseudo relevance feedback", i.e., a method of expanding a query by analyzing pages. Experimental results on English document images show that the proposed method is superior to conventional methods of electronic document retrieval at recall levels 0.0-0.6.
Keywords
document image processing; image retrieval; relevance feedback; text analysis; visual databases; 2D density distribution; BOW; DIB; bag of words; document image database; document image retrieval; information retrieval; page analysis; pseudo relevance feedback; word-level indexing; Distributed computing; Feedback; Image databases; Image retrieval; Indexing; Information retrieval; Optical character recognition software; Paper technology; Performance analysis; Space technology;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
Print_ISBN
0-7695-1960-1
Type
conf
DOI
10.1109/ICDAR.2003.1227713
Filename
1227713
Link To Document