Title :
Efficient signature file methods for text retrieval
Author :
Lee, Dik Lun ; Kim, Young Man ; Patel, Gaurav
Author_Institution :
Dept. of Comput. & Inf. Sci., Ohio State Univ., Columbus, OH, USA
fDate :
6/1/1995 12:00:00 AM
Abstract :
Signature files have been studied extensively, as an access method for textual databases. Many approaches have been proposed for searching signatures files efficiently. However, different methods make different assumptions and use different performance measures, making it difficult to compare their performance. In this paper, we study three basic methods proposed in the literature, namely, the indexed descriptor file, the two-level superimposed coding scheme, and the partitioned signature file approach. The contribution of this paper is two-fold. First, we present a uniform analytical performance model so that the methods can be compared fairly and consistently. The analysis shows that the two-level superimposed coding scheme, if stored in a transposed file, has the best performance. Second, we extend the two-level superimposed coding method into a multilevel superimposed coding method, we obtain the optimal number of levels for the multilevel method and show that for databases with reasonable size the optimal value is much larger than 2, which is assumed in the two-level method. The accuracy of the analytical formula is demonstrated by simulation
Keywords :
information retrieval; access method; indexed descriptor file; partitioned signature file approach; performance measures; signature file methods; simulation; text retrieval; textual databases; two-level superimposed coding scheme; Analytical models; Chemicals; Cities and towns; Computer Society; DNA; Hardware; Multimedia databases; Performance analysis; Performance evaluation; Search methods;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on