Title :
A Comparative Study on the Combination of Multiple Retrieval Systems
Author :
Chun-Yi Liu ; Chuan-Yi Tang ; Hsu, D. Frank
Author_Institution :
Dept. of Comput. Sci., Nat. Tsing Hua Univ., Hsinchu, Taiwan
Abstract :
It is known that combining multiple information retrieval systems can improve the combined systems performance over the performance of individual systems in many cases. It has also been known in these cases that the performance improvement of the combined system is mainly due to: (a) performance of each of the individual systems, and (b) the diversity between individual systems. However, it remains a challenging problem to quantify these two conditions. In this paper, we investigate these issues using live TREC datasets, TREC 2-6 (1993-97). Six systems in each dataset are selected either by random choice or by precision. We then compare performance of combining these six systems selected by random v.s. by precision from each of these datasets. It is demonstrated that, in each of the live datasets, the sum of x + y for positive cases (performance of combination of A and B is better than or equal to the individual systems) is larger than for negative cases (other than positive cases), where x is the performance ratio Pl/Ph and y is the diversity (between A and B), both normalized to [0, 1]. In addition, it is also demonstrated that combinations of t systems, t = 2,3,4, 5 , and 6 overall on the 6 systems selected by precision performs better than on the 6 systems selected by random.
Keywords :
information retrieval; information retrieval system evaluation; random processes; TREC-2 dataset; TREC-3 dataset; TREC-4 dataset; TREC-5 dataset; TREC-6 dataset; combined system performance improvement; diversity normalization; individual systems; multiple information retrieval systems; negative cases; performance ratio normalization; positive cases; precision; random choice; Computers; Data integration; Diversity reception; Educational institutions; Electronic mail; Informatics; Information retrieval; cognitive diversity; information retrieval; rank combination; rank-score characteristic (RSC) function; score combination;
Conference_Titel :
Pervasive Systems, Algorithms and Networks (ISPAN), 2012 12th International Symposium on
Conference_Location :
San Marcos, TX
Print_ISBN :
978-1-4673-5064-8
DOI :
10.1109/I-SPAN.2012.31