DocumentCode
2465144
Title
Automatic construction of an evaluation dataset from wisdom of the crowds for information retrieval applications
Author
Wang, Chieh-Jen ; Huang, Hung-Sheng ; Chen, Hsin-Hsi
Author_Institution
Dept. of Comput. Sci. & Inf. Eng., Nat. Taiwan Univ., Taipei, Taiwan
fYear
2012
fDate
14-17 Oct. 2012
Firstpage
490
Lastpage
495
Abstract
A benchmark evaluation dataset which reflects users´ search behaviors in the real world is indispensable for evaluating the performance of information retrieval applications. A typical evaluation dataset consists of a document set, a topic set and relevance judgments. Manual preparation of an evaluation dataset needs much human cost, and human-made topics may not fully capture users´ real search needs. This paper aims at automatically constructing an evaluation dataset from wisdom of the crowds in search query logs for information retrieval applications. We begin with collecting documents of clicked documents in search query logs, selecting suitable queries in terms of topics, sampling documents from the document collection for each query and estimating the multi-level relevance of document samples based on click count, normalized count and average count functions. The machine-made evaluation dataset is trained and tested by three learning to rank algorithms, including linear regression, SVMRank and FRank. We compare their performance on a testing collection MQ2007 of LETOR which is a well-known human-made benchmark dataset for learning to rank. The experimental results show that the performance tendency is similar by using machine-made and human-made evaluation datasets. That demonstrates our proposed models can construct an evaluation dataset with similar quality of human-made.
Keywords
document handling; information retrieval; learning (artificial intelligence); regression analysis; sampling methods; support vector machines; user interfaces; FRank algorithm; SVM algorithm; average count function; click count function; document collection; document sampling; document set; human-made evaluation dataset; information retrieval application; learning-to-rank algorithm; linear regression algorithm; machine-made evaluation dataset; normalized count function; relevance judgment; search query log; support vector machines; topic set; user search behavior; user search need; Humans; Information retrieval; Linear regression; Measurement; Predictive models; Testing; Training; evaluation dataset construction; retrieval evaluation; search query logs analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems, Man, and Cybernetics (SMC), 2012 IEEE International Conference on
Conference_Location
Seoul
Print_ISBN
978-1-4673-1713-9
Electronic_ISBN
978-1-4673-1712-2
Type
conf
DOI
10.1109/ICSMC.2012.6377772
Filename
6377772
Link To Document