DocumentCode
2348603
Title
Bagging to find better expansion words
Author
Wang, Bingqing ; Zhou, Yaqian ; Qiu, Xipeng ; Zhang, Qi ; Huang, Xuanjing
Author_Institution
Sch. of Comput. Sci., Fudan Univ., Shanghai, China
fYear
2010
fDate
21-23 Aug. 2010
Firstpage
1
Lastpage
8
Abstract
The supervised learning has been applied into the query expansion techniques, which trains a model to predict the “goodness” or “utility” of the expanded term to the retrieval system. There are many features to measure the relatedness between the expanded word and the query, which can be incorporated in the supervised learning to select the expanded terms. The training data set is generated automatically by a tricky method. However, this method can be affected by many aspects. A severe problem is that the distribution of the features is query-dependent, which has not been discussed in previous work. With a different distribution on the features, it is questionable to merge these training instances together and use the whole data set to train one single model. In this paper, we first investigate the statistical distribution of the auto-generated training data and show the problems in the training data set. Based on our analysis, we proposed to use the bagging method to ensemble several regression models in order to get a better supervised model to make prediction on the expanded terms. We conducted the experiments on the TREC benchmark test collections. Our analysis on the training data reveals some interesting phenomena about the query expansion techniques. The experiment results also show that the bagging approach can achieve the state-of-art retrieval performance on the standard TREC data set.
Keywords
learning (artificial intelligence); query processing; regression analysis; statistical distributions; TREC benchmark test collections; bagging method; query expansion techniques; regression models; statistical distribution; supervised learning; training data set; Variable speed drives; Bagging; Query Expansion; Regression Learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4244-6896-6
Type
conf
DOI
10.1109/NLPKE.2010.5587826
Filename
5587826
Link To Document