DocumentCode
2377885
Title
Search scripts mining from wisdom of the crowds
Author
Wang, Chieh-Jen ; Chen, Hsin-Hsi
Author_Institution
Dept. of Comput. Sci. & Inf. Eng., Nat. Taiwan Univ., Taipei, Taiwan
fYear
2011
fDate
9-12 Oct. 2011
Firstpage
878
Lastpage
883
Abstract
This paper mines sequences of actions called search scripts from query logs which keep large scale users´ search experiences. Search scripts can be applied to predict users´ search needs, improve the retrieval effectiveness, recommend advertisements, and so on. Information quality, topic diversity, query ambiguity, and URL relevancy are major challenging issues in search scripts mining. In this paper, we calculate the relevance of URLs, adopt the Open Directory Project (ODP) categories to disambiguate queries and URLs, explore various features and clustering algorithms for intent clustering, and identify critical actions from each intent cluster to form a search script. Experiments show that the model based on a complete link hierarchical clustering algorithm with the features of query terms, relevant URLs, and disambiguated ODP categories performs the best. Search scripts are generated from the best model. When only search scripts containing a single intent are considered to be correct, the accuracy of the action identification algorithm is 0.4650. If search scripts containing a major intent are also counted, the accuracy increases to 0.7315.
Keywords
data mining; pattern clustering; query processing; URL relevancy; action identification algorithm; action sequence mining; advertisement recommendation; complete link hierarchical clustering algorithm; crowd wisdom; information quality; open directory project categories; query ambiguity; query logs; retrieval effectiveness improvement; search script mining; topic diversity; user search need prediction; Accuracy; Clustering algorithms; Noise; Predictive models; Search engines; Sports equipment; Web pages; mining web logs; search script; web search enhancement;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems, Man, and Cybernetics (SMC), 2011 IEEE International Conference on
Conference_Location
Anchorage, AK
ISSN
1062-922X
Print_ISBN
978-1-4577-0652-3
Type
conf
DOI
10.1109/ICSMC.2011.6083762
Filename
6083762
Link To Document