DocumentCode :
2781257
Title :
Chinese query expansion based on user log clustering
Author :
Jia, Shufang ; Li, Lei
Author_Institution :
Center for Intell. Sci. & Technol., Beijing Univ. of Posts & Telecommun., Beijing, China
fYear :
2009
fDate :
6-8 Nov. 2009
Firstpage :
446
Lastpage :
451
Abstract :
Most previous query expansion researches are based on pseudo relevant documents. In this study, we present a novel expansion method by clustering the real user log. Because not all of the clicked pages are suitable for query expansion, we de-noised the clicked results by reliability to enhance the performance. After HTML labels removing, the page body contents are clustered and the cluster centers cover various aspects of the original query. The terms used in log queries can provide a better choice of features, from the user´s point of view, for summarizing the Web pages that were clicked from these queries. Therefore, the associated queries, reverse queries, Webpage title and keyword phrases are combined with the cluster centers to attain high-quality expansion terms for new queries. We also propose a new terminology extraction method through Baidu Baike. It can identify and extract the terminology phrase based on the manual edited dictionary online.
Keywords :
Web sites; data mining; hypermedia markup languages; query processing; Baidu Baike; Chinese query expansion; HTML labels removal; Web page denoising; keyword phrases; manual edited online dictionary; page body contents; pseudo relevant documents; terminology phrase extraction; terminology phrase identification; user log clustering; Computer science; Data mining; Dictionaries; HTML; Information retrieval; Large scale integration; Noise reduction; Search engines; Terminology; Web pages; Baike terminology extraction; LSI clustering; Query expansion; log mining; webpage de-noising;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Network Infrastructure and Digital Content, 2009. IC-NIDC 2009. IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-4898-2
Electronic_ISBN :
978-1-4244-4900-6
Type :
conf
DOI :
10.1109/ICNIDC.2009.5360836
Filename :
5360836
Link To Document :
بازگشت