Title :
A search log clustering algorithm based on the idea of hierarchy
Author :
Shu-Sheng Hou ; Han Jin ; Yang Wei ; Xu Wang ; Bao-Quan Fan ; Jin-Mao Wei
Author_Institution :
Coll. of Inf. .Tech. Sci., Nankai Univ., Tianjin, China
Abstract :
Data analysis for search logs is becoming more and more important and necessary. A search query may contain several keywords, which makes the text belong to different categories. This paper presents a new algorithm called Sequential Clustering Algorithm for clustering search logs. Different from many other clustering algorithms, the proposed algorithm can cluster one record into multiple categories and meanwhile achieve a balance among time complexity, clustering reliability and the involved parameters. These are realized by text combination and text backtracking. Text combination forms the feature of each category automatically, and text backtracking makes the previous texts have opportunities to be compared with new categories. In the experiments, the proposed algorithm and the general hierarchical clustering algorithm were applied to the clustering of search log texts. The results suggest that our proposed algorithm can improve the clustering performance.
Keywords :
data analysis; query formulation; clustering reliability; data analysis; hierarchical clustering algorithm; search log clustering algorithm; search log texts; search query; sequential clustering algorithm; text backtracking; text combination; time complexity; Abstracts; Hierarchical clustering; Search log clustering; Web data mining;
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2013 International Conference on
Conference_Location :
Tianjin
DOI :
10.1109/ICMLC.2013.6890446