Title :
An improved measuring similarity for short text snippets and its application in clustering search engine
Author :
Li, Zhao ; Peng, Hong ; Peng, Peng ; Jia, Xi-ping ; Wang, Jia-bing
Author_Institution :
Sch. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou
Abstract :
Measuring the similarity of short text snippets plays an important role in information retrieval and natural language processing. Measuring the similarity for short text snippets, such as search queries, remains a challenging task. In this paper, we develop a new similarity measure, which can further improve the accuracy of semantic similarity for short text snippets, especially in the case of insufficient content, such as Web page snippets. Then we introduce our similarity measure combined with information entropy to the clustering search engine to automatically find the best clustering numbers. Meanwhile, we rank the clusters with our method and illustrate the results.
Keywords :
entropy; information retrieval; search engines; semantic Web; text analysis; Web page snippets; clustering search engine; information entropy; information retrieval; natural language processing; short text snippets; similarity measure; Cybernetics; Data mining; Information entropy; Information retrieval; Kernel; Machine learning; Natural language processing; Search engines; Web pages; Web search; Semantic similarity; clustering; information entropy; search engine;
Conference_Titel :
Machine Learning and Cybernetics, 2008 International Conference on
Conference_Location :
Kunming
Print_ISBN :
978-1-4244-2095-7
Electronic_ISBN :
978-1-4244-2096-4
DOI :
10.1109/ICMLC.2008.4620658