مرکز منطقه ای اطلاع رساني علوم و فناوري - A clustering retrieval system of Chinese information

DocumentCode :

3301526

Title :

A clustering retrieval system of Chinese information

Author :

Sha, Xin-Guang ; Liu, Yuan-Chao ; Liu, Ming ; Wang, Xiao-long

Author_Institution :

Intell. Technol. & Natural Language Process. Lab., Harbin Inst. of Technol., Harbin

fYear :

2008

fDate :

19-22 Oct. 2008

Firstpage :

Lastpage :

Abstract :

With tremendous and ever-growing amounts of electronic documents from World Wide Web and digital libraries, it becomes more and more difficult to get information that people really want. In order to predigest search process, people use clustering method to browse through search results. However traditional Chinese information clustering techniques are inadequate since they don´t generate clusters with highly readable themes. This paper reformats the clustering problem as a salient phrase ranking problem. Given a query and its related ranked list of documents (typically a list of titles and snippets) returned from a certain Web search engine, this method first extracts and ranks salient phrases as candidate cluster theme, based on regression model of SVR (support vector regression) learned from human labeled training data. The documents are assigned to relevant salient phrases to form candidate clusters, and the final clusters are generated by merging these candidate clusters. This paper also searches for a reasonable format to display the final themes of clusters, in order to help users to find the interesting documents easily. Experiment results verified our method feasible and effective.

Keywords :

Internet; document handling; merging; natural language processing; pattern clustering; query processing; regression analysis; search engines; support vector machines; Chinese information clustering techniques; Web search engine; World Wide Web; candidate clusters merging; clustering retrieval system; digital libraries; document querying; electronic documents; salient phrase ranking problem; support vector regression; Clustering methods; Data mining; Humans; Information retrieval; Merging; Search engines; Software libraries; Training data; Web search; Web sites; Salient phrase; document clustering; performance of clustering theme;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Natural Language Processing and Knowledge Engineering, 2008. NLP-KE '08. International Conference on

Conference_Location :

Beijing

Print_ISBN :

978-1-4244-4515-8

Electronic_ISBN :

978-1-4244-2780-2

Type :

conf

DOI :

10.1109/NLPKE.2008.4906815

Filename :

4906815

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3301526