مرکز منطقه ای اطلاع رساني علوم و فناوري - Research on mixture language model-based document clustering

DocumentCode :

3262886

Title :

Research on mixture language model-based document clustering

Author :

Wen, Jian ; Li, Zhoujun

Author_Institution :

Comput. Sch., Nat. Univ. of Defence Technol., Changsha

fYear :

2008

fDate :

26-28 Aug. 2008

Firstpage :

649

Lastpage :

652

Abstract :

Language modeling with semantic smoothing is proposed as an effective way to improve the quality of document clustering. However, the existing semantic smoothing model is not effective for partitional clustering because it can not assign fit weight to ldquogeneralrdquo word in a collection. In this paper, inspired by mixture probability model, we put forward a mixture language model for document clustering. The new model can alleviate the effect of ldquogeneralrdquo word, simultaneously, it can integrate the context information and solve the polysemy problems in a document. Based the new model, an EM algorithm for partitional clustering is present. The experimental results show our algorithms are more effective than the previous methods to improve the cluster quality.

Keywords :

document handling; natural language processing; cluster quality; context information; language modeling; mixture language model-based document clustering; mixture probability model; partitional clustering; polysemy problems; semantic smoothing model; Clustering algorithms; Clustering methods; Context modeling; Frequency; Information retrieval; Partitioning algorithms; Probability; Smoothing methods; Vocabulary;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Granular Computing, 2008. GrC 2008. IEEE International Conference on

Conference_Location :

Hangzhou

Print_ISBN :

978-1-4244-2512-9

Electronic_ISBN :

978-1-4244-2513-6

Type :

conf

DOI :

10.1109/GRC.2008.4664755

Filename :

4664755

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3262886