مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

3273415

Title :

Clustering Based on Context Similarity

Author :

Kovács, László ; Répási, Tibor ; Baksa-Varga, Erika ; Barabás, Péter

Author_Institution :

Univ. of Miskolc, Miskolc, Hungary

fYear :

2008

fDate :

8-10 Nov. 2008

Firstpage :

157

Lastpage :

165

Abstract :

The discovery of word categories is an important step in statistical grammar induction systems. Word categories can be considered as clusters containing words with similar grammatical or semantic behavior. Having a metric space of words, the clustering algorithm will place similar words into the same cluster, whereas dissimilar ones are clustered into different groups. In this paper we propose an approximate word clustering method based on context similarity. The context of a word is defined here as the set of sentences containing the word. The similarity of two words is measured with the similarity of the corresponding context sets. For the calculation of the context-based distance of two words, a hierarchical agglomerative clustering algorithm has been developed, and is presented here.

Keywords :

grammars; natural language processing; statistical analysis; clustering algorithm; context based distance; context similarity; grammatical behavior; hierarchical agglomerative clustering; semantic behavior; statistical grammar induction systems; word categories; word clustering; Artificial intelligence; Biomedical computing; Biomedical equipment; Biomedical measurements; Clustering algorithms; Clustering methods; Computational efficiency; Medical services; Testing; Text mining; approximate clustering algorithms; context based clustering; context similarity measures; grammar induction systems;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Complexity and Intelligence of the Artificial and Natural Complex Systems, Medical Applications of the Complex Systems, Biomedical Computing, 2008. CANS '08. First International Conference on

Conference_Location :

Targu Mures, Mures

Print_ISBN :

978-0-7695-3621-7

Type :

conf

DOI :

10.1109/CANS.2008.26

Filename :

5231455

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3273415