DocumentCode :
3273415
Title :
Clustering Based on Context Similarity
Author :
Kovács, László ; Répási, Tibor ; Baksa-Varga, Erika ; Barabás, Péter
Author_Institution :
Univ. of Miskolc, Miskolc, Hungary
fYear :
2008
fDate :
8-10 Nov. 2008
Firstpage :
157
Lastpage :
165
Abstract :
The discovery of word categories is an important step in statistical grammar induction systems. Word categories can be considered as clusters containing words with similar grammatical or semantic behavior. Having a metric space of words, the clustering algorithm will place similar words into the same cluster, whereas dissimilar ones are clustered into different groups. In this paper we propose an approximate word clustering method based on context similarity. The context of a word is defined here as the set of sentences containing the word. The similarity of two words is measured with the similarity of the corresponding context sets. For the calculation of the context-based distance of two words, a hierarchical agglomerative clustering algorithm has been developed, and is presented here.
Keywords :
grammars; natural language processing; statistical analysis; clustering algorithm; context based distance; context similarity; grammatical behavior; hierarchical agglomerative clustering; semantic behavior; statistical grammar induction systems; word categories; word clustering; Artificial intelligence; Biomedical computing; Biomedical equipment; Biomedical measurements; Clustering algorithms; Clustering methods; Computational efficiency; Medical services; Testing; Text mining; approximate clustering algorithms; context based clustering; context similarity measures; grammar induction systems;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Complexity and Intelligence of the Artificial and Natural Complex Systems, Medical Applications of the Complex Systems, Biomedical Computing, 2008. CANS '08. First International Conference on
Conference_Location :
Targu Mures, Mures
Print_ISBN :
978-0-7695-3621-7
Type :
conf
DOI :
10.1109/CANS.2008.26
Filename :
5231455
Link To Document :
بازگشت