Title :
Clustering Based on Context Similarity
Author :
Kovács, László ; Répási, Tibor ; Baksa-Varga, Erika ; Barabás, Péter
Author_Institution :
Univ. of Miskolc, Miskolc, Hungary
Abstract :
The discovery of word categories is an important step in statistical grammar induction systems. Word categories can be considered as clusters containing words with similar grammatical or semantic behavior. Having a metric space of words, the clustering algorithm will place similar words into the same cluster, whereas dissimilar ones are clustered into different groups. In this paper we propose an approximate word clustering method based on context similarity. The context of a word is defined here as the set of sentences containing the word. The similarity of two words is measured with the similarity of the corresponding context sets. For the calculation of the context-based distance of two words, a hierarchical agglomerative clustering algorithm has been developed, and is presented here.
Keywords :
grammars; natural language processing; statistical analysis; clustering algorithm; context based distance; context similarity; grammatical behavior; hierarchical agglomerative clustering; semantic behavior; statistical grammar induction systems; word categories; word clustering; Artificial intelligence; Biomedical computing; Biomedical equipment; Biomedical measurements; Clustering algorithms; Clustering methods; Computational efficiency; Medical services; Testing; Text mining; approximate clustering algorithms; context based clustering; context similarity measures; grammar induction systems;
Conference_Titel :
Complexity and Intelligence of the Artificial and Natural Complex Systems, Medical Applications of the Complex Systems, Biomedical Computing, 2008. CANS '08. First International Conference on
Conference_Location :
Targu Mures, Mures
Print_ISBN :
978-0-7695-3621-7
DOI :
10.1109/CANS.2008.26