DocumentCode :
2379353
Title :
Using SOM based graph clustering for extracting main ideas from documents
Author :
Phuc, Do ; Hung, Mai Xuan
Author_Institution :
VNU-HCM, Univ. of Inf. Technol., Ho Chi Minh City
fYear :
2008
fDate :
13-17 July 2008
Firstpage :
209
Lastpage :
214
Abstract :
In this paper, we would like to present a graph clustering system for grouping the similar documents and extracting the main ideas in documents. To cluster the documents, we need a model for representing the documents. The traditional approaches used a word set based model or a vector based model for representing the documents. These models discard the important structural information of documents such as word position, the semantic relations of words in document... Recently, some research works using the graph for representing the documents have been appeared. We use the graph to be created by analyzing the co-occurrence and position of two words in a section of document. After representing the documents by using graph, we used self organizing map (SOM) with two-dimensional output layer for grouping the graphs. One of the advantages of SOM is to cluster the data without specifying the number of clusters. Besides, two-dimensional SOM output layer can be put on the computer display and it can help to access the similar documents on the computer display. We use the graph distance based on the maximum common sub-graph (mcs) which is discovered by maximal frequent sub-graph algorithm and the updated operation of neurons on SOM ouput layer based on the weighted means graphs and the genetic algorithm.
Keywords :
genetic algorithms; graph theory; pattern clustering; self-organising feature maps; text analysis; SOM based graph clustering; document clustering; document main idea extraction; document structural information; genetic algorithm; graph clustering system; maximum common subgraph; neuron operation; self organizing map; weighted mean graph; word cooccurrence; word position; word semantic relation; Cities and towns; Clustering algorithms; Computer displays; Data mining; Frequency; Genetic algorithms; Information technology; Neural networks; Neurons; Organizing; Genetic algorithm; Graph Distance; Graph clustering; Kohonen neural network; Weighted means graphs;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Research, Innovation and Vision for the Future, 2008. RIVF 2008. IEEE International Conference on
Conference_Location :
Ho Chi Minh City
Print_ISBN :
978-1-4244-2379-8
Electronic_ISBN :
978-1-4244-2380-4
Type :
conf
DOI :
10.1109/RIVF.2008.4586357
Filename :
4586357
Link To Document :
بازگشت