Title :
Similar document detection using self-organizing maps
Author :
Lensu, Anssi ; Koikkalainen, Pasi
Author_Institution :
Dept. of Math. Inf. Technol., Jyvaskyla Univ., Finland
Abstract :
This paper describes how similar free-form textual documents can be matched using the self-organizing maps (SOMs). The analysis chain is made of three parts: first, similar words are located using an alphabet occurrence coding and SOM; second, three-word contexts are clustered using codes obtained from the word SOM to build a context map; and third, whole documents are clustered using codes from the context SOM. Although this work is inspired by the WEBSOM method, it is quite different since our goal was to build a fast system, which is tolerant to the special features of different languages
Keywords :
document handling; image matching; information retrieval; self-organising feature maps; WEBSOM method; alphabet occurrence coding; context map; free-form textual documents; self-organizing maps; similar document detection; Algorithm design and analysis; Computer vision; Data analysis; Data mining; Humans; Information technology; Neural networks; Self organizing feature maps; Text analysis; Web sites;
Conference_Titel :
Knowledge-Based Intelligent Information Engineering Systems, 1999. Third International Conference
Conference_Location :
Adelaide, SA
Print_ISBN :
0-7803-5578-4
DOI :
10.1109/KES.1999.820147