• DocumentCode
    351086
  • Title

    Similar document detection using self-organizing maps

  • Author

    Lensu, Anssi ; Koikkalainen, Pasi

  • Author_Institution
    Dept. of Math. Inf. Technol., Jyvaskyla Univ., Finland
  • fYear
    1999
  • fDate
    36495
  • Firstpage
    174
  • Lastpage
    177
  • Abstract
    This paper describes how similar free-form textual documents can be matched using the self-organizing maps (SOMs). The analysis chain is made of three parts: first, similar words are located using an alphabet occurrence coding and SOM; second, three-word contexts are clustered using codes obtained from the word SOM to build a context map; and third, whole documents are clustered using codes from the context SOM. Although this work is inspired by the WEBSOM method, it is quite different since our goal was to build a fast system, which is tolerant to the special features of different languages
  • Keywords
    document handling; image matching; information retrieval; self-organising feature maps; WEBSOM method; alphabet occurrence coding; context map; free-form textual documents; self-organizing maps; similar document detection; Algorithm design and analysis; Computer vision; Data analysis; Data mining; Humans; Information technology; Neural networks; Self organizing feature maps; Text analysis; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Knowledge-Based Intelligent Information Engineering Systems, 1999. Third International Conference
  • Conference_Location
    Adelaide, SA
  • Print_ISBN
    0-7803-5578-4
  • Type

    conf

  • DOI
    10.1109/KES.1999.820147
  • Filename
    820147