• DocumentCode
    1930663
  • Title

    Unsupervised clustering of symbol strings

  • Author

    Flanagan, John A.

  • Author_Institution
    Nokia Res. Center, Espoo, Finland
  • Volume
    4
  • fYear
    2003
  • fDate
    20-24 July 2003
  • Firstpage
    3250
  • Abstract
    The Symbol String Clustering Map (SCM) is introduced as a very simple but effective algorithm for clustering strings of symbols in an unsupervised manner. The clustering is based on an iterative learning of the input data symbol strings. The learning uses the principle of winner take all (WTA) and hence requires a similarity measure between symbol strings. A novel and efficient, average based, similarity measure is defined. Unsupervised generation of the data cluster structure results from the use of a lateral inhibition function applied to the update of adjacent nodes on the SCM lattice. A simple coding method to convert time sequences of symbols to simple symbol strings for use in the SCM is described. The SCM is shown to generate clusters for symbol string data sets.
  • Keywords
    data mining; nomenclature; pattern clustering; string matching; unsupervised learning; Symbol String Clustering Map; average based similarity measure; coding method; input data symbol strings; iterative learning; lateral inhibition function; time sequences; unsupervised data cluster structure generation; unsupervised symbol string clustering; winner take all; Clustering algorithms; Feature extraction; Iterative algorithms; Lattices; Nearest neighbor searches; Pattern recognition; Probability density function; Random variables; Samarium; Tin;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks, 2003. Proceedings of the International Joint Conference on
  • ISSN
    1098-7576
  • Print_ISBN
    0-7803-7898-9
  • Type

    conf

  • DOI
    10.1109/IJCNN.2003.1224094
  • Filename
    1224094