• DocumentCode
    2480631
  • Title

    Bag of Characters and SOM Clustering for Script Recognition and Writer Identification

  • Author

    Marinai, Simone ; Miotti, Beatrice ; Soda, Giovanni

  • Author_Institution
    Dipt. di Sist. e Inf., Univ. di Firenze, Firenze, Italy
  • fYear
    2010
  • fDate
    23-26 Aug. 2010
  • Firstpage
    2182
  • Lastpage
    2185
  • Abstract
    In this paper, we describe a general approach for script (and language) recognition from printed documents and for writer identification in handwritten documents. The method is based on a bag of visual word strategy where the visual words correspond to characters and the clustering is obtained by means of Self Organizing Maps (SOM). Unknown pages (words in the case of script recognition) are classified comparing their vectorial representations with those of one training set using a cosine similarity. The comparison is improved using a similarity score that is obtained taking into account the SOM organization of cluster centroids. Promising results are presented for both printed documents and handwritten musical scores.
  • Keywords
    document image processing; handwritten character recognition; pattern clustering; self-organising feature maps; SOM clustering; bag of visual word strategy; cosine similarity; handwritten documents; handwritten musical scores; printed documents; script recognition; self organizing maps; writer identification; Artificial neural networks; Character recognition; Feature extraction; Indexing; Text analysis; Visualization; Script Recognition; Self-Organizing Map; Writer Identification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2010 20th International Conference on
  • Conference_Location
    Istanbul
  • ISSN
    1051-4651
  • Print_ISBN
    978-1-4244-7542-1
  • Type

    conf

  • DOI
    10.1109/ICPR.2010.534
  • Filename
    5595942