• DocumentCode
    2145293
  • Title

    Segmentation and Normalisation in Grapheme Codebooks

  • Author

    Gilliam, Tara ; Wilson, Richard C. ; Clark, John A.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of York, York, UK
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    613
  • Lastpage
    617
  • Abstract
    The grapheme codebook is a high-performing technique for offline writer identification. This paper considers whether the de facto standards for initial grapheme extraction are optimal for both modern and historical datasets. We examine the construction and representation of the graphemes that comprise the codebook, testing three segmentation methods and two grapheme size normalisation methods on two datasets: a 93-writer IAM dataset, and a 43-writer medieval English dataset. The standard minima-split segmentation is compared to a complementary segmentation method that preserves ligature shapes, as well as the union of both these methods. Classification performance for each method is compared on a range of codebook sizes. We demonstrate that grapheme aspect-ratio is not always a writer-specific feature, and that preserving the character body shape in segmentation is more informative than preserving cursive text ligatures.
  • Keywords
    character recognition; image classification; image segmentation; text analysis; classification performance; cursive text ligatures; de facto standards; grapheme codebooks; grapheme extraction; graphemes representation; ligature shapes; normalisation methods; offline writer identification; segmentation methods; Accuracy; Feature extraction; Hidden Markov models; Image segmentation; Ink; Text analysis; Codebook; Grapheme; Segmentation; Writer identification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2011 International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4577-1350-7
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2011.129
  • Filename
    6065384