• DocumentCode
    1783736
  • Title

    A new hybrid binarization method based on Kmeans

  • Author

    Soua, Mahmoud ; Kachouri, Rostom ; Akil, Mohamed

  • Author_Institution
    LIGM, ESIEE Paris, Noisy-Le-Grand, France
  • fYear
    2014
  • fDate
    21-23 May 2014
  • Firstpage
    118
  • Lastpage
    123
  • Abstract
    The document binarization is a fundamental processing step toward Optical Character Recognition (OCR). It aims to separate the foreground text from the document background. In this article, we propose a novel binarization technique combining local and global approaches using the clustering algorithm Kmeans. The proposed Hybrid Binarization, based on Kmeans (HBK), performs a robust binarization on scanned documents. According to several experiments, we demonstrate that the HBK method improves the binarization quality while minimizing the amount of distortion. Moreover, it outperforms several well-known state of the art methods in the OCR evaluation.
  • Keywords
    document image processing; learning (artificial intelligence); optical character recognition; pattern clustering; HBK method; Kmeans clustering algorithm; OCR evaluation; binarization quality; binarization technique; distortion amount; document binarization; hybrid binarization method; optical character recognition; scanned documents; Character recognition; Clustering algorithms; Distortion measurement; Histograms; Optical character recognition software; Optical distortion; Robustness; Kmeans; OCR; Scanned documents; binarization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communications, Control and Signal Processing (ISCCSP), 2014 6th International Symposium on
  • Conference_Location
    Athens
  • Type

    conf

  • DOI
    10.1109/ISCCSP.2014.6877830
  • Filename
    6877830