• DocumentCode
    2021810
  • Title

    Automatic Detection of Document Script and Orientation

  • Author

    Lu, Shijian ; Tan, Chew Lim

  • Author_Institution
    Nat. Univ. of Singapore, Singapore
  • Volume
    1
  • fYear
    2007
  • fDate
    23-26 Sept. 2007
  • Firstpage
    237
  • Lastpage
    241
  • Abstract
    This paper presents an identification technique that automatically detects the underlying script and orientation of scanned document images. In the proposed technique, document script and orientation are identified by using the stroke density and distribution, which convert each document image into a document vector. For each script at each orientation, a number of reference document vectors are first constructed. Script and orientation of the query document are then determined according to the similarity between the query document vector and multiple pre- constructed reference document vectors by using the K-nearest neighbor algorithm. Experiments show that the proposed technique is tolerant to the document skew and able to detect orientations of documents of different scripts.
  • Keywords
    document image processing; image retrieval; vectors; K-nearest neighbor algorithm; document orientation automatic detection; document script automatic detection; query document vector; scanned document images; stroke density; Character recognition; Engines; Filtering; Filters; Image analysis; Image converters; Optical character recognition software; Pixel; Statistical distributions; Text analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
  • Conference_Location
    Parana
  • ISSN
    1520-5363
  • Print_ISBN
    978-0-7695-2822-9
  • Type

    conf

  • DOI
    10.1109/ICDAR.2007.4378711
  • Filename
    4378711