• DocumentCode
    2149169
  • Title

    Towards Searchable Digital Urdu Libraries - A Word Spotting Based Retrieval Approach

  • Author

    Abidi, Ali ; Siddiqi, Imran ; Khurshid, Khurram

  • Author_Institution
    Nat. Univ. of Sci. & Technol. (MCS-NUST), Islamabad, Pakistan
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    1344
  • Lastpage
    1348
  • Abstract
    Libraries in South Asia hold huge collections of valuable printed documents in Urdu and it is of interest to digitize these collections to make them more accessible. The unavailability of an OCR for Urdu however limits the concept of a digital Urdu library to scanning of documents only, offering very limited search facility based on manually assigned tags. We address this issue by proposing a word spotting based keyword search method for information retrieval in digitized collections of printed Urdu documents. The proposed method is based on segmentation of Urdu text in to partial words and representing each partial word by a set of features. To search a specific word (or phrase), the user provides a query in the form of an image. Comparing the features of the partial words in the query image with the ones already indexed, the user is provided with a list of documents containing occurrences of the queried word. The system evaluated on 50 Urdu documents exhibited a recall of 95.17% and a precision of 94.3%.
  • Keywords
    digital libraries; document image processing; information retrieval; library automation; OCR; South Asia libraries; Urdu text segmentation; digitized collection; information retrieval; printed Urdu documents; query image; searchable digital Urdu library; word spotting based keyword search; word spotting based retrieval approach; Feature extraction; Image segmentation; Indexing; Libraries; Optical character recognition software; Sorting; Vectors; Dynamic Time Warping; Urdu digital libraries; Word Spotting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2011 International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4577-1350-7
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2011.270
  • Filename
    6065529