• DocumentCode
    1918986
  • Title

    Abstract: Digitization and Search: A Non-Traditional Use of HPC

  • Author

    Diesendruck, L. ; Marini, Luigi ; Kooper, R. ; Kejriwal, Mayank ; McHenry, Kenton

  • Author_Institution
    Nat. Center for Supercomput. Applic., Univ. of Illinois at Urbana-Champaign, Champaign, IL, USA
  • fYear
    2012
  • fDate
    10-16 Nov. 2012
  • Firstpage
    1460
  • Lastpage
    1461
  • Abstract
    We describe our efforts to provide a form of automated search of handwritten content for digitized document archives. To carry out the search we use a computer vision technique called word spotting. A form of content based image retrieval, it avoids the still difficult task of directly recognizing text by allowing a user to search using a query image containing handwritten text and ranking a database of images in terms of those that contain more similar looking content. In order to make this search capability available on an archive three computationally expensive pre-processing steps are required. We augment this automated portion of the process with a passive crowd sourcing element that mines queries from the systems users in order to then improve the results of future queries. We benchmark the proposed framework on 1930s Census data, a collection of roughly 3.6 million forms and 7 billion individual units of information.
  • Keywords
    computer vision; content-based retrieval; document image processing; image retrieval; information retrieval systems; parallel processing; text detection; visual databases; Census data; HPC; automated search; computer vision technique; content based image retrieval; digitized document archives; handwritten content; handwritten text; image database ranking; passive crowd sourcing element; query image; query mining; text recognition; word spotting; Big Data; Digitization; Indexing Text;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:
  • Conference_Location
    Salt Lake City, UT
  • Print_ISBN
    978-1-4673-6218-4
  • Type

    conf

  • DOI
    10.1109/SC.Companion.2012.259
  • Filename
    6496042