• DocumentCode
    2530009
  • Title

    Document images analysis solutions for digital libraries

  • Author

    Bourgeois, F. Le ; Trinh, E. ; Allier, B. ; Eglin, V. ; Emptoz, H.

  • Author_Institution
    LIRIS, CNRS, Villeurbanne, France
  • fYear
    2004
  • fDate
    2004
  • Firstpage
    2
  • Lastpage
    24
  • Abstract
    Today the development of digital libraries is reaching technological limits due to the difficulty of automatically processing a growing mass of digitized images of documents from different origins. The main problem is the high cost of the digitization and retro-conversion processes which include image capture and indexation, metadata extraction, image storage, conversion in reusable electronic form, publication on the Internet and reduction of image weights for faster access. To reduce the cost of digitization and retro-conversion, we need to break technological bottlenecks like the development of "intelligent" digitizers which reduce manual intervention and produce the best quality images. Retro-conversion needs efficient software which analyze images contents and automatically extract all necessary information for image indexing. Other technological bottlenecks must also be considered like the need of an open file format, which can describe digitized documents as heterogeneous media. This article is not state-of-the-art in this domain, it just describes some cases, which we have studied in our laboratory during the past years.
  • Keywords
    digital libraries; document image processing; indexing; information retrieval; Internet; digital library; digitized document; digitized image; document images analysis solution; heterogeneous media; image capture; image indexation; image quality; image storage; image weight; information extraction; metadata extraction; open file format; retro-conversion process; reusable electronic form conversion; Costs; Data mining; Image analysis; Image converters; Image storage; Information analysis; Internet; Manuals; Software libraries; Text analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Image Analysis for Libraries, 2004. Proceedings. First International Workshop on
  • Print_ISBN
    0-7695-2088-X
  • Type

    conf

  • DOI
    10.1109/DIAL.2004.1263233
  • Filename
    1263233