• DocumentCode
    2957489
  • Title

    Active clustering of document fragments using information derived from both images and catalogs

  • Author

    Wolf, Lior ; Litwak, Lior ; Dershowitz, Nachum ; Shweka, Roni ; Choueka, Yaacov

  • Author_Institution
    Blavatnik Sch. of Comput. Sci., Tel Aviv Univ., Tel Aviv, Israel
  • fYear
    2011
  • fDate
    6-13 Nov. 2011
  • Firstpage
    1661
  • Lastpage
    1667
  • Abstract
    Many significant historical corpora contain leaves that are mixed up and no longer bound in their original state as multi-page documents. The reconstruction of old manuscripts from a mix of disjoint leaves can therefore be of paramount importance to historians and literary scholars. Previously, it was shown that visual similarity provides meaningful pair-wise similarities between handwritten leaves. Here, we go a step further and suggest a semiautomatic clustering tool that helps reconstruct the original documents. The proposed solution is based on a graphical model that makes inferences based on catalog information provided for each leaf as well as on the pairwise similarities of handwriting. Several novel active clustering techniques are explored, and the solution is applied to a significant part of the Cairo Genizah, where the problem of joining leaves remains unsolved even after a century of extensive study by hundreds of scholars.
  • Keywords
    document image processing; history; pattern clustering; active clustering; catalog information; document fragments; graphical model; handwriting pairwise similarities; historical corpora; image information; multipage documents; original document clustering; semiautomatic clustering; Catalogs; Complexity theory; Computational modeling; Data models; Graphical models; Humans; Visualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision (ICCV), 2011 IEEE International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1550-5499
  • Print_ISBN
    978-1-4577-1101-5
  • Type

    conf

  • DOI
    10.1109/ICCV.2011.6126428
  • Filename
    6126428