• DocumentCode
    3103709
  • Title

    A Methodology for File Relationship Discovery

  • Author

    Ondrejcek, Michal ; Kastner, Jason ; Kooper, Rob ; Bajcsy, Peter

  • Author_Institution
    Nat. Center for Supercomput. Applic., Univ. of Illinois, Urbana, IL, USA
  • fYear
    2009
  • fDate
    9-11 Dec. 2009
  • Firstpage
    193
  • Lastpage
    200
  • Abstract
    This paper addresses the problem of discovering temporal and contextual relationships across document, data, and software categories of electronic records. We designed a methodology to discover unknown relationships by conducting file system and file content analyses. The work also investigates automation of metadata extraction from engineering drawings and storage requirements for metadata extraction. The methodology has been applied to extracting information from a test collection of electronic records about the NAVY ship (TWR 841) archived by the US National Archive (NARA). This test collection represents a problem of unknown relationships among files that include 784 2D image drawings and 22 CAD models.
  • Keywords
    data handling; records management; 2D image drawings; CAD model; contextual relationship; electronic records; engineering drawings; file content analysis; file relationship discovery; file system; metadata extraction; software category; Application software; Character recognition; Data mining; Design methodology; Electronic equipment testing; Engineering drawings; File systems; Marine vehicles; Optical character recognition software; Storage automation; Data conversion; Data processing; Optical character recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    e-Science, 2009. e-Science '09. Fifth IEEE International Conference on
  • Conference_Location
    Oxford
  • Print_ISBN
    978-0-7695-3877-8
  • Type

    conf

  • DOI
    10.1109/e-Science.2009.35
  • Filename
    5380868