• DocumentCode
    1827337
  • Title

    Automatic extraction of table metadata from digital documents

  • Author

    Liu, Ying ; Mitra, Prasenjit ; Giles, C. Lee ; Bai, Kun

  • Author_Institution
    Coll. of Inf. Sci. & Technol., Pennsylvania State Univ., University Park, PA
  • fYear
    2006
  • fDate
    38869
  • Firstpage
    339
  • Lastpage
    340
  • Abstract
    Tables are used to present, list, summarize, and structure important data in documents. In scholarly articles, they are often used to present the relationships among data and highlight a collection of results obtained from experiments and scientific analysis. In digital libraries, extracting this data automatically and understanding the structure and content of tables are very important to many applications. Automatic identification extraction, and search for the contents of tables can be made more precise with the help of metadata. In this paper, we propose a set of medium-independent table metadata to facilitate the table indexing, searching, and exchanging. To extract the contents of tables and their metadata, an automatic table metadata extraction algorithm is designed and tested on PDF documents
  • Keywords
    digital libraries; document handling; information retrieval; meta data; PDF documents; automatic identification extraction; automatic table metadata extraction algorithm; digital documents; digital libraries; medium-independent table metadata; table exchanging; table indexing; table searching; Algorithm design and analysis; Automatic testing; Data mining; Databases; Documentation; Educational institutions; Indexing; Information retrieval; Reverse engineering; Software libraries; exchanging; metadata extraction; searching; table detection; table structure recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Libraries, 2006. JCDL '06. Proceedings of the 6th ACM/IEEE-CS Joint Conference on
  • Conference_Location
    Chapel Hill, NC
  • Print_ISBN
    1-59593-354-9
  • Type

    conf

  • DOI
    10.1145/1141753.1141835
  • Filename
    4119155