• DocumentCode
    1038687
  • Title

    Document processing for automatic knowledge acquisition

  • Author

    Tang, Yuan Yan ; Yan, Chang De ; Suen, Ching Y.

  • Author_Institution
    Centre for Pattern Recognition and Machine Intelligence, Concordia Univ., Montreal, Que., Canada
  • Volume
    6
  • Issue
    1
  • fYear
    1994
  • fDate
    2/1/1994 12:00:00 AM
  • Firstpage
    3
  • Lastpage
    21
  • Abstract
    The knowledge acquisition bottleneck has become the major impediment to the development and application of effective information systems. To remove this bottleneck, new document processing techniques must be introduced to automatically acquire knowledge from various types of documents. By presenting a survey on the techniques and problems involved, this paper aims at serving as a catalyst to stimulate research in automatic knowledge acquisition through document processing. In this study, a document is considered to have two structures: geometric structure and logical structure. These play a key role in the process of the knowledge acquisition, which can be viewed as a process of acquiring the above structures. Extracting the geometric structure from a document refers to document analysis; mapping the geometric structure into logical structure is regarded as document understanding. Both areas are described in this paper, and the basic concept of document structure and its measurement based on entropy analysis is introduced. Logical structure and geometric models are proposed. Both top-down and bottom-up approaches and their entropy analyses are presented. Different techniques are discussed with practical examples. Mapping methods, such as tree transformation, document formatting knowledge and document format description language, are described
  • Keywords
    deductive databases; document handling; knowledge acquisition; visual databases; automatic knowledge acquisition; bottom-up approaches; document analysis; document format description language; document formatting knowledge; document processing; document understanding; entropy analysis; geometric models; geometric structure; information systems; knowledge acquisition bottleneck; logical structure; mapping methods; top-down approaches; tree transformation; Area measurement; Artificial intelligence; Data engineering; Entropy; Impedance; Information systems; Knowledge acquisition; Manuals; Solid modeling; Text analysis;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/69.273022
  • Filename
    273022