• DocumentCode
    1961692
  • Title

    A tool for classifying office documents

  • Author

    Hao, Xiaolong ; Wang, Jason T L ; Bieber, Michael P. ; Ng, Peter A.

  • Author_Institution
    Dept. of Comput. & Inf. Sci., New Jersey Inst. of Technol., Newark, NJ, USA
  • fYear
    1993
  • fDate
    8-11 Nov 1993
  • Firstpage
    427
  • Lastpage
    434
  • Abstract
    The authors present the design of a tool for classifying office documents. They represent a document´s layout structure using an ordered labeled tree, called the layout structure tree (L-S-tree), based on a nested segmentation procedure. The tool uses a sample-based approach for learning, where concepts are learned by retaining samples and new documents are classified by matching their L-S-trees with samples. The matching process involves both computing the edit distance between two trees using a previously developed pattern matching toolkit, and calculating the degree of conceptual closeness between the documents and samples. The experimental results show that the tool is capable of classifying various types of office documents, even with very few samples in the sample base
  • Keywords
    deductive databases; document handling; learning (artificial intelligence); office automation; pattern classification; pattern matching; tree data structures; L-S-tree; conceptual closeness; edit distance; layout structure tree; learning; nested segmentation procedure; office document classification; ordered labeled tree; pattern matching toolkit; sample-based approach; Classification tree analysis; Facsimile; Image converters; Information science; Pattern matching; Surges; Testing; Text recognition; Tree data structures;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence, 1993. TAI '93. Proceedings., Fifth International Conference on
  • Conference_Location
    Boston, MA
  • ISSN
    1063-6730
  • Print_ISBN
    0-8186-4200-9
  • Type

    conf

  • DOI
    10.1109/TAI.1993.633991
  • Filename
    633991