• DocumentCode
    1808575
  • Title

    AHA: Asset Harvester Assistant

  • Author

    Mukherjee, Debdoot ; Mani, Senthil ; Sinha, Vibha Singhal ; Ananthanarayanan, Rema ; Srivastava, Biplav ; Dhoolia, Pankaj ; Chowdhury, Prahlad

  • Author_Institution
    IBM Res. India, New Delhi, India
  • fYear
    2010
  • fDate
    5-10 July 2010
  • Firstpage
    425
  • Lastpage
    432
  • Abstract
    Information assets in service enterprises are typically available as unstructured documents. There is an increasing need for unraveling information from these documents into a structured and semantic format. Structured data can be more effectively queried, which increases information reuse from asset repositories. This paper addresses the problem of extracting XML models, which follow a given target schema, from enterprise documents. We discuss why existing approaches for information extraction do not suffice for the enterprise documents created during service delivery. To address this limitation, we present the Asset Harvester Assistant (AHA), a tool that automatically extracts structured models from MS-Word documents, and supports manual refinement of the extracted models within an interactive environment. We present the results of empirical studies conducted using business-process documents from real service-delivery engagements. Our results indicate that the AHA approach can be effective in extracting accurate models from unstructured documents and improving user productivity.
  • Keywords
    XML; data structures; document handling; ontologies (artificial intelligence); AHA; XML models; asset harvester assistant; data structures; enterprise documents; information assets; information extraction; semantic format; service enterprises; Business; Data mining; Ontologies; Pediatrics; Semantics; Web pages; XML; documents; enterprise; harvesting; information extraction; services;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Services Computing (SCC), 2010 IEEE International Conference on
  • Conference_Location
    Miami, FL
  • Print_ISBN
    978-1-4244-8147-7
  • Electronic_ISBN
    978-0-7695-4126-6
  • Type

    conf

  • DOI
    10.1109/SCC.2010.55
  • Filename
    5557199