DocumentCode
1808575
Title
AHA: Asset Harvester Assistant
Author
Mukherjee, Debdoot ; Mani, Senthil ; Sinha, Vibha Singhal ; Ananthanarayanan, Rema ; Srivastava, Biplav ; Dhoolia, Pankaj ; Chowdhury, Prahlad
Author_Institution
IBM Res. India, New Delhi, India
fYear
2010
fDate
5-10 July 2010
Firstpage
425
Lastpage
432
Abstract
Information assets in service enterprises are typically available as unstructured documents. There is an increasing need for unraveling information from these documents into a structured and semantic format. Structured data can be more effectively queried, which increases information reuse from asset repositories. This paper addresses the problem of extracting XML models, which follow a given target schema, from enterprise documents. We discuss why existing approaches for information extraction do not suffice for the enterprise documents created during service delivery. To address this limitation, we present the Asset Harvester Assistant (AHA), a tool that automatically extracts structured models from MS-Word documents, and supports manual refinement of the extracted models within an interactive environment. We present the results of empirical studies conducted using business-process documents from real service-delivery engagements. Our results indicate that the AHA approach can be effective in extracting accurate models from unstructured documents and improving user productivity.
Keywords
XML; data structures; document handling; ontologies (artificial intelligence); AHA; XML models; asset harvester assistant; data structures; enterprise documents; information assets; information extraction; semantic format; service enterprises; Business; Data mining; Ontologies; Pediatrics; Semantics; Web pages; XML; documents; enterprise; harvesting; information extraction; services;
fLanguage
English
Publisher
ieee
Conference_Titel
Services Computing (SCC), 2010 IEEE International Conference on
Conference_Location
Miami, FL
Print_ISBN
978-1-4244-8147-7
Electronic_ISBN
978-0-7695-4126-6
Type
conf
DOI
10.1109/SCC.2010.55
Filename
5557199
Link To Document