• DocumentCode
    2257515
  • Title

    Integration of Data Warehouse and Unstructured Business Documents

  • Author

    Alqarni, Ahmad Abdullah ; Pardede, Eric

  • Author_Institution
    Dept. of Comput. Sci. & Comput. Eng., La Trobe Univ., Melbourne, VIC, Australia
  • fYear
    2012
  • fDate
    26-28 Sept. 2012
  • Firstpage
    32
  • Lastpage
    37
  • Abstract
    The profusion of unstructured data forced organizations to manage and take advantage of such data especially in the decision making process. The feasibility of integrating or mapping unstructured data to a data warehouse is becoming significant to bridge this gap and take the full potential of these data. In this paper, we propose a multi-layer schema for mapping structured data stored in a data warehouse and unstructured data in business-related documents. The multi-layer schema facilitates the mapping between the two different data. Linguistically correlated data is identified using Word Net to enable the integration between both data sources. We also propose a generic XML schema for business-related unstructured documents to assist the mapping. The use Word Net to identify the matching result is promising in the absence of schema-instance and without the need to domain specific knowledge.
  • Keywords
    XML; data integration; data warehouses; decision making; WordNet; business-related unstructured documents; data sources; data warehouse integration; decision making process; generic XML schema; linguistic correlated data; multilayer schema; unstructured data forced organizations; unstructured data mapping; Data mining; Data models; Data warehouses; Organizations; Semantics; XML; XML schema matching; data integeration; data warehouse; schema mapping; unstructured document;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Network-Based Information Systems (NBiS), 2012 15th International Conference on
  • Conference_Location
    Melbourne, VIC
  • Print_ISBN
    978-1-4673-2331-4
  • Type

    conf

  • DOI
    10.1109/NBiS.2012.59
  • Filename
    6354804