Title :
A global and comprehensive approach for XML data warehouse design
Author :
Ouaret, Zoubir ; Boussaid, Omar ; Chalal, Rachid
Author_Institution :
High Nat. Sch. Of Comput. Sci., ESI, Algiers, Algeria
Abstract :
The increasing amounts of interesting data stored in the XML format is the most challenging issue for BI community, thus it is desirable to successfully extract, store and integrate this large sources of information special purpose systems called “data warehouse” for further analysis and decision-making. However, compared with the well structured relational databases of a company, XML data presents a complex hierarchical structure, which renders inappropriate, existing traditional data warehouse approaches and techniques. In this paper, we propose a semi-automatic approach for XML data warehouse design starting from XML schemas as data sources. The first step consists in automatically generating the UML Class diagram from W3C XML Schema (XSD). However, the obtained diagram can be very large and hard to understand. To overcome this situation, we use a set of rules based on basic techniques for object oriented design quality to develop a simplification algorithm that efficiently generates high-quality diagrams with limited number of classes. Then, we propose a multi-dimensional (MD) element extraction algorithm to automatically identify facts, measures and their corresponding dimensions. We also present a new metric for ranking obtained MD schemas according to their relevance. The final step consists in automatically generating the star XML schema that corresponds to the XML Data warehouse schema. Finally, we have implemented our approach using JAVA and we have evaluated this tool on several XML schemas.
Keywords :
Java; Unified Modeling Language; XML; data warehouses; decision making; object-oriented programming; relational databases; BI community; JAVA; MD element extraction algorithm; MD schema; UML class diagram; W3C XML Schema; XML data warehouse design; XML data warehouse schema; XML format; XSD; data source; decision-making; hierarchical structure; information special purpose system; multidimensional element extraction algorithm; object oriented design quality; relational database; star XML schema; Buildings; Complexity theory; Data models; Data warehouses; Unified modeling language; Warehousing; XML; XML data warehouse; multiple XML data sources; star-join schema;
Conference_Titel :
Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Conference on
DOI :
10.1109/AICCSA.2014.7073251