Title :
Parsimonious Explanations of Change in Hierarchical Data
Author :
Barman, Dhiman ; Korn, Flip ; Srivastava, Divesh ; Gunopulos, Dimitrios ; Young, Neal ; Agarwal, Deepak
Author_Institution :
California Univ., Riverside, CA
Abstract :
Dimension attributes in data warehouses are typically hierarchical, and a variety of OLAP applications (such as point-of-sales analysis and decision support) call for summarizing the measure attributes in fact tables along the hierarchies of these attributes. For example, the total sales at different stores can be summarized hierarchically by geographic location (e.g., state/city/zip_code/store), by time (e.g., year/month/day/hour), or by product category (e.g., clothing/outerwear/jackets/brand). Existing OLAP tools help to summarize and navigate the data at different levels of aggregation (e.g., jackets sold in each state during December 2006) via drill-down and roll-up operators. OLAP tools are also used to characterize changes in these hierarchical summaries over time (e.g., the sales in December 2006 compared to sales in December 2005 over different locations) to detect anomalies and characterize trends. When the number of changes identified is large (e.g., the total sales at many locations differed significantly from their expectations), one seeks explanations. In this paper, we are interested in parsimonious explanations of changes in measure attributes aggregated along an associated dimension attribute hierarchy. We propose a natural model of explanation that makes effective use of the dimension hierarchy and describes changes at the leaf nodes of the hierarchy (e.g., individual stores in the location hierarchy) as a composition of "node weights" along each node\´s root-to-leaf path in the dimension hierarchy; each node weight constitutes an explanatory term. For example, sales in California stores were three times expected sales; sales in San Jose stores were higher by a factor of two (six times expected sales), whereas sales in Los Angeles stores were lower than the statewide increase by a factor of 1.5 (two times expected sales).
Keywords :
data mining; data warehouses; OLAP application; OLAP tools; data warehouse; dimension attribute hierarchy; hierarchical data; parsimonious explanation; Aggregates; Cities and towns; Clothing; Data warehouses; Marketing and sales; Navigation; Time measurement;
Conference_Titel :
Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on
Conference_Location :
Istanbul
Print_ISBN :
1-4244-0802-4
Electronic_ISBN :
1-4244-0803-2
DOI :
10.1109/ICDE.2007.368991