Author :
Choi, Wonik ; Liu, Ling ; Yu, Boseon
Abstract :
Multi-criteria decision making is one of the most critical and yet most challenging components in modern enterprise business intelligence. It is well known that complex business decisions are often made based on multi-dimensional criteria. The competitiveness of optimal business decision making typically resorts to finding a good trade-off among many different and possibly contradicting criteria, e.g., maximum profit, minimum price, minimum resource consumption. A skyline query operator is by design to find the set of interesting data points (objects) over a large dimensional data collection, satisfying a set of possibly contradicting conditions. In this paper, we provide an in-depth coverage of skyline computation models, algorithms and optimization techniques for improving both efficiency and quality of multi-criteria decision making. By reviewing and revising the state of art research in multi-criteria decision making using skyline operations, we describe the essential concepts, the alternative models, and the suite of techniques for providing scalable and elastic skyline computation in massively distributed computing environments. The paper consists of four parts. First, we provide an overview of skyline query operators in terms of concepts, basic processing algorithms and representative application scenarios. Second, we review the state of art literature in skyline query processing research and development, outline the most representative classes of skyline query processing and optimization techniques and discuss the pros and cons of existing approaches. Third, we provide a comprehensive analysis on the inherent limitations of some existing skyline models and algorithms and discuss why scaling skyline query processing over large high dimensional datasets continues to pose daunting challenges. Finally, we present optimization techniques for designing parallel skyline query processing algorithms and how to utilize GPUs to support and scale parallel skyline compu- ations over high dimensional large datasets. We also introduce a novel dominance test technique, called GNL (GPU-based Nested Loop), which can drastically reduce the cost of dominance tests by leveraging GPUs, and outperform CPU-based dominance tests.
Keywords :
competitive intelligence; data analysis; decision making; graphics processing units; parallel algorithms; query processing; CPU-based dominance tests; GNL; GPU-based nested loop; complex business decisions; data analytics operations; data collection; data points; distributed computing environments; dominance test technique; enterprise business intelligence; maximum profit; minimum price; minimum resource consumption; multicriteria decision making; multidimensional criteria; optimal business decision making; optimization techniques; parallel skyline query processing algorithms; skyline computation models; skyline query operator; Algorithm design and analysis; Classification algorithms; Computational modeling; Decision making; Indexes; Query processing;