Title :
Efficient Visualization of Large-Scale Data Tables through Reordering and Entropy Minimization
Author :
Djuric, Nemanja ; Vucetic, Slobodan
Author_Institution :
Dept. of Comput. & Inf. Sci., Temple Univ., Philadelphia, PA, USA
Abstract :
Visualization of data tables with n examples and m columns using heat maps provides a holistic view of the original data. As there are n! ways to order rows and m! ways to order columns, and data tables are typically ordered without regard to visual inspection, heat maps of the original data tables often appear as noisy images. However, if rows and columns of a data table are ordered such that similar rows and similar columns are grouped together, a heat map may provide a deep insight into the underlying data distribution. We propose an information-theoretic approach to produce a well-ordered data table. In particular, we search for ordering that minimizes entropy of residuals of predictive coding applied on the ordered data table. This formalization leads to a novel ordering procedure, EM-ordering, that can be applied separately on rows and columns. For ordering of rows, EM-ordering repeats until convergence the steps of (1) rescaling columns and (2) solving a Traveling Salesman Problem (TSP) where rows are treated as cities. To allow fast ordering of large data tables, we propose an efficient TSP heuristic with modest O(n log(n)) time complexity. When compared to the existing state-of-the-art reordering approaches, we show that the method often provides heat maps of higher visual quality, while being significantly more scalable. Moreover, analysis of real-world traffic and financial data sets using the proposed method, which allowed us to readily gain deeper insights about the data, further confirmed that EM-ordering can be a valuable tool for visual exploration of large-scale data sets.
Keywords :
computational complexity; data visualisation; entropy; financial data processing; traffic engineering computing; travelling salesman problems; EM-ordering; TSP heuristic; data distribution; entropy minimization; financial data set analysis; heat map; information-theoretic approach; large-scale data set visual exploration; large-scale data table visualization; ordered data table; predictive coding residual; real-world traffic analysis; reordering; rescaling columns; time complexity; traveling salesman problem; Cities and towns; Clustering algorithms; Data visualization; Entropy; Heating; Minimization; Principal component analysis; data reordering; data seriation; data visualization; heatmap; large-scale data; traveling salesman problem;
Conference_Titel :
Data Mining (ICDM), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
DOI :
10.1109/ICDM.2013.63