Author :
Hafen, Ryan ; Gosink, Luke ; McDermott, Jason ; Rodland, Karin ; Dam, Kerstin Kleese-van ; Cleveland, W.S.
Abstract :
Trelliscope emanates from the Trellis Display framework for visualization and the Divide and Recombine (D&R) approach to analyzing large complex data. In Trellis, the data are broken up into subsets, a visualization method is applied to each subset, and the display result is an array of panels, one per subset. This is a powerful framework for visualization of data, both small and large. In D&R, the data are broken up into subsets, and any analytic method from statistics and machine learning is applied to each subset independently. Then the outputs are recombined. This provides not only a powerful framework for analysis, but also feasible and practical computations using distributed computational facilities. It enables deep analysis of the data: study of both data summaries as well as the detailed data at their finest granularity. This is critical to full understanding of the data. It also enables the analyst to program using an interactive high-level language for data analysis such as R, which allows the analyst to focus more on the data and less on code. In this paper we introduce Trelliscope, a system that scales Trellis to large complex data. It provides a way to create displays with a very large number of panels and an interactive viewer that allows the analyst to sort, filter, and sample the panels in a meaningful way. We discuss the underlying principles, design, and scalable architecture of Trelliscope, and illustrate its use on three analysis projects in proteomics, high intensity physics, and power systems engineering.
Keywords :
bioinformatics; data analysis; data visualisation; learning (artificial intelligence); physics computing; power engineering computing; proteomics; sorting; statistical analysis; Divide and Recombine approach; Trellis Display framework; Trelliscope; data analysis; data summaries; data visualization; detailed visualization; high intensity physics; interactive high-level language; interactive viewer; large complex data deep analysis; machine learning; panel filtering; panel sampling; panel sorting; power systems engineering; proteomics; statistics; visualization method;