DocumentCode :
2512573
Title :
Parallel clustering for visualizing large scientific line data
Author :
Wei, Jishang ; Yu, Hongfeng ; Chen, Jacqueline H. ; Ma, Kwan-Liu
Author_Institution :
Univ. of California Davis, Davis, CA, USA
fYear :
2011
fDate :
23-24 Oct. 2011
Firstpage :
47
Lastpage :
55
Abstract :
Scientists often need to extract, visualize and analyze lines from vast amounts of data to understand dynamic structures and interactions. The effectiveness of such a visual validation and analysis process mainly relies on a good strategy to categorize and visualize the lines. However, the sheer size of line data produced by state-of-the-art scientific simulations poses great challenges to preparing the data for visualization. In this paper, we present a parallelization design of regression model-based clustering to categorize large line data derived from detailed scientific simulations by leveraging the power of heterogeneous computers. This parallel clustering method employs the Expectation Maximization algorithm to iteratively approximate the optimal data partitioning. First, we use a sorted-balance algorithm to partition and distribute the lines with various lengths among multiple compute nodes. During the following iterative clustering process, regression model parameters are recovered based on the local lines on each individual node, with only a few inter-node message exchanges involved. Meanwhile, the workload of regression model computing is well balanced across the nodes. The experimental results demonstrate that our approach can effectively categorize large line data in a scalable manner to concisely convey dynamic structures and interactions, leading to a visualization that captures salient features and suppresses visual clutter to facilitate scientific exploration of large line data.
Keywords :
approximation theory; data visualisation; expectation-maximisation algorithm; feature extraction; message passing; parallel processing; pattern classification; pattern clustering; regression analysis; sorting; dynamic interactions; dynamic structures; expectation maximization algorithm; internode message exchanges; iterative approximation; iterative clustering process; large line data categorization; large scientific line data visualization; line extraction; optimal data partitioning; parallel clustering method; parallelization design; regression model parameter recovery; regression model-based clustering; salient feature capturing; scientific simulation; sorted-balance algorithm; visual analysis process; visual validation; Clustering algorithms; Computational modeling; Data models; Data visualization; Graphics processing unit; Mathematical model; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Large Data Analysis and Visualization (LDAV), 2011 IEEE Symposium on
Conference_Location :
Providence, Rl
Print_ISBN :
978-1-4673-0156-5
Type :
conf
DOI :
10.1109/LDAV.2011.6092316
Filename :
6092316
Link To Document :
بازگشت