Author :
Deveci, Mehmet ; Kaya, Kamer ; Catalyurek, Umit V.
Author_Institution :
Dept. of Biomed. Inf., Ohio State Univ., Columbus, OH, USA
Abstract :
The data one needs to cope to solve today\´s problems is large scale, so are the graphs and hyper graphs used to model it. Today, we have Big Data, big graphs, big matrices, and in the future, they are expected to be bigger and more complex. Many of today\´s algorithms will be, and some already are, expensive to run on large datasets. In this work, we analyze a set of efficient techniques to make "big data", which is modeled as a hyper graph, smaller so that its processing takes much less time. As an application use case, we take the hyper graph partitioning problem, which has been successfully used in many practical applications for various purposes including parallelization of complex and irregular applications, sparse matrix ordering, clustering, community detection, query optimization, and improving cache locality in shared-memory systems. We conduct several experiments to show that our techniques greatly reduce the cost of the partitioning process and preserve the partitioning quality. Although we only measured their performance from the partitioning point of view, we believe the proposed techniques will be beneficial also for other applications using hyper graphs.
Keywords :
Big Data; data analysis; graph theory; matrix algebra; Big Data analysis; big graphs; big matrices; cache locality; community detection; hypergraph partitioning problem; hypergraph sparsification; query optimization; shared-memory systems; sparse matrix ordering; Arrays; Complexity theory; Computational modeling; Data models; Measurement; Pins; Hypergraph sparsification; Jaccard similarity; hypergraph partitioning; identical nets; identical vertices; multi-level approach;