DocumentCode :
1451791
Title :
Managing statistical behavior of large data sets in shared-nothing architectures
Author :
Rigoutsos, Isidore ; Delis, Alex
Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Volume :
9
Issue :
11
fYear :
1998
fDate :
11/1/1998 12:00:00 AM
Firstpage :
1073
Lastpage :
1087
Abstract :
Increasingly larger data sets are being stored in networked architectures. Many of the available data structures are not easily amenable to parallel realizations. Hashing schemes show promise in that respect for the simple reason that the underlying data structure can be decomposed and spread among the set of cooperating nodes with minimal communication and maintenance requirements. In all cases, storage utilization and load balancing are issues that need to be addressed. One can identify two basic approaches to tackle the problem. One way is to address it as part of the design of the data structure that is used to store and retrieve the data. The other is to maintain the data structure intact but address the problem separately. The method that we present here falls in the latter category and is applicable whenever a hash table is the preferred data structure. Intrinsically attached to the used hash table is a hashing function that allows one to partition a possibly unbounded set of data items into a finite set of groups; the hashing function provides the partitioning by assigning each data item to one of the groups. In general, the hashing function cannot guarantee that the various groups will have the same cardinality on average, for all possible data item distributions. In this paper, we propose a two-stage methodology that uses the knowledge of the hashing function to reorganize the group assignments so that the resulting groups have similar expected cardinalities. The method is generally applicable and independent of the used hashing function. We show the power of the methodology using both synthetic and real-world databases. The derived quasi-uniform storage occupancy and associated load-balancing gains are significant
Keywords :
computer architecture; cryptography; data structures; file organisation; resource allocation; data structures; hashing schemes; large data sets; load balancing; load balancing gains; networked architectures; quasi-uniform storage occupancy; real-world databases; shared-nothing architectures; statistical behavior; storage utilization; Data structures; Databases; Educational institutions; Information retrieval; Intelligent networks; Load management; Quantization; Workstations;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/71.735955
Filename :
735955
Link To Document :
بازگشت