مرکز منطقه ای اطلاع رساني علوم و فناوري - Managing statistical behavior of large data sets in shared-nothing architectures

DocumentCode :

1451791

Title :

Managing statistical behavior of large data sets in shared-nothing architectures

Author :

Rigoutsos, Isidore ; Delis, Alex

Author_Institution :

IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA

Volume :

Issue :

fYear :

1998

fDate :

11/1/1998 12:00:00 AM

Firstpage :

1073

Lastpage :

1087

Abstract :

Increasingly larger data sets are being stored in networked architectures. Many of the available data structures are not easily amenable to parallel realizations. Hashing schemes show promise in that respect for the simple reason that the underlying data structure can be decomposed and spread among the set of cooperating nodes with minimal communication and maintenance requirements. In all cases, storage utilization and load balancing are issues that need to be addressed. One can identify two basic approaches to tackle the problem. One way is to address it as part of the design of the data structure that is used to store and retrieve the data. The other is to maintain the data structure intact but address the problem separately. The method that we present here falls in the latter category and is applicable whenever a hash table is the preferred data structure. Intrinsically attached to the used hash table is a hashing function that allows one to partition a possibly unbounded set of data items into a finite set of groups; the hashing function provides the partitioning by assigning each data item to one of the groups. In general, the hashing function cannot guarantee that the various groups will have the same cardinality on average, for all possible data item distributions. In this paper, we propose a two-stage methodology that uses the knowledge of the hashing function to reorganize the group assignments so that the resulting groups have similar expected cardinalities. The method is generally applicable and independent of the used hashing function. We show the power of the methodology using both synthetic and real-world databases. The derived quasi-uniform storage occupancy and associated load-balancing gains are significant

Keywords :

computer architecture; cryptography; data structures; file organisation; resource allocation; data structures; hashing schemes; large data sets; load balancing; load balancing gains; networked architectures; quasi-uniform storage occupancy; real-world databases; shared-nothing architectures; statistical behavior; storage utilization; Data structures; Databases; Educational institutions; Information retrieval; Intelligent networks; Load management; Quantization; Workstations;

fLanguage :

English

Journal_Title :

Parallel and Distributed Systems, IEEE Transactions on

Publisher :

ieee

ISSN :

1045-9219

Type :

jour

DOI :

10.1109/71.735955

Filename :

735955

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1451791