Abstract :
Some datasets and computing environments are inherently distributed. For example, image data may be gathered and stored at different locations. Although data parallelism is a well-known computational model, there are few programming systems that are both easy to program (for simple applications) and can work across administrative domains. We have designed and implemented a simple programming system, called Trellis-SDP, that facilitates the rapid development of data-intensive applications. Trellis-SDP is layered on top of the Trellis infrastructure, a software system for creating overlay metacomputers: user-level aggregations of computer systems. Trellis-SDP provides a master-worker programming framework where the worker components can run self-contained, new or existing binary applications. We describe two interface functions, namely trellis scan() and trellis gather(), and show how easy it is to get reasonable performance with simple data-parallel applications, such as Content Based Image Retrieval (CBIR) and Parallel Sorting by Regular Sampling (PSRS).