Title : 
Large-scale data collection: a coordinated approach
         
        
            Author : 
Cheng, William C. ; Chou, Cheng-Fu ; Golubchik, Leana ; Khuller, Samir ; Wan, Yung-Chun
         
        
            Author_Institution : 
Dept. of Comput. Sci., Univ. of Southern California, Los Angeles, CA, USA
         
        
        
        
        
            Abstract : 
In this paper we consider the problem of collecting a large amount of data from several different hosts to a single destination in a wide-area network. Often, due to congestion conditions, the paths chosen by the network may have poor throughput. By choosing an alternate route at the application level, we may be able to obtain substantially faster completion time. This data collection problem is a nontrivial one because the issue is not only to avoid congested link(s), but to devise a coordinated transfer schedule which would afford maximum possible utilization of available network resources. In this paper we present an approach for computing coordinated data collection schedules, which can result in significant performance improvements. We make no assumptions about knowledge of the topology of the network or the capacity available on individual links of the network, i.e., we only use end-to-end information. Finally, we also study the shortcomings of this approach in terms of the gap between the theoretical formulation and the resulting data transfers in wide-area networks. In general, our approach can be used for solving arbitrary data movement problems over the Internet. We use the Bistro platform to illustrate one application of our techniques.
         
        
            Keywords : 
Internet; graph theory; network topology; telecommunication network routing; Bistro platform; Internet; application level; arbitrary data movement problems; completion time; congestion conditions; coordinated data collection schedules; coordinated transfer schedule; data transfers; end-to-end information; graph theory; hosts; large-scale data collection; network links capacity; network paths; network resources; network topology; simulations; system design; theoretical formulation; wide-area network; Computer science; Data engineering; Educational institutions; Internet; Intersymbol interference; Job shop scheduling; Large-scale systems; Network topology; Processor scheduling; Throughput;
         
        
        
        
            Conference_Titel : 
INFOCOM 2003. Twenty-Second Annual Joint Conference of the IEEE Computer and Communications. IEEE Societies
         
        
        
            Print_ISBN : 
0-7803-7752-4
         
        
        
            DOI : 
10.1109/INFCOM.2003.1208674