Author_Institution :
Dept. of Inf. Eng., Chinese Univ. of Hong Kong, Shatin, China
Abstract :
Parallel video servers have been proposed for building large-scale video-on-demand (VoD) systems from multiple low-cost servers. However, when adding more servers to scale up the capacity, system-level reliability will decrease as failure of any one of the servers will cripple the entire system. To tackle this reliability problem, this paper proposes and analyzes architectures to support server-level fault tolerance in parallel video servers. Based on the concurrent push architecture proposed earlier, this paper tackles three problems pertaining to fault tolerance, namely redundancy management, redundant data transmission protocol, and real-time fault masking. First, redundant data based on erasure codes are introduced to video data stored in the servers, which are then delivered to the clients to support fault tolerance. Despite the success of distributed redundancy striping schemes such as RAID-5 in disk array implementations, we discover that similar schemes extended to the server context do not scale well. Instead, we propose a redundant server scheme that is both scalable, and with lower total server buffer requirement. Second, two protocols are proposed to manage the transmission of redundant data to the clients, namely forward erasure correction which always transmits redundant data, and on-demand correction which transmits redundant data only after a server failure is detected. Third, to enable ongoing video sessions to maintain nonstop video playback during failure, we propose using fault masking at the client to recompute lost video data in real-time. In particular we derive the amount of client buffer required so that nonstop, continuous video playback can be maintained despite server failures
Keywords :
fault tolerant computing; forward error correction; multiprocessing systems; parallel architectures; redundancy; transport protocols; video on demand; video servers; FEC protocol; ODC protocol; RAID-5; VoD systems; concurrent push architecture; concurrent-push-based parallel video servers; disk array; distributed redundancy striping schemes; erasure codes; fault masking; forward erasure correction; large-scale video-on-demand systems; low-cost servers; nonstop video playback; on-demand correction; real-time fault masking; redundancy management; redundant data transmission management; redundant data transmission protocol; redundant server; server buffer requirement; server-level fault tolerance; system-level reliability; video sessions; Buildings; Costs; Councils; Data communication; Fault tolerance; Large-scale systems; Network servers; Protocols; Redundancy; Video on demand;