Title :
Architecture and algorithm for high performance fault-tolerant replication of sensitive military and homeland security C/sup 3/I database messages
Author :
Guturu, Parthasarathy
Author_Institution :
Electr. Eng. Dept., North Texas Univ., Denton, TX
Abstract :
Replicated databases are used by military and homeland security command, control, communications and intelligence (C3I) systems for fault-tolerance and fast retrieval of crucial decision-aiding information irrespective of the position the military units. In this paper, we propose a replicated database nodal architecture in which nodes are organized into multiple clusters connected by long-haul links. This kind of multi-site architecture permits access to crucial information even when a site is completely destroyed. In this architecture, nodes within a cluster communicate through either LANs or multi-hop wireless routing. Database updates can originate at any node in any cluster, hut must be replicated to all the nodes in the system. In the proposed replication algorithm, optimal use of the bandwidth of long haul links is achieved by an arrangement in which each node in a cluster replicates its updates to a single designated node in each one of the other clusters and those designated nodes take responsibility to replicate messages to the other nodes in their respective clusters. Fault-tolerance is addressed by assigning surrogates for each node in a cluster so that the surrogates take over replication of their primary nodes as soon as they sense inactivity in them. For high performance, this algorithm avoids usage of a reliable transport mechanism like the TCP and synchronous replication messaging required by the 2-phase or 3-phase commit type of algorithms, but still achieves sequence-preserving lossless message communication by asynchronous message flows and application level control of the messages received from an unreliable UDP channel. Composite queues with in-memory and persistent segments are used for storage of the replication messages so that they can be supplied to the receiving nodes, upon their recovery, with low latency after small down-times and without any message loss after reasonably long down times. The algorithm has also features such a- - s throttling for avoidance of message loss in low capacity long-haul links and batch acknowledgements to reduce control traffic
Keywords :
command and control systems; fault tolerance; information retrieval; local area networks; military communication; replicated databases; telecommunication network routing; telecommunication security; transport protocols; C3I system; LAN; TCP; application level control; command-communication control; decision-aiding information retreival; fault-tolerance algorithm; intelligence system; local area network; lossless message communication; military-homeland security; multihop wireless routing; replicated database nodal architecture; transmission control protocol; Algorithm design and analysis; Clustering algorithms; Communication system control; Control systems; Deductive databases; Fault tolerance; Fault tolerant systems; Intelligent control; Military communication; Terrorism;
Conference_Titel :
Military Communications Conference, 2005. MILCOM 2005. IEEE
Conference_Location :
Atlantic City, NJ
Print_ISBN :
0-7803-9393-7
DOI :
10.1109/MILCOM.2005.1605994