Author_Institution : 
LADyR, Univ. Rey Juan Carlos, Móstoles, Spain
         
        
            Abstract : 
Considering an asynchronous system made up of n processes and where up to t of them can crash, finding the weakest assumption that such a system has to satisfy for a common leader to be eventually elected is one of the holy grail quests of fault-tolerant asynchronous computing. This paper is a step in that direction. It has two contributions. Considering a simple and general asynchronous system model where processes generate asynchronous pulses during which they send and receive messages, it first introduces an additional assumption that allows to elect an eventual leader in all the runs that satisfy that assumption. That assumption is captured by the notion of asynchronous intermittent rotating t-star. An x-star is made up of one process p (the center of the star) plus a sequence of sets of x processes (the successive points of the star), which satisfies some properties. Intuitively, the intermittent rotating t-star assumption means that there are a process p, a subset of pulse numbers pn, and associated sets of processes Q(pn) such that each process of Q(pn) receives from p a message sent in pulse pn in a timely manner or among the first (n-t) messages tagged pn it ever receives. The t-star is called rotating because the set Q(pn) is allowed to change with pn; it is intermittent because it can disappear during finite periods; it is asynchronous because the points of a star are not required to be simultaneously at the same pulse. (This assumption combines and generalizes several synchrony and time-free assumptions that have been previously proposed to elect an eventual leader, e.g., eventual t-source, eventual t-moving source, and message pattern assumption.) The second contribution of the paper is an algorithm that eventually elects a common leader in the systems that satisfy the asynchronous intermittent rotating t-star assumption. This algorithm enjoys, among others, two noteworthy properties. First, from a design point of view, it is simple. Second, from a cost- - point of view, only the pulse numbers increase without bound. This means that, even in infinite executions, be links timely or not (or have the corresponding sender crashed or not), all the other local variables (including the timers) and message fields have a finite domain.
         
        
            Keywords : 
distributed algorithms; fault tolerant computing; middleware; system recovery; asynchronous intermittent rotating t-star; asynchronous pulses; asynchronous system; distributed algorithm; fault tolerant asynchronous computing; message tagging; Assumption coverage; asynchronous system; distributed algorithm; eventual leader; eventual t-source; failure detector; fault tolerance; message pattern; moving source; omega; partial synchrony; process crash; system model; timely link.;