Author_Institution :
Computer Science Division, EECS, University of California, Berkeley, California 94720
Abstract :
This paper compares the three simplest SRT division methods by using them to design a divider that produces four quotient bits per cycle (radix 16). The three methods are distinguished by the number of bits found per stage of quotient selection logic: (a) one bit per stage (radix 2) with quotient digits chosen from the set {−1, 0, 1} (b) two bits per stage (radix 4) with quotient digits {−2, −1, 0, 1, 2}, or (c) two bits per stage (radix 4) with quotient digits {−3, −2, −1, 0, 1, 2, 3}. For each method, we compare several ways to overlap multiple stages of quotient selection logic and we consider both irredundant and redundant (carry-save) representations for the remainder. The cost and performance of each alternative is evaluated in terms a specific ECL gate array technology. We find that we can build a 15% faster divider with radix four stages than with radix two stages, for about the same amount of hardware. Between the two radix 4 alternatives, method (c) offers 5% more speed than method (b) at the cost of 20% more hardware. A radix 16 divider using method (b) has been built for the S-1 Mark IIB computer under development at Lawrence Livermore Laboratory. This divider consists of eight ECL gate arrays and has a 12.5 nanosecond cycle time. It performs IEEE single and double precision floating point division in 150 and 225 nanoseconds, respectively, the shortest times reported for any general purpose computer.