In this paper, a technique is proposed to decompose a two-dimensional (2-D) cyclic convolution of two

arrays, where

with

, into many identical and independent 2-D cyclic convolutions of smaller size. Using this technique and the fact that fast polynomial transform (FFT) exists when

for

, a pipelined tree machine architecture composed of modular FPT, FFT, and Chinese Remainder Theorem (CRT) computational units is then developed to efficiently compute a 2-D cyclic convolution. Finally, the extension of this tree machine architecture to efficiently compute a multidimensional cyclic convolution is discussed in this paper.