• DocumentCode
    2964076
  • Title

    VLIW DSP vs. superscalar implementation of a baseline 11.263 video encoder

  • Author

    Banerjee, Serene ; Sheikh, Humid R. ; John, Lizy K. ; Evans, Brian L. ; Bovik, Alan C.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Texas Univ., Austin, TX, USA
  • Volume
    2
  • fYear
    2000
  • fDate
    Oct. 29 2000-Nov. 1 2000
  • Firstpage
    1665
  • Abstract
    A Very Long Instruction Word (VLIW) processor and a superscalar processor can execute multiple instructions simultaneously. A VLIW processor depends on the compiler and programmer to find the parallelism in the instructions, whereas a superscaler processor determines the parallelism at runtime. This paper compares TI TMS320C6700 VLIW digital signal processor (DSP) and SimpleScalar superscalar implementations of a baseline 11.263 video encoder in C. With level two C compiler optimization, a one-way issue superscalar processor is 7.5 times faster than the VLIW DSP for the same processor clock speed. The superscalar speedup from one-way to four-way issue is 2.88:1, and from four-way to 256-way issue is 2.43:1. To reduce the execution time on the C6700, we write assembly routines for sum-of-absolute-difference, interpolation, and reconstruction, and place frequently used code and data into on-chip memory. We use TI´s discrete cosine transform assembly routines. The hand optimized VLIW DSP implementation is 61/spl times/ faster than the C version compiled with level two optimization. Most of the improvement was due to the efficient placement of data and programs in memory. The hand optimized VLIW implementation is 14% faster than a 256-way superscalar implementation without hand optimizations.
  • Keywords
    digital signal processing chips; interpolation; multiprocessing systems; program compilers; video coding; C compiler optimization; SimpleScalar superscalar; TI TMS320C6700 VLIW digital signal processor; VLIW DSP; assembly routines; baseline 11.263 video encoder; compiler; discrete cosine transform assembly routines; interpolation; sum-of-absolute-difference; superscalar implementation; superscalar processor; Assembly; Clocks; Digital signal processing; Digital signal processors; Interpolation; Optimizing compilers; Program processors; Programming profession; Runtime; VLIW;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signals, Systems and Computers, 2000. Conference Record of the Thirty-Fourth Asilomar Conference on
  • Conference_Location
    Pacific Grove, CA, USA
  • ISSN
    1058-6393
  • Print_ISBN
    0-7803-6514-3
  • Type

    conf

  • DOI
    10.1109/ACSSC.2000.911272
  • Filename
    911272