VLIW FPGA Project

VLIW FPGA Project

PIs: Raymond R. Hoare, Alex K. Jones

Many applications, particularly those in the multimedia, networking, and security domains follow the rule of thumb that 90% or more of the execution time is produced by 10% or less of the code, often highly parallel loop structures called computational kernels. Many researchers have investigated ways to improve the performance of these computations by reducing the execution time of the computational kernels. Unfortunately, even if the kernels are reduced to negligible computation time, the overall speed-up is limited to approximately 10X. If the remaining 90% of the code can be speed up from 2-5X its original performance, the overall application can be improved by 20X to 50X.

The VLIW FPGA project brings together the concept of custom hardware synthesis using reconfigurable fabrics, VLIW (Very Long Instruction Word) processing, and customizable instructions into a single device using FPGAs or Structured ASICs. By intelligently mapping portions of the algorithm to each individual processing technique using a smart compiler, a speed-up of 20X to 50X can be achieved.

This project was supported by the Swanson Center for Micro and Nano Systems.

Related Publications

•A. K. Jones, R. Hoare, D. Kusic, J. Fazekas, G. Mehta, and J. Foster, A VLIW Processor with Hardware Functions: Increasing Performance While Reducing Power, IEEE Transactions on Circuits and Systems II, Vol. 53, No. 11, November 2006, pp. 1250-1254.

•R. Hoare, A. K. Jones, D. Kusic, J. Fazekas, J. Foster, S. Tung, M. McCloud, Rapid VLIW Processor Customization For Signal Processing Applications Using Combinational Hardware Functions, EURASIP Journal on Applied Signal Processing (JASP), Vol. 2006, Article ID 46473, 2006, pp. 1-23.
•A. K. Jones, R. R. Hoare, J. St. Onge, J. Lucas, S. Shao, and R. Melhem, “Linking Compilation and Visualization for Massively Parallel Programs,” in Proc. of the IPDPS/APDCM Workshop, 2007.
•D. Kusic, R. Hoare, A. K. Jones, J. Fazekas, J. Foster, Extracting Speedup from C-code with Poor Instruction-level Parallelism, in Proc. of the IPDPS Workshop of Massively Parallel Processing (WMPP), 2005, pp. 264-9 - 264-18.
•R. Hoare, A. K. Jones, D. Kusic, J. Fazekas, G. Mehta, and J. Foster, A VLIW Processor with Hardware Functions: Increasing Performance While Reducing Power, in Proc. of HPEC, September 2005, pp. 5-6.

•A. K. Jones, R. Hoare, I. S. Kourtev, J. Fazekas, D. Kusic, J. Foster, S. Boddie, and A. Muaydh, "A 64-way VLIW/SIMD FPGA Architecture and Design Flow," in Proceedings of the IEEE International Conference on Electronics, Circuits, and Systems (ICECS), Tel Aviv, Isreal, December 2004. [ pdf ]

•R. Hoare, S. Tung, and K. Werger, "A 64-Way SIMD Processing Architecture on an FPGA," in Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS), 2003, pp. 345-350.