Residential College | false |
Status | 已發表Published |
A 97.8 GOPS/W FPGA-Based Residual-Block-Aware CNN Accelerator Featuring Multi-Clock PW2 Pipeline and Adaptive-Resolution Quantization | |
Li, Jixuan1; Li, Ke1; Un, Ka Fai1![]() ![]() ![]() ![]() ![]() | |
2024-11 | |
Source Publication | IEEE Transactions on Circuits and Systems I: Regular Papers
![]() |
ISSN | 1549-8328 |
Abstract | Enhancing the energy efficiency for the residual block is crucial for an energy-efficient deep neural network accelerator. This paper presents a multi-clock pointwise-pointwise (MCPW) technique to process the adjacent PW convolution layers across residual blocks, reducing up to 75.0% DRAM access for the intermediate feature maps while securing >88.1% processing element (PE) utilization. Moreover, we introduce a dual-precision packing (DPP) DSP array to compute multiple 4/8-bit multiplications in a shared DSP, improving the accuracy by 1.5% (ImageNet) using low-precision residual distillation (RD) with adaptive-resolution quantization. The DPP DSP and adaptive-resolution RD boost the DSP efficiency up to 4.0×, reduce DRAM access by 50.0%, and improve the throughput by >2.7×. We also propose a dynamic accumulator/multiplier (A/M) DSP reconfiguration scheme to dynamically adjust the level of parallelism along the input/output channel dimensions. It also increases the PE utilization by 1.8× for the depthwise (DW) convolution layers with 33% less hardware resource overhead. Implemented on Xilinx VC709, the proposed accelerator achieves PE utilization of >93.0%, a DSP efficiency gain of >2.9×, and a throughput improvement on benchmarked networks of 4.9× while exhibiting an energy efficiency of 97.8 GOPs/W and a normalized throughput of 1.18 GOPS/DSP. |
Keyword | Convolutional Neural Network (Cnn) Digital Signal Processing (Dsp) Field-programmable Gate Array (Fpga) Processing Unit (Pe) Utilization Residual Block |
DOI | 10.1109/TCSI.2024.3505299 |
URL | View the original |
Indexed By | SCIE |
Language | 英語English |
WOS Research Area | Engineering |
WOS Subject | Engineering, Electrical & Electronic |
WOS ID | WOS:001367632400001 |
Publisher | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 445 HOES LANE, PISCATAWAY, NJ 08855-4141 |
Scopus ID | 2-s2.0-85210960901 |
Fulltext Access | |
Citation statistics | |
Document Type | Journal article |
Collection | INSTITUTE OF MICROELECTRONICS DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING |
Corresponding Author | Un, Ka Fai |
Affiliation | 1.University of Macau, State-Key Laboratory of Analog and Mixed-Signal VLSI, Institute of Microelectronics and the Faculty of Science and Technology, ECE Department, Macau, Macao 2.Universidade de Lisboa, Instituto Superior Técnico, Lisbon, 1049-001, Portugal |
First Author Affilication | Faculty of Science and Technology |
Corresponding Author Affilication | Faculty of Science and Technology |
Recommended Citation GB/T 7714 | Li, Jixuan,Li, Ke,Un, Ka Fai,et al. A 97.8 GOPS/W FPGA-Based Residual-Block-Aware CNN Accelerator Featuring Multi-Clock PW2 Pipeline and Adaptive-Resolution Quantization[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2024. |
APA | Li, Jixuan., Li, Ke., Un, Ka Fai., Yu, Wei Han., Martins, Rui P.., & Mak, Pui In (2024). A 97.8 GOPS/W FPGA-Based Residual-Block-Aware CNN Accelerator Featuring Multi-Clock PW2 Pipeline and Adaptive-Resolution Quantization. IEEE Transactions on Circuits and Systems I: Regular Papers. |
MLA | Li, Jixuan,et al."A 97.8 GOPS/W FPGA-Based Residual-Block-Aware CNN Accelerator Featuring Multi-Clock PW2 Pipeline and Adaptive-Resolution Quantization".IEEE Transactions on Circuits and Systems I: Regular Papers (2024). |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment