Residential College | false |
Status | 已發表Published |
A 50.4 GOPs/W FPGA-Based MobileNetV2 Accelerator using the Double-Layer MAC and DSP Efficiency Enhancement | |
Li, Jixuan1; Chen, Jiabao1; Un, Ka Fai1; Yu, Wei Han1; Mak, Pui In1; Martins, Rui P.1,2 | |
2021-07-30 | |
Conference Name | 2021 IEEE Asian Solid-State Circuits Conference (A-SSCC) |
Source Publication | Proceedings - A-SSCC 2021: IEEE Asian Solid-State Circuits Conference |
Conference Date | 07-10 November 2021 |
Conference Place | Busan |
Country | SOUTH KOREA |
Publication Place | IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA |
Publisher | IEEE |
Abstract | Convolutional neural network (CNN) models, e.g. MobileNetV2 [1] and Xception, are based on depthwise separable convolution. They exhibit over 40 \times(64 \times) reduction of the number of parameters (operations) when compared to the VGG16 for the ImageNet inference, while maintaining the TOP-1 accuracy at 72 %. With an 8-bit quantization, the required memory for storing the model can be further compressed by 4 \times. This multitude of model sizes compression facilitates real-time complex machine learning tasks implemented on a low-power FPGA apt for Internet-of-Things edge computation. Previous effect [2] has improved its computational energy efficiency by exploiting the model sparsity, but the effectiveness drops in already-compressed modern CNN models. As a result, further advancing the CNN accelerator's energy efficiency with new techniques is desirable. [3] is a scalable adder tree for energy-efficient depthwise separable convolution computation, and [4] is a frame-rate enhancement technique; both failed to handle the extensive memory access during separable convolution that dominates the power consumption of the CNN accelerators. Herein we propose a double-layer multiply-accumulate (MAC) scheme to evaluate two layers within the bottleneck layer in a pipelining manner. It results significant reduction of the memory access to the feature maps. On top of that we also innovate a double-operation digital signal processor (DSP) to enhance the throughput of the accelerator by benefiting the use of a high-precision DSP for computing two fixed-point operations in one clock cycle. |
Keyword | Computation Efficiency Convolutional Neural Network (Cnn) Fpga Object Recognition Reconfigurability |
DOI | 10.1109/A-SSCC53895.2021.9634838 |
URL | View the original |
Indexed By | CPCI-S |
Language | 英語English |
WOS Research Area | Engineering |
WOS Subject | Engineering, Electrical & Electronic |
WOS ID | WOS:000768220800102 |
Scopus ID | 2-s2.0-85124012436 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | INSTITUTE OF MICROELECTRONICS Faculty of Science and Technology DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING |
Affiliation | 1.University of Macau, Macao 2.Instituto Superior Tecnico/University of Lisboa, Lisbon, Portugal |
First Author Affilication | University of Macau |
Recommended Citation GB/T 7714 | Li, Jixuan,Chen, Jiabao,Un, Ka Fai,et al. A 50.4 GOPs/W FPGA-Based MobileNetV2 Accelerator using the Double-Layer MAC and DSP Efficiency Enhancement[C], IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE, 2021. |
APA | Li, Jixuan., Chen, Jiabao., Un, Ka Fai., Yu, Wei Han., Mak, Pui In., & Martins, Rui P. (2021). A 50.4 GOPs/W FPGA-Based MobileNetV2 Accelerator using the Double-Layer MAC and DSP Efficiency Enhancement. Proceedings - A-SSCC 2021: IEEE Asian Solid-State Circuits Conference. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment