UM  > INSTITUTE OF MICROELECTRONICS
Residential Collegefalse
Status已發表Published
A 50.4 GOPs/W FPGA-Based MobileNetV2 Accelerator using the Double-Layer MAC and DSP Efficiency Enhancement
Li, Jixuan1; Chen, Jiabao1; Un, Ka Fai1; Yu, Wei Han1; Mak, Pui In1; Martins, Rui P.1,2
2021-07-30
Conference Name2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)
Source PublicationProceedings - A-SSCC 2021: IEEE Asian Solid-State Circuits Conference
Conference Date07-10 November 2021
Conference PlaceBusan
CountrySOUTH KOREA
Publication PlaceIEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA
PublisherIEEE
Abstract

Convolutional neural network (CNN) models, e.g. MobileNetV2 [1] and Xception, are based on depthwise separable convolution. They exhibit over 40 \times(64 \times) reduction of the number of parameters (operations) when compared to the VGG16 for the ImageNet inference, while maintaining the TOP-1 accuracy at 72 %. With an 8-bit quantization, the required memory for storing the model can be further compressed by 4 \times. This multitude of model sizes compression facilitates real-time complex machine learning tasks implemented on a low-power FPGA apt for Internet-of-Things edge computation. Previous effect [2] has improved its computational energy efficiency by exploiting the model sparsity, but the effectiveness drops in already-compressed modern CNN models. As a result, further advancing the CNN accelerator's energy efficiency with new techniques is desirable. [3] is a scalable adder tree for energy-efficient depthwise separable convolution computation, and [4] is a frame-rate enhancement technique; both failed to handle the extensive memory access during separable convolution that dominates the power consumption of the CNN accelerators. Herein we propose a double-layer multiply-accumulate (MAC) scheme to evaluate two layers within the bottleneck layer in a pipelining manner. It results significant reduction of the memory access to the feature maps. On top of that we also innovate a double-operation digital signal processor (DSP) to enhance the throughput of the accelerator by benefiting the use of a high-precision DSP for computing two fixed-point operations in one clock cycle.

KeywordComputation Efficiency Convolutional Neural Network (Cnn) Fpga Object Recognition Reconfigurability
DOI10.1109/A-SSCC53895.2021.9634838
URLView the original
Indexed ByCPCI-S
Language英語English
WOS Research AreaEngineering
WOS SubjectEngineering, Electrical & Electronic
WOS IDWOS:000768220800102
Scopus ID2-s2.0-85124012436
Fulltext Access
Citation statistics
Document TypeConference paper
CollectionINSTITUTE OF MICROELECTRONICS
Faculty of Science and Technology
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING
Affiliation1.University of Macau, Macao
2.Instituto Superior Tecnico/University of Lisboa, Lisbon, Portugal
First Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Li, Jixuan,Chen, Jiabao,Un, Ka Fai,et al. A 50.4 GOPs/W FPGA-Based MobileNetV2 Accelerator using the Double-Layer MAC and DSP Efficiency Enhancement[C], IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE, 2021.
APA Li, Jixuan., Chen, Jiabao., Un, Ka Fai., Yu, Wei Han., Mak, Pui In., & Martins, Rui P. (2021). A 50.4 GOPs/W FPGA-Based MobileNetV2 Accelerator using the Double-Layer MAC and DSP Efficiency Enhancement. Proceedings - A-SSCC 2021: IEEE Asian Solid-State Circuits Conference.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Li, Jixuan]'s Articles
[Chen, Jiabao]'s Articles
[Un, Ka Fai]'s Articles
Baidu academic
Similar articles in Baidu academic
[Li, Jixuan]'s Articles
[Chen, Jiabao]'s Articles
[Un, Ka Fai]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Li, Jixuan]'s Articles
[Chen, Jiabao]'s Articles
[Un, Ka Fai]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.