A 50.4 GOPs/W FPGA-Based MobileNetV2 Accelerator using the Double-Layer MAC and DSP Efficiency Enhancement

doi:10.1109/A-SSCC53895.2021.9634838

UM > INSTITUTE OF MICROELECTRONICS

Residential College	false
Status	已發表Published
	A 50.4 GOPs/W FPGA-Based MobileNetV2 Accelerator using the Double-Layer MAC and DSP Efficiency Enhancement
	Li, Jixuan 1; Chen, Jiabao 1; Un, Ka Fai1 ; Yu, Wei Han1 ; Mak, Pui In1 ; Martins, Rui P.1,2
	2021-07-30
Conference Name	2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)
Source Publication	Proceedings - A-SSCC 2021: IEEE Asian Solid-State Circuits Conference
Conference Date	07-10 November 2021
Conference Place	Busan
Country	SOUTH KOREA
Publication Place	IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA
Publisher	IEEE
Abstract	Convolutional neural network (CNN) models, e.g. MobileNetV2 [1] and Xception, are based on depthwise separable convolution. They exhibit over 40 \times(64 \times) reduction of the number of parameters (operations) when compared to the VGG16 for the ImageNet inference, while maintaining the TOP-1 accuracy at 72 %. With an 8-bit quantization, the required memory for storing the model can be further compressed by 4 \times. This multitude of model sizes compression facilitates real-time complex machine learning tasks implemented on a low-power FPGA apt for Internet-of-Things edge computation. Previous effect [2] has improved its computational energy efficiency by exploiting the model sparsity, but the effectiveness drops in already-compressed modern CNN models. As a result, further advancing the CNN accelerator's energy efficiency with new techniques is desirable. [3] is a scalable adder tree for energy-efficient depthwise separable convolution computation, and [4] is a frame-rate enhancement technique; both failed to handle the extensive memory access during separable convolution that dominates the power consumption of the CNN accelerators. Herein we propose a double-layer multiply-accumulate (MAC) scheme to evaluate two layers within the bottleneck layer in a pipelining manner. It results significant reduction of the memory access to the feature maps. On top of that we also innovate a double-operation digital signal processor (DSP) to enhance the throughput of the accelerator by benefiting the use of a high-precision DSP for computing two fixed-point operations in one clock cycle.
Keyword	Computation Efficiency Convolutional Neural Network (Cnn) Fpga Object Recognition Reconfigurability
DOI	10.1109/A-SSCC53895.2021.9634838
URL	View the original
Indexed By	CPCI-S
Language	英語English
WOS Research Area	Engineering
WOS Subject	Engineering, Electrical & Electronic
WOS ID	WOS:000768220800102
Scopus ID	2-s2.0-85124012436
Fulltext Access	View Full-Text via DOI View Full-Text via Web of Science View Full-Text via Scopus
Citation statistics
Document Type	Conference paper
Collection	INSTITUTE OF MICROELECTRONICS Faculty of Science and Technology DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING
Affiliation	1.University of Macau, Macao 2.Instituto Superior Tecnico/University of Lisboa, Lisbon, Portugal
First Author Affilication	University of Macau
Recommended Citation GB/T 7714	Li, Jixuan,Chen, Jiabao,Un, Ka Fai,et al. A 50.4 GOPs/W FPGA-Based MobileNetV2 Accelerator using the Double-Layer MAC and DSP Efficiency Enhancement[C], IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE, 2021.
APA	Li, Jixuan., Chen, Jiabao., Un, Ka Fai., Yu, Wei Han., Mak, Pui In., & Martins, Rui P. (2021). A 50.4 GOPs/W FPGA-Based MobileNetV2 Accelerator using the Double-Layer MAC and DSP Efficiency Enhancement. Proceedings - A-SSCC 2021: IEEE Asian Solid-State Circuits Conference.

Files in This Item:
There are no files associated with this item.

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Affiliation No.
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh