Downloads
Abstract
Convolutional neural networks (CNNs) play an important role in many computer vision applications such as object classification and recognition. To achieve high recognition rate, these neural networks are usually implemented on high-performance computing platforms with high processing speed and large memory. This is a big obstacle for deploying these models on devices with limited hardware resources such as embedded computers. For convolution layers, it requires a lot of multiply-accumulation operations to extract useful features from input images. Furthermore, multiplication of floating-point numbers has long latency and demands a big hardware overhead. In this paper, we analyze and identify the causes that limit the performance of CNNs. Then a method for implementing convolutional networks on hardware with limited resources is presented. Performance evaluation in terms of power, execution time as well as recognition rate is presented in detail. Experimental results on both the FPGA hardware platform and the ARM Cortex-A embedded processor indicate that CNNs using the XNOR-popcount approach can be optimized to achieve a 1000-fold increase in computational performance and approximately a 24-fold reduction in power consumption compared to the tranditional implementation of CNNs on common embedded computer systems.
Issue: Vol 5 No 1 (2022)
Page No.: 1332-1341
Published: Mar 31, 2022
Section: Research article
DOI: https://doi.org/10.32508/stdjet.v4i4.906
PDF = 479 times
Total = 479 times