基于FPGA的卷积神经网络并行加速器设计-AET-电子技术应用

基于FPGA的卷积神经网络并行加速器设计

2021年电子技术应用第2期

王婷，陈斌岳，张福海

南开大学电子信息与光学工程学院，天津300350

摘要： 近年来，卷积神经网络在许多领域中发挥着越来越重要的作用，然而功耗和速度是限制其应用的主要因素。为了克服其限制因素,设计一种基于FPGA平台的卷积神经网络并行加速器，以Ultra96-V2 为实验开发平台，而且卷积神经网络计算IP核的设计实现采用了高级设计综合工具，使用Vivado开发工具完成了基于FPGA的卷积神经网络加速器系统设计实现。通过对GPU和CPU识别率的对比实验，基于FPGA优化设计的卷积神经网络处理一张图片的时间比CPU要少得多，相比GPU功耗减少30倍以上，显示了基于FPGA加速器设计的性能和功耗优势，验证了该方法的有效性。

关键词： 并行计算卷积神经网络加速器流水线

中图分类号： TN402
文献标识码： A
DOI：10.16157/j.issn.0258-7998.200858
中文引用格式： 王婷，陈斌岳，张福海. 基于FPGA的卷积神经网络并行加速器设计[J].电子技术应用，2021，47(2)：81-84.
英文引用格式： Wang Ting，Chen Binyue，Zhang Fuhai. Parallel accelerator design for convolutional neural networks based on FPGA[J]. Application of Electronic Technique，2021，47(2)：81-84.

Parallel accelerator design for convolutional neural networks based on FPGA

Wang Ting，Chen Binyue，Zhang Fuhai

College of Electronic Information and Optical Engineering，Nankai University，Tianjin 300350，China

Abstract： In recent years, convolutional neural network plays an increasingly important role in many fields. However, power consumption and speed are the main factors limiting its application. In order to overcome its limitations, a convolutional neural network parallel accelerator based on FPGA platform is designed. Ultra96-v2 is used as the experimental development platform, and the design and implementation of convolutional neural network computing IP core adopts advanced design synthesis tools. The design and implementation of convolutional neural network accelerator system based on FPGA is completed by using vivado development tools. By comparing the recognition rate of GPU and CPU, the convolutional neural network based on FPGA optimized design takes much less time to process a picture than CPU, and reduces the power consumption of GPU by more than 30 times. It shows the performance and power consumption advantages of FPGA accelerator design, and verifies the effectiveness of this method.

Key words : parallel computing；convolutional neural network；accelerator；pipeline

0 引言

随着人工智能的快速发展，卷积神经网络越来越受到人们的关注。由于它的高适应性和出色的识别能力，它已被广泛应用于分类和识别、目标检测、目标跟踪等领域^[1]。与传统算法相比，CNN的计算复杂度要高得多，并且通用CPU不再能够满足计算需求。目前，主要解决方案是使用GPU进行CNN计算。尽管GPU在并行计算中具有自然优势，但在成本和功耗方面存在很大的缺点。卷积神经网络推理过程的实现占用空间大，计算能耗大^[2]，无法满足终端系统的CNN计算要求。FPGA具有强大的并行处理功能，灵活的可配置功能以及超低功耗，使其成为CNN实现平台的理想选择。FPGA的可重配置特性适合于变化的神经网络网络结构。因此，许多研究人员已经研究了使用FPGA实现CNN加速的方法^[3]。本文参考了Google提出的轻量级网络MobileNet结构^[4]，并通过并行处理和流水线结构在FPGA上设计了高速CNN系统，并将其与CPU和GPU的实现进行了比较。

本文详细内容请下载:http://www.chinaaet.com/resource/share/2000003393

作者信息:

王婷，陈斌岳，张福海

(南开大学电子信息与光学工程学院，天津300350)

原创声明：此内容为AET网站原创，未经授权禁止转载。

相关内容