基于GPU的稀疏深度神经网络性能优化-AET-电子技术应用

基于GPU的稀疏深度神经网络性能优化

电子技术应用

石于诚，黄建强，边浩东，吴利，贾金芳，王晓英

青海大学计算机技术与应用系，青海西宁 810016

摘要： 摘要：随着神经网络层数不断加深，稀疏深度神经网络在计算与存储空间上更具优势，但稀疏深度神经网络的性能仍然有待优化。为此提出基于GPU的稀疏深度神经网络性能优化方法，对于计算顺序进行调整，增强数据重用性，并结合GPU的独特结构与CUDA编程方法，通过预取等方法进一步提升性能。基于GraphChallenge官方提供的数据集，相较于cuSPARSE相关库函数，最高获得了2.5倍的性能加速。

关键词： 深度神经网络稀疏化异构平台稀疏矩阵矩阵乘

中文引用格式： 石于诚，黄建强，边浩东，等. 基于GPU的稀疏深度神经网络性能优化[J]. 电子技术应用，2023，49(12)：14-19.
英文引用格式： Shi Yucheng，Huang Jianqiang，Bian Haodong，et al. Performance optimization of sparse deep neural network based on GPU[J]. Application of Electronic Technique，2023，49(12)：14-19.

Performance optimization of sparse deep neural network based on GPU

Shi Yucheng，Huang Jianqiang，Bian Haodong，Wu Li，Jia Jinfang，Wang Xiaoying

Department of Computer Technology and Application，Qinghai University，Xining 810016，China

Abstract： With the deepening of neural network layers, the sparse deep neural network has more advantages in computing and storage space, but the performance of the sparse deep neural network still needs to be optimized. Therefore, a performance optimization method based on GPU sparse deep neural network is proposed, which adjusts the order of computation, enhances the reusability of data, and combines the unique structure of GPU with CUDA programming method, performance is further improved by prefetching and other methods. According to GraphChallenge's official data set, it achieved up to 2.5 times the performance acceleration compared to the related cuSPARSE library functions.

Key words : deep neural network；sparsification；heterogeneous platform；sparse matrix-matrix multiplication

0　引言

随着神经网络原理性研究的不断深入以及算力逐步增强，越来越多的深度神经网络涌现。例如在自然语言处理[1]领域，谷歌提出Transformer[2]模型，其本身对于梯度消失这一难题的解决以及可以进行并行训练等一系列的优势，使得大模型愈发火热，ChatGPT[3]也是在此基础上训练得到的。但规模庞大的深度神经网络对于模型应用的时效性提出了更大的挑战，由于“存储墙”[4]和“功耗墙”[5]的存在，稀疏深度神经网络[6-7]进入研究视野，GPU设备和稀疏深度神经网络的结合使得训练速度再迈上一个崭新的台阶。

本文详细内容请下载：https://www.chinaaet.com/resource/share/2000005799

作者信息

石于诚，黄建强，边浩东，吴利，贾金芳，王晓英

（青海大学计算机技术与应用系，青海西宁 810016）

原创声明：此内容为AET网站原创，未经授权禁止转载。

相关内容