面向多说话人分离的深度学习麦克风阵列语音增强-AET-电子技术应用

面向多说话人分离的深度学习麦克风阵列语音增强

2022年电子技术应用第5期

张家扬1，2，童峰1，2，3，陈东升1，2，3，黄惠祥1，2

1.厦门大学水声通信与海洋信息技术教育部重点实验室，福建厦门361005； 2.厦门大学海洋与地球学院，福建厦门361005；3.厦门大学深圳研究院，广东深圳518000

摘要： 随着近年来人机语音交互场景不断增加，利用麦克风阵列语音增强提高语音质量成为研究热点之一。与环境噪声不同，多说话人分离场景下干扰说话人语音与目标说话人同为语音信号，呈现类似的时、频特性，对传统麦克风阵列语音增强技术提出更高的挑战。针对多说话人分离场景，基于深度学习网络构建麦阵空间响应代价函数并进行优化，通过深度学习模型训练设计麦克风阵列期望空间传输特性，从而通过改善波束指向性能提高分离效果。仿真和实验结果表明，该方法有效提高了多说话人分离性能。

关键词： 深度学习麦克风阵列波束形成 LSTM

中图分类号： TN912.3
文献标识码： A
DOI：10.16157/j.issn.0258-7998.212404
中文引用格式： 张家扬，童峰，陈东升，等. 面向多说话人分离的深度学习麦克风阵列语音增强[J].电子技术应用，2022，48(5)：31-36.
英文引用格式： Zhang Jiayang，Tong Feng，Chen Dongsheng，et al. Deep learning microphone array speech enhancement for multiple speaker separation[J]. Application of Electronic Technique，2022，48(5)：31-36.

Deep learning microphone array speech enhancement for multiple speaker separation

Zhang Jiayang1，2，Tong Feng1，2，3，Chen Dongsheng1，2，3，Huang Huixiang1，2

1.Key Laboratory of Underwater Acoustic Communication and Marine Information Technology Ministry of Education， Xiamen University，Xiamen 361005，China； 2.College of Ocean and Earth Sciences，Xiamen Univercity，Xiamen 361005，China； 3.Shenzhen Research Institute of Xiamen Univercity，Shenzhen 518000，China

Abstract： With the increase of human-computer voice interaction scenes in recent years, using microphone array speech enhancement to improve speech quality has become one of the research hotspots. Different from the ambient noise, the interfering speaker′s speech and the target speaker are the same speech signal in the multiple speaker separation scene, showing similar time-frequency characteristics, which poses a higher challenge to the traditional microphone array speech enhancement technology. For the multiple speaker separation scenario, the spatial response cost function of microphone array is constructed and optimized based on deep learning network. The desired spatial transmission characteristics of microphone array are designed through deep learning model training, so as to improve the separation effect by improving the beamforming performance. Simulation and experimental results show that this method effectively improves the performance of multiple speaker separation.

Key words : deep learning；microphone array；beamforming；LSTM

0 引言

随着人与机器之间的语言交互逐渐频繁，更需要考虑噪声、混响和其他说话人的干扰等引起语音信号质量下降的因素对语音识别造成的影响，语音增强技术^[1]可以有效地从受干扰的信号中提取纯净的语音，而麦克风阵列比起单个麦克风可以获取更多的语音信息和时空特征，因而麦克风阵列语音增强技术被广泛应用在智能家居、车载系统和音(视)频会议等领域。

麦克风阵列对信号进行空间滤波，可以增强期望方向上的信号并抑制方向性噪声，实现语音增强。传统麦阵语音增强算法；如形成固定波束的滤波累加波束形成算法(Filter-and-Sum Beamforming，FSB)^[2]，通过一定长度的滤波器系数对多通道信号进行滤波累加，实现了频率无关的空间响应特性，具有低复杂度、硬件容易实现等优点，但是对于具有方向性的噪声效果不佳。

本文详细内容请下载：http://www.chinaaet.com/resource/share/2000004272。

作者信息：

张家扬1，2，童峰1，2，3，陈东升1，2，3，黄惠祥1，2

(1.厦门大学水声通信与海洋信息技术教育部重点实验室，福建厦门361005；

2.厦门大学海洋与地球学院，福建厦门361005；3.厦门大学深圳研究院，广东深圳518000)

原创声明：此内容为AET网站原创，未经授权禁止转载。

相关内容