基于生成对抗网络合成噪声的语音增强方法研究-AET-电子技术应用

基于生成对抗网络合成噪声的语音增强方法研究

2020年电子技术应用第11期

夏鼎，徐文涛

南京航空航天大学理学院，江苏南京211106

摘要： 在语音增强领域，深度神经网络通过对大量含有不同噪声的语音以监督学习方式进行训练建模，从而提升网络的语音增强能力。然而不同类型噪声的获取成本较大，噪声类型难以全面采集，影响了模型的泛化能力。针对这个问题，提出一种基于生成对抗网络(Generative Adversarial Networks，GAN)的噪声数据样本增强方法，该方法对真实噪声数据进行学习，根据数据特征合成虚拟噪声，以此扩充训练集中噪声数据的数量和类型。通过实验验证，所采用的噪声合成方法能够有效扩展训练集中噪声来源，增强模型的泛化能力，有效提高语音信号去噪处理后的信噪比和可理解性。

关键词： 语音增强生成对抗网络数据增强

中图分类号： TN912.3
文献标识码： A
DOI：10.16157/j.issn.0258-7998.200327
中文引用格式： 夏鼎，徐文涛. 基于生成对抗网络合成噪声的语音增强方法研究[J].电子技术应用，2020，46(11)：56-59，64.
英文引用格式： Xia Ding，Xu Wentao. Research on speech enhancement method based on generating noise using GAN[J]. Application of Electronic Technique，2020，46(11)：56-59，64.

Research on speech enhancement method based on generating noise using GAN

Xia Ding，Xu Wentao

School of Science，Nanjing University of Aeronautics and Astronautics，Nanjing 211106，China

Abstract： In the field of speech enhancement, deep neural network can improve the enhancement ability of the model by training and modeling a large number of data with different noises in the supervised learning way. However, the acquisition cost of different types of noise is large and the noise types are difficult to be comprehensive, which affects the generalization ability of the model. Aiming at this problem, this paper proposes a noise data augmentation method based on generative adversarial network(GAN), which learns from the real noise data and synthesizes virtual noises according to the data features, so as to expand the number and type of the noise data in the training set. Experimental results show that the method of noise synthesis adopted in this article can effectively expand the source of noise in the training set, enhance the generalization ability of the model, and effectively improve the signal-to-noise ratio and intelligibility of speech signal after denoising.

Key words : speech enhancement；generative adversarial network；data augmentation

0 引言

在语音信号处理的过程中，背景噪声和环境干扰严重影响了信号处理的可靠性，需要通过语音增强处理方法去除信号中的噪声干扰，改善含噪语音的质量。因此，语音增强技术在语音识别、听力辅助和语音通信等领域中具有非常重要的作用。

传统的语音增强方法有谱减法^[1]、维纳滤波^[2-3]以及之后出现的基于统计模型的处理方法^[4]等，这些方法都是基于已知噪声的统计特性来进行建模，得到噪声的功率谱信息，对含噪语音信号进行降噪处理，以估计纯净语音信号。这些传统方法的准确性严重依赖数据特征工程处理方法和数据类型，对于未知的噪声干扰，其适应能力较差^[5]。随着人工智能的发展，深度神经网络被应用于语音增强领域^[6]。利用深层神经网络的特征学习，可以将含噪语音映射为纯净语音，达到去除噪声的目的。为了提高深度神经网络进行语音增强方法的泛化能力，最直接的手段是进行数据增强，包括增加数据的多样性、扩大数据集等。实验表明，在深度神经网络训练的过程中采用更多种类的噪声数据，语音信噪比质量可以显著提高^[7-8]。但是，真实的噪声数据获取难度较大，成本较高，这限制了网络去噪能力的适用性。针对这一问题，本文基于生成对抗网络GAN设计了一种训练数据集增强方法，通过生成虚拟噪声，扩充训练集中噪声数据的类型和数量，提高模型的泛化能力。

本文详细内容请下载:http://www.chinaaet.com/resource/share/2000003050

作者信息:

夏鼎，徐文涛

(南京航空航天大学理学院，江苏南京211106)

原创声明：此内容为AET网站原创，未经授权禁止转载。

相关内容