用于巡航导弹突防航迹规划的改进深度强化学习算法-AET-电子技术应用

用于巡航导弹突防航迹规划的改进深度强化学习算法

2021年电子技术应用第8期

马子杰，高杰，武沛羽，谢拥军

北京航空航天大学电子信息工程学院，北京100191

摘要： 为了解决巡航导弹面临动态预警机雷达威胁下的突防航迹规划问题，提出一种改进深度强化学习智能航迹规划方法。针对巡航导弹面对预警威胁的突防任务，构建了典型的作战场景，给出了预警机雷达探测概率的预测公式，在此基础上设计了一种引入动态预警威胁的奖励函数，使用深度确定性策略梯度网络算法(Deep Deterministic Policy Gradient，DDPG)探究巡航导弹智能突防问题。针对传统DDPG算法中探索噪声时序不相关探索能力差的问题，引入了奥恩斯坦-乌伦贝克噪声，提高了算法的训练效率。计算结果表明，改进的DDPG算法训练收敛时间更短。

关键词： 巡航导弹 DDPG算法突防策略深度强化学习

中图分类号： TN959.1；TP181
文献标识码： A
DOI：10.16157/j.issn.0258-7998.211934
中文引用格式： 马子杰，高杰，武沛羽，等. 用于巡航导弹突防航迹规划的改进深度强化学习算法[J].电子技术应用，2021，47(8)：11-14，19.
英文引用格式： Ma Zijie，Gao Jie，Wu Peiyu，et al. An improved deep reinforcement learning algorithm for cruise missile penetration path planning[J]. Application of Electronic Technique，2021，47(8)：11-14，19.

An improved deep reinforcement learning algorithm for cruise missile penetration path planning

Ma Zijie，Gao Jie，Wu Peiyu，Xie Yongjun

School of Electronics and Information Engineering，Beihang University，Beijing 100191，China

Abstract： Aiming at the problem of cruise missile penetration trajectory planning under the threat of dynamic early of warning aircraft radar, an improved deep reinforcement learning intelligent trajectory planning method is proposed. Firstly, aiming at the penetration mission of cruise missiles facing early warning threats, a typical combat scenario is constructed, and a prediction formula of radar detection probability of early warning aircraft is given. On this basis, a reward function that introduces dynamic early warning threats is designed, and the deep deterministic policy gradient algorithm(DDPG) is used to explore the intelligent penetration of cruise missiles. And then, in response to the poor exploration ability of the traditional DDPG algorithm that explores the uncorrelated timing of noise, Ornstein-Uhlenbeck noise is introduced to improve the training efficiency of the algorithm. The simulation results show that the improved DDPG algorithm training convergence time is shorter.

Key words : cruise missile；deep deterministic policy gradient algorithm；penetration strategy；deep reinforcement learning

0 引言

巡航导弹是一种能机动发射、命中精度高、隐蔽性强、机动性能强的战术打击武器，但近年来由海陆空防御武器整合得到的体系化信息化反导防御系统态势感知能力和区域拒止能力都得到了极大的提升，巡航导弹的战场生存能力受到威胁，提升巡航导弹规避动态威胁的能力成为其能否成功打击目标的关键^[1-3]。传统的巡航导弹航迹规划方法中将雷达威胁建模为一个静态的雷达检测区域，这难以适应对决策实时性要求较高的动态战场环境，而且其缺乏探索先验知识以外的突防策略的能力，需要研究能应对动态对抗的巡航导弹智能航迹规划算法。

深度强化学习是人工智能领域新的研究热点^[4-6]。随着深度强化学习研究的深入，其开始被应用于武器装备智能突防，文献[7]利用深度强化学习提出了一种新的空空导弹制导律，提高了打击目标的能力。文献[8]针对目标、打击导弹、拦截导弹作战问题，探究了是否发射拦截导弹、拦截导弹的最佳发射时间和发射后的最佳导引律。文献[9]利用深度价值网络算法探究了静态预警威胁下的无人机航迹规划问题，提升了航迹规划的时间。文献[10]将雷达威胁建模为一个静态的雷达检测区域，在二维平面探究了巡飞弹动态突防控制决策问题，提高了巡飞弹的自主突防能力。

本文详细内容请下载：http://www.chinaaet.com/resource/share/2000003690。

作者信息：

马子杰，高杰，武沛羽，谢拥军

(北京航空航天大学电子信息工程学院，北京100191)

原创声明：此内容为AET网站原创，未经授权禁止转载。

相关内容