基于注意力机制的无监督单目标跟踪算法-AET-电子技术应用

基于注意力机制的无监督单目标跟踪算法

信息技术与网络安全 6期

林志雄，吴丽君，陈志聪

(福州大学物理与信息工程学院，福建福州350108)

摘要： 为提升目标跟踪精度，设计一种基于注意力机制的无监督单目标跟踪算法。该算法使用DCFNet网络作为基本网络，通过前向跟踪和后向验证实现无监督跟踪。为结合上下文信息，引入特征融合方法，且将DCFNet网络每一层所提取的特征通过双线性池化调整分辨率以便进行特征融合；为关注不同特征通道上的关系，引入通道注意力机制SENet模块；设计一个反向逐帧验证方法，在反向验证中间帧的基础上再预测第一帧，进而减少判别位置的误差。在公共数据集OTB-2015上的测试结果显示，本算法AUC分数达60.6%，速度达61FPS。与无监督单目标跟踪UDT算法相比，所设计算法取得了更优的目标跟踪性能。

关键词： 目标跟踪无监督学习特征融合注意力机制

中图分类号： TP391
文献标识码： A
DOI： 10.19358/j.issn.2096-5133.2022.06.009
引用格式：林志雄，吴丽君，陈志聪. 基于注意力机制的无监督单目标跟踪算法[J].信息技术与网络安全，2022，41(6)：50-56.

Unsupervised single target tracking algorithm based on attention mechanism

Lin Zhixiong，Wu Lijun，Chen Zhicong

(College of Physics and Information Engineering，Fuzhou University，Fuzhou 350108，China)

Abstract： In order to improve target tracking accuracy, this paper designs an unsupervised single target tracking algorithm based on attention mechanism. The algorithm uses the DCFNet network as the basic network to achieve unsupervised tracking through forward tracking and backward verification. In order to combine the context information, a feature fusion method is introduced, and the features extracted by each layer of the DCFNet network are adjusted for resolution by bilinear pooling for feature fusion; in order to pay attention to the relationship between different feature channels, a channel attention mechanism SENet module is introduced; a reverse frame-by-frame verification method is designed, and the first frame is predicted based on the reverse verification of the intermediate frame, thereby reducing the error of the discriminant position. The test results on the public dataset OTB-2015 show that the AUC score of this algorithm is 60.6% and the speed is 61FPS. Compared with the unsupervised single-target tracking UDT algorithm, the designed algorithm achieves better target tracking performance.

Key words : target tracking；unsupervised learning；feature fusion；attention mechanism

0 引言

目标跟踪被广泛应用于视频监控和自动驾驶等领域。在给定视频第一帧中目标位置后，目标跟踪的任务是得到目标在后续帧中的位置信息。在有遮挡、变形和背景混乱等场景下, 准确有效地检测和定位目标仍然是个难点。

深度网络由于可以加强特征表示，被广泛用于视觉目标跟踪领域。TAO等人提出SINT网络[1]，首次利用孪生网络提取特征，通过匹配初始目标的外观识别候选图像位置，实现目标跟踪任务；BERTINETTO等人提出SiamFC(Siamses Fully Convolution)网络[2]，使用离线训练的完全卷积孪生网络作为跟踪系统的基本网络，大大提高了跟踪性能；LI等人[3]提出了SiamRPN网络，基于SiamFC网络引入了区域提案网络RPN模块[4]，让跟踪系统可以回归位置、形状，进一步提高性能并加速；在此之前，基于孪生网络的跟踪器往往使用较浅的网络，很大原因在于深层网络的填充会破坏平移不变性，导致跟踪性能下降。LI等人[5]提出在训练过程中引入位置均衡的采样策略，来缓解网络在训练过程中存在的位置偏见问题，进而在SiamRPN网络基础上用了ResNet网络[6]作为主干网络，让跟踪模型性能不再受制于网络的容量。

以上这些单目标跟踪模型都是属于有监督学习，有监督学习需要大量的有标记数据集，但是手动标记既昂贵又耗时。而互联网上有大量的未标记视频可供使用，因此无监督目标跟踪算法具有更好的实际应用价值。WANG等人[7]提出了UDT(Unsupervised Deep Tracking)模型，通过将前向传播和反向预测的结果进行一致性损失计算，实现在没有标签的情况下同样优化模型。但在前向传播过程中，跟踪模型若预测的位置出错，经过反向修正后可能会再回到正确的位置，这就会导致前向传播的错误预测没有被惩罚，降低了模型跟踪性能。为此，WANG等人又进一步提出UDT+模型[8]，通过多帧验证方法惩罚前向传播的错误预测，提升位置预测的准确性。

本文详细内容请下载：http://www.chinaaet.com/resource/share/2000004535

作者信息：

林志雄，吴丽君，陈志聪

(福州大学物理与信息工程学院，福建福州350108)

微信图片_20210517164139.jpg

原创声明：此内容为AET网站原创，未经授权禁止转载。

相关内容