基于双注意力和多区域检测的细粒度图像分类-AET-电子技术应用

基于双注意力和多区域检测的细粒度图像分类

2022年电子技术应用第8期

潘新辰，杨小健，秦岭

南京工业大学计算机科学与技术学院，江苏南京211816

摘要： 有效地检测具有辨别性的局部区域和更准确地提取图像的细粒度特征有助于提高细粒度图像的分类效果。为此，提出了一种结合双注意力机制和多区域检测的细粒度图像分类方法。多区域检测旨在通过类别标签学习定位到具有辨别性的图像区域，然后通过特征提取网络提取辨别性局部区域的特征并与全局特征相融合。同样，更精确的特征提取网络能够提取图像细粒度的特征。因此，通过将双注意力机制和多区域检测相结合，所提出的方法在3个公开的细粒度图像数据集CUB-200-2011、StanfordCars和FGVC Aircraft上分别达到了88.3%、94.5%和92.3%的准确率。

关键词： 细粒度图像分类注意力机制区域检测卷积神经网络特征提取

中图分类号： TP301.6
文献标识码： A
DOI：10.16157/j.issn.0258-7998.211980
中文引用格式： 潘新辰，杨小健，秦岭. 基于双注意力和多区域检测的细粒度图像分类[J].电子技术应用，2022，48(8)：117-122.
英文引用格式： Pan Xinchen，Yang Xiaojian，Qin Ling. Fine-grained image classification based on dual attentions and multi-region detection[J]. Application of Electronic Technique，2022，48(8)：117-122.

Fine-grained image classification based on dual attentions and multi-region detection

Pan Xinchen，Yang Xiaojian，Qin Ling

Computer Science and Technology，Nanjing University of Technology，Nanjing 211816，China

Abstract： Effectively detecting discriminative local areas and more accurately extracting fine-grained features of images will help improve the classification effect of fine-grained images. For this reason, a fine-grained image classification method combining dual attention mechanism and multi-region detection is proposed. Multi-region detection aims to locate discriminative image regions through class label learning, and then extract the features of the discriminative local regions through a feature extraction network and merge them with global features. Similarly, a more precise feature extraction network can extract fine-grained features of an image. Therefore, by combining the dual attention mechanism and multi-region detection, the proposed method respectively achieves 88.3%, 94.5% and 92.3% accuracy on three public fine-grained image datasets, CUB-200-2011, StanfordCars and FGVC Aircraft.

Key words : fine-grained image classification；attention mechanism；regional detection；convolutional neural network；feature extraction；feature group

0 引言

目前，深度学习技术已被广泛应用于图像分类领域，细粒度图像分类旨在区分同种对象的不同类别。相较于传统图像分类，细粒度图像分类的难点在于：(1)不同类别之间的高相似性，难以找到具有辨别性的区域并提取细节特征；(2)同一种类别之间由于图像视角、光照、背景和遮挡等因素的变化也存在着一定的差异性。因此，如何定位具有辨别性的局部区域，以及如何更精确地提取细粒度特征，成为目前细粒度图像分类方法的主要研究方向。

为了检测具有辨别性的局部区域，一些方法^[1-2]通过人工标注的方式对细粒度图像中具有辨别性的区域进行标注，然后通过网络学习定位辨别性局部区域，从而提高网络模型的分类准确性，需要花费大量的时间、人力对图像进行标注，成本太大。还有一些方法^[3-4]利用类别标签以弱监督的方式来学习具有辨别性的局部区域，这类方法虽然不能够达到使用人工标注的标签进行监督学习的效果，但额外成本几乎为零。

注意力机制作为提升网络特征提取能力的重要手段^[5]主要分为通道注意力机制和空间注意力机制，通道注意力机制可以学习到不同通道间的权重关系，空间注意力机制可以学习不同像素间的依赖关系。合理利用以上两个注意力机制能够更细粒度提取图像的特征，从而能够更好地进行分类。

本文详细内容请下载：http://www.chinaaet.com/resource/share/2000004663。

作者信息：

潘新辰，杨小健，秦岭

(南京工业大学计算机科学与技术学院，江苏南京211816)

原创声明：此内容为AET网站原创，未经授权禁止转载。

相关内容