• 中文核心期刊
  • 中国科技核心期刊
  • RCCSE中国核心学术期刊
  • Scopus, DOAJ, CA, AJ, JST收录期刊
高级检索

基于自注意力机制的矿井次光照图像语义分割研究

Research on semantic segmentation of mine sub-illumination images based on self-attention mechanism

  • 摘要: 引入图像语义分割技术,对矿井次光照环境中的目标物进行分割,将图像分成原始清晰图像和次光照图像两类,采用基于深度学习的图像增强方法对次光照条件下拍摄的图像增强细节后替换,再利用单应变换算法对数据集进行扩充,进而构建矿井巷道图像语义分割标准数据集。提出一种基于自注意力机制的轻量级编码—解码结构网络:以DeepLab V3+编码—解码网络为基础网络,在编码结构中,提取矿井图像深、浅层语义特征信息,将深层语义特征信息经由轻量级自注意力机制模块进行特征激活,而浅层语义特征信息直接送入解码器中,在解码结构中拼接深、浅层语义特征信息,恢复原始图像尺寸,输出分割结果。与传统算法就图像预测进行对比实验,结果表明:在网络复杂度方面,对于3通道512×512像素大小的图像,算法的网络理论计算量FLOPs仅48.80 G,参数量仅11.90 M;在网络分割精度方面,平均交并比76.50%,平均像素精度87.75%,领先其他主流网络;在速度方面,推理一张图像的速度能达到0.032 s,可满足轻量级网络的要求。

     

    Abstract: The image semantic segmentation technology is introduced to segment the object in the sub-illumination environment of mine, and the image is divided into two categories: original clear image and sub-illumination image. The image enhancement method based on deep learning is used to replace the enhanced details of the images taken under sub-illumination conditions, and then the data set is expanded by monogram transformation, and then the standard data set of semantic segmentation of mine roadway images is constructed. A lightweight encoding-decoding structure network based on the self-attention mechanism was proposed, which was based on DeepLab V3+ coding-decoding network. In the encoding structure, the deep and shallow semantic feature information of the mine image was extracted, and the deep semantic feature information was activated by the lightweight self-attention mechanism module, and shallow semantic feature information was directly sent to the decoder. The deep semantic feature information and shallow semantic feature information were spliced in the decoding structure, the original image size was restored, and the segmentation result was output. Compared with the traditional algorithm, the experimental results show that: in terms of network complexity, for the 3-channel 512×512 image, the network theoretical computation cost of the algorithm is only 48.80 G FLOPs and the parameter number is only 11.90 M; in terms of network segmentation accuracy, the average intersection ratio is 76.50% and the average pixel accuracy is 87.75%, leading other mainstream networks; in terms of speed, the speed of an image can reach 0.032 s, meeting the requirements of lightweight networks.

     

/

返回文章
返回