Visible–infrared person re-identification based on key-point feature extraction and optimization

doi:10.1016/j.jvcir.2022.103511

Journal of Visual Communication and Image Representation

Volume 85, May 2022, 103511

https://doi.org/10.1016/j.jvcir.2022.103511 Get rights and content

Highlights

•
A new cross-modal image matching method is proposed to learn the feature of person key-point, and the self-attention mechanism is applied for discriminative feature learning.
•
A multi-hop attention graph convolution network is designed by using the attention mechanism to assign weights to different layers for learning more robust mode invariant features.
•
A self-attention semantic perception layer is added to filter the learned features and get more discriminative features.
•
Experiments show the effectiveness of the proposed method.

Abstract

Feature extraction for visible–infrared person re-identification (VI-ReID) is challenging because of the cross-modality discrepancy in the images taken by different spectral cameras. Most of the existing VI-ReID methods often ignore the potential relationship between features. In this paper, we intend to transform low-order person features into high-order graph features, and make full use of the hidden information between person features. Therefore, we propose a multi-hop attention graph convolution network (MAGC) to extract robust person joint feature information using residual attention mechanism while reducing the impact of environmental noise. The transfer of higher order graph features within MAGC enables the network to learn the hidden relationship between features. We also introduce the self-attention semantic perception layer (SSPL) which can adaptively select more discriminant features to further promote the transmission of useful information. The experiments on VI-ReID datasets demonstrate its effectiveness.

Introduction

Person re-identification(Re-ID) [1], [2], [3] aims to find designated persons in unrelated cameras, and has been widely used in urban management, public security prevention, construction of smart cities and other fields. Many recent studies on Re-ID are limited to visible person images shot by a modal camera, and mainly rely on the unique person appearance features under visible conditions. At night or under dim conditions, the camera cannot capture visible person images clearly. In order to work normally in the case of insufficient light, most of the current surveillance cameras can automatically switch from visible light (RGB) modal to near infrared (IR) modal so as to capture infrared person images. RGB images are captured in the visible light environment, so they contain three channel color information, while the infrared images captured in the near-infrared modal only contain one channel invisible information. In this case, the color information cannot be utilized, so it is difficult to match persons across visible and infrared modalities. Recent studies focus on the Euclidean distance between two modal images and realize feature alignment by directly minimizing the Euclidean distance. They are similar to those used in traditional target detection [4], [5], [6]. However, due to the huge visual difference between the two modal images, it is difficult to achieve ideal performance. Inspired by the traditional single-modality person re-Identification, some methods [7], [8] use neural networks to learn global features shared by different modalities, but the differences within the modals may lead to image mismatching. Part based methods [9], [10], [11] find effective local features of images with large modal differences. When there is a big difference between two images, the local features usually contain more invalid information, and can easily lead to mismatch in the matching stage. Inspired by Generative Adversarial Networks(GAN), some methods [12], [13] utilize GAN to generate images and convert images from different modals to the same modal. However, due to the heavy recognition task in Re-ID, the tags of training set and test set may not be shared, therefore, the GAN trained in the test stage may not be able to generate satisfactory images. There are also some methods [14], [15] that use local features and global features at the same time to find the fine-grained and coarse-grained discrimination features. How to make good use of the features of different scales to learn the discriminant features of modal invariance is a problem worth studying. In addition, the color information in the images of different modals is quite different, and there are interference factors such as occlusion and shadow in the environment, which make it difficult for the network model to extract effective features. All these problems make it difficult to extract discriminant features for images from different modalities. Therefore, we propose a multi-hop attention graph convolution network(MAGC) and self-attention semantic perception layer(SSPL) based on person high-order information to solve this problem. In some works [13], [16], [17], it has been proved that key-point features can have strong resistance to noise interference. Therefore, our main idea is to use person key-point features as a unit to construct person graph feature matrix through predefined adjacency matrix so as to keep the relationship between person key-points. The graph convolution network [18] has natural advantages in dealing with topological relationship features, while being supplemented by multi-hop attention structure, it will have good resistance to some noise in the identification of VI-ReID. The goal of SSPL is to independently find more discriminative areas in person features learned from MAGC, and at the same time suppress the adverse effects of irrelevant areas on discrimination. In general, we design a network using two convolutional neural networks for feature extraction to obtain person key-point features, and transform the original scattered person key-point features into person key-point graph features by using the predefined adjacency matrix that preserves the connection between pairs of person key-points. We pass the graph features into MAGC and SSPL to learn high-order person information. In this way, we can extract useful identification information from the limited information of the two modalities. At the same time, feature learning based on key-points can effectively alleviate the problem that local features are unavailable due to noise or occlusion.

Our main contributions are as follows:

(1) We propose a new cross-modal image matching method. The method adopts the feature of person key-points as the unit, and forms the form of graph feature according to the actual structure of person body. The learning of VI-ReID features is facilitated by mining person structure information within and between modals.

(2) We design a multi-hop attention graph convolution network to learn the modality-invariant graph features, and distribute the weights of different layers of the network to enhance the resistance to noise interference.

(3) We use self-attention semantic perception layer to pay more attention to the regions with more information in the discrimination process and suppress the invalid regions at the same time.

(4) We conduct experiments to verify the effectiveness of our proposed method on two VI-ReID datasets and a holistic Re-ID dataset.

Section snippets

Single-modality Person Re-ID

The purpose of Person Re-ID is to find persons with the same identity in the person images taken by different visible cameras. At present, deep learning methods are widely studied [19], [20], [21]. Sun et al. [19] constructed a new baseline network and a partial pooling layer for refinement so as to redistribute some extreme local feature information. He et al. [20] proposed a reconstruction algorithm to learn person features in combination with CNN and replaced the original pixel-level feature

Proposed approach

We propose a two module graph convolution model based on person high-order information. The one is a semantic information extraction module for feature extraction and integration, and the other is a semantic information relationship building module for feature optimization learning. The model connects the key-points according to the actual position of the human body key-points and integrates the key-points into a graph. Then, the feature information in the form of graph is passed into the

Experimental settings

We verify our method on two VI-ReID datasets (RegDB [38] and SYSU-MM01 [27]) and one RGB Re-ID dataset (Market-1501 [39]), and we use mean average precision (mAP) and Cumulative Matching Characteristic (CMC) curves as our evaluation criteria to evaluate the model performance.

RegDB [38] is a VI-ReID dataset captured by a visible camera and a far-infrared camera on the front, back and side of 412 different persons. The total number of dataset pictures are 8240, among which each person have 10 RGB

Conclusions

In this paper, we propose a multi-hop attention graph convolution network(MAGC) and a self-attention semantic perception layer(SSPL) to learn more discriminant features in images. By adding a multi-hop attentional mechanism to the traditional convolution network, we can alleviate the problem of data fitting in the traditional network due to the number of layers in the network. At the same time, a self-attention semantic perception layer is added after the convolution network to make the model

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The work is partially supported by the National Natural Science Foundation of China (Nos. U1836216, 62176144, 62076153), the major fundamental research project of Shandong, China (No. ZR2019ZD03), and the Taishan Scholar Project of Shandong Province, China , China (No. ts20190924).

References (47)

LiYueying et al.
Person re-identification based on multi-scale feature learning
Knowl.-Based Syst.
(2021)
WangGuan’an et al.
Cross-modality paired-images generation and augmentation for RGB-infrared person re-identification
Neural Netw.
(2020)
GeYao et al.
A three-stage learning approach to cross-domain person re-identification
Appl. Soft Comput.
(2021)
DongHusheng et al.
Person re-identification by enhanced local maximal occurrence representation and generalized similarity metric learning
Neurocomputing
(2018)
YeMang et al.
Deep learning for person re-identification: A survey and outlook
(2020)
LengQingming et al.
A survey of open-world person re-identification
IEEE Trans. Circuits Syst. Video Technol.
(2020)
YuHong-Xing et al.
Unsupervised person re-identification by deep asymmetric metric embedding
IEEE Trans. Pattern Anal. Mach. Intell.
(2020)
YeMang et al.
Person reidentification via ranking aggregation of similarity pulling and dissimilarity pushing
IEEE Trans. Multim.
(2016)
Mang Ye, Xiangyuan Lan, Jiawei Li, Pong C. Yuen, Hierarchical Discriminative Learning for Visible Thermal Person...
KaranamSrikrishna et al.
A systematic evaluation and benchmark for person re-identification: Features, metrics, and datasets
IEEE Trans. Pattern Anal. Mach. Intell.
(2019)

Qingchao Chen, Yang Liu, Zhaowen Wang, Ian J. Wassell, Kevin Chetty, Re-Weighted Adversarial Adaptation Network for...

ZhaoRui et al.

Person re-identification by saliency learning

IEEE Trans. Pattern Anal. Mach. Intell.

(2017)

Feng Zheng, Cheng Deng, Xing Sun, Xinyang Jiang, Xiaowei Guo, Zongqiao Yu, Feiyue Huang, Rongrong Ji, Pyramidal Person...

Guanshuo Wang, Yufeng Yuan, Xiong Chen, Jiwei Li, Xi Zhou, Learning Discriminative Features with Multiple Granularities...

Mang Ye, Jianbing Shen, David J. Crandall, Ling Shao, Jiebo Luo, Dynamic Dual-Attentive Aggregation Learning for...

LinWeiyao et al.

Group re-identification with multi-grained matching and integration

(2019)

Fengxiang Yang, Zhun Zhong, Zhiming Luo, Yuanzheng Cai, Yaojin Lin, Shaozi Li, Nicu Sebe, Joint Noise-Tolerant Learning...

Guan’an Wang, Shuo Yang, Huanyu Liu, Zhicheng Wang, Yang Yang, Shuliang Wang, Gang Yu, Erjin Zhou, Jian Sun, High-Order...

Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang, Deep High-Resolution Representation Learning for Human Pose Estimation, in:...

Thomas N. Kipf, Max Welling, Semi-Supervised Classification with Graph Convolutional Networks, in: 5th International...

Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, Shengjin Wang, Beyond Part Models: Person Retrieval with Refined Part Pooling...

Lingxiao He, Jian Liang, Haiqing Li, Zhenan Sun, Deep Spatial Feature Reconstruction for Partial Person...

GaoZan et al.

DCR: A unified framework for holistic/partial person ReID

IEEE Trans. Multim.

(2021)

Cited by (5)

RGB-T image analysis technology and application: A survey
2023, Engineering Applications of Artificial Intelligence
RGB-Thermal infrared (RGB-T) image analysis has been actively studied in recent years. In the past decade, it has received wide attention and made a lot of important research progress in many applications. This paper provides a comprehensive review of RGB-T image analysis technology and application, including several hot fields: image fusion, salient object detection, semantic segmentation, pedestrian detection, object tracking, and person re-identification. The first two belong to the preprocessing technology for many computer vision tasks, and the rest belong to the application direction. This paper extensively reviews 400+ papers spanning more than 10 different application tasks. Furthermore, for each specific task, this paper comprehensively analyzes the various methods and presents the performance of the state-of-the-art methods. This paper also makes an in-deep analysis of challenges for RGB-T image analysis as well as some potential technical improvements in the future.
Stronger Heterogeneous Feature Learning for Visible-Infrared Person Re-Identification
2024, Neural Processing Letters
Modal Invariance Feature Learning and Consistent Fine-Grained Information Mining Based Cross-Modal Person Re-identification
2022, Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence
Identification Technology of Factory Personnel Based on Improved Strong-Baseline
2022, 2022 International Conference on Mechanical and Electronics Engineering, ICMEE 2022
Infrared Image Captioning Based on Unsupervised Learning and Reinforcement Learning
2022, 2022 International Conference on Automation, Robotics and Computer Engineering, ICARCE 2022

^☆: This paper has been recommended for acceptance by Zicheng Liu.

View full text

Full length articleVisible–infrared person re-identification based on key-point feature extraction and optimization☆

Highlights

Abstract

Introduction

Section snippets

Single-modality Person Re-ID

Proposed approach

Experimental settings

Conclusions

Declaration of Competing Interest

Acknowledgments

Knowl.-Based Syst.

Neural Netw.

Appl. Soft Comput.

Neurocomputing

Deep learning for person re-identification: A survey and outlook

A survey of open-world person re-identification

IEEE Trans. Circuits Syst. Video Technol.

Unsupervised person re-identification by deep asymmetric metric embedding

IEEE Trans. Pattern Anal. Mach. Intell.

Person reidentification via ranking aggregation of similarity pulling and dissimilarity pushing

IEEE Trans. Multim.

A systematic evaluation and benchmark for person re-identification: Features, metrics, and datasets

IEEE Trans. Pattern Anal. Mach. Intell.

Person re-identification by saliency learning

IEEE Trans. Pattern Anal. Mach. Intell.

Group re-identification with multi-grained matching and integration

DCR: A unified framework for holistic/partial person ReID

IEEE Trans. Multim.

Full length article
Visible–infrared person re-identification based on key-point feature extraction and optimization☆