Learning to Recognize Chest-Xray Images Faster and More Efficiently Based on Multi-Kernel Depthwise Convolution

The development of convolutional neural networks has promoted the progress of computer-aided diagnostic systems. Details in medical image, such as the texture and tissue structure, are crucial features for diagnosis. Therefore, large input images combined with deep convolution neural networks are adopted to boost the performance in recent research of chest X-ray diagnosis. Meanwhile, due to the variable sizes of thoracic diseases, many researchers have worked to introduce additional module to capture multi-scale feature of images in CNN. However, these efforts hardly consider the computational costs of large inputs and introduced additional modules. This paper aims to automatically diagnose diseases on chest X-rays images quickly and effectively. We propose the multi-kernel depthwise convolution(MD-Conv) which contains depthwise convolution kernels with different filter sizes in one depthwise convolution layer. MD-Conv has high calculation efficiency and few parameters. Because its ability to learn multi-scale feature based on the multi-size kernels, it is appropriate for medical images diagnosis tasks in which abnormalities varied in sizes. In addition, larger depthwise convolution kernels are adopted in MD-Conv to obtain a larger receptive field efficiently, which can ensure sufficient receptive field for high resolution inputs. MD-Conv can be easily applied in modern lightweight networks to replace the normal depthwise convolution layer. We conduct experiments on the Chest X-ray 14 Dataset, which is the largest available chest x-ray dataset, and obtain competitive results. We also evaluate the MD-Conv on the new released dataset for pediatric pneumonia diagnosis. We obtain a better performance of 98.3% AUC than original paper (96.8%) for recognize pneumonia versus normal. Meanwhile we compare the FLOPs and Params of different models to show their efficiency for chest X-rays recognition.


I. INTRODUCTION
The development of convolutional neural networks(CNN) has made a dramatic breakthrough in a series of computer vision tasks, which has also promoted the computer-aided diagnosis system. Medical images have grown exponentially in hospital, and disease screening is a time consuming task for radiologists. The computer-aided diagnosis system can help to do preliminary screening and reduce the burden of radiologists.
Chest X-ray is one of the most accessible radiology examinations in the world. Research on chest X-ray includes The associate editor coordinating the review of this manuscript and approving it for publication was Yongqiang Zhao . thoracic disease identification and localization [34], lung regional segmentation [19] and diseases report generation [35]. Among all these studies, the details in medical images, such as textures and structure of lung tissue, are crucial features for diagnosis. That is the reason that the Chest X-ray 14 dataset keeps 1024 × 1024 bitmap images [34] to preserve details, which exceeding the 512 × 512 images in OpenI dataset [3]. Similarly, most global image-based CNN methods adopt large images as inputs, 512 × 512 or even 1024 × 1024 in [2], [18], [37]. And some local image-based CNN methods use 224 × 224 inputs [7], because the 224 × 224 inputs are large enough to take local details for local images. On the other hand, the pathologies in chest X-ray images are highly varied in their shapes and sizes. We conduct VOLUME 8, 2020 This statistical analysis on the sizes of eight common thoracic pathologies based on the 984 boxes provided by Chest X-ray 14 dataset. As shown in FIGURE. 1, the size of eight common thoracic pathologies varies in a wide range, and even different instances of one thoracic pathology, such as Infiltrate, have different sizes. This requires learning multi-scale convolutional features in CNN. Some recent work attempts to solve this problem by fusing the information from multiple resolutions [39], [41].
Due to the enlarged input as mentioned above, a deeper network is always adopted to ensure the network receptive field is large enough. Many works choose ResNet-50 [13] and DenseNet-121 [5] to extract CNN features [18], [22]. Although this improves the performance, the large inputs combined with deep networks brings quite huge computational costs and parameters, increasing time for network training and optimization. For example, doubled input size will lead to four times training time. And the multiple resolutions feature fusion also cost computation time and store space. Thus this is not conducive to future deployments on mobile and embedded systems.
In this paper, we focus on increasing network receptive field and efficiently learning multi-scale feature. We firstly leverage the lightweight networks. There are many excellent lightweight networks, such as MobileNet [10], Shuf-fleNet [40], MobileNet-v2 [23], ShuffleNet-v2 [21] and MobileNet-v3 [1]. These networks take the network structures of VGG [28] and ResNet [14] for reference, and replace normal convolution with depthwise convolution to reduce parameters and FLOPs while maintaining the accuracy. By fully considering the balance between computation and accuracy during the design process, these networks have achieved good performance on ImageNet dataset. In addition, we propose the multi-kernel depthwise convolution(MD-Conv), which can capture the multi-scale feature in one convolution layer without introducing extra layers or blocks. Meanwhile the larger depthwise convolution kernels, 5 × 5 kernels, are adopted in MD-Conv to efficiently obtain a larger receptive field. MD-Conv is appropriate for medical images with abnormalities in various sizes.
We replace the normal depthwise convolution with the proposed MD-Conv in popular lightweight networks, and evaluate the modified model on two public datasets: the Chest X-ray14, which is the largest available chest x-ray dataset, and the Chest X-ray2017, which is a recently released chest x-ray dataset for pediatric pneumonia diagnosis. We achieve state-of-the-art results on both datasets. The modified MD-Conv can successfully identify the chest X-rays quickly and effectively.
The contributions of this paper are as follows.
(1) Compared with modern methods adding complex network and additional block to improve performance, we adopt a lightweight network to quickly recognize chest X-rays which requires small model parameters, and is suitable to employ on embedded systems.
(2) The problem of multi-scale feature learning is studied. Based on the various sizes of thorax diseases and enlarged inputs, we propose MD-Conv, which is conducive to learning the multi-scale feature of different thorax diseases and improving network performance.
(3) The modified MobileNet-v2 with MD-Conv achieves competitive results on the Chest X-ray 2017 dataset and the Chest X-ray 14 dataset.

A. DEEP LEARNING FOR CHEST X-RAY DIAGNOSIS
Wang et al. [34] release the largest chest X-ray dataset and utilize different models to recognize and locate thorax diseases. Since then, a series of studies have been explored based on the large dataset, such as image classification, weakly supervised localization, medical report generation for medical image. In [34], four classic CNN models, AlexNet [16], GoogleNet [31], VGG16 [28], ResNet-50 [13] are compared in the proposed DCNN framework for disease localization. In [22], CheXNet is proposed and demonstrate that DenseNet [5] performs much better on chest x-ray images recognition. And then cascade ConvNet [17], global local fusion method [7] and multi-scale feature are proposed to improve recognition performance, and all these works use ResNet and DenseNet as the basic network for feature extraction. Among all these works, ResNet50 and DenseNet121 are most widely used models. Reference [17] also uses DenseNet161, and [38] reduces the Conv-Block number within a DenseBlock to four to get a light model.
In [15], authors try to utilize CNN to diagnose pediatric pneumonia on chest X-ray images. Based on transfer learning algorithm, it reaches a AUC of 96.8% for recognize pneumonia from normal on chest X-rays dataset. It adopts efficient Inception-v3 [32] as the basic network.

B. MULTI-SCALE METHODS IN COMPUTER VISION TASKS
Though CNN is robust to do recognition on images with objects of different sizes, how to obtain a multi-scale feature 37266 VOLUME 8, 2020 representation is an important issue in many computer vision tasks.
Traditional approaches use image pyramids to get a more accurate results, for example the multi-scale test in [13]. While most state-of-the-art methods utilize features from different layers to obtain inherent multi-scale in network. FPN [33] uses upsample and latency to generate feature pyramid, and SSD [36] reuses the multi-scale feature maps from different layers. FCN [20] is one of the earliest methods to fuse the multi-scale representations in semantic segmentation. Recently, HRNet [30] for human pose estimation performs repeated multi-scale fusion to achieve state-of-the-art results.
Therefore, in the field of medical deep learning, many multi-scale CNNs have also been proposed to learn multiscale feature for abnormalities. In [24], [39], additional blocks are added to fuse the multi-scale feature from different layers.

C. EFFICIENT NETWORK DESIGN
With the development of convolution neural networks, researchers have become interested in efficient model design. GoogleNet [31] is one of the earliest networks which is designed for computational efficiency. And since depthwise convolution is proposed [27], depthwise convolution is utilized in modern lightweight network to replace the normal convolution for its efficiency and effectiveness. There are five popular lightweight networks, MobileNet [10], ShuffleNet [40], MobileNet-v2 [23], ShuffleNet-v2 [21] and MobileNet-v3 [1]. Among these five lightweight networks, MobileNet modify VGG structure with depthwise convolution, while others adopt ResNet-like structure. The lightweight network CondenseNet [11] is created based on the extensive and explicit feature reuses structure of DenseNet.
The recent work [29] which is named as HetConv is similar to our work. However, their work focuses on efficient convolution computation, and they design the heterogeneous convolutional filter by 1 × 1 conv and 3 × 3 conv. While our work focus on the multi-scale feature learning ability and the large receptive field of the large convolution kernels. Therefore, 3 × 3 depthwise conv and 5 × 5 depthwise conv are adopted in our proposed MD-Conv, and 1 × 1 kernel is not recommended in MD-Conv. The separated channels of depthwise convolution is suitable for implementation of multi-kernel depthwise convolution.

III. METHOD
We aim to build a CNN to quickly and efficiently recognize chest X-ray images. Compared with the widely used networks, ResNet50 and DenseNet121, we adopt the lightweight network MobileNet-v2. The framework of chest X-ray image recognition is shown in FIGURE. 2. In order to learn the muti-scale feature, we propose the MD-Conv to replace the depthwise convolution in MobileNet-v2. The

A. THE MULTI-KERNEL DEPTHWISE CONVOLUTION
For a standard convolution layer, it takes an input feature map I ∈ R H ×W ×M , and outputs a feature map O ∈ R H ×W ×N , here we assume that the feature map spatial width(W) and height(H) are constant, and M is the number of input channels, N is the number of output channels. For depthwise convolution layer(DW), each of these filters only connects to one input channel. Then an additional layer pointwise convolution(PW) is added after DW to calculate a linear combination of the output of DW. The feature map operations of standard convolution layer and DW + PW are shown in FIGURE. 3(a)(b) respectively. For a standard convolution layer with kernel size of K × K , the computational cost and parameters are computed as: (1) For a depthwise convolution of K ×K paired with 1×1 pointwise convolution, the computational costs and parameters are VOLUME 8, 2020 computed as: Because the output channel N is much larger than K 2 , the computational cost and parameters of DW + PW is 1 N times the normal convolution. We introduce the multi-kernel depthwise convolution(MD-Conv), which contains both 3 × 3 DWConv and 5 × 5 DWConv in a multi-kernel depthwise convolution layer. The feature map computation of MD-Conv + PW is shown in FIGURE. 3(c). For each depthwise convolution kernel in the MD-Conv layer, it corresponds to only one input channel, thus we can easily implement the MD-Conv by channel split operation. Here, we consider an alternative slicing of the input feature I = [i 3×3 , i 5×5 ], where i 3×3 corresponding to input feature for 3 × 3 DWConv and i 5×5 for 5 × 5 DWConv. As shown in FIGURE. 3(c), the yellow feature maps correspond to i 3×3 , and the green feature maps correspond to i 5×5 . Therefore, the output of DWConv is And then the PW is followed to fuse the separable channels of MD-Conv. For the MD-Conv with an input feature map of H × W × M and same size output feature map, the computation cost and parameters are: For the 3 × 3 DWConv with the same inputs and outputs, the number of its parameters is 9 × (i 3×3 + i 5×5 ) × H × W . MD-Conv only adds tiny amounts of parameters compared to 3 × 3 DWConv.

1) THE EFFICIENT WAY TO IMPROVE RECEPTIVE FIELD
Receptive field plays an important role in CNN. For visual tasks with low resolution inputs, the receptive field of standard 3 × 3 filter is sufficient. While for medical images with high resolution, increased receptive field is needed to retain more details. The network receptive field is computed as: where n represents the network layer, k n is the kernel size of layer n, s n is the stride of layer n.
To increase the receptive field, a simple method is to increase the depth of the network, such as densenet121. The receiving area of the two stacked 3 × 3 convolutions is 5, which is the same as the 5 × 5 convolutions, and has less computation and parameters. In [32], two stacked 3 × 3 convolution are used to replace 5 × 5 convolution for efficiency. 37268 VOLUME 8, 2020 For lightweight networks, an efficient way to increase the receptive field is to adopt a larger kernel depthwise convolution. A 5 × 5 depthwise convolution has the same receptive field as 5 × 5 convolution, and it requires much less extra computational costs and parameters. Two stacked 3 × 3 depthwise convolution is not adopted due to no channel cross talk [40].
Additionally, two stacked 3 × 3 convolutions have a computation cost of 18 × M × N × H × W , while the 5 × 5 depthwise convolution has a computation cost of 25 × M × H × W . The 5 × 5 depthwise convolution has much smaller computation cost while maintaining the same receptive field.
Therefore, the part of 5×5 depthwise convolution can more efficiently obtain a large receptive field in MD-Conv.

2) COMPARED WITH THE MULTI-SCALE FEATURE FUSION METHOD
The image-level classification always requires coarse-scale features with high semantic information and context, while detection and segmentation need fine-scale features to capture detailed appearance information. Therefore, in [39], feature fusion at different resolutions is used. In [41], a multiresolution CNN is adopted to recognize nodules of different sizes. These methods all require additional convolution blocks and downsample or upsample to integrate information from multiple scales.
The MD-Conv is proposed to replace the normal depthwise convolution. In one MD-Conv block as shown in FIGURE. 3(e), the MD-Conv can extract the multi-scale feature based on the multi-scale depthwise convolution kernels, and then the 1×1 convolution is followed to fuse information of different scales. This introduces no addition convolution layers or operations, and adds only slight parameters of 5 × 5 depthwise convolution. Moreover, the MD-Conv can be easily integrated into modern networks.

B. HOW AND WHERE TO USE MD-CONV
In this subsection, we will discuss how and where to use MD-Conv, which is the ratio of different kernels(i 3×3 , i 5×5 ) in one MD-Conv layer, and where the MD-Conv should be used in a network.
As mentioned above, the MD-Conv consists of 3 × 3 depthwise convolution and 5 × 5 depthwise convolution. The parameters i 3×3 and i 5×5 control the number of different types of kernels in one MD-Conv layer. To find the best ratio of i 3×3 /i 5×5 , we do experiments on different ratios of i 3×3 /i 5×5 in MD-Conv. In these experiments, we replace all the normal depthwise convolution layer in MobileNet-v2 with MD-Conv layer and perform the experiments on the Chest-Xray 14 Dataset. The results are shown in FIGURE. 4. According to it, we choose i 3×3 /i 5×5 ratio of 1:1 to achieve the best cost and accuracy trade-off.

2) WHERE TO USES MD-CONV
MD-Conv is proposed to efficiently obtain the multi-scale feature and improve receptive field. We find that it is not necessary to use MD-Conv in all layers. Placing MD-Conv in right location can improve performance while saving FLOPs and Params. We conduct experiments to explore which layer the MD-Conv should be used. As shown in TABLE 4, we achieve the best AUC score replacing all the normal depthwise convolution layer in layer2 with MD-Conv. The final network architecture of modified MobileNet-v2 is shown in TABLE 1.

IV. EXPERIMENT A. CHEST X-RAY DATASETS
Chest X-ray is the most common and efficient technique for screening and diagnosis of lung-related diseases, such as pneumonia, cardiomegaly, lung node. Several chest X-rays datasets have released for study. Early datasets, such as [12], [26], contain only hundreds of chest X-ray images, which are too few for deep learning. OpenI [3] is a publicly available dataset collected by Indiana University, it contains VOLUME 8, 2020 3955 radiology reports and the corresponding 7,470 chest X-ray images. The original image size is 512 × 512.
Chest X-ray14 is the largest chest X-ray dataset provided by Wang et al. [34]. It provides 112,120 frontal-view chest X-ray images of 30,805 unique patients. Each image is annotated with one or more labels of 14 common thorax diseases. The chest X-ray image was originally extracted from DICOM file and resized to 1024 × 1024, which containing more details. Based on the large amount of chest X-ray images in Chest X-ray14, many deep learning methods can be performed on it.
Recently, [15] releases a chest X-rays dataset for pneumonia classification. It contains 5,856 labeled chest X-ray images from children, which also distinguish between bacterial and viral. The sizes of images are various. According to statistics, about 90% images have the aspect ratio between 1.0 and 1.5, and 94.8% images have larger image size than 512 × 512.
In this section, we first perform a series of comparative experiments on the Chest X-ray14 Dataset. For convenience, we perform most experiments from scratch without Ima-geNet [4] pre-trained. Finally, we compare our methods with others on both the Chest X-ray14 Dataset and the Chest X-ray2017 Dataset, for both chest x-ray dataset, we follow the official patient-wise split.

B. IMPLEMENTATION DETAILS
We use lightweight network architectures and modify the last fully connected (fc) layer to 14 for the Chest X-ray14 dataset, and 2 for the Chest X-ray2017 dataset. For the multi-label classification problem of Chest X-ray14 dataset, a BCE Loss is adopted.
We implement all experiments using PyTorch framework. Adam optimizer is adopted with a initial value of 10e −5 . And cosine learning rate [11] is adopted. We train 100 epochs with a batch size of 64 on 4 GPU. The weight decay is set to 10e −5 and early stop is used. During the training phase, we perform data augmentation by resizing the original images to 512 × 512 and randomly cropping to 448 × 448(for 224×224 input, resize to 256×256), randomly rotating from −10 to 10 degrees, and the probability of random horizontal flips is 0.5. The network weights are initialized using MSRA initialization [9].

C. CHEST X-RAY 14 DATASET 1) CLASSIFICATION PERFORMANCE UNDER INPUTS IN DIFFERENT SCALES
We first run experiments to understand how kernel size and input scale affect the performance of different thoracic pathologies. As shown in FIGURE. 5, (a) For most results, the larger input images improve the performance. As it shows that the green ones has a higher AUC than red ones, expect for Pneumonia and Consolidation which get similar scores. (b) For Atelectasis, Mass, Nodule and Fibrosis, a huge improvement is obtained by enlarged input, and the 3 × 3  kernel size performs the best. And for Mass, Nodule and Fibrosis, on 224 × 224 input, 3 × 3 kernel performs much better than 5 × 5 kernel. This is because the pathologies with small size, as shown in FIGURE. 1, are easier to recognize on large image, and 3 × 3 is more suitable for feature extraction. (c) For diseases like Cardiomegaly, Pneumothorax, Emphysema and Pleural_Thickening, 5 × 5 kernel on 448 × 448 inputs achieve best AUC. As shown in FIGURE. 1, most of Cardiomegaly and Pneumothorax have larger size than 200 × 200. Thus for these diseases, a larger kernel is more suitable for feature extraction on 448 × 448 inputs.
We can conclude that for the thoracic pathologies with different sizes, we should try to extract the multi-scale feature for better recognition. Therefore, MD-Conv is right for this problem. We do experiments on the Chest X-ray14 dataset with MobileNet-v2 and TABLE 2 shows the performance of MD-Conv on different input size. MD-Conv performs best on both 224 × 224 and 448 × 448 inputs. And 7 × 7 kernel performs the worst which is too large for both input sizes, thus 7 × 7 kernel is not adopted in MD-Conv.

2) PERFORMANCE OF FIVE LIGHTWEIGHT NETWORKS
Five lightweight networks MobileNet [10], ShuffleNet [40], MobileNet-v2 [23], ShuffleNet-v2 [21] and MobileNet-v3 [1] have different performance on ImageNet dataset which indicates that these networks will also perform the same in chest X-ray images recognition task. In order to determine which network we use, we conduct experiments using these networks on the chest x-ray14 dataset and the results are shown in TABLE 3. According to the experimental result, we choose MobileNet-v2 as the baseline. This is  because the depthwise convolution layer in MobileNet-v2 has increased channels which are enough for each depthwise convolution group with different kernel size in MD-Conv that ShuffleNet-v1 dose not have. Meanwhile, MobileNet-v2 has the best accuracy in chest X-rays recognition as shown in TABLE 3.

3) ABLATION STUDY ON WHERE AND WHETHER TO USE MD-CONV
To explore where and whether we should use MD-Conv in MobileNet-v2, we conduct ablation experiments. In MobileNet-v2, the stride of layer1, layer2, layer3 and layer5 is 2, and using MD-Conv in these layers can make receptive field increase in multiples. Therefore, we use MD-Conv only in these layers, and the results are shown in TABLE 4. We can find that using MD-Conv in all layers achieve the low AUC score and its FLOPs are the largest. Using MD-Conv in layer5 achieves the highest AUC and the fewest FLOPs. Therefore, we use MD-Conv only in layer5, as shown in TABLE 1. We guess MobileNet-v2 with MD-Conv in layer5 can obtain a suitable receptive field for input size of 448 × 448. Using MD-Conv in all layers is not necessary. In order to explore the effect of MD-Conv, we also conduct additional ablation experiments using MobileNet-v3, MD-Conv is used in all layers, the results are shown in TABLE 5. As can be seen, MD-Conv improves the performance of both networks.

4) CLASSIFICATION PERFORMANCE COMPARED WITH THE STATE-OF-THE-ART APPROACH
We compare our results with Resnet50 [34], DenseNet + LSTM [38], DenseNet121 [8] based on the official dataset split. The results are shown in TABLE 6. As we can see, our method performs better than [34] which only adopts ResNet-50. While for [38] which introduces LSTM in network, and [8] which adopts DenseNet121 and uses additional data PLCO Dataset [6]. Although our results are not the best, we provide a baseline for latter research studying lightweight networks in chest X-ray images recognition task and the probability to identifying diseases in embedded and mobile devices. Meanwhile, we only provide a basic network for chest X-ray recognition, which can achieve competitive results with less computation costs and parameters and can be easily used in any other networks with depthwise convolution.
The comparison of computational costs of ResNet50, DenseNet121, and our modified MobileNet-v2 is shown in TABLE 7. As the table shows, our modified model is much lighter on FLOPs and Params. Thus we think there is more space to make improvements based on our basic network. And with such small computation cost and parameters, we still outperform the results of [34].

D. CHEST X-RAY 2017 DATASET
To verify the generalization ability of our model, we also do experiments on the ChestX-ray2017 Dataset released by [15]. We use MobileNet-v2 with MD-Conv to recognize pneumonia versus normal, bacterial versus viral pneumonia. The results are shown in Table 8.
For Chest-Xray recognition of pneumonia versus normal, we achieve an accuracy of 93.4%, outperforms [15] by 0.6%. And the sensitivity outperforms [15] by 4.2%. Though the specificity is 2.3% lower, we achieve an AUC of 98.3%, while is higher than 96.8% in [15]. The AUC curve is shown in FIGHRE 6. Besides, our modified MobileNet-v2 with MD-Conv is lighter than Inception-v3 adopted in [15]. VOLUME 8, 2020  Comparison results of different methods. we list fourteen abnormalities and their AUCs, we also list the method adopted in every paper.   [15]. pne represents pneumonia and bac represents bacterial.

E. QUALITATIVE ANALYSIS
We obtain the discriminative regions for each disease by Grad-CAM to show the visual explanation of how CNN recognize Chest-Xray images. As shown in FIGURE 7, different FIGURE 7. Examples of some network attention visualization on test image in Chest X-ray14 dataset. The attention map is generated by Grad-CAM [25], the red region represents the place where disease appears probably. The blue boxes is the ground-truth bounding box provided in the dataset.
diseases have different sizes. The Mass and Nodule are small and have equal length and width, concluded in size distribution in FIGURE 1 and the visualization results in FIGURE 7, the location heatmaps concentrate on an approximate circular area. And for Pneumonia, the location heatmaps focus on most chest region, and the area with most attention is in the blue ground-truth box. For Cardiomegaly, the groundtruth boxes include all the heart region, while the heatmaps pay attention to the enlarged heart margin which can also recognize Cardiomegaly effectively. Since that, the model can locate the disease region no matter the shapes and sizes of the diseases.

V. CONCLUSION
In this paper, we explore the various sizes of thoracic pathologies and multi-scale feature learning for thoracic pathologies recognition. We propose MD-Conv, which contains both 3 × 3 depthwise convolution and 5 × 5 depthwise convolution to quickly and efficiently recognize Chest Xray images. We conduct experiments on both the Chest X-ray14 Dataset and the Chest X-ray2017 Dataset, and our modified MobileNet-v2 with MD-Conv outperforms the basic network adopted in other methods. The MobileNet-v3 with MD-Conv is better, and have much less computational costs and parameters. We believe that the results will be better than the current version, and may even exceed other existing methods with carefully adjusting hyperparameters in training phase. And there will be more work can be done on efficient model in the future. QING SONG received the Ph.D. degree from Tianjin University, Tianjin, China, in 2006. She is currently a Scientific Researcher with the Beijing University of Posts and Telecommunications (BUPT), where she is also involved in computer vision technology study. She is also the Founder of the Pattern Recognition and Intelligent Vision Laboratory (PRIV) and led the PRIV Team to the Championship of COCO2018-DensePose Challenge. She is also in charge of many national, provincial and ministerial projects, and enterprise cooperation projects. She has published more than 70 academic articles in international journals and conferences.