DWSC-YOLO: A Lightweight Ship Detector of SAR Images Based on Deep Learning

Jiang, Shexiang; Zhou, Xinrui

doi:10.3390/jmse10111699

Open AccessArticle

DWSC-YOLO: A Lightweight Ship Detector of SAR Images Based on Deep Learning

by

Shexiang Jiang

and

Xinrui Zhou

^*

School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan 232001, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2022, 10(11), 1699; https://doi.org/10.3390/jmse10111699

Submission received: 4 October 2022 / Revised: 4 November 2022 / Accepted: 6 November 2022 / Published: 9 November 2022

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

In the field of ship detection, most research on lightweight models comes at the expense of accuracy. This study aims to address this challenge through a deep learning approach and proposes a model DWSC-YOLO, which is inspired by YOLOv5 and MobileNetV3. The model employs a lightweight framework as the backbone network, and the activation function and attention mechanism are researched. Furthermore, to improve the accuracy of the convolutional neural network and reduce loss, heterogeneous convolutions are added to the network. Three independent experiments were carried out using the proposed model. The experiment results show that the model can achieve excellent detection results with a small number of computational resources and costs. The

m A P

of the model is 99.5%, the same as YOLOv5, but the volume is 2.37 M, which is 79.8% less.

Keywords:

Synthetic Aperture Radar (SAR); ship detection; lightweight detector; deep learning

1. Introduction

Remote sensing data is inherently affected by errors related to the sensor, atmospheric effects, and scene properties, which often suffer from massive information loss [1]. The Synthetic Aperture Radar (SAR) is a kind of high-resolution coherent imaging radar, which can effectively improve the resolution of images. It has the advantages of all-weather and all-time working performance. Accordingly, they are widely applied in the marine field, such as ship surveillance [2], marine environmental protection [3] shipwreck rescue [4], and sea ice classification [5]. Currently, a major current focus of SAR in the marine field is ship detection in images [6], which makes it possible to use the deep learning method for SAR ship detection.

The detection of SAR ships usually faces some challenges, such as complex backgrounds and small objects. In SAR ship detection, the number of SAR samples seriously affects the performance of the algorithms. Accordingly, the research on ship detection has mainly focused on some public datasets. Some researchers proposed a marine vessel-re-identification framework and conducted extensive experiments on it on the VesselID-539 dataset [7]. To facilitate the development of object detectors in SAR images, Wang et al. constructed a SAR dataset, SSDD [8]. Recently, there have been significant breakthroughs in ship detection. Generally, ship detection methods can be divided into traditional feature extraction and modern deep learning methods.

The traditional feature extraction methods require manual feature extraction. The Constant False Alarm Rate (CFAR) [9] is a classic algorithm that adaptively calculates the detection threshold based on parameter estimation. However, the detection speed of the method is low because the background clutter distribution needs to be calculated. Another common method is template-based detection [10]. In addition to considering both the target characteristics and the background characteristics, the method can achieve excellent detection performance. Admittedly, the method largely depends on expert experience, and the corresponding generalization ability is weak. In conclusion, these traditional feature extraction methods are computationally complex or have weak generalization ability. At the same time, the detection speed is too slow to meet the real-time requirements of ship detection.

Deep learning is the most effective, time and cost-efficient, learning method based on machine learning, which has become a hot research topic [11,12]. Nowadays, many computer vision methods based on deep learning are proposed. These methods have greatly improved the state-of-the-art in object detection and many other fields, including SAR ship detection. Guo et al. proposed a stable single-stage detector called CenterNet++, which effectively solves the difficulties of ship detection, such as high false negative rates and ships being easily confused with other objects with similar appearance [13]. Wang et al. proposed an improved YOLOv3 algorithm to realize an end-to-end ship target detection system [14]. Zhang et al. established a brand new lightweight network architecture for high-speed SAR detection, which is based on the depthwise separable convolutional neural network (DS-CNN) [15]. The above methods have achieved superior performance in SAR ship detection. However, for practical application scenarios, ship detection algorithms are usually deployed on satellites with limited memory and computing resources. Therefore, the question of how to obtain excellent detection performance with a lightweight model remains to be solved.

Much research has been devoted to developing lightweight SAR ship detectors. Xu et al. proposed a lightweight onboard SAR ship detector [16]. A lightweight cross-stage partial (L-CSP) module and network pruning were used to obtain a more compact detector but with reduced accuracy. Xu et al. designed a novel on-board SAR ship detection model which compresses the network by introducing low-cost linear operations [17]. Models trained with this method have lower

m A P

and large volumes. Xu et al. developed an on-board ship detection scheme based on the traditional constant false alarm rate (CFAR) method and lightweight deep learning [18]. This scheme can be used for near-real-time image processing and data transmission of SAR satellites, but the detection accuracy is sacrificed. Zhang et al. proposed a lightweight SAR ship detector using a feature fusion module, a feature enhance module, and a scale share feature pyramid module. This is to compensate for the raw detector’s accuracy loss. Their model is lightweight but slows down detection. The above methods have made outstanding contributions to the field of SAR ship detection. Nevertheless, few studies are superior in both accuracy and model volume.

We propose a lightweight ship detection model inspired by YOLOv5 and MobileNetV3. On the one hand, YOLOv5 has the advantage of rapid recognition speeds and a relatively small amount of calculation. Furthermore, YOLOv5 uses the Pytorch framework, which is easier to deploy and put into production. On the other hand, MobileNetV3 is a lightweight convolutional neural network model, which can simplify the model to a certain extent. The proposed model well balances the model volume and detection accuracy. To the best of our knowledge, this work takes into account the loss of accuracy while reducing the model volume and is the first model that introduces Heterogeneous Convolution (HetConv) into the SAR ship detector.

This paper is organized as follows. In Section 2, the proposed improved model is presented in detail. In Section 3, the experimental environment and the dataset are first given, and then we describe the experimental design and make a comparison with some existing schemes. In Section 4, some conclusions are drawn.

2. Methodology

YOLOv5 performs well in object detection and has the advantage of fast running speed and easy deployment; MobileNetV3 is a lightweight convolutional neural network, which is suitable for scenarios with limited storage space and power consumption. Inspired by them, we propose a novel object detection network model, as shown in Figure 1. First, based on Neural Network Architecture Search (NAS), a depthwise separable convolution is introduced into the backbone network for feature extraction, and Efficient Channel Attention (ECA) is added. Therefore, the volume of the model can be greatly reduced, and the performance of the convolutional neural network is improved. Then, Mish is used as the activation function in the convolutional layer, which effectively improves the stability of the algorithm and enhances the generalization ability. Finally, a heterogeneous convolution (HetConv) is introduced in the head module, which greatly reduces the computational cost while maintaining accuracy.

The proposed model can achieve excellent detection results with less computational cost, and the detection speed is faster, which is suitable for deployment on resource-limited radar equipment. The detailed description of the proposed framework is given below.

2.1. The Main Network Structure

2.1.1. The Network Structure of the Backbone

At present, deep learning methods have developed many well-known deep neural networks, and convolutional neural networks (CNNs) are one of them. A general CNN network is structured as a series of stages. The first few stages are composed of two types of layers: convolutional layers and pooling layers [17]. The convolutional layer extracts features from the previous layer and organizes them into feature map units. The pooling layer can increase the network perception field and enhance the robustness of the model. Our model uses depthwise separable convolutions [19] and global average pooling in the backbone, which can reduce the number of parameters and the phenomenon of overfitting. A depthwise separable convolution is shown in Figure 2.

First, perform a depthwise convolution. Assuming that a 5 × 5 × 3 convolution kernel is taken, turn it into three convolution kernels of 5 × 5 × 1, which can be used to extract features. Then, this is followed by a pointwise convolution, making each kernel operate on the input image, acting separately on each channel. After that, use a 1 × 1 point-by-point convolution to restore the original channel dimension and ultimately end with the output of 8 × 8 × 1. The computational cost of the depthwise separable convolution is:

C_{D S} = D_{k} \times D_{k} \times M \times D_{W} \times D_{H} + M \times N \times D_{W} \times D_{H}

(1)

To illustrate that this model has a low computational cost, comparing the computational effort of this convolution with the standard convolution, we define a compression rate

ε

to describe it:

ε = \frac{D_{k} \times D_{k} \times M \times D_{W} \times D_{H} + M \times N \times D_{W} \times D_{H}}{D_{k} \times D_{k} \times M \times N \times D_{W} \times D_{H}} = \frac{1}{N} + \frac{1}{D_{k}^{2}}

(2)

where:

ε

is compression rate;

D_{W} \times D_{H}

is the size of the input convolution kernel;

D_{k}

is the size of the convolution kernel acting on the feature map;

M

is the number of channels of the input image;

N

is the number of channels of the output image.

From the above calculation, it can be seen that the depthwise separable convolution is much less computationally intensive than the standard convolution. In our model, deep convolutional layers are not used in the final stage to improve speed and accuracy. Instead, a convolutional layer that can be used for small-sized feature maps is placed after the global average pooling layer for balancing detection speed and accuracy.

2.1.2. The Structure of the Convolution Filter

Generally, increasing the depth and width of a network is the best way to improve its performance, but as the convolutional neural network deepens, the network complexity increases. Accordingly, in our model, an efficient convolutional filter, and the heterogeneous filter are used, which can be plugged directly into existing standard architectures. Using this filter in the network overcomes the limitations of the existing approaches based on model compression and efficient architecture search [20], which improves performance without increasing network complexity. The structure of the standard convolution and heterogeneous convolution filter is shown in Figure 3.

In the standard convolutional layer, the total computational cost at the layer

L

can be given as:

F L_{S C} = D_{o} \times D_{o} \times M \times N \times K \times K

(3)

The computational cost for

K \times K

size kernels in the heterogeneous convolution filters with the part

P

on the layer

L

is given as:

F L_{K} = (D_{o} \times D_{o} \times M \times N \times K \times K) / P

(4)

The computational cost of the remaining

1 \times 1

kernels can be given as:

F L_{1} = (D_{o} \times D_{o} \times N) \times (M - M / P)

(5)

The total reduction in the computation as compared to standard convolution can be given as:

R = \frac{F L_{K} + F L_{1}}{F L_{S C}} = \frac{1}{P} + \frac{(1 - 1 / P)}{K^{2}}

(6)

where

D_{o} \times D_{o}

is the output square feature map spatial width and height;

M

is the input depth (number of input channels);

N

is the output depth (number of output channels);

K

is the kernel size;

P

is the number of different types of kernels in a convolutional filter.

Heterogeneous convolution filters contain convolution kernels of different sizes, which are latency-free and reduce the computation and number of parameters compared to standard convolution filters while maintaining the same accuracy as the original model.

2.2. Attention Mechanisms

The attention mechanism is inspired by the biological systems of humans. Its purpose is to focus on the distinctive parts based on an extensive amount of information [21]. The attention mechanism not only greatly increases the efficiency and accuracy of perceptual information processing, but also provides the interpretability for the model generation process. Channel attention automatically obtains the importance of each feature channel through network learning and is used to strengthen essential features and suppress unimportant features. There are two channels of attention are considered in our research: Squeeze-and-Excitation Attention and Efficient Channel Attention.

The Squeeze-and-Excitation block can adaptively recalibrate channel-wise feature responses by explicitly modeling interdependencies between channels, and this block can generalize extremely effectively across different datasets [22]. The Squeeze part is to take the feature vector of each channel of the feature map. The position of the excitation part is to learn the feature weights of each channel in

c

through

z_{c}

. Formally, the

c - th

element of statistic

z

is calculated by:

z_{c} = F_{s q} (u_{c}) = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} u_{c} (i, j)

(7)

where

z \in ℝ^{C}

u_{c} \in ℝ^{H \times W}

;

H \times W

are reduced spatial dimensions;

u_{c} = v_{c} * χ = \sum_{s = 1}^{C^{'}} v_{c}^{s} * χ^{s}

;

*

denotes convolution;

χ

is input,

χ \in ℝ^{H' \times W' \times C'}

;

V = [v_{1}, v_{2}, \dots, v_{C}]

denotes the learned set of filter kernels;

v_{c}^{s}

is a 2D spatial kernel, which represents a single channel of

v_{c}

that acts on the corresponding channel of

χ

.

The Efficient Channel Attention module is a lightweight module that generates channel attention. It can be efficiently implemented by fast

1 D

convolution of size

k

to ensure model efficiency and accuracy, which avoids dimensionality reduction. The dimension of the channel

C

is calculated by:

C = ϕ (k) = 2^{(γ * k - b)}

(8)

The size of the convolution kernel

k

is:

k = ψ (C) = {| \frac{\log_{2} (C)}{γ} + \frac{b}{γ} |}_{o d d}

(9)

where

C

is the dimension of the channel;

ϕ (k)

is the linear function;

k

is kernel size;

{| t |}_{o d d}

indicates the nearest odd number of

t

.

2.3. Activation Functions

For the multi-hidden layers network, the choice of activation function is important; it can dramatically solve the problem of gradient disappearance and improve the self-learning ability of the neural network. Currently, there are many activation functions proposed; however, only a few are widely used. In our model, Mish and Swish are considered.

Swish is a smooth non-monotonic activation function, it is bounded below which can be defined as:

f (x) = x • δ (β x)

, where

δ (z) = {(1 + \exp (- z))}^{- 1}

is the sigmoid function and

β

is either a constant or a trainable parameter. Mish is also a smooth non-monotonic function and plays an important role in explaining the improvement in results, as it facilitates efficient optimization and generalization. Mish activation function can be defined as:

f (x) = x • t a n h (ς (x))

, where

ς (x) = \ln (1 + e^{x})

. The graph of Swish and Mish is shown in Figure 4.

2.4. Loss Function

The loss function used in the training of YOLOv5 includes Binary Cross Entropy loss (BCE loss) and Complete-IoU loss (CIoU loss). CIoU loss is used to measure the loss of the rectangular box, defined as:

L_{C I o U} = 1 - I o U + \frac{ρ^{2} (b, b^{g t})}{c^{2}} + α v

(10)

where:

b

and

b^{g t}

are the center points of the prediction box and the

g t

box, respectively;

ρ

is the Euclidean distance;

c

is the diagonal length of the smallest enclosing box covering two boxes;

α

is a trade-off parameter;

v

is used to measure the consistency of the relative proportion between the real box and predicted box. The BCE loss is:

L o s s = {l_{1}, l_{2}, \dots, l_{N}}, l n = - [y_{n} • \log (x_{n}) + (1 - y_{n}) • \log (1 - x_{n})]

(11)

where:

N

is the total number of samples;

y_{n}

is the category of the

n - th

sample;

x_{n}

is the predicted value of the

n - th

sample.

2.5. The Algorithm Implementation Details

Our proposed model is inspired by the YOLOv5 network and MobileNetV3 network, which is used for SAR ship image detection. The training process of our model is summarized as follows.

Input: We feed the processed SAR ship images as input samples to the network. The main processing method is data augmentation, using the K-means clustering algorithm to calculate anchor boxes and adaptive image scaling.
Feature extraction: In our model, the depthwise separable convolution is used for the backbone network. Data are fed into multilayer CNNs as tasks so that local features can be efficiently extracted by the network from the input SAR ship images.
Backpropagation: In a multi-layer neural network, the error signals for the neurons in one layer are computed from the error signals in the layer above [23]. This process is called backpropagation, and it updates the input weights of neurons to reduce the final output error. During this period, the early stopping mechanism is used to prevent overfitting to enhance generalization.
Output: After a series of feature extraction and computation operations, we obtain the weights of the final model. It will be used for SAR ship detection.

3. Experiment and Analysis of Results

3.1. Experiment Condition

All experiments are performed on a 64 bits Windows 10 computer with CPU Intel (R) Xeon (R) E5-2670 @ 2.60 GHz, 32 GB memory and an NVIDIA GeForce RTX 2080Ti GPU with 12 GB. The program was developed on the PyCharm 2022 platform by Python language, and the deep learning framework Pytorch was used.

3.2. Datasets

The efficiency of the improved method compared with state-of-the-art methods was demonstrated, especially with YOLOv5, on the SSDD SAR ship dataset [8].

The SSDD dataset contains ships in various environments and is widely used for SAR image intelligent interpretation. The dataset was set as follows. The experimental datasets contained 1160 images, and 2456 ships, the average number of ships per image is 2.12.

3.3. Evaluation Metrics

We used

m e a n

A v e r a g e

P r e c i s i o n

(

m A P

),

P r e c i s i o n

, and

R e c a l l

as evaluation metrics. They are defined as:

(1)

m A P

m A P = \frac{1}{Q} \sum_{q = 1}^{Q} \sum_{k = 1}^{N} p (k) \times Δ r (k)

(12)

(2)

P r e c i s i o n

P r e c i s i o n = \frac{T P}{T P + F P}

(13)

(3)

R e c a l l

R e c a l l = \frac{T P}{T P + F N}

(14)

where

N

is the total number of images;

Q

is the number of categories;

p (k)

is the

P r e c i s i o n

when the

k - tn

image can be identified;

Δ r (k)

is the

R e c a l l

value change in the number of images identified from

k - 1

to

k

;

T P

is the situation where the prediction and label are both ships;

F P

is the situation where the prediction is a ship but the label is the background;

F N

is the situation where the prediction is the background but the label is a ship.

3.4. Results and Discussion

To evaluate the effectiveness of the proposed model, three single tasks were used for our experiments: establishing a backbone network with different attention mechanisms, the choice of activation function, and the choice of convolution filter.

3.4.1. Network Model Establishment

In preliminary experiments, we used depthwise separable convolution (DWSC) as the main convolutional module of the backbone network. Then, adding different attention mechanisms, Squeeze-and-Excitation (SE) and Efficient Channel Attention (ECA) to the network, respectively. In order to verify the effectiveness of the lightweight backbone network, comparative experiments were conducted for the YOLOv5s, DWSC-SE, and DWSC-ECA.

Figure 5 shows for comparison the volume of the three models. The YOLOv5s were used as the baseline and compared with the other models that we proposed. It can also be seen from Figure 5 that, after the depthwise separable convolution used in the backbone network, the volume of the model is significantly reduced. Adding different attention to the model also has a slight effect on the size of the model. The reduction rates compared with YOLOv5s are 76.8% and 82.7%, respectively. DWSC-ECA showed a better lightweight relative to the DWSC-SE, model volume decreased by 25.5%. Based on the above analysis, we use DWSC-ECA as the benchmark to research the influence of activation function and convolution filter.

The parameter settings for the model are shown in Table 1.

3.4.2. Performance Analysis of Activation Functions

To verify the generalization ability of the model using different activation functions, comparative experiments were conducted using Mish and Swish activation functions. The evaluation metrics used were and

m A P_0.5 : 0.95

. From Figure 6, we can see that Mish performs better in terms of stability. Figure 6a shows that between 100–200 epochs, using the Mish activation function, the curve gradually plateaus without impulse. Figure 6b shows that the use of the Mish activation function results in a slight improvement.

Figure 7 shows a zoomed-in display of

m A P_0.5

for 100–200 epochs, the results of two activation functions can be visually compared. To further illustrate the performance of models using different activation functions, two methods are used to detect ships in the SSDD test set, respectively. In the experiment, the confidence value is set to 0.5. The results indicate that the detection rate of the Mish-based is 95%~98%, but the Swish-based method is only about 90%. It can be concluded that the method utilizing Swish has a higher missed detection rate, especially for complex backgrounds and small objects.

As described that the training effect of the model using the Mish activation function is better. The Mish function was used as the benchmark in our subsequent research, which is significant for SAR ship detection.

3.4.3. Performance Analysis of Convolution Filter

In this section, to verify the performance of the proposed network, which uses the Heterogeneous Convolution, we compared the model using HetConv with the model using Standard Convolution. The comparison results are shown in Figure 8.

Figure 8 shows that the model using heterogeneous convolutions is better than the model using standard convolution for both box loss and object loss. Combined with the details of Figure 8a, we can make a longitudinal comparison. With the increase in epochs, the loss dropped quickly for both curves to 50 epochs. After 50 epochs, if we take a much closer look, then we observe that the trend for both curves has a roughly similar trend until the end. It can be seen in Figure 8b that the two curves of object loss show a descending trend with significant fluctuations.

Meanwhile, to quantitatively analyze the detection efficiency and performance of the algorithm, YOLOv5, Lite_YOLOv5 [16], YOLO-CASS [24], L-YOLO [17], and DWSC-YOLO were selected as comparison algorithms, respectively. The corresponding experimental results are shown in Table 2. From Table 2, it can be seen that our proposed method DWSC-YOLO achieved a better balance between detection accuracy and lightweight. The

P r e c i s i o n

of the proposed method is improved by 17.95%, the

m A P

is improved by 36.28%, and the

R e c a l l

is substantially improved by 56.35% compared with L-YOLO. Compared with YOLO-CASS,

P r e c i s i o n

,

R e c a l l

, and

m A P

are slightly improved. Although the

P r e c i s i o n

,

R e c a l l

, and

m A P

of the method are the same as YOLOv5, the model volume is reduced by 79.8%, which has great potential to be transplanted to the SAR satellite with limited computing resources.

In summary, all the experiments show that the proposed model possesses advanced SAR ship detection performance. The combination of lightweight networks and traditional object detection algorithms can achieve a good balance between model volume and robustness. The network has contributions in three aspects: lightweight backbone network with different attention mechanisms, activation function, and convolution filter. All experiments are conducted independently of the SSDD dataset to verify the generalization ability of the proposed model. Compared with other lightweight models, the

m A P

, the

P r e c i s i o n

, and

R e c a l l

are all improved, despite the lack of model size.

4. Conclusions and Future Work

In this paper, we investigated SAR ship detection inspired by YOLOv5 and MobileNetV3 and proposed a new detector, DWSC-YOLO. The depthwise separable convolution was introduced into the backbone network for feature extraction, and Efficient Channel Attention (ECA) was added. Then, the activation function of the convolutional layer was selected with Mish. Finally, the heterogeneous convolutions were introduced to the head module to build a suitable SAR ship detection model. The corresponding comparison experiments were conducted, respectively, to verify the performance of our model for SAR ship detection, and they obtained reliable experimental results in several aspects. Overall, the proposed model achieves a good balance between detection accuracy and light weight, and can effectively handle the multi-scale characteristics of SAR ship images. This work provides a novel idea for SAR ship detection, which will play an important role in the real-time monitoring and control of ships.

In the future, we hope to expand our network in the next two major directions. One approach would be to develop alternative algorithms to simplify the network, resulting in a lighter model. The other is to increase the depth of CNN and improve the network’s generalization ability, thereby providing more beneficial assistance for ship monitoring.

Author Contributions

Conceptualization, S.J. and X.Z.; Methodology, X.Z.; Software, S.J.; Validation, S.J. and X.Z.; Formal Analysis, X.Z.; Investigation, S.J.; Resources, S.J.; Data Curation, X.Z.; Writing—Original Draft Preparation, X.Z.; Writing—Review and Editing, S.J.; Visualization, X.Z.; Supervision, S.J.; Project Administration, S.J.; Funding acquisition, S.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by [the Talent Introduction Fund of Anhui University of Science and Technology] grant number [2021yjrc34], and was funded by [the Natural Science Research Project of Colleges and Universities in Anhui Province] grant number [KJ2020A0301], and was funded by [the ‘Six Outstanding, One Top-Notch’ Outstanding Talent Training Innovation Project of Anhui Province] grant number [2020zyrc056], and was funded by [the natural science foundation of the Jiangsu higher education institutions of China] grant number [20KJA520009].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, J.; Li, C.; Xu, W.; Feng, H.; Zhao, F.; Long, H.; Meng, Y.; Chen, W.; Yang, H.; Yang, G. Fusion of optical and SAR images based on deep learning to reconstruct vegetation NDVI time series in cloud-prone regions. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102818. [Google Scholar] [CrossRef]
Zhao, Z.; Ji, K.; Xing, X.; Zou, H.; Zhou, S. Ship Surveillance by Integration of Space-borne SAR and AIS—Review of Current Research. J. Navig. 2014, 67, 177–189. [Google Scholar] [CrossRef]
Solberg, A.; Storvik, G.; Solberg, R.; Volden, E. Volden Automatic detection of oil spills in ERS SAR images. IEEE Trans. Geosci. Remote Sens. 1999, 37, 1916–1924. [Google Scholar] [CrossRef] [Green Version]
Carmassi, C.; Porta, I.; Bertelloni, C.A.; Impagnatiello, P.; Capone, C.; Doria, A.; Corsi, M.; Dell’Osso, L. PTSD and post-traumatic stress spectrum in the Italian Navy Operational Divers Group and corps of Coast Guard Divers employed in search and rescue activities in the Mediterranean refugees emergences and Costa Concordia shipwreck. J. Psychiatr. Res. 2020, 129, 141–146. [Google Scholar] [CrossRef] [PubMed]
Zakhvatkina, N.; Smirnov, V.; Bychkova, I. Satellite SAR Data-based Sea Ice Classification: An Overview. Geosciences 2019, 9, 152. [Google Scholar] [CrossRef] [Green Version]
Tang, G.; Zhuge, Y.; Claramunt, C.; Men, S. N-YOLO: A SAR Ship Detection Using Noise-Classifying and Complete-Target Extraction. Remote Sens. 2021, 13, 871. [Google Scholar] [CrossRef]
Qiao, D.; Liu, G.; Dong, F.; Jiang, S.-X.; Dai, L. Marine Vessel Re-Identification: A Large-Scale Dataset and Global-and-Local Fusion-Based Discriminative Feature Learning. IEEE Access 2020, 8, 27744–27756. [Google Scholar] [CrossRef]
Wang, Y.; Wang, C.; Zhang, H.; Dong, Y.; Wei, S. A SAR Dataset of Ship Detection for Deep Learning under Complex Backgrounds. Remote Sens. 2019, 11, 765. [Google Scholar] [CrossRef] [Green Version]
El-Darymli, K.; McGuire, P.; Power, D.; Moloney, C. Target detection in synthetic aperture radar imagery: A state-of-the-art survey. J. Appl. Remote Sens. 2013, 7, 071598. [Google Scholar] [CrossRef] [Green Version]
Zhu, J.; Qiu, X.; Pan, Z.; Zhang, Y.; Lei, B. Projection Shape Template-Based Ship Target Recognition in TerraSAR-X Images. IEEE Geosci. Remote Sens. Lett. 2017, 14, 222–226. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y. Geoffrey Hinton Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Dargan, S.; Kumar, M.; Ayyagari, M.R.; Kumar, G. A Survey of Deep Learning and Its Applications: A New Paradigm to Machine Learning. Arch Comput. Methods Eng. 2020, 27, 1071–1092. [Google Scholar] [CrossRef]
Guo, H.; Yang, X.; Wang, N.; Gao, X. A CenterNet++ model for ship detection in SAR images. Pattern Recognit. 2021, 112, 107787. [Google Scholar] [CrossRef]
Wang, Y.; Ning, X.; Leng, B.; Fu, H. Ship Detection Based on Deep Learning. In Proceedings of the 2019 IEEE International Conference on Mechatronics and Automation (ICMA), Tianjin, China, 4–7 August 2019; pp. 275–279. [Google Scholar]
Zhang, T.; Shi, J.; Wei, S. Depthwise Separable Convolution Neural Network for High-Speed SAR Ship Detection. Remote Sens. 2019, 11, 2483. [Google Scholar] [CrossRef] [Green Version]
Xu, X.; Zhang, X.; Zhang, T. Lite-YOLOv5: A Lightweight Deep Learning Detector for On-Board Ship Detection in Large-Scene Sentinel-1 SAR Images. Remote Sens. 2022, 14, 1018. [Google Scholar] [CrossRef]
Xu, X.; Zhang, X.; Zhang, T.; Shi, J.; Wei, S.; Li, J. On-Board Ship Detection in SAR Images Based on L-YOLO. In Proceedings of the 2022 IEEE Radar Conference (RadarConf22), New York, NY, USA, 21–25 March 2022; pp. 1–5. [Google Scholar]
Xu, P.; Li, Q.; Zhang, B.; Wu, F.; Zhao, K.; Du, X.; Yang, C.; Zhong, R. On-Board Real-Time Ship Detection in HISEA-1 SAR Images Based on CFAR and Lightweight Deep Learning. Remote Sens. 2021, 13, 1995. [Google Scholar] [CrossRef]
Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1800–1807. [Google Scholar]
Singh, P.; Verma, V.K.; Rai, P.; Vinay, P. Namboodiri HetConv: Heterogeneous Kernel-Based Convolutions for Deep CNNs. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 4830–4839. [Google Scholar]
Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Lillicrap, T.P.; Santoro, A.; Marris, L.; Akerman, C.J.; Hinton, G. Backpropagation and the brain. Nat. Rev. Neurosci. 2020, 21, 335–346. [Google Scholar] [CrossRef] [PubMed]
Xie, F.; Lin, B.; Liu, Y. Research on the Coordinate Attention Mechanism Fuse in a YOLOv5 Deep Learning Detector for the SAR Ship Detection Task. Sensors 2022, 22, 3370. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The overall network framework.

Figure 2. A depthwise separable convolution.

Figure 3. The structure of the convolution filter for (a) Standard convolution; (b) Heterogeneous convolution.

Figure 4. Mish and Swish Activation Function.

Figure 5. Model Volume comparison.

Figure 6. Comparison of the Mish and Swish activation function on the SSDD dataset for (a)

m A P_0.5

; (b)

m A P_0.5 : 0.95

.

Figure 6. Comparison of the Mish and Swish activation function on the SSDD dataset for (a)

m A P_0.5

; (b)

m A P_0.5 : 0.95

.

Figure 7. Comparison of the Mish and Swish activation function on the SSDD dataset with a zoomed-in display.

Figure 8. Comparison of the Standard Convolution and Heterogeneous Convolution for (a) box loss; (b) object loss.

Table 1. Improved network parameter settings.

Neural Network Layer	Parameters
Conv3BN	[16, 2]
InvertedResidual	[16, 16, 3, 2, 1, 0]
InvertedResidual	[24, 72, 3, 2, 0, 0]
InvertedResidual	[24, 88, 3, 1, 0, 0]
InvertedResidual	[40, 96, 5, 2, 1, 1]
InvertedResidual	[40, 240, 5, 1, 1, 1]
InvertedResidual	[40, 240, 5, 1, 1, 1]
InvertedResid al	[48, 120, 5, 1, 1, 1]
InvertedResidual	[48, 144, 5, 1, 1, 1]
InvertedResidual	[96, 288, 5, 2, 1, 1]
InvertedResidual	[96, 576, 5, 1, 1, 1]
InvertedResidual	[96, 576, 5, 1, 1, 1]
HetConv	[256, 1]
Upsample	[None, 2, ‘nearest’]
Concat	[1]
C3	[256, False]
HetConv	[128, 1]
Upsample	[None, 2, ‘nearest’]
Concat	[1]
C3	[128, False]
Conv	[128, 3, 2]
Concat	[1]
C3	[256, False]
Conv	[256, 3, 2]
Concat	[1]
C3	[512, False]
Detect	[nc, anchors]

Table 2. The performance of different methods.

Method	P (%)	R (%)	$m A P$ (%)	Model Volume (M)
YOLOv5	100.00	100.00	99.50	13.70
Lite_YOLOv5 [16]	–	–	–	2.00
YOLO-CASS [24]	95.5	93.40	97.60	1.72
L-YOLO [17]	84.78	63.96	73.01	7.40
DWSC-YOLO(Ours)	100.00	100.00	99.50	2.37

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jiang, S.; Zhou, X. DWSC-YOLO: A Lightweight Ship Detector of SAR Images Based on Deep Learning. J. Mar. Sci. Eng. 2022, 10, 1699. https://doi.org/10.3390/jmse10111699

AMA Style

Jiang S, Zhou X. DWSC-YOLO: A Lightweight Ship Detector of SAR Images Based on Deep Learning. Journal of Marine Science and Engineering. 2022; 10(11):1699. https://doi.org/10.3390/jmse10111699

Chicago/Turabian Style

Jiang, Shexiang, and Xinrui Zhou. 2022. "DWSC-YOLO: A Lightweight Ship Detector of SAR Images Based on Deep Learning" Journal of Marine Science and Engineering 10, no. 11: 1699. https://doi.org/10.3390/jmse10111699

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DWSC-YOLO: A Lightweight Ship Detector of SAR Images Based on Deep Learning

Abstract

1. Introduction

2. Methodology

2.1. The Main Network Structure

2.1.1. The Network Structure of the Backbone

2.1.2. The Structure of the Convolution Filter

2.2. Attention Mechanisms

2.3. Activation Functions

2.4. Loss Function

2.5. The Algorithm Implementation Details

3. Experiment and Analysis of Results

3.1. Experiment Condition

3.2. Datasets

3.3. Evaluation Metrics

3.4. Results and Discussion

3.4.1. Network Model Establishment

3.4.2. Performance Analysis of Activation Functions

3.4.3. Performance Analysis of Convolution Filter

4. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI