Plankton Detection with Adversarial Learning and a Densely Connected Deep Learning Model for Class Imbalanced Distribution

Detecting and classifying the plankton in situ to analyze the population diversity and abundance is fundamental for the understanding of marine planktonic ecosystem. However, the features of plankton are subtle, and the distribution of different plankton taxa is extremely imbalanced in the real marine environment, both of which limit the detection and classification performance of them while implementing the advanced recognition models, especially for the rare taxa. In this paper, a novel plankton detection strategy is proposed combining with a cycle-consistent adversarial network and a densely connected YOLOV3 model, which not only solves the class imbalanced distribution problem of plankton by augmenting data volume for the rare taxa but also reduces the loss of the features in the plankton detection neural network. The mAP of the proposed plankton detection strategy achieved 97.21% and 97.14%, respectively, under two experimental datasets with a difference in the number of rare taxa, which demonstrated the superior performance of plankton detection comparing with other state-of-the-art models. Especially for the rare taxa, the detection accuracy for each rare taxa is improved by about 4.02% on average under the two experimental datasets. Furthermore, the proposed strategy may have the potential to be deployed into an autonomous underwater vehicle for mobile plankton ecosystem observation.


Introduction
As a main component of the marine ecosystem, plankton plays an important role in both the global marine carbon cycle and early warning ahead of natural disasters [1,2]. In addition, the plankton with a high-density distribution will also affect the performance of the detecting sensors such as sonar since the acoustic transmission is impeded. Therefore, the research on the comprehensive understanding of the distribution and abundance of the plankton in the marine environment is a focus issue for both ecologists and engineers.
In the past decades and even now, the core ways of plankton sampling are mainly employing traditional tools such as filters, pumps and nets. Furthermore, the collected samples are investigated manually employing expert knowledge in the laboratory environment. It is evident that there are numerous shortcomings. On one hand, the samples are easy to be destroyed during the sampling and investigation, especially for the fragile gelatinous plankton organisms, which would result in a wrong conclusion. On the other hand, this way of plankton sampling and investigation is labor-intensive and time-consuming. 2

of 14
To overcome these shortcomings, the in situ plankton recorder equipment and detection strategy with high accuracy is an urgent demand.
It was not until the late 1970s, the first computing systems with the capability of automatically measuring the planktonic particles within images were introduced [3][4][5]. After that, especially in the past 20 years, the types of equipment used to record plankton images in situ and analyze them in the laboratory were rapidly developed, such as Video Plankton Recorder [6], FlowCytobot [7], FlowCam [8] and ZooProcess [9]. With the help of these types of equipment, the image data volume of plankton is accumulated rapidly. Simultaneously, to achieve the in situ plankton detection, a number of studies focus on utilizing the image processing technologies to mine from the immense amounts of collected data [9][10][11][12]. Following the rapid development of machine learning and computing hardware, deep neural networks are widely implemented in the field of plankton detection because of their superior capability of feature extraction compared with the traditional methods [13,14]. Inspired by AlexNet and VGGNet, Dai J et al. proposed a convolutional neural network (CNN) named ZooplanktoNet consisted of 11 layers and achieved 93.7% accuracy performance on zooplankton detection [13]. Li X et al. and Py O et al. employed a deep residual network (ResNet) and a deep CNN with a multi-size image sensing module for plankton classification, respectively [15,16]. Shi Z employed an improved YOLOV2 (You Only Look Once V2) model to detect the zooplankton in the holographic image data [17]. Pedraza et al. used CNN for the first time in automatic diatom classification and compared the performance between two state-of-the-art models RCNN (Region CNN) and YOLO [18,19]. Kerr T et al. proposed collaborative deep learning models to detect plankton from collected FlowCam image data to solve the problem of class imbalance [20]. Lee et al. incorporate transfer learning by pre-training CNN with class-normalized data and fine-tuning with original data on an open dataset named WHOI-Plankton, the classification accuracy is increased but there remains a significant problem on the prediction quality in rare taxa [21,22]. Lumini A et al. worked on the fine-tuning and the transfer learning of several renowned deep learning models (AlexNet, GoogleNet, VGG, et al.) to design an ensemble of classifiers for plankton, the performance of their approach outperformed other models, and the accuracy baseline achieved about 95.3% accuracy under the WHOI-Plankton dataset [23].
Usually, there are two ways to improve the accuracy of the plankton detection and classification, one is to enrich the amount of the features and the other is to optimize the detection and classification model to reduce the feature loss. Most studies focus on augmenting the volume of the training dataset by rotating the original image data, changing the brightness and other operations [13][14][15][16][17][18][19][20]. Cheng et al. enriched the features of plankton by combining the features under both Cartesian and Polar coordinate systems, and then employed CNN and support vector machines (SVMs) to train the classification model and classify the taxa of plankton [14]. These data augmentation operations are usually for all taxa data, which means the amount of data for all taxa are augmented proportionally, therefore, these augmentation operations do not solve the problems caused by class imbalanced data. Moreover, the learning capability of the detection and classification model is limited for the features of the rare taxa data if the amount of rare taxa data is relatively small. On the other hand, the structure of the DenseNet model [24] with the advantage of the feature reuse ability was fused into other detection models, such as YOLOV3, which reduced the feature loss in the deep neural network model and was proved effective for subtle feature retention [25,26].
To the best of our knowledge, there are two challenges for plankton detection and classification using a deep neural network. First, the plankton is class imbalanced distributed in the spatiotemporal marine environment, this phenomenon limits the detection performance since the neural network is prone to overfitting during model training [27][28][29]. Second, a large number of the subtle features of plankton are lost during the features transmitting in the neural network because of the convolution and down-sampling operations, which also limits the capability of detection and classification. Therefore, this paper aims at these two challenges and proposes a novel detection and classification strategy for imbalanced distributed plankton. The main contributions of this paper are summarized as follows. On one hand, an adversarial neural network named CycleGAN is implemented at the pre-processing stage to generate an amount of fake image data to augment the data volume of the rare taxa, which would improve the learning capability of the neural network to the features of the rare taxa [30]. On the other hand, a densely connected YOLOV3 model is proposed to detect and classify the plankton by adding some dense blocks to replace the down-sampling operations of perception layers, which ensure all the features of plankton could transmit in the neural network during model training [31].
The rest of this paper is organized as follows. In Section 2, the data augmentation method based on the CycleGAN model is introduced after reviewing the original dataset. Furthermore, the basis of the original YOLOV3 model and the proposed densely connected model based on it for plankton detection is addressed in Section 2. Subsequently, The performance evaluation metrics are listed in Section 2, while the experimental results are discussed in Section 3. The conclusions for this paper are provided at last in Section 4.

Dataset Description
A large scale and fine-grained dataset for plankton named WHOI-Plankton are used in this work, which is provided by Woods Hole Oceanographic Institution with an Imaging FlowCytobot (IFCB) to imaging plankton since 2006 [21]. The WHOI-Plankton dataset comprises over 3.4 million expert-labeled images covering 100 taxa. However, the data distribution for each taxa is extremely imbalanced by reviewing the WHOI-Plankton dataset, the most volume of the dataset is concentrated in six rare taxa including Detridus, Leptocylindrus, Dino30, Cylindrotheca, Rhizosolenia and Chaetoceros, the total percentage is up to 85% of the whole dataset. This proves the existence of the phenomenon that the plankton taxa are imbalanced distributed in the actual marine environment on one aspect.
Considering that the CycleGAN network needs a certain amount of data to train before expanding the dataset, the taxa with too little data volume will lead to under-fitting of training and affect the quality of the generated data. Therefore, the rare taxa are randomly selected among the taxa with data volumes between 100 and 200 in the WHOI-plankton dataset. The dominant taxa are randomly selected among the taxa with data volume greater than 400. Several taxa are randomly selected and illustrated as shown in Figure 1. transmitting in the neural network because of the convolution and down-sampling operations, which also limits the capability of detection and classification. Therefore, this paper aims at these two challenges and proposes a novel detection and classification strategy for imbalanced distributed plankton. The main contributions of this paper are summarized as follows. On one hand, an adversarial neural network named CycleGAN is implemented at the pre-processing stage to generate an amount of fake image data to augment the data volume of the rare taxa, which would improve the learning capability of the neural network to the features of the rare taxa [30]. On the other hand, a densely connected YOLOV3 model is proposed to detect and classify the plankton by adding some dense blocks to replace the down-sampling operations of perception layers, which ensure all the features of plankton could transmit in the neural network during model training [31]. The rest of this paper is organized as follows. In Section 2, the data augmentation method based on the CycleGAN model is introduced after reviewing the original dataset. Furthermore, the basis of the original YOLOV3 model and the proposed densely connected model based on it for plankton detection is addressed in Section 2. Subsequently, The performance evaluation metrics are listed in Section 2, while the experimental results are discussed in Section 3. The conclusions for this paper are provided at last in Section 4.

Dataset Description
A large scale and fine-grained dataset for plankton named WHOI-Plankton are used in this work, which is provided by Woods Hole Oceanographic Institution with an Imaging FlowCytobot (IFCB) to imaging plankton since 2006 [21]. The WHOI-Plankton dataset comprises over 3.4 million expert-labeled images covering 100 taxa. However, the data distribution for each taxa is extremely imbalanced by reviewing the WHOI-Plankton dataset, the most volume of the dataset is concentrated in six rare taxa including Detridus, Leptocylindrus, Dino30, Cylindrotheca, Rhizosolenia and Chaetoceros, the total percentage is up to 85% of the whole dataset. This proves the existence of the phenomenon that the plankton taxa are imbalanced distributed in the actual marine environment on one aspect.
Considering that the CycleGAN network needs a certain amount of data to train before expanding the dataset, the taxa with too little data volume will lead to under-fitting of training and affect the quality of the generated data. Therefore, the rare taxa are randomly selected among the taxa with data volumes between 100 and 200 in the WHOIplankton dataset. The dominant taxa are randomly selected among the taxa with data volume greater than 400. Several taxa are randomly selected and illustrated as shown in Fig

Dataset Augmentation
To improve the learning ability of the detection model to the rare taxa and avoid overfitting during model training, the data volume of the rare taxa is augmented roughly as the same as the dominant taxa before model training. In this paper, a generative adversarial network named CycleGAN is implemented to produce a certain amount of fake

Dataset Augmentation
To improve the learning ability of the detection model to the rare taxa and avoid overfitting during model training, the data volume of the rare taxa is augmented roughly as the same as the dominant taxa before model training. In this paper, a generative adversarial network named CycleGAN is implemented to produce a certain amount of fake data from the unpaired original data to augment the data volume of the rare taxa. The principle of the CycleGAN is shown in Figure 2. The goal is to learn two mapping functions between the domain X and Y ( G : X → Y and F : Y → X ), and the mapping functions are parameterized by neural networks to fool adversarial discriminators D Y and D X , respectively. These two mapping functions are cycle-consistent, the image x from the domain X should be brought back to the original image by the image transition cycle. Thus, the characteristics of the reconstructed fake images are similar to the original images. The loss function of CycleGAN is formulated as follows: where, L GAN (G, D Y , X, Y) and L GAN (F, D X , Y, X) are the adversarial loss, L cyc (G, F) is the cycle consistency loss, and λ is a parameter to control the relative importance between marginal matching and cycle consistency. The expectation of CycleGAN is as follows: The detailed mathematical description of CycleGAN also can be found in other literature [29]. data from the unpaired original data to augment the data volume of the rare taxa. The principle of the CycleGAN is shown in Figure 2. The goal is to learn two mapping functions between the domain X and Y ( : G X Y  and : F Y X  ), and the mapping functions are parameterized by neural networks to fool adversarial discriminators Y D and X D , respectively. These two mapping functions are cycle-consistent, the image x from the domain X should be brought back to the original image by the image transition cycle. Thus, the characteristics of the reconstructed fake images are similar to the original images. The loss function of CycleGAN is formulated as follows: is the cycle consistency loss, and  is a parameter to control the relative importance between marginal matching and cycle consistency. The expectation of CycleGAN is as follows: , arg min max ( , , , ) The detailed mathematical description of CycleGAN also can be found in other literature [29].

Basic of YOLOV3 Model
As a typical one-stage detection model, the YOLO was proposed by Redmon et al. in 2016 [32]. The significant advantage of the YOLO model over the two-stage model based on the region like R-CNN is that it greatly reduces the time consumption of detecting one image [30], which is good for detecting targets in the in situ plankton observation. The basic principle of target detection based on the YOLO model is as follows: the input image is divided into grids. If the center point of the object falls into a grid, the grid is responsible for predicting the object. The prediction bounding box contains five information values: x , y , width, height and prediction confidence. The confidence of the predicted target is defined as follows: where, the IoU is the overlap ratio between the ground truth bounding box and the predicted bounding box. . Then the dimension of the predicted tensor is as follows:

Basic of YOLOV3 Model
As a typical one-stage detection model, the YOLO was proposed by Redmon et al. in 2016 [32]. The significant advantage of the YOLO model over the two-stage model based on the region like R-CNN is that it greatly reduces the time consumption of detecting one image [30], which is good for detecting targets in the in situ plankton observation. The basic principle of target detection based on the YOLO model is as follows: the input image is divided into grids. If the center point of the object falls into a grid, the grid is responsible for predicting the object. The prediction bounding box contains five information values: x, y, width, height and prediction confidence. The confidence of the predicted target is defined as follows: where, the IoU is the overlap ratio between the ground truth bounding box and the predicted bounding box. p r (Object) = 1 means the plankton target falls into the grid, and otherwise p r (Object)= 0. Then the dimension of the predicted tensor is as follows: where, S × S is the number of grids in the image. B is the number of prediction scales. C is the number of taxa of plankton. YOLOV3 was first proposed in 2018, which is a classic version of the YOLO series [31]. There are three different prediction scales in the YOLOV3 model with the Darknet-53 structure as a backbone network, which is one of the innovations compared with the previous versions. Therefore, the dimension of the tensor becomes as follows: The loss function of YOLOV3 is composed of coordinate prediction error, IoU error and classification error as follows: Err coord + Err IoU + Err cls (6) where, S 2 is the number of grids in the image. The coordinate prediction error is defined as follows: where, λ coord is the weight of Err coord . I obj ij = 1 means the target falls into the jth bounding box of the grid i, and otherwise I obj ij = 0. The four values denote the center coordinates, height and width of the bounding box in (x i , y i , w i , h i ) and x i ,ŷ i ,ŵ i ,ĥ i , which means the ground-truth value and the predicted value of the plankton target, respectively.
The IoU error is defined as follows: where, λ noobj is the weight of Err IoU , C i andĈ i are the true confidence and the predictive confidence of plankton target, respectively. The classification error is defined as follows: where, c is the class of the detected target, p i (c) andp i (c) are the real probability and the prediction probability of the target belonging to the class c in the grid i, respectively.

Densely Connected Structure
Analysis of the distribution and abundance of rare plankton is a significant part of the investigation of plankton diversity. In order to achieve the purpose, the real-time and accurate identification and classification of plankton become particularly important. This is even more critical in the case of employing mobile underwater vehicles.
Even though the YOLOV3 model has superiority in saving analysis time during detecting plankton targets, the subtle features of plankton are easy to be lost in the process of deepening the neural network layers, which leads to the reduction of the accuracy of plankton identification and classification. The DenseNet was proposed in 2017 with the advantages of promoting feature reuse and reducing gradient disappearance [24], and its structure is shown in Figure 3. In this paper, an improved YOLOV3 model was proposed and it introduced the structure of DenseNet by adding the dense block and transition layer to replace the down-sampling layers of YOLOV3. Therefore, the proposed model ensures the integrity of the feature information in the process of deep neural network propagation. and it introduced the structure of DenseNet by adding the dense block and transition layer to replace the down-sampling layers of YOLOV3. Therefore, the proposed model ensures the integrity of the feature information in the process of deep neural network propagation.

Performance Evaluation Metrics
The reasonable index is the favorable basis to evaluate the proposed model. It usually includes detection accuracy and average time cost aspects. For the detection accuracy, precision and recall analysis are utilized to measure it [25,33]. The precision and recall are defined as follows: where, True Positives is the number of targets correctly identified, False Positives is the number of non-targets identified as targets and False Negatives is the number of nontargets identified as non-targets. Therefore, the high precision value means the detection results contain a high percentage of useful information and a low percentage of false alarms. Meanwhile, the higher the recall value is, the larger the proportion of correctly detected targets is. The average precision (AP) is the integral over the precision-recall curve. In addition, the mean average precision (mAP) is the average precision of all taxa of plankton. These two indexes are defined as follows: where, C is the number of taxa of plankton. Furthermore, the average time cost of plankton detection is another important index to evaluate the quality of the proposed model and other comparison models. The lower the average time cost is, the better the real-time performance of the model is and the more practical it is in practical engineering applications. , and so on. The transition-layer containing BN-ReLU-Conv (1 × 1)-average pooling is used to connect adjacent dense blocks. The 13 × 13 down-sampling layer is the same. Finally, the size of the extracted feature map are 26 × 26 × 512 and 13 × 13 × 1024, respectively, and the feature extraction network outputs three scales feature maps for prediction: 52 × 52, 26 × 26 and 13 × 13.

Performance Evaluation Metrics
The reasonable index is the favorable basis to evaluate the proposed model. It usually includes detection accuracy and average time cost aspects. For the detection accuracy, precision and recall analysis are utilized to measure it [25,33]. The precision and recall are defined as follows: Recall = True Positives True Positives + False Negatives (11) where, True Positives is the number of targets correctly identified, False Positives is the number of non-targets identified as targets and False Negatives is the number of nontargets identified as non-targets. Therefore, the high precision value means the detection results contain a high percentage of useful information and a low percentage of false alarms. Meanwhile, the higher the recall value is, the larger the proportion of correctly detected targets is.
The average precision (AP) is the integral over the precision-recall curve. In addition, the mean average precision (mAP) is the average precision of all taxa of plankton. These two indexes are defined as follows: where, C is the number of taxa of plankton. Furthermore, the average time cost of plankton detection is another important index to evaluate the quality of the proposed model and other comparison models. The lower the average time cost is, the better the real-time performance of the model is and the more practical it is in practical engineering applications.

Experiments and Discussions
In order to verify the performance of plankton detection, several well-known and widely used state-of-the-art detection models YOLOV3-tiny, YOLOV3 and Faster RCNN are selected to compare with the proposed YOLOV3-dense model. Table 1 lists some parameters of the proposed model and other comparison models. The proposed detection model and the comparison models in the experiments are performed on a computing server under a Linux environment, which is equipped with Intel XEON Gold 5217 CPU and NVIDIA RTX TITAN GPU cards. A brief flowchart of the experiments is shown in Figure 6.

Experiments and Discussions
In order to verify the performance of plankton detection, several well-known and widely used state-of-the-art detection models YOLOV3-tiny, YOLOV3 and Faster RCNN are selected to compare with the proposed YOLOV3-dense model. Table 1 lists some parameters of the proposed model and other comparison models. The proposed detection model and the comparison models in the experiments are performed on a computing server under a Linux environment, which is equipped with Intel XEON Gold 5217 CPU and NVIDIA RTX TITAN GPU cards. A brief flowchart of the experiments is shown in Figure 6.

Experimental Dataset Production and Components
Both of the original data and the augmented data with the CycleGAN model are labeled manually before training the plankton detection model with a graphical image annotation tool named LabelImg by drawing bounding boxes. Furthermore, the annotated

Experimental Dataset Production and Components
Both of the original data and the augmented data with the CycleGAN model are labeled manually before training the plankton detection model with a graphical image annotation tool named LabelImg by drawing bounding boxes. Furthermore, the annotated values of plankton are saved as XML files in PASCAL VOC format.
In order to evaluate the performance of the proposed plankton detection strategy to the problem of class imbalanced distribution, one and two taxa are randomly selected as rare taxa to augment the dataset with the CycleGAN model, respectively. The produced fake images of the rare taxa with different training steps are illustrated in Figure 7. It can be seen that the features of the plankton are well learned under the knowledge of humans after training 20,000 steps, and the latter weights achieved are used to produce the fake images and augment them to the training dataset. The components of the dataset with data augmentation for one and two rare taxa are listed in Tables 2 and 3 Cerataulina  300  0  100  400  Cylindrotheca  379  0  100  479  Dino30  411  0  100  511  Guinardia_delicatula  450  0  100  550  Guinardia_striata  300  0  100  400  Prorocentrum  60  390  100  550  Total  1900  390  600  2890   Table 3. Components of the dataset with data augmentation for two rare taxa. Testing Dataset  Total  Original  Augmentation  Cerataulina  300  0  100  400  Cylindrotheca  379  0  100  479  Dino30  411  0  100  511  Dinobryon  348  0  100  448  Guinardia_delicatula  450  0  100  550  Guinardia_striata  300  0  100  400  Pennate  58  362  100  520  Prorocentrum  60  390  100  550  Total  2306  752  800  3858 In Table 2 the taxon "Prorocentrum" is randomly selected as the rare taxa for instance, and the augmented data are produced using different weights of CycleGAN. The data volume of the rare taxon is increased from 60 to 390 after data augmentation which is roughly the same as the volume of the other taxa. The case in Table 3 with data augmentation for two rare taxa is similar. To evaluate the performance of more taxa, other 2 plankton taxa are randomly selected into the experiment and the number of plankton taxa is increased to 8, and another rare taxon "Pennate" with little data volume in the original  Table 2. Components of the dataset with data augmentation for one rare taxa.

Taxonomic Group
Training Dataset  Testing Dataset  Total  Original  Augmentation   Cerataulina  300  0  100  400  Cylindrotheca  379  0  100  479  Dino30  411  0  100  511  Guinardia_delicatula  450  0  100  550  Guinardia_striata  300  0  100  400  Prorocentrum  60  390  100  550  Total  1900  390  600  2890   Table 3. Components of the dataset with data augmentation for two rare taxa. Cerataulina  300  0  100  400  Cylindrotheca  379  0  100  479  Dino30  411  0  100  511  Dinobryon  348  0  100  448  Guinardia_delicatula  450  0  100  550  Guinardia_striata  300  0  100  400  Pennate  58  362  100  520  Prorocentrum  60  390  100  550  Total  2306  752  800  3858 In Table 2 the taxon "Prorocentrum" is randomly selected as the rare taxa for instance, and the augmented data are produced using different weights of CycleGAN. The data volume of the rare taxon is increased from 60 to 390 after data augmentation which is roughly the same as the volume of the other taxa. The case in Table 3 with data augmentation for two rare taxa is similar. To evaluate the performance of more taxa, other 2 plankton taxa are randomly selected into the experiment and the number of plankton taxa is increased to 8, and another rare taxon "Pennate" with little data volume in the original WHOI-Plankton dataset is added in the experiments. The training data of the rare taxa and all the testing data are strictly and randomly selected from the original WHOI-Plankton dataset considered as ground truth. Table 2 At the training stages, the loss curves of the YOLOV3 series models are compared with the proposed YOLOV3-dense model, as shown in Figure 8. All of the three YOLOV3 based models achieved convergence after tens of thousands of training steps. The convergence performance of the proposed YOLOV3-dense model is faster than the YOLOV3-tiny model and high degree of consensus as the original YOLOV3 model. The final loss of the original YOLOV3 model, YOLOV3-tiny model and the proposed YOLOV3-dense model is 0.409, 0.514 and 0.405, respectively. This indicates that the proposed YOLOV3-dense model has a higher utilization of image features than the other YOLOV3 based comparison models.  The indexes of performance evaluation for the strategy proposed in this paper and the other comparison models are listed in Table 4. The strategy is abbreviated as ours in the table and the values in bold denote that the related model has the best performance for the corresponding evaluation indexes. Based on the results, the mAP of the proposed strategy achieves 97.21%, which is higher than the other models both YOLOV3 based models and the Faster RCNN model. This verifies the performance of the proposed strategy is superior to the other models in plankton detection. It is notable that the AP of "Prorocentrum" increases from 91.87% to 96.00% after the data augmentation for the rare taxa with the CycleGAN model. Meanwhile, both the true positives and the false positives of the proposed strategy have a better performance than the other comparison models. This indicates that the proposed strategy could detect more plankton accurately with the least false alarms comparing to the other models. On the other hand, another important finding is that all the indexes of performance evaluation for the YOLOV3-dense model (values for mAP, True Positive and False Positive are 96.55%, 581 and 19, respectively) are better than the YOLOV3 model (values for mAP, True Positive and False Positive are 95.92%, 578 and 22, respectively), which confirms that the densely connected structure is helpful to improve the performance of the plankton detection by reducing the feature loss during the The indexes of performance evaluation for the strategy proposed in this paper and the other comparison models are listed in Table 4. The strategy is abbreviated as ours in the table and the values in bold denote that the related model has the best performance for the corresponding evaluation indexes. Based on the results, the mAP of the proposed strategy achieves 97.21%, which is higher than the other models both YOLOV3 based models and the Faster RCNN model. This verifies the performance of the proposed strategy is superior to the other models in plankton detection. It is notable that the AP of "Prorocentrum" increases from 91.87% to 96.00% after the data augmentation for the rare taxa with the CycleGAN model. Meanwhile, both the true positives and the false positives of the proposed strategy have a better performance than the other comparison models. This indicates that the proposed strategy could detect more plankton accurately with the least false alarms comparing to the other models. On the other hand, another important finding is that all the indexes of performance evaluation for the YOLOV3-dense model (values for mAP, True Positive and False Positive are 96.55%, 581 and 19, respectively) are better than the YOLOV3 model (values for mAP, True Positive and False Positive are 95.92%, 578 and 22, respectively), which confirms that the densely connected structure is helpful to improve the performance of the plankton detection by reducing the feature loss during the feature transmission in models. Table 4. Plankton detection performance of the proposed strategy and comparison models for the dataset in Table 2.  Table 3 For the dataset in Table 3, the loss curves of the YOLO series models at the training stages are shown in Figure 9, which are most similar to the curves in Figure 8. Even though both the number of taxa and the data volume are increased, the models also could be well trained.  Table 3 For the dataset in Table 3, the loss curves of the YOLO series models at the training stages are shown in Figure 9, which are most similar to the curves in Figure 8. Even though both the number of taxa and the data volume are increased, the models also could be well trained. The final loss of the original YOLOV3 model, YOLOV3-tiny model and the proposed YOLOV3-dense model is 0.416, 0.469 and 0.423, respectively.  Table 5 lists the indexes of performance evaluation for the dataset in Table 3. Based on the results, the mAP of the YOLOV3-dense model without data augmentation for the rare taxa yields 95.69%, which has 3.02% increase than the YOLOV3-tiny model (92.67%) and basically equals to the YOLOV3 model (95.53%) only has 0.16% increase. However, after the data augmentation for the rare taxa, the mAP of our proposed plankton detection strategy increases to 97.14% with the best detection performance than the other comparison models. Similar to the results with one rare taxon data augmentation, the AP of the two rare taxa are increased from 90.89% to 92.82% and from 92.00% to 98%, respectively. Parallelly, the true positive and false positive of our proposed detection strategy achieve the best performance than the other comparison models. On the whole, the experimental results for the dataset both in Tables 2 and 3 demonstrate the proposed strategy is suitable for the detection of the imbalanced distributed plankton in the practical ocean environ-  Table 5 lists the indexes of performance evaluation for the dataset in Table 3. Based on the results, the mAP of the YOLOV3-dense model without data augmentation for the rare taxa yields 95.69%, which has 3.02% increase than the YOLOV3-tiny model (92.67%) and basically equals to the YOLOV3 model (95.53%) only has 0.16% increase. However, after the data augmentation for the rare taxa, the mAP of our proposed plankton detection strategy increases to 97.14% with the best detection performance than the other comparison models. Similar to the results with one rare taxon data augmentation, the AP of the two rare taxa are increased from 90.89% to 92.82% and from 92.00% to 98%, respectively. Parallelly, the true positive and false positive of our proposed detection strategy achieve the best performance than the other comparison models. On the whole, the experimental results for the dataset both in Tables 2 and 3 demonstrate the proposed strategy is suitable for the detection of the imbalanced distributed plankton in the practical ocean environment. Table 5. Plankton detection performance of the proposed strategy and comparison models for the dataset in Table 3.

Real-Time Performance Evaluation
The plankton detection time consumption with these models are listed in Table 6. The average detection time costs of the proposed YOLOV3-dense model are 36 ms and 51 ms for one testing image data in the two experiments, which are slower than the YOLOV3-tiny model and YOLOV3 model, respectively, for the reason that more features were processed and transmitted in the model. Considering the properties of both the data acquisition platform and equipment, the detection speed of YOLOV3-dense is enough for practical applications in real-time. In contrast, the average detection time costs of Faster RCNN are 893 ms and 814 ms, more than 15 times slower than the YOLOV3-dense model. The slow detection speed causes that it is difficult to be implemented in the plankton detection applications with some fast-moving mobile platforms. Table 6. Comparisons of real-time zooplankton detection performance.

Model
Detection Time Consumption (ms) Dataset in Table 1 Dataset in

Conclusions
The main goal of this study is to improve the ability of in situ plankton detection for the phenomenon of class imbalanced distribution in the real marine environment. The CycleGAN model was employed to produce many fake images by the adversarial learning and augment the volume of the training dataset for the rare plankton taxa, which ensures the balanced learning of the latter proposed plankton detection model for the features of each plankton taxon. Moreover, an improved plankton detection model based on the YOLOV3 model by fusing the DenseNet was designed, which reduced the feature loss during the transmission in the model.
The experimental results under two experimental datasets with a difference in the number of rare taxa showed that the AP of the rare taxa increases by about 4.02% on average (4.13% for Prorocentrum in Experiment 1; 1.93% and 6% for Pennate and Pennate, respectively, in Experiment 2) and the mAP increases by 0.66%, 1.45%, respectively, after data augmentation. In addition, the mAP of the proposed model (97.21% in Experiment 1; 97.14% in Experiment 2) outperformed the YOLOV3-tiny, YOLOV3 and Faster-RCNN models (94.23%, 95.92% and 94.54% in Experiment 1; 92.67%, 95.53% and 95.04% in Experiment 2), and the detection time consumption (36 ms in Experiment 1; 51 ms in Experiment 2) is not much different from the YOLOV3-tiny (8 ms in Experiment 1; 11 ms in Experiment 2) and YOLOV3 (25 ms in Experiment 1; 28 ms in Experiment 2) models, but much lower than the Faster-RCNN model (893 ms in Experiment 1; 814 ms in Experiment 2). Hence, the proposed plankton detection strategy in this paper outperformed other state-of-the-art detection models to solve the problem of the species imbalanced distribution both in the performance of accuracy and in real-time.
Currently, the proposed model is deployed on the deep learning development board Jetson Nano which is a small integrated hardware equipped with a Linux system and GPU. The advantage of low energy consumption is helpful to carry out the applications of large-scale to the plankton observation with an underwater autonomous vehicle. In the ongoing and future works, the proposed in situ plankton detection will be implemented on an autonomous underwater vehicle to verify the feasibility in the real marine environment. It is notable that, the autonomous underwater vehicle at higher navigation speed affects the image quality of the imaging sensor which possibly limits the performance of the plankton detection and classification. However, the autonomous underwater vehicle is difficult to control at very low navigation speed in the complex marine environment. Therefore, the plankton sampling strategy and the detection model will be further optimized.