Improved cGAN based linear lesion segmentation in high myopia ICGA images

: The increasing prevalence of myopia has attracted global attention recently. Linear lesions including lacquer cracks and myopic stretch lines are the main signs in high myopia retinas, and can be revealed by indocyanine green angiography (ICGA). Automatic linear lesion segmentation in ICGA images can help doctors diagnose and analyze high myopia quantitatively. To achieve accurate segmentation of linear lesions, an improved conditional generative adversarial network (cGAN) based method is proposed. A new partial densely connected network is adopted as the generator of cGAN to encourage the reuse of features and make the network time-saving. Dice loss and weighted binary cross-entropy loss are added to solve the data imbalance problem. Experiments on our data set indicated that the proposed network achieved better performance compared to other networks

shown in Fig. 1, linear lesions are revealed as hypofluorescent structures in the late-phase ICGA images (indicated by red arrows). Although ICGA images can present linear lesions more clearly than other image modalities, linear lesion segmentation is still challenging because of the following two reasons: (1) The shape of linear lesions is quite complex, including linear, stellate, branching and crisscrossing structures. There is no fixed structure for linear lesions. (2) Linear lesions share the same characteristics with retinal vessels in both spatial structures and gray levels. Deep convolution neural networks (DCNNs) has been proved to be efficient for image processing including image classification [13,14], image segmentation [15] and object detection [16]. It has achieved success in retinal vessel segmentation, which is somewhat similar to the linear lesion segmentation. Liskowski et al. [17] proposed a deep neural network model to detect retinal vessels in fundus images. The approach outperformed previous vessel segmentation methods on accuracy of classification and area under the ROC curve. Later, Wu et al. [18] proposed a DCNN architecture under a probabilistic tracking framework to extract retinal vessel tree. Fu et al. [19] formulated the vessel segmentation as a boundary detection problem using fully connected CNN model. However, there are few studies on linear lesion segmentation. To our best knowledge, we are the first to apply DCNNs to automatically segment linear lesions. In our previous work [20], a conditional generative adversarial network (cGAN) based method was proposed which achieved reasonable performance. In this paper, a new partial densely connected network is proposed to further improve the segmentation performance. Dice loss and weighted binary crossentropy loss are added to overcome the data imbalance problem. The contributions of our work are listed as follows: -We are the first to introduce and improve cGANs for the task of linear lesion segmentation.
The result is the best among other compared methods.
-A new partial densely connected network is proposed as the generator of cGANs that tries to encourage the reuse of the features.
-Dice loss and weighted binary cross-entropy loss are added in the loss function to deal with the data imbalance problem.
-The problem is formulated as a three-class segmentation task, so that the network can be trained to learn the differences between linear lesions and retinal vessels. The rest of the paper is organized as follows. In Section 2, the proposed method is described in detail. In Section 3, the experimental results are given and compared with other methods. In Section 4, conclusions and discussions are presented.

Conditional generative adversarial networks
GANs and its variations [21,22] have been widely studied in the last four years and have achieved success in many image processing applications, such as inpainting [23], future state prediction [24], image manipulation [25,26] and style transfer [27]. Just as GANs learn a generative model of data, cGANs learn a conditional generative model, where the output image is conditioned on an input image. This makes cGANs suitable for image-to-image translation tasks especially for image segmentation. Based on the conditional information, cGANs can generate images with high quality. Figure 2 shows the flowchart of the proposed method. In the training stage, the input image and the ground truth are combined in pairs and sent to cGANs to train both the generator and discriminator. ICGA images in the data set are annotated with three class labels: backgrounds, linear lesions and retinal vessels. Since the original two-class segmentation of linear lesions usually segment the retinal vessels as linear lesions, the threeclass segmentation can train the networks to learn the differences between linear lesions and retinal vessels and increase the accuracy of segmentation. In the test stage, the generator can generate the three-class segmentation results according to the input images. Finally, retinal vessels and background are combined as the background in the binary segmentation results. cGANs consist of a generator and a discriminator. During the training process, the generator captures the data distribution and the discriminator estimates the probability that the image comes from the training data rather than the generator. The discriminator learns a loss that tries to detect whether the output image is real or fake while the generator is trained simultaneously.
cGANs learn a mapping from the input image x and the random noise vector z to the output image y . The loss function of cGANs can be expressed as follows: During iterations, the generator is trained to minimize log(1 ( , ( , ))) D x G x z − while the discriminator is trained to maximize log ( , ) D x y , following the min-max optimization rule:

Partial dense connections in generator
The encoder-decoder model [28,29] has been shown to be one of the most efficient network architectures to complete image segmentation tasks. U-Net [30], one of the typical encoderdecoder networks, is the most common network architecture adopted in generator. The encoder can gradually reduce the spatial dimension of feature maps and capture the longrange information while the decoder can recover object details and spatial dimension. Skip connections are added from the encoder features to the corresponding decoder activations to help decoder layers assemble a more precise output based on features from encoder layers. However, the original U-Net has poor performance on linear lesion segmentation reported in our previous work [20], because linear lesions and other structures such as retinal vessels are too similar to be distinguished by the network. In the proposed method, partial dense connections are introduced into the U-Net structure in the generator.
Dense connections are first proposed in densely connected convolutional networks (DenseNets) [31], which is the improved network of ResNets [13]. DenseNets obtain significant improvements over the state-of-the-art on most data sets. According to [31,32], DenseNets can drastically reduce the vanishing of gradient because features are reused by creating short paths from early layers to later layers. DenseNets allow layers to access feature maps from all of its preceding layers. As an improvement to DenseNet, TiramisuNet [33] extends the DenseNet architecture to fully convolutional networks for semantic segmentation, while mitigating the feature map explosion. It can complete semantic segmentation efficiently and achieve state-of-the-art results on urban scene benchmark data sets.
Experiments have shown that although the final classification layer uses weights across the entire dense block, there seems to be a concentration towards final feature-maps, suggesting the adjacent layers might contribute most to the final feature maps [31]. Thus, partial dense connections are proposed and applied on the generator of cGAN in our method. Compared with the original dense connections, long range connections are removed to reduce the training time and increase the computation efficiency, while short range connections are kept to encourage the reuse of the features. By adding partial dense connections, the network can learn the differences between object and background in a relatively short time. The proposed partial dense connections are illustrated in Fig. 3, where each layer makes use of feature maps produced by the previous two layers. The output of the th i layer i x is denoted as follows: where ( ) i H ⋅ represents the non-linear transformation in the th i layer, including batch normalization, rectified linear units, pooling or convolution. As each layer has different feature resolutions, we down-sample the feature maps with higher resolutions or up-sample the feature maps with lower resolutions before the partial dense connections.
Finally, the encoder-decoder architecture with partial dense connections is used in the generator as shown in Fig. 4. The skip connections share the information between encoders and decoders to make the output much more reasonable. Partial dense connections encourage the reuse of the feature maps so that networks can distinguish the features of linear lesions from those of retinal vessels. Meanwhile, partial densely connected networks have fewer computations than fully densely connected networks. It can finish the training process in a much shorter time.

PatchGAN in discriminator
PatchGAN [22] is employed in the discriminator as shown in Fig. 5. Traditional discriminator in cGANs for image segmentation outputs a single number between 0 to 1 to represent the probability that the output image is real or fake. In contrast, patchGAN tries to classify if each N N × patch in the output image is real or fake. We run this discriminator across the whole output image convolutionally and average all responses to fetch the ultimate discrimination of the output image. As shown in Fig. 5, each pixel in the final layer reflects the possibility of the corresponding 70 70 × patch in the input images. By running patchGAN, the size of patches can be much smaller than the full size of the image and it has fewer parameters than the original discriminator. Therefore, it can be applied on arbitrary size images with higher computational efficiency.

Improved loss function
Previous improvement approaches [22,23] for cGANs have found it beneficial to mix cGAN loss with a traditional loss, such as L1 loss. By adding L1 loss, the discriminator's task remains unchanged, but the generator is tasked to produce not only undistinguishable images but also images much more similar to the ground truth. We also adopt the following L1 loss in the proposed method: To improve the segmentation performance of our proposed networks, the Dice loss [34] and the weighted binary cross-entropy loss are also added in the final loss function.
In ICGA images, linear lesions usually occupy a relatively small part of the whole image. The imbalance between the pixel number of background and object often causes the training process to get trapped in a local minimum of the final loss function. The networks usually produce predictions which are biased to backgrounds. To solve the data imbalance problem, the Dice loss function is added as follows: where i y and ( , ) i G x z respectively represent the th i channel of the ground truth and the prediction, and i w denotes the weight of the Dice loss from the th i channel. Different weights are allocated to the Dice loss from different channels so that the networks can finally achieve the better linear lesion segmentation.
Although the Dice loss can drastically reduce the data imbalance problem, it still has some limits on the predictions of single pixels. In this paper, the weighted binary crossentropy loss is added in the final loss function. The Dice loss cares about the intersection area of predictions and the ground truth, while the weighted binary cross-entropy treats the segmentation problem as pixel-wise classification and tries to increase the accuracy of pixelwise classification for each class. It cannot only highlight the area of linear lesions effectively to enhance the structural information but also balance the gradients of areas in different classes during training. The weighted binary cross-entropy loss is defined as follows: The total weighted binary cross-entropy is the sum of weighted binary cross entropy calculated in each class. ( , ) i G x z and i y represent the prediction and the ground truth of the th i class, respectively. i w + and i w − denote the ratio of object and background. The final improved loss function is:

Data set
The medical records and ICGA database of Shanghai General Hospital from April 2017 to August 2017 were searched and reviewed. Totally 76 eyes with linear lesions from 38 subjects were included and imaged (indocyanine green as fluorescer, Heidelberg Retina Angiography 2, Heidelberg Engineering, Heidelberg, Germany, 768 × 768 pixels). The collection and analysis of image data were approved by the Institutional Review Board of Shanghai General Hospital and adhered to the tenets of the Declaration of Helsinki. An informed consent was obtained from each subject to perform all the imaging procedures. Previous studies [3] show that lacquer cracks are hypofluorescent in the late ICGA phase, which is 15 minutes after ICG dye injection. In our experiments, images were fetched 30 minutes after injection to ensure linear lesions were clear. Due to the small number of subjects, 2 images from each eye are used in the data set. These 2 images have slight differences in the position and intensity because of the different imaging time. Therefore, each subject has 4 images and a total of 152 ICGA images are included in the data set. We randomly split the data set into 4 parts, which contains images from 10, 10, 9 and 9 subjects, for the four-fold cross validation. As shown in Fig. 6, each image in the data set is annotated pixel-wise with three class labels, namely background, linear lesions and retinal vessels.

Evaluation metrics
As each evaluation metric has its own bias toward the specific properties of the segmentation, multiple metrics should be covered to achieve the overall evaluation for the segmentation. To make the results clear and quantitative, we adopt metrics in Table 1 to evaluate our segmentation. Since the linear lesions are our final segmentation object, retinal vessels and background in the three-class segmentation results are combined as the background in the final binary images. Intersection over union (IoU), also known as Jaccard index, is the main metric which measures the overlap between the ground truth and segmentation results [15]. Dice similarity coefficient (DSC) can also be used for comparing the similarity between the ground truth and results [35,36]. Accuracy is another common metric, representing a ratio between the amount of properly segmented pixels to the total pixel number [37]. Furthermore, because the automatic linear lesion segmentation is used to assist doctors in diagnosis and analysis of high myopia, sensitivity and specificity are also included [38].

Comparison of model variations
In this section we investigate the effect of the partial densely connections and the improvement of the loss function including Dice loss and weighted binary cross-entropy loss. As shown in Fig. 7 and Table 2, the original cGAN method cannot learn the differences between linear lesions and retinal vessels. Partial dense connections and the improved loss function drastically improve the performance of segmentation. Compared with the traditional U-Net generator, generator with partial dense connections encourages the feature reuse during the training process. It can easily capture the main features of linear lesions and learn the differences between linear lesions and retinal vessels. Partial dense connections can also retain the full structure of linear lesions and present more details. Additionally, the Dice loss and the weighted binary cross-entropy loss can solve the data imbalance problem to a great extent, since linear lesions are always slim and only occupy a small part of the image. The improved loss function can avoid the bias to background effectively.

Comparison to other deep learning networks
To evaluate the performance of our method objectively, the proposed method is compared with several popular deep learning networks. As shown in Fig. 8 and Table 3, the proposed method obtains the best performance according to all evaluation metrics. Compared to other deep learning networks, it is clear that the adversarial mechanism in cGAN has remarkable performance in linear lesion segmentation. U-Net with partial dense connections are added in the comparison. As we can see, it performs better than original U-Net, PSPNet and TiramisuNet, which not only indicates that the U-Net with partial dense connections can improve the performance of generator, but also represents the proposed method is good at linear lesion segmentation even without cGAN mechanism. We also infer that the Dice loss and the weighted binary cross-entropy loss play a big role in achieving the good results, because other networks such as PSPNet and TiramisuNet, designed for natural object segmentation, only use the cross-entropy loss. To make the networks suitable for medical image segmentation, loss functions should be improved to correspond to the object since each part of the loss function has different bias to drive the prediction. Exploring the appropriate loss functions is very important to achieve good segmentations in ICGA images.

Conclusions and discussions
With the increasing prevalence of myopia, high myopia has become a main vision-threat.
Since the development of linear lesions can reflect the severity of high myopia, it is important and meaningful to achieve automatic linear lesion segmentation. This paper has proposed an improved cGAN framework to segment linear lesions in ICGA images. On one hand, partial dense connections are added in the generator to emphasis feature reuse and to allow the network to better learn the differences between object and background. On the other hand, the final loss function is improved with the Dice loss and the weighted binary cross-entropy loss. They both help to avoid the drastic reductions in accuracy due to the data imbalance problem. Moreover, binary cross-entropy loss helps to classify pixels on the edge of object much more precisely. The proposed networks improved with partial dense connections and additional loss functions can effectively solve the linear lesion segmentation problem. Compared with other popular deep learning networks for image segmentation, our method achieves the better results. Considering the low image quality, it is difficult to capture the features of linear lesions only via image intensity in ICGA images. Even the ground truth may not be 100% correct. Most diagnosis from experts are based on abundant experiences, which is hard for the networks to learn and conclude. This may explain the low IoU ratios of all methods in our comparison.
In the future work, the segmentation performance can be improved from the following two aspects. First, the data set we used is quite small, which contains only 152 images. We will enlarge the ICGA data set to make the network more generalized. Data augmentation can be an efficient way to increase the size of data set and reduce the over-fitting problem. On the other hand, the data imbalance between object and background still affects the accuracy of segmentation, though Dice loss and weighted binary cross-entropy loss are added to the loss function. Linear lesions are so small that it is difficult for the network to learn the overall structure and shape of the linear lesions. Thus, small errors in the results may lead to drastically reduction in the IoU ratio. To overcome this problem, we will cut the input images into small patches and only keep the patches with the object in the training stage so that the network can fully learn the features of linear lesions.