Keywords
Deep learning, image annotation, semantic and instance segmentations, conditional GANs, nuclei segmentation
This article is included in the Artificial Intelligence and Machine Learning gateway.
This article is included in the NEUBIAS - the Bioimage Analysts Network gateway.
This article is included in the Bioinformatics gateway.
Deep learning, image annotation, semantic and instance segmentations, conditional GANs, nuclei segmentation
Over the last decade, deep learning approaches have outperformed all existing methods for image segmentation1–4. Semantic segmentation, the estimation of a label at each pixel, and instance segmentation, the identification of individual objects, were successfully applied to spatially characterize biological entities in microscopic images5–8. However, these powerful approaches rely on large annotated datasets. While more and more datasets become publicly available9,10, annotated data for every combination of modalities, tissues and biological objects is far from completion. Therefore, procedures to efficiently build training datasets are re quired to use the full potential of deep learning-based segmentation at a single biological lab scale.
In this paper, we propose a strategy to minimize the amount of time dedicated to manually annotate images and investigate several approaches to maximize accuracy when only using one annotated image. We apply this strategy to segment nuclei stained with DAPI in widefield images of human colorectal adenomas (i.e. precancerous polyps) as follows. First, we take advantage of existing training datasets11,12 and massive data augmentation to obtain a preliminary segmentation. We then use an open source annotation software12 to manually correct this segmentation and consequently define the training dataset. Next, we simulate synthetic images using a conditional generative adversarial network (GAN)13 to increase the size of the training dataset. Finally, we combine U-Net14,15, a semantic segmentation approach, and Mask R-CNN16, an instance segmentation approach, to improve the nuclear segmentation accuracy.
In this study, we used the Medical University of South Carolina (MUSC) pathology laboratory information system CoPath (Cerner Corporation, Kansas City, MO), to identify a convenience sample of colorectal adenomas excised from patients who underwent a sigmoidoscopy or colonoscopy with polypectomy between October 2012 and May 2016. For each patient, we obtained a formalin-fixed, paraffin-embedded (FFPE) tissue block and prepared one H&E and 5, 5-micron sections for immunofluorescence (IF) on FFPE tissue. Prior to the start of the IF procedures, all antibodies were optimized and reviewed by the study immunologist, the pathologist, the epidemiologist, and laboratory personnel to ensure agreement and proper staining. The MUSC Institutional Review Board has approved the research study (IRB # PRO-00007139).
DAPI was used for nuclear counterstaining. Stained slides were mounted with ProLong™ Gold Antifade Reagent (Cat. # P36934, ThermoFisher) and imaged using the Akoya Vectra® Polaris™ Automated Imaging system (Akoya Biosciences, Marlborough, MA). Whole slide scans were done at 20X magnification and regions of interest where chosen randomly.
U-Net, Mask R-CNN and pix2pix were coded in Python and used the Python libraries numpy17, tensorflow18, keras19, scipy20 and scikit-image21.
The training dataset consisted of three 1868 x 1400 images manually annotated with Annotater12. Only one image was used to train U-Net and Mask R-CNN as well as pix2pix (conditional GAN) for most of the study. The two other images were added to the training dataset in the last section to be compared with the combination of results obtained with U-Net and Mask R-CNN (see Figure 3).
The annotated 1868 x 1400 image was divided into six 622 x 700 images for training: five of these images were included in the training dataset while the last one defined the validation dataset. As U-Net is a semantic segmentation approach, three classes were defined to allow separating nuclei as proposed in 22: inner nuclei, nuclei contours and background. To facilitate nuclei separation, the nuclei contours in the training dataset were dilated22. To limit over-fitting, the imaging field for images in the training dataset was set to 256 x 256 by randomly cropping the 622 x 700 input images. These cropped images were then normalized to obtain intensity values between 0 and 1. A root mean square prop was used to estimate the parameters of the deep convolutional neural network by minimizing a weighted cross entropy loss to handle class imbalance for 100 epochs without data augmentation and 25 epochs with data augmentation. The weights associated with each class were defined from the training dataset as their inverse proportion. A data augmentation to increase the training dataset by a factor of 100 was processed after normalization with the imgaug python library23 and included flipping, rotation, pixel dropout, blurring, noise addition and contrast modifications. In Figure 2 and Figure 3, augmented simulated images were obtained by applying the same modifications with the imgaug python library to simulated images with pix2pix. When combining the annotated image for this study with simulated images and/or existing datasets, the number of augmented images was defined to be balanced between the different data.
An ImageJ macro24,25 was used to convert the three classes obtained with U-Net to individual nuclei. More specifically, individual nuclei were identified by thresholding the subtraction of the nuclei contours component from the inner nuclei component with a threshold equal to 0.35. A 3D Voronoi tessellation26 was then applied to assign each pixel to a nucleus. The object component was defined as all pixels whose background component was inferior to 0.95. This object component was then multiplied by the Voronoi tessellation to obtain individual nuclei. The Voronoi tessellation implies that a 1-pixel width area between nuclei is not assigned to any nucleus. To address this problem, the location of these pixels is obtained by subtracting the binary thresholding of the individual nuclei from the object component. The individual nuclei are then dilated27 and multiplied to this subtraction to be added to the individual nuclei. Finally, nuclei with less than 35 pixels were removed.
The annotated 1868 x 1400 image was divided into thirty-five 266 x 280 images for training: thirty of these images were included in the training dataset while the last five images defined the validation dataset. Version 2.1 of Mask R-CNN16 was used in this study. The backbone network was defined as the Resnet-101 deep convolutional neural network28. We used the code in 5 to define the only class in this study, i.e. the nuclei. A data augmentation to increase the training dataset by a factor of 100 was processed before normalization with the imgaug python library23 and included resizing, cropping, flipping, rotation, shearing, pixel dropout, blurring, sharpness and brightness modifications, noise addition and contrast modifications. Transfer learning with fine-tuning from a network trained on the coco dataset29 was also applied. In the first epoch, only the region proposal network, the classifier and mask heads were trained. The whole network was then trained for the next three epochs. In Figure 2 and Figure 3, augmented simulated images were obtained by applying the same modifications with the imgaug python library to simulated images with pix2pix. When combining the annotated image for this study with simulated images and/or existing datasets, the number of augmented images was defined to be balanced between the different data. The maximum image size used for processing Mask R-CNN was larger than 256 as resizing and cropping were applied for data augmentation and set to 512. This parameter was defined as 1024 when other existing datasets were included for training as magnification in these images is higher.
One 1868 × 1400 and one 934 × 1400 manually annotated images were used for evaluation. As proposed in 11, we used the F1 score with respect to the Intersection over Union (IoU) to evaluate the different nuclei segmentation approaches. More formally, let OGT = {OGT(e)}e=1,...,n be the set of n ground truth nuclei and OE = {OE(e)}e=1,...,m be the set of m estimated nuclei. The IoU defined between the truth nucleus OGT(e1) and the estimated nucleus OE(e2) was defined as:
An IoU (OGT(e1), OE(e2)) equal to 0 implies that OGT(e1) and OE(e2) do not share any pixel while an IoU (OGT(e1), OE(e2)) equal to 1 means that OGT(e1) and OE(e2) are identical. To ensure that one ground truth nucleus is not associated to multiple estimated nuclei and conversely, we use the following definition for the IoU:
F1 score for a given IoU* threshold t > 0 can be defined as:
where
and
With a threshold t = 0.05, this metric gives the accuracy of a method to identify the correct number of nuclei, while with thresholds in the range 0.05 − 0.9, it evaluates the localization accuracy of the identified nuclear contours.
The annotated 1868 × 1400 image was divided into thirty-five 256 × 256 images for training. As defined in 13, U-Net14 was used for the generator and a convolutional PatchGAN classifier was used for the discriminator. Once trained, nuclei masks had to be generated to simulate images. Distributions for the number of nuclei per image and the size of nuclei were defined from the training dataset. The number of nuclei per image was then modeled as a Gaussian distribution while the size of nuclei was modeled by a Gumbel distribution to reflect the heavy tail distribution observed in the training dataset. Nuclei masks were then defined as ellipses randomly generated with these distributions with random orientation and a ratio between the two axes defined according to a Gaussian distribution of average s/π and standard deviation of 0.2s/π, where s is the area of the ellipse. 1000 256 × 256 nuclei images were simulated by considering the generated ellipses as nuclei masks.
The combination of results obtained with instance and semantic segmentations was initialized as the nuclei segmented with Mask R-CNN. To prevent from hallucinations, nuclei identified with Mask R-CNN for which the area overlapping with nuclei obtained with U-Net was inferior to 20% were discarded. Then, nuclei identified with U-Net whose area overlapping with nuclei obtained with Mask R-CNN was inferior to 33% were added as new nuclei to the final segmentation. Finally, nuclei with an area inferior to 35 pixels were discarded.
A training dataset is required to train a deep learning method for object segmentation. Consequently, users most often start with manually annotating objects of interest with existing annotation tools30,31. As shown in Figure 1a, this task is particularly challenging in our case due to the wide range of morphologies and high density of nuclei in polyps. We use the ImageJ plugin Annotater12 to efficiently annotate nuclei, a task that takes approximately 30 hours. To avoid a fully manual annotation and save time, it is possible to use the same plugin to correct a nuclei segmentation obtained with an existing method. The watershed method32, probably the most used method for nuclei segmentation in fluorescence microscopy images, correctly identifies a high number of nuclei (high F1 score for a low IoU threshold in Figure 1 b-c). Unfortunately, under- and over-segmentations, a well-known limitation of this approach, lead to a poor segmentation localization (rapidly decreasing F1 score with increasing IoU thresholds in Figure 1 b-c). Alternatively, deep learning approaches can be trained with existing training datasets. We propose to use a high throughput chemical screen on U2OS cells dataset (CC) (image set BBBC039v1 available from the Broad Bioimage Benchmark Collection9) and a widefield mouse intestinal epithelium dataset (MIE)12. While U-Net demonstrates a poor performance with these datasets (Figure 1b), Mask R-CNN identifies more nuclei and mostly leads to much higher localization precision than the watershed approach (slowly decreasing F1 score with increasing IoU thresholds in Figure 1c). Correcting this segmentation with Annotater takes about 15–20 hours, which is clearly faster than an annotation from scratch. For both U-Net and Mask R-CNN, a massive data augmentation (100 times) clearly improves the performance.
When only considering the annotated image in Figure 1a in the training dataset, U-Net leads to higher segmentation accuracy than Mask R-CNN (Figure 2a-b). To increase the training dataset, we use the same annotated image to train a conditional Generative Adversarial Network (GAN)13 and simulate images showing nuclei from masks defined as random ellipses generated with the distributions of nuclei size and nuclei number observed in the training dataset (see Figure 2c and Methods). Only using simulated images lead to a lower accuracy for both deep learning approaches, even though applying mathe matical operations to these synthetic images (augmented simulated training dataset, see Methods) improves the segmentation accuracy. However, pooling together augmented simulated images and the annotated image from Figure 1a slightly improves U-Net performance and distinctly increases the number of accurately identified nuclei with Mask R-CNN while decreasing the segmentation localization precision. Finally, adding existing datasets clearly leads to the optimal results for Mask R-CNN while degrading the accuracy for U-Net, which is consistent with the inability for this approach to generalize nuclear segmentation for different data as shown in Figure 1b.
Nuclei segmented with Mask R-CNN show a higher localization precision than those obtained with U-Net as shown in Figure 1a-b. However, nuclei that are harder to delineate are missed with Mask R-CNN while U-Net accurately identifies pixels that belong to nuclei, even though the separation between individual nuclei might not be precise. In order to get the best of both worlds, we propose to combine the results obtained with U-Net trained with one annotated image with data augmentation and augmented simulated images, and the results obtained with Mask R-CNN trained with one annotated image with data augmentation, augmented simulated images and existing datasets with data augmentation (see Methods). As shown in Figure 3, these results demonstrate a higher F1 score for any IoU threshold than obtained with U-Net or Mask R-CNN trained with 3 times more annotated images. The corresponding segmented nuclei are shown in Figure 4.
This study demonstrates how to take advantage of existing training datasets, efficient annotation tools, massive data augmentation, conditional GANs and the combination of results obtained with both semantic and instance segmentations to minimize the amount of manually annotated data. When facing a new object segmentation problem, it is beneficial to find existing training datasets, even though modalities and/or tissues differ, to train an instance segmentation-based deep learning method. The segmentation obtained with this approach is then used to initialize a training dataset. Training a conditional GAN to increase the size of the training dataset improves the performance for both semantic and instance segmentations. Additionally, adding existing training datasets increases even more the segmentation accuracy for instance segmentation. Finally, combining semantic and instance segmentation results leads to the optimal result for the initial training dataset. If the final accuracy is not satisfactory, images should be processed by manually correcting the combination of semantic and instance segmentations to increase the size of the training dataset and repeat this operation until an accuracy threshold is met.
The five annotated images are available at https://github.com/tpecot/DeepLearningBasedSegmentationForBiologists/tree/main/Data/AnnotatedNuclei. This project contains the following data:
Polyp12_[10837,39273]_component_data.tiff: image used for training U-Net and Mask R-CNN in all figures and for training pix2pix in Figure 2
Polyp40_[13694,34105] _component_data.tiff and Polyp42_[12011,37598] _component_data.tiff: two images used for training U-Net and Mask R-CNN in Figure 3
Polyp12_[12699,39273] _component_data.tiff and Polyp42_[12942,36900] _component_data.tiff: two images used for evaluation in all figures
The images generated with pix2pix and used for training U-Net and Mask R-CNN in Figure 2–Figure 3 are available at https://github.com/tpecot/NucleiSimulationWithConditionalGAN/tree/main/datasets/Nuclei_polyps_1image.
The code with the parameters used to train and process all experiments presented in this manuscript with U-Net and Mask R-CNN is available at https://github.com/tpecot/DeepLearningBasedSegmentationForBiologists/tree/main/Codes.
Archived code as at time of publication: https://doi.org/10.5281/zenodo.460879533
License: GPL3
The code with the parameters used to train and generate images with pix2pix is available at https://github.com/tpecot/NucleiSimulationWithConditionalGAN.
Archived code as at time of publication: https://doi.org/10.5281/zenodo.460879334
License: GPL3
The ImageJ macro used to convert the output classes obtained with U-Net to individual nuclei is available at https://github.com/tpecot/DeepLearningBasedSegmentationForBiologists/tree/main/Codes/ImageJMacros.
Archived macro as at time of publication: https://doi.org/10.5281/zenodo.460879533
License: GPL3
This publication was supported by COST Action NEU-BIAS (CA15124), funded by COST (European Cooperation in Science and Technology). We acknowledge the Translational Science Lab at the Medical University of South Carolina for help and advice with microscopy.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the rationale for developing the new method (or application) clearly explained?
Yes
Is the description of the method technically sound?
Yes
Are sufficient details provided to allow replication of the method development and its use by others?
Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?
Partly
References
1. Ouyang W, Le T, Xu H, Lundberg E: Interactive biomedical segmentation tool powered by deep learning and ImJoy. F1000Research. 2021; 10. Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: I am a quantitative imaging specialist, focused on fluorescence microscopy, super-resolution, and quantitative analysis method development, including deep learning.
Is the rationale for developing the new method (or application) clearly explained?
Yes
Is the description of the method technically sound?
Yes
Are sufficient details provided to allow replication of the method development and its use by others?
Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Deep Learning, Computer Vision, Image and Video Processing
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 2 (revision) 17 Jan 22 |
read | read |
Version 1 30 Mar 21 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)