Automated detection of the HER2 gene amplification status in Fluorescence in situ hybridization images for the diagnostics of cancer tissues

The human epidermal growth factor receptor 2 (HER2) gene amplification status is a crucial marker for evaluating clinical therapies of breast or gastric cancer. We propose a deep learning-based pipeline for the detection, localization and classification of interphase nuclei depending on their HER2 gene amplification state in Fluorescence in situ hybridization (FISH) images. Our pipeline combines two RetinaNet-based object localization networks which are trained (1) to detect and classify interphase nuclei into distinct classes normal, low-grade and high-grade and (2) to detect and classify FISH signals into distinct classes HER2 or centromere of chromosome 17 (CEN17). By independently classifying each nucleus twice, the two-step pipeline provides both robustness and interpretability for the automated detection of the HER2 amplification status. The accuracy of our deep learning-based pipeline is on par with that of three pathologists and a set of 57 validation images containing several hundreds of nuclei are accurately classified. The automatic pipeline is a first step towards assisting pathologists in evaluating the HER2 status of tumors using FISH images, for analyzing FISH images in retrospective studies, and for optimizing the documentation of each tumor sample by automatically annotating and reporting of the HER2 gene amplification specificities.

Methods: Here, we apply deep learning-based pipeline for the detection, localization and classification of interphase nuclei depending on their HER2 gene amplification state in Fluorescence in situ hybridization (FISH) images. Our pipeline combines two CNN architectures named RetinaNet which are trained on (1) the detection and classification of interphase nuclei into normal, low-grade and high-grade and on (2) the detection and classification of FISH signals into HER2 and into the centromere of chromosome 17 (CEN17). In the first step (RetinaNet-1) nuclei are localized image-wide and a first classification is applied.
The nuclei classification conducted via RetinaNet-1 is controlled and supplemented by HER2/CEN17 FISH signal ratios for the same nucleus by RetinaNet-2. Finally, an image-wide decision on the HER2 gene amplification stage is performed.
Results: We demonstrate that the accuracy of this deep learning-based pipeline is on par with that of a pathologist. The pipeline accurately classifies FISH images as demonstrated on set of 57 validation images containing hundreds of nuclei. Consequently, high quality FISH images can now be analyzed at once regarding their image-wide HER2 gene amplification status in our lab.
Conclusions: The automatic pipeline is a first step towards assisting the pathologist in evaluating the HER2 status of tumors using FISH images, for analyzing FISH images in retrospective and for optimizing the documentation of each tumor sample by automatically annotating and reporting of the HER2 gene amplification specificities.

Background
The human epidermal growth factor receptor 2 (HER2) gene, also designated ERBB2 gene for the v-erb-b2 erythroblastic leukemia viral oncogene homolog 2, encodes a member of the epidermal growth factor receptor family of receptor tyrosine kinases. Amplification of the HER2 gene is the primary mechanism of HER2 overexpression in tumors 1 . HER2 amplification occurs before HER2 protein overexpression and consequently, monitoring of the tumor HER2 gene amplification status has become routine in breast cancer [2][3][4] surveillance. A positive HER2 status in around 25% of breast cancers is associated with poorer prognosis, more aggressive disease, and an increased risk of disease recurrence 2,5-7 . Application of HER2directed therapies such as treatment with anti-HER2 antibodies, e.g. trastuzumab, depends on the detection of the HER2 gene amplification and increases overall survival of individuals suffering from HER2 positive breast cancer 2,[6][7][8][9][10] . In addition to breast cancer, HER2 status testing is also applied in gastric cancers as trastuzumab is similarly effective in prolonging survival in HER2 positive carcinoma of the stomach and of the gastroesophageal junction 2,11 . HER2 testing is commonly carried out by immunohistochemistry (IHC), chromogenic in situ hybridization (CISH), silver-enhanced in situ hybridization (SISH) or Fluorescence in situ hybridization (FISH). In interphase nuclei of investigated tumor material, HER2 gene amplification testing is preferentially conducted via FISH 12 . In FISH analysis a HER2 positive state is defined when a HER2/CEN17 ratio of more than 2.2 is detectable, whereas CEN17 is a centromeric probe for the centromere of chromosome 17 on which the HER2 gene resides. Negative HER2 FISH amplification is defined as HER2/CEN17 ratio of less than 1.8 12 .
Without an internal control probe such as CEN17, HER2 positive FISH is defined when above six HER2 genes are detectable per interphase nucleus while the equivocal range is defined with an average copy number of four to six HER2 genes per nucleus. Normal nuclei harbor two or fewer HER2 genes 13 .
In clinical practice, the analysis is determined by the pathologist by observation of the FISH slide using the fluorescence microscope. The decision making relies on the individual expert knowledge of the pathologist and is dependent on standardization of the methodology, labdependent routines and finally on the quality of the FISH images (background signals, artifacts, tissue quality, and fluorescence microscopedependent parameters). Pathologists analyze the HER2 gene amplification status of a tumor sample via evaluation in comparison to control samples. Testing criteria define HER2 positive status when (on observing within an area of tumor that amounts to > 10% of contiguous and homogeneous tumor nuclei) there is evidence of HER2 gene amplification based on counting at least 20 nuclei within this area 14 . By counting and classification of at least 20 interphase nuclei from different areas of the FISH slides a diagnostic decision is possible regarding a positive or negative state of HER2 gene amplification and its HER2 grade (low or high). The diagnostic relies on ratios of HER2 to CEN17 signals per nuclei on which the subsequent classification of the corresponding tumor sample is conducted.
While there are already automation methods for extracting features from microscopic images such as spot detection and counting 15,16 , during the last years a notable increase in deep learning applications for classification tasks of pathological microscopic images were developed and successfully conducted on a wide field of applications 17 . Image classification tasks are commonly applied via Convolutional Neural Networks (CNNs) which rely on convolutional and non-linear transformations of the input data for a high-level abstraction classification 18 . Deep learning approaches such as CNNs have been already performed on pathology image classification, tumor classification, on imaging mass spectrometry data 16 , in the identification of metastatic cancer areas 17 or annotation of pathological images 19 . Recently, CNNs were applied for signal detection and counting in nuclei from FISH images (SpotLearn) 20 and segmentation of chromosomes in multicolor FISH images 21 . SpotLearn includes two supervised machine learning-based analysis workflows for the high-throughput segmentation and classification of large and diverse sets of FISH signals. FISH signals are detected with high accuracy in three separate fluorescence microscopy channels 20 . We aimed to develop a pipeline based on CNNs that works in one only channel because in our certified routine diagnostic workflow FISH signals are captured using a graded filter recording the different HER2 gene and CEN17 signals in one step which cannot be differentiated by SpotLearn. Furthermore, we targeted the localization of nuclei and FISH signals using fast one-stagedetectors rather than applying a segmentation algorithm workflow as applied in SpotLearn.
Therefore, we generated a lab-specific pipeline for automatic classification of FISH images comprising of many interphase nuclei into normal, low and high-grade on the basis of CNNs to be specifically used at our institute. The pipeline consists of two trained RetinaNet 22 steps for an image-wide classification of the HER2 gene amplification status. While the first RetinaNet (RetinaNet-1) detects and pre-classifies the nuclei, in the second RetinaNet step (RetinaNet-2) the HER2 and CEN17 signals are counted for each nucleus providing detailed information on each pre-classified nucleus.
RetinaNet is a state-of-the-art, real time object detection and classification network with the aim of fast, accurate recognition of a wide variety of objects 22 . It relies on a Feature Pyramid Network 14 backbone on top of a feed-forward ResNet 23 architecture. To this backbone, RetinaNet attaches two subnetworks: one for classifying anchor boxes and one for regressing from anchor boxes to ground-truth object boxes 22 . Together with this architecture the new loss function, focal loss, is used that acts as a more effective alternative to previous approaches for dealing with class imbalance. RetinaNet is potentially more efficient than other state-of-the-art one-stage detectors because of the focal loss of RetinaNet, which applies a modulating term to the cross entropy loss in order to focus learning on hard negative examples, achieving state-of-the-art accuracy and speed of two-stage detectors 24 . To use these advantages for the detection on pathological samples, we applied RetinaNet in our deep learning-based system targeting the automation of FISH image evaluation regarding the HER2 grade detection at high accuracy and compared the performance of our system with the pathological assessment.

Material and Methods
Preparation of slides, Fluorescence in situ hybridization (FISH) and image capturing Formalin-fixed Paraffin-embedded (FFPE) cancer tissue is delivered from clinical institutions from all over Germany (up to 20 patients per week). FFPE tissue is cut into small pieces (2µm) on a slide and dehydrated using first a xylene washing step subsequent flowed by a series of ethanol steps (100%, 96%, 70%). After drying the slide at room temperature slides are incubated with sodium thiocyanate followed by a wash step using distilled water.
Subsequently, slides are incubated with pepsin and hydrochloric acid, washed using distilled water and dried at room temperature. Probes (PathVysion HER-2 DNA Probe Kit II, Abbott Inc.) are hybridized at 37°C in a wet chamber overnight. Washing of slides occurs in 2x salinesodium citrate (SSC) buffer and DAPI counterstaining is conducted. Images are taken using fluorescence microscope (Axioskop 2, Zeiss Inc.) using a graded filter (Filter Set 23 (488023-0000-000), emission: 515-530 nm + 580-630 nm, Zeiss Inc.), recording HER2 gene signals, CEN17 signals and a small subset of DAPI signals at once. An overview about the training data set is given in Table 1 for the RetinaNet-1 and in Table 2 for the RetinaNet-2. The nuclei images used as training data for the RetinaNet-2 was randomly chosen from the FISH images used for training the RetinaNet-1. The RetinaNet-1 detects and classifies nuclei in a FISH image. A potential nuclei (or artifact) is marked via a bounding box and is additionally extracted and stored as an individual image file. A report text file containing the number of detected nuclei and their classification as well as the number of uncertain cases and artifacts is generated. The number of normal, low-grade and high-grade nuclei per FISH image is used for calculation of two ratios (ratio-1 and ratio-2): ratio-1 is low-grade nuclei/number of all detected nuclei and ratio-2 is high-grade nuclei/number of all detected nuclei. A FISH image is defined to be low-grade when ratio-1 is at least 0.2 while a FISH image is classified to be high-grade when ratio-2 is at least 0.4. These thresholds can be modified by the pathologist according to individual specificities and criteria. On top of the RetinaNet-1, the RetinaNet-2 detects and classifies the FISH signals in a single nucleus. Detected FISH signals were classified into HER2 signal, HER2 cluster (representing many not differentiable single HER2 signals) and CEN17 signals. All signals were counted respectively and for each nucleus the HER/CEN17 ratio is calculated. As soon as a HER cluster is detected the HER2/CEN17 ratio is automatically set to 10. If no HER2 signals are detected, the HER2/CEN17 ratio is automatically set to 1. In case no CEN17 signal is detected the nucleus is classified as artifact. For each FISH image the average HER2/CEN17 ratio is calculated on the basis of all HER2/CEN17 ratios from all detected nuclei from this FISH image. A HER2/CEN17 ratio greater than 1.5 and lower than 6.0 indicates a low-grade status of the FISH image. A value greater than 6.0 indicates a high-grade status of the FISH image. The RetinaNet-2 works automatically on top of the RetinaNet-1 and reports its detections in a second report text file. Each annotated nucleus is automatically stored as image file. Details on the training process for both RetinaNets, respectively, were as follows. We used a focal loss function22 for classification, and a smooth L1 loss function for bounding box regression together with the Adam optimizer25 with a fixed learning rate of 10-4. A batch size of 1 was used due to GPU memory limitations. The network was trained for 50 epochs on a single NVIDIA GPU (GeForce GTX 1080Ti) and took approximately 48 hours.

Results
The certified FISH protocol is used routinely on the daily diagnostics for breast and stomach cancer patients and implements a standard procedure, which has been in use since 16 years. To enable an automated, in-house detection service we trained our pipeline on breast cancer FISH image samples originating from this routine diagnostics to enable applicability of the pipeline at our Institute. Image capture of FISH microscope images were taken using a graded filter recording HER2 and CEN17 signals in one step. The pipeline was trained for the detection of the HER2 gene amplification status into normal, low-and high-grade stage of routine FISH images from breast cancer samples. It relies on the implementation of two RetinaNets 22 trained on individual tasks respectively: The RetinaNet-1 with Resnet-50 23 backend was trained on up to 300 FISH images containing thousands of nuclei for nucleus detection and classification into high-grade, low-grade and normal nuclei (as well as artifacts and uncertain cases). The RetinaNet-2 with Resnet-50 backend was trained on up to 300 single nuclei images containing thousands of FISH signals for detection, classification and counting of FISH signals in each nucleus (HER2 single signals, HER2 cluster and CEN17 single signals). On the basis of the predictions of the RetinaNet-1 and the RetinaNet-2 on all nuclei of a FISH image a final decision is possible on the image-wide HER2 gene amplification status of the FISH image. This decision-making process is comparable to pathological assessment as in a first step nuclei are image-wide localized and classified and secondly a confirmation of the classification is applied on the basis of HER2/CEN17 ratios for each nucleus. The two major steps of our pipeline are explained in detail in the following sections. The training and prediction is illustrated in Figure 1.

Figure 1. Overview of the detection pipeline of the HER2 gene amplification stage in FISH images from breast cancer samples.
Acquisition and manual labelling of the training data sets FISH slides were prepared from FFPE tumor samples as described in the methods section. Probes against the HER2 gene and against the centromere of chromosome 17 were performed using the PathVysion HER-2 DNA Probe Kit II (Abbott Inc). Images were taken using the fluorescence microscope Axioskop 2 (Zeiss Inc.) using a graded filter (Filter Set 23 (488023-0000-000), emission: 515-530 nm + 580-630 nm, Zeiss Inc.), recording HER2 gene signals, CEN17 signals and a small subset of DAPI signals at once at a magnification of 100x. Images were captured using the Image -Pro MC 6.0 software and saved in JPEG file format with a size of 1200 x 1600 pixel. Our pipeline is optimized on these FISH images generated. We used up to 300 routine FISH images of high quality (minor or no background noise, minimal number of artifacts, no overlapping nuclei) from breast cancer samples (see detailed characterization of images in material and methods section). These FISH images represent a randomly selection of all FISH images of high quality routinely stored for training and documentation purposes from routine diagnostics of all analyzed breast cancer tumor samples which have been processed during the last three years at our institute. The manual annotation was conducted by a pathologist via labelling (bounding boxes) high-grade, low-grade, normal nuclei as well as artifacts and uncertain nuclei for validation and test FISH images. HER2 and CEN17 FISH signals as well as cluster of HER2 signals were manually labelled (bounding boxes) by a pathologist in up to 300 nuclei randomly chosen from all nuclei occurring within the ~300 previously mentioned FISH images.

RetinaNet-1: Detection and classification of interphase nuclei in FISH images
Training was performed on the manually labelled FISH images (n = 299) containing in total thousands of high-grade, low-grade and normal nuclei, as well as uncertain cases and artifacts (Tab. 1). The data was augmented using rotations, translations, shearing, scaling and horizontal and vertical flip. We used a focal loss function 22 for classification, and a smooth L1 loss function for bounding box regression, and the Adam optimizer 25 with a fixed learning rate of 10 -4 . A batch size of 1 was used due to GPU memory limitations. The network was trained for 50 epochs on a single NVIDIA GPU (GeForce 1080Ti) and took approximately 48 hours.
The trained RetinaNet-1 automatically detects, localizes (via bounding boxes) and counts the number of normal, low-grade and high-grade nuclei as well as unidentifiable objects (uncertain cases and artifacts) image-wide. Each detected nuclei was stored as an individual image file using the detected bounding box and used in the RetinaNet-2 for FISH signals detection (see section below) as well as for potential manual reevaluation and documentation purposes. Per FISH image, two ratios were calculated which allow conclusion about whether the image represents a positive (low or high grade) or normal state. The low-grade ratio (= ratio-1) indicates whether the FISH image is classified as HER2 low-grade and the high-grade ratio (= ratio-2) reports how likely it is that the FISH image is classified as high-grade. These ratios were calculated as follows: number of low-grade nuclei or high-grade nuclei divided by the sum of all detected and classified nuclei, respectively. As threshold we used values greater or equal to 0.2 for a nucleus to be a low-grade nucleus and 0.4 for a nucleus to be a high-grade nucleus. However, these thresholds are manually customizable according to the pathologist's definitions on specific ratios of the classified nuclei. Finally, the absolute occurrence of each class and the ratio-1 and ratio-2 were denoted in a report text file. Two exemplarily FISH images (low-grade and high-grade) complemented with the visualization of RetinaNet-1 object detection and classification results are shown in Figure 2.

Figure 2. Application of RetinaNet pipeline on two Fluorescence in situ hybridization (FISH) images for interphase nuclei detection and classification. (A)
A high-grade stage was detected due to numerous high-grade nuclei. Only one nucleus was low-grade because it comprises four HER2 gene signals. One nucleus that was not detected is marked with a red arrow. (B) A low-grade stage was detected due to five low-grade nuclei. Many nuclei were only classified as uncertain due to missing information on HER2 signals.
To validate the applicability and reliability of the first RetinaNet approach in routine diagnostics, 57 test high quality FISH images, containing 1,175 nuclei, were subject to image-wide nuclei detection and classification and compared to the annotation by a pathologist, considered as ground truth. The number of normal, high-grade, low-grade and unidentifiable nuclei (including artifacts and uncertain cases) were independently determined by the pathologist and by the RetinaNet-1 for each of the 57 FISH images. Table 3 shows the results of the classification of the 1,175 nuclei as a confusion matrix. The classification performance is summarized using Cohen's kappa κ, a statistic measuring the degree of agreement between the predicted and the ground truth classification compared to a classification by chance. This results in κ=0.64, representing substantial agreement 26 over the whole validation set of nuclei (n=1,175). However, differentiation between normal from low-grade nuclei appears difficult, as shown by the prediction accuracy (acc) and reliability (rel) in Table 3, whereas high-grade nuclei were classified with high accuracy (acc=0.82) and reliability (rel=0.92). In addition, the accuracy of detection and classification of nuclei differed per image, ranging from poor accuracy (acc<0.5, 5 images) to near perfect classification (acc>0.85, 10 images) (Suppl . Table 1), with a mean accuracy of 0.73.
Nuclei in FISH images from our routine diagnostics might be of reduced quality compared to up-to-date fluorescence images as they have to be prepared under time limitation and a standardization procedure. High background noise, an increased number of artifacts and large differences in the number and shape of nuclei as well as overlapping nuclei all together influences the image quality of the captured nuclei. In addition, the quality depends on the input tumor material and available tissue type for analysis. To test the robustness of RetinaNet-1, we manually subdivided the nuclei from our investigated FISH images into the two groups "high quality" and "low quality" nuclei. Nuclei in the "high quality" group are characterized by clearly differentiable HER2 and CEN17 signals and by a uniform and regular nucleus shape without overlapping by further nuclei. In contrast, nuclei in the "low quality" group showed blurring of FISH signals, overlapping by further nuclei, very weak FISH signals or signal artifacts which made it difficult to adequately detect the signals. As shown in Table 3, Cohen kappa is reduced from substantial agreement (κ=0.64) for high quality nuclei to moderate agreement (κ=0.54) in the case of low quality nuclei. In particular, the accuracy of classification for high-grade nuclei is reduced between the high-and low quality nuclei (acc=0.93 vs. acc=0.55).

RetinaNet-1 on nuclei -all nuclei Pathologist
RetinaNet-1 classification performed via the RetinaNet-1. Each nucleus detected via the RetinaNet-1 is automatically fed into the RetinaNet-2 where FISH signals were classified and counted, documented in an image-wide report text file and visualized in an additional nucleus-specific image file. RetinaNet-2 predicts a bounding box and classifies each individual HER2 signal, HER2 cluster and CEN17 signal. Afterwards the boxes are counted and the ratio of HER2/CEN17 signals is calculated per nucleus. If no HER2 signals were detected the HER2/CEN17 ratio was automatically set to 1. In case no CEN17 signal was detected the nucleus was classified as uncertain. When a HER2 cluster was detected the HER2/CEN17 ratio was set to 10 as a HER2 clusters may contain a high but unknown number of HER2 signals. The average and image-wide HER2/CEN17 ratio was calculated on the basis of all detected nuclei harboring CEN17 and HER2 signals. This quantity was used to decide the image-wide HER2 gene amplification status of the corresponding FISH image.
To measure the performance of RetinaNet-2 for nucleus classification, 50 randomly selected nuclei were analyzed by the RetinaNet-2 and compared to the manual annotation by the pathologist. In six cases a different classification was revealed (Tab. 4). In three of the six cases, a normal nucleus was classified via the RetinaNet-2 while the pathologist detected a lowgrade nucleus which was caused due to missed HER2 single signal detection via the RetinaNet-2. In two of the six cases the RetinaNet-2 detected a high-grade nucleus while the pathologist classified these nuclei as normal.
The reason was that RetinaNet-2 detected a HER2 signal as HER2 cluster because of strong blurring of the single HER2 signal mimicking a HER2 cluster. In one of the six cases, a classification via the RetinaNet-2 was not possible although the same number of HER2 signals and CEN17 signals was found in comparison to the pathologist. However, because only one HER2 signal was identified, the RetinaNet-2 classified the nucleus as "uncertain". 1 high high n.d. HER2 signals were not quantified in the Ground Truth in case at least one HER2 cluster was identified. HER2 cluster were not counted in the Ground Truth because of the strong subjective aspect of this procedure. The value was set to 1 when at least one HER2 cluster occurred.
To validate the applicability and reliability of the RetinaNet-2 approach in the image-wide classification, 57 test FISH images (same images which were used for validation of the RetinaNet-1) were subject to their image-wide nuclei detection and classification compared to the ground truth annotated by the pathologist. The comparison was also conducted to "high quality" and "low quality" nuclei as previously done for the RetinaNet-1 to test the robustness on nuclei images of lower quality (Tab. 5). Again, we find a substantial agreement between our deep learning system and the human pathologist, but at a higher level of agreement, κ=0.76. In particular, the classification accuracy of normal nuclei has increased as compared to RetinaNet-1 (acc=0.40 and acc=0.72, respectively, Tab. 3). Whereas for low quality nuclei, we find only a moderate agreement (κ=0.55), similar to the performance of RetinaNet-1, for nuclei recorded at high quality, we find the classification performance of RetinaNet-2 κ=0.85, representing an almost perfect agreement with the pathologist. Nevertheless, a minor number of HER2 double or triple signals in very close vicinity were annotated as HER2 cluster leading to the wrong overall assumption that a high-grade nucleus occurred.
The accuracy of both RetinaNets was compared regarding the image-wide detection and classification of nuclei in the 57 test FISH images (Suppl. Tab. 1). Similar to RetinaNet-1, we found for RetinaNet-2 that the detection and classification performance differs between images. Interestingly, however, several images where RetinaNet-1 performed poorly were wellclassified by RetinaNet-2 and vice versa, indicating these two approaches are complementary and are best used in combination (Suppl. Tab. 1). However, for most of the images the accuracy equals (Fig. 3A) but a few images show larger differences in their accuracies. Four example images were depicted where (1) the accuracy was 100% for both RetinaNets (Fig. 3, image 35), (2) the accuracy was low for RetinaNet-2 but high in RetinaNet-1 (Fig. 3, image 16), (3) the accuracy was lower for RetinaNet-1 compared to RetinaNet-2 (Fig. 3,  image 9) and (4) the accuracy was similar low for both RetinaNets (Fig. 3, image 47). Nuclei in the FISH images are marked with a red arrow where a different classification was obtained by Table 5. Classification performance of RetinaNet-2 on validation images (n=57).
both RetinaNets (Fig. 3C). In image 35 nuclei are clearly distinguishable and show massive amplification of the HER2 gene, which can be easily and clearly detected by both RetinaNets. Therefore, no differences in the nuclei classification were detected. In image 47 the performance of the two RetinaNets is equally low due to the general low quality of many nuclei occurring in the image. In addition, the overall number of nuclei in the image is low so that the influence of the "low quality" nuclei on the imagewide classification is higher. Reasons for different classification between RetinaNet-1 and RetinaNet-2 in images 9 and 34 may be due to weak and blurring FISH signals not seen by RetinaNet-2 and/or the interpretation of very adjacent located HER2 gene signals as HER2 cluster by RetinaNet-2 leading to false classification of the corresponding nucleus. More precisely, Figure 4 shows selected and representative examples on three cases for a same and three cases for a different classification between both RetinaNets. The three nuclei in the left column ( Fig. 4A-C) were classified identically by both networks and the classification corresponds to those of the pathologist, providing stronger confidence in the correct classification. The three nuclei in the right column, however, were classified differently (Fig.  4D-F). In the first case (Fig. 4D) RetinaNet-2 detected three HER2 signals in close vicinity as a single HER2 cluster, leading to a misclassification as high-grade nucleus while the RetinaNet-1 correctly classified this nucleus as low-grade.

RetinaNet-2 on nuclei -all nuclei Pathologist
RetinaNet-2 In the second case (Fig. 4E), RetinaNet-2 missed the detection of HER2 and CEN17 signals, presumably due to overexposure, and therefore a misclassification of the nucleus as normal was conducted. The RetinaNet-1 correctly classified the nucleus as low-grade. Finally, in Figure 4F, RetinaNet-2 correctly detected all signals but classified the nucleus as normal in contrast to RetinaNet-1 which conducted a classification as uncertain. However, the pathologist´s classification was low-grade.
Automated classification of high quality FISH images into normal, low-and high-grade Our nuclei detection and classification system relies on the combination of two steps performed by the RetinaNet-1 and the RetinaNet-2 enabling a final decision on the HER2 gene amplification status with HER2 and CEN17 FISH signal counting of the whole FISH image. The decision relies on ratios being calculated in both RetinaNet steps. In the RetinaNet-1 the ratio-1 and the ratio-2 (ranging from 0 to 1, respectively) are calculated and indicate on the relative number of low grade nuclei (ratio-1) and high grade nuclei (ratio-2), respectively, compared to the overall occurrence of all classifiable nuclei. A low-grade or high-grade stage of is indicated by ratio-1 greater or equal to 0.2 and by ratio-2 greater than 0.4, respectively. Both thresholds are modifiable with respect to the pathologist's specified criteria. In the RetinaNet-2 an imagewide HER2/CEN17 ratio is calculated as average value among all nuclei-specific HER2/CEN17 ratios of classifiable nuclei. A HER2/CEN17 ratio greater than 1.5 and lower than 6.0 indicates a low-grade status of the FISH image. A value greater than 6.0 indicates a high-grade status of the FISH image. The maximum average value is 10.0 because the highest value a single nucleus can obtain is 10.0 due to the fact that as soon as a HER2 cluster is detected the value is automatically set to 10.0. The overall image-wide classification of the HER2 gene amplification status is mostly identical between our pipeline (RetinaNet-1 and RetinaNet-2) and the pathologist on the 57 test FISH images (Tab. 6).
In two of the 57 cases a different classification of was denoted. In one of the two cases, the RetinaNet-1 classified a low-grade FISH image while the RetinaNet-2 had a tendency towards a high-grade image. In the second case, the RetinaNet-1 classified the image to be low-grade while the RetinaNet-2 classified it as high-grade due to a misclassification of one normal nucleus as high-grade nucleus.  studies of large amounts of documented FISH images collected over several years at our Institute for re-evaluation. Another application could be the enhancement of the documentation quality of the images. Furthermore, an anonymized and human-independent evaluation of the HER2 gene amplification level is possible. Analyzing one FISH image including the generation of the annotated image data and the report files occurs in less than a second which is quite faster than comparable human evaluation so far. Therefore, we interpret our pipeline as a first step towards the automation of the HER2 gene amplification detection in FISH images.
Image-wide ratios representing the number of abnormal nuclei in relationship to all classified nuclei are calculated which serve as guideline for classifying the HER2 gene amplification status of the corresponding tumor sample from which the FISH image originated from. Our pipeline works on the basis of two CNNs for localization and classification, called RetinaNet. RetinaNet-1 detects and classifies nuclei in the FISH images and calculates two ratios. Ratio-1 (ranging from 0 to 1) represents how frequently nuclei with a low amplification status of HER2 genes occurred and ratio-2 (ranging from 0 to 1) indicates the same for nuclei with high amplification status of the HER2 gene.  (3) increase the number of pathologists in annotating the data. However, due to performing the FISH diagnostics on slides originating from FFPE material it might be difficult to obtain higher quality images. Training the pipelines on these samples will largely enhance its performance on future cases of a similar reduced overall quality. Nevertheless, even now our pipeline (trained on high quality FISH images) makes predictions on the HER2 amplification status of the tumor on the basis of these low quality FISH images demonstrating the general potential of deep learning on this task (Suppl. Fig. 1). It should be noticed that, in clinical practice, pathologists do not analyze every nucleus in a FISH image. Instead, a certain number of nuclei (at least 20) are selected and, in this process nuclei are excluded that are difficult to analyze, e.g. due to low image quality. Additionally, variations in the experimental setup among different pathology labs might result in different shape, structure and nuclei composition of the FISH images (e.g. used antibodies and fluorophores, tissue type, tissue preparation protocol, consideration of DAPI staining, fluorescence microscope type and parameters). Therefore, a customization of our pipeline, e.g. setting different thresholds, and additional training of both networks will be necessary to adapt the detection and classification pipeline to lab-specific conditions and lab-specific investigated tissue types in order to automatize the HER2 amplification detection of tumors in other pathology labs.
Pathologists normally analyze the FISH slides directly under the fluorescence microscope. Due to shifting in z-dimension HER2 and CEN17 signals of one nucleus can be located which are not detectable in a single 2D position only. Since, however, a FISH image is only a 2D representation of the 3D space of the FISH slide only limited information is available for the nuclei classification for our pipeline potentially leading to false estimations of the HER2 gene amplification status of the corresponding tumor sample. Therefore, a deep learning application based on nuclei detection and classification on at least a stack of images representing the 3D space of the FISH slide will be largely superior compared to the 2D solution used in our study. Our pipeline is in principle able to make nuclei detections and classifications on videos which is under investigation in our lab. Future solutions should directly implement one-stage detectors or similar CNN architectures into the fluorescence microscope for instantly classifying the nuclei while the pathologist is observing it. A comparable solution was recently developed by Google Inc. for marking tumor areas in Hematoxylin and Eosin stained slides 27 .
Alternatively, a fully automated software solution recording all layers and positions of a FISH slide as large data input for the deep learning-based nuclei detection and classification might be used.

Competing Interests No
Authors' contributions FZ, WdB and PH wrote the manuscript. FZ, WdB and PH designed the study. FZ, WdB, MW, RM and TW planed and conducted the bioinformatics data preparation and analysis. SZ and DEA generated the FISH data. KF, SZ, DEA and GB performed the pathological analysis of the data. KF, DEA, IR and GB supervised the project and assisted in the writing of the manuscript.
Supplemental Figure 1. Two examples of the application of our pipeline on FISH images of low quality. In (A) a normal stage was detected and corresponds to the decision of a pathologist. In (B) a high-grade stage was detected which also corresponds to the pathologists decision on the FISH image. Numerous nuclei have not been detected in both images indicating the limitations of our pipeline (RetinaNet-1) on FISH images of low quality. Training on a large set of FISH images of low quality would enhance the accuracy in detecting most nuclei.