Multi-categorical classification using deep learning applied to the diagnosis of gastric cancer

Introduction: Pathologists currently face a substantial increase in workload and complexity of their diagnosis work on different types of cancer. This is due to the increased incidence and detection of neoplasms, associated with diagnostic subspecialization and the advent of personalized medicine. There are numerous treatments available for different types of cancer, and the diagnosis must be dispensed quickly and accurately for each case. Deep learning is a tool that has been used in daily life, including image detection, and there is growing interest in its application in Medicine and especially in Pathology, where it has a revolutionary potential. Objective: In this article, we present deep learning, in particular convolutional neural networks, as a potential technique for the analysis of digitized images of histopathological slides, detecting identifiable patterns in an automated manner, introducing the possibility of applying this technology as an auxiliary tool in the diagnosis of neoplasms, especially in gastric cancer, the object of this preliminary study. Method: From a database of digitized images of histopathological slides representative of gastric cancer, we identified three morphological patterns of neoplasia, as well as non-neoplastic tissue patterns, with which we train a convolutional neural network algorithm, designed to identify and categorize similar images within these standards, in an automated manner. Results: The results of identification and automatic classification in the defined categories were satisfactory, with ROC curves above 0.9. Conclusion: The results show the potential application of convolutional neural networks for digitized slides of gastric cancer, in accordance with international literature findings.


introduction
The anatomopathological diagnosis of neoplastic lesions is substantially carried out by the analysis of hematoxylin and eosin (HE) stained slides (1) , evaluated through an optical microscope by a trained pathologist. This method has been used and refined since the second half of the nineteenth century (2) . Due to the high prevalence of cancer and its high mortality in the world population, responsible for 9 million deaths in 2016 alone (3) , coupled with an increasingly personalized medicine, the pathologist's work has encountered numerous challenges. The main challenges detected involve mainly the increase in workload, as a consequence of both the increased demand due to population growth, and the intense subspecialization of the area in response to surgical subspecialization (4) . Likewise, advances in knowledge about different types of cancer and the need for greater accuracy and speed in the diagnostic definition also contribute to the increase in workload, and the data provided by the pathological examination are essential for determining the best treatment available (4) .
In this context, the advent of high-resolution slide scanning systems is already a technology available on large scale, allowing for excellent quality and resolution of histological whole slide image (WSI), widely use in teaching, research and remote consulting, giving rise to virtual microscopy (5)(6)(7)(8)(9)(10) . However, there are also no automated programs for analyzing such images or data, applicable in clinical routine, which would assist the pathologist to streamline the diagnostic process, which still depends on the single analysis case-by-case by a trained individual.
The present study addresses a technique that has become popular for the past seven years: deep learning, applying it to histopathological images, and will be conducted similarly to the of Litjens et al. technique (2016) (11) . The technique was used to diagnose stomach cancer, a malignant epithelial neoplasm that affects about 990,000 individuals each year worldwide, leading about 738,000 patients to death. It is the fourth most common type of cancer regarding global incidence and the second leading cause of cancer mortality worldwide (12) . In Brazil, the National Cancer Institute [Instituto Nacional do Câncer (INCA)] estimated 13,540 new cases of stomach cancer among men and 7,750 in women for the 2018-2019 biennium; it is the fourth most common type in men and the sixth among women (13) .
Deep learning was applied specifically for the gastric cancer subtype. Histologically, the diagnosis of this tumor poses challenges due to its morphological heterogeneity, partly reflected in the diversity of histopathological classification schemes. The World Health Organization (WHO) adopts a strictly descriptive histological subclassification, which recognizes five main types of gastric cancer, the following categories designated: tubular, papillary, mucinous, poorly cohesive (including the signet ring cell variant), and mixed (14) . Other classification systems used by pathologists include the Lauren system, with the following subclassifications: diffuse, intestinal and undetermined types (15) , consisting of the microscopic description of diffuse type carcinoma in poorly cohesive cells with little or no glandular formation; whereas the intestinal type refers to carcinomas with glandular formation with varying degrees of differentiation (14) . There is also the classification of Carneiro, which recognizes four categories: glandular, isolated cells, solid and mixed (16) . It is also worth mentioning the premalignant lesions, which include neoplastic epithelial proliferations with cellular and architectural atypia, but with no evidence of invasion of the lamina propria (14) . A combination of these classifications was used in the study and will be described in the methodological section.
Deep learning, a technology that has been applied in several areas of knowledge (17)(18)(19) , as previously mentioned; it has been shown to be promising as an auxiliary tool in the detection and in the histological diagnosis of certain types of neoplasms, such as prostate cancer and breast cancer (11) . It is a family of algorithms that use large databases to detect and learn to recognize relevant patterns automatically without relying on laborious manual extraction of quantitative data for each set (18,19) . It is divided into two groups: unsupervised and supervised learning. The first one, used in this study, is the one in which, for each input sample, there is a correct answer that is presented to the training algorithm.
A specific technique for modeling within the family of deep learning algorithms, named convolutional neural network, was used. Its differential is that it contains one or more convolutional layers in its topology, indicated by the letter C in Figure 1. The learning process consists of updating the weights of the links between the nodes of the neural network layers to the sample set. A convolutional layer will try to learn the patterns (features) of the samples presented through the dynamic process of updating their weights. The process is repeated iteratively, that is, it is repeated several times to reach a result and each time generates a partial result that will be used the next time; and the algorithm operator sets a time for training to be interrupted. The time is called epoch and each epoch unit corresponds to one pass through all available samples. In the following epoch, all samples will be revisited during the weights update.
Our study investigates the application of convolutional neural network in the identification of gastric cancer in high resolution scanned images obtained from histopathological slides stained by the HE method. We, therefore, investigate the possibility of associating such a method with the pathologist's work which, in the future, may be useful in meeting the challenges of current practice.  in a resolution of 0.19 µm/pixel. After digitization, for this initial study, six slides with minimal histological processing artifacts representative of the different gastric cancer subclassifications were selected and viewed using the Pannoramic Viewer software, version 1.15.4. From these, 251 representative areas were obtained, with an increase of 20× (1145 × 707 pixels), corresponding to the different morphological variations of the adenocarcinoma and also to the representative areas of the normal gastric epithelium and other non-epithelial and non-neoplastic tissues, according to the criteria listed below: 1. non-epithelial normal tissues (NT) -any area of the lamina where there is no normal or neoplastic epithelium, containing broad representativeness of connective, muscle, vascular, adipose tissues, and lamina propria. Fifty representative areas were selected; 2. normal gastric epithelium (NG) -fundic and pyloric gastric epithelium with no metaplastic, dysplastic or neoplastic changes. Fifty representative areas were selected; 3. neoplastic gastric epithelium/tubular gastric adenocarcinoma (TGA) -gastric epithelium with moderate dysplasia and gastric cancer with glandular formation. Fifty representative areas were selected; 4. solid-type gastric adenocarcinoma (SGA) -solid type gastric cancer with no gland formation. Forty-one representative areas were selected; 5. diffuse/dyscohesive gastric carcinoma (DGC) -gastric cancer with dyscohesive cells and signet ring cells. Sixty representative areas were selected.
These categories were created based on the different classification systems of gastric cancer, in order to evaluate the automated discriminatory power between the presence or absence of neoplastic and non-neoplastic gland formation and the presence or absence of dyscohesive neoplastic or signet ring cells. Regarding the representative areas with neoplastic tissue (designated TGA, SGA, and DGC), a minimum of 70% of the total selected area should contain the defined histological pattern. Figure 2 illustrates one of the 60 representative DGC areas selected with at least 70% of the area containing dyscohesive/diffuse gastric cancer type. This value was arbitrary, and the area categorization method was chosen because it is faster compared to the frame-by-frame separation method by manually labeling the slides. These representative areas gave rise to the samples used in the present study.

methods
After selecting these representative areas, they were divided into two groups: training-set and testing-set. With the images of the training-set, the algorithm was trained. The topology of the convolutional neural network used (Figure 1) is similar to the neural network described by Litjens et al. (2016) (11) , but there are some differences: the F3 layer is composed of five nodes, since the modeling in this work takes into account five different types of classifications and the use of a dropout regularization between the F2 and F3 layers. This regularization technique has the function of preventing the network from learning very specific patterns of the presented data set, generating an overfitted model, that is, a model that fits very well with the observed data set, but proves ineffective to predict new results. The data input tensor has the size of a n × n pixels sample in three layers corresponding to the channels R (red), G (green), and B (blue), where n is the number of pixels in line and column. This three-dimensional matrix of n × n × 3 dimensions and its subsequent transformations in the network are called tensors. Layers C are convolutional, while layers M are maxpooling. The max-pooling operation layers effectively reduce the tensor area, giving the next convolution layer an opportunity to learn a pattern related to a new scale of the image. A max-pooling layer with a 2 × 2 window size selects one pixel out of four to create a new reduced area tensor. This pooling technique enables to create a network topology with more layers, hence the term known as deep learning. The number of features is a parameter used in the convolutional layers and represents the number of different patterns that the network will consider when learning.
The greatest consumption of time involving neural networks modelling is to determine the parameters of the network in which learning is best according to the chosen metrics. A network that is too complex for a given modeling will present unsatisfactory results. Another important factor for the success of training for modeling, such as pattern recognition in images, is the amount of samples available for learning, as often the appropriate number of samples reaches around the millions (19) .
At the interface between layers F2 and F3 there is also the application of the softmax function, which normalizes the e1522020 Multi-categorical classification using deep learning applied to the diagnosis of gastric cancer probabilities of each class. The trained model is able to receiving an image of n × n × 3 dimensions, in addition to delivering a vector of probabilities with five inputs, one for each class. The probability vector is normalized, that is, the sum of the probabilities of occurrence is equal to 1.
The metrics used for the evaluation of the algorithm's classification potential were sensitivity and specificity. Sensitivity measures the fraction of the number of samples correctly classified in the chosen class over the total number of samples belonging to this class. Specificity measures the fraction of the number of samples correctly classified as not belonging to the chosen class over the total number of samples that do not belong to this class. Specificity relates to the false negative metric according to the equation FN = 1 -S, where FN is the false negative rate and S is the sensitivity. The F1-score metric used refers to a harmonic mean of sensitivity and accuracy, corresponding to a measure of accuracy of the test, which values range from 0 to 1, where 1 is equivalent to perfect accuracy and sensitivity. Accuracy measures the fraction of true positives over the total amount of positives predicted by the test (true positives and false positives). All the results of the metrics were calculated from the testing-set (set of test samples not used in training).
Finally, three results will be presented. The first step was to determine the ideal sample size to be generated from the representative areas that will be input to the convolutional neural network. Samples that are too small do not have structures that can be significant for class determination. On the other hand, samples that are too large require a more complex and difficult to parameterize neural network, in addition to significantly reducing the amount of samples available and, as previously mentioned, thousands or even millions of samples are needed. The second result sought was the receiver operator curve (ROC) satisfactory for the sample size obtained in the first step. These curves are indicative of the discriminative capacity of the algorithm. The third and last step is the classification of the samples using the best parameters for the optimization of the convolutional neural network found in the previous step.

results
The sample size is a relevant parameter in modeling planning. To determine an optimal size for this study, the F1-score was calculated using a variety of different dimensions for the samples; a neural network in each case was trained. The dimensions (in pixel × pixel) used were: 8 Figure 3 shows the results of the F1-score for all classes according to the sample's lateral size. The results show that the best dimensions were 128 × 128 and 148 × 148. Then the 128 × 128 size was chosen, because even though there was a NG class with a F1-score value higher for the 148 × 148 dimension, than for 128 × 128, the other classes suffered a penalty. Using a sample size of 128 × 128, the total number of samples was 10,040, with 2,000 for NG, 2,000 for TGA, 1,640 for SGA, 2,400 for DGC and 2,000 for NT.
The ROC curves, indicative of the algorithm's discriminative capacity, referring to the five classes, are shown in Figure 4. The class with the smallest area under the ROC curve (AUC) -0.9795was NG, which can be considered an excellent performance value; this conclusion also extends to the other classes.   Finally, the third result to be presented is the classified representative areas. After each sample receives a probability vector for each class (the five probability vectors added together must equal 1), the greater vector will be the definer of the class to which the sample belongs. It is also interesting to know the degree of certainty of the classified sample. Figure 5 illustrates the results for the five different classes. Unclassified samples are the first image of each class (marked with the letter A); the second image (marked with the letter B), with its degrees of certainty; and the third image of each class (marked with the letter C), the classifications generated by the algorithm for the testing-set samples.
hypotheses were raised to explain the high performance of the obtained metrics: 1. specific characteristics of gastric cancerthe object of study may have a particularly easy identification pattern in relation to the normal condition of the tissue, when analyzed for the ideal sample size found of 128 × 128 pixels; 2. particularity of the samples used, which were selected aiming to represent as much as possible all the morphological possibilities of gastric cancer, avoiding selecting areas with many processing artifacts or other possible confounding factors. This implies that the chosen images may have made the modeling very specific for this set of cases; 3. characteristic of the methodology for defining images for sampling. For the definition of the learning samples, images were searched in which at least 70% of the area of interest corresponded only to one of the histological classifications described, avoiding capturing areas with combinations of the different classes. It is a different process of marking the classes on a total digitized slide and a subsequent post-processing step to define the samples and the probability vector to be used in the training. According to Litjens et al. (2016) (11) these two definitions are not equivalent in relation to modeling. This method of defining the samples can then bring promising results using as a test samples extracted with the same method. It does not necessarily mean that this modeling will offer the good results for a prediction in a WSI.
A broader generalization of identification requires more data for training with as much diversity as possible. It is known that, during preparation of a simple HE histopathological slide, numerous factors influence the final result of the histological cut under analysis, such as the variation in the intensity of the stains used, the processing artifacts and the cut, which can produce empty spaces between the tissues, folds and overlaps, in addition to the longitudinal or transverse direction of the cut in relation to the tissue sample, which can generate different image patterns, easily recognized by the human eye-to-brain set. However, it is a challenge for the classifying algorithm if such variations are not previously included in the training. In addition to these technical challenges that generate a diversity of patterns, the morphological nature of gastric cancer is quite heterogeneous, with the possible presence of different patterns in the same lesion, making it a classificatory challenge also for the trained pathologist (16) . This limitation extends to the definition and selection of samples and, consequently, to the classification system of the algorithm, predominantly in cases of lesions that, when analyzed in a whole slide, fit into the mixed subtype. Thus, the current classificatory definitions in the selection of samples in this study may be revised in a later study, to be applied not only in representative samples, but in a WSI. e1522020 Multi-categorical classification using deep learning applied to the diagnosis of gastric cancer discussion In this work the application of convolutional neural networks for the recognition of five classes present in histological images stained by HE, digitalized, representative of gastric cancer, was studied. The identification results were satisfactory according to the metrics of the visual inspection of the ROC curves and the AUC values, since all the ROC curves presented AUC above 0.97, which indicates an excellent algorithm classifying and discriminative capacity for all the five defined classes. Comparing these results with others identified in the literature (11) , three non-exclusive It is reasonable to expect that a larger set of data for training will decrease the performance of the model in relation to the metrics presented; however the modeling will have greater generalization power regarding the data that can be presented. During the training performed, we also verified the robustness of the models according to some cases in which a sample is informed to the training as belonging to a certain class and the prediction of this model informs another class, classifying it correctly, as observed in Figure 4. There, we see that in an image with a predominant classification of DGC pattern (Figure 4 -5A), the algorithm identified an area with a normal gland, correctly classifying it as NG (Figure 4 -5C). This is because the samples (128 × 128 pixels) are obtained from a larger area of a representative image (1145 × 707 pixels) in which only the predominant class of the larger image is reported as present in the algorithm training.
conclusion This preliminary study demonstrated that, for a defined sample size of 128 pixels, the algorithm is able to satisfactorily references identifying the different relevant structures in the gastric cancer image for classification within the five determined classes. It also shows that the application of convolutional neural networks for the classification of tissues in digitized anatomopathological slides is promising.
A larger set of samples categorized by a group of professionals and modeled as in this work can present a prediction with no individual bias in the selection of representative samples, as well as define stricter criteria for their inclusion in order to generate an even more robust classification. The next step for this study is to use a larger number of samples and change the method of defining them, marking the regions with different classes in the WSI (11) .

acknowledgements
To the Professor Evandro Sobroza de Mello, coordinator of the Pathology Service of the Cancer Institute of the State of Sao Paulo ICESP, for providing the digitalized images of the histological slides, enabling this work to be carried out.