Early diagnosis of gastric cancer based on deep learning combined with the spectral-spatial classification method

: The development of an objective and rapid method that can be used for the early diagnosis of gastric cancer has important clinical application value. In this study, the ﬂuorescence hyperspectral imaging technique was used to acquire ﬂuorescence spectral images. Deep learning combined with spectral-spatial classiﬁcation methods based on 120 fresh tissues samples that had a conﬁrmed diagnosis by histopathological examinations was used to automatically identify and extract the “spectral + spatial” features to construct an early diagnosis model of gastric cancer. The model results showed that the overall accuracy for the nonprecancerous lesion, precancerous lesion, and gastric cancer groups was 96.5% with speciﬁcities of 96.0%, 97.3%, and 96.7% and sensitivities of 97.0%, 96.3%, and 96.6%, respectively. Therefore, the proposed method can increase the diagnostic accuracy and is expected to be a new method for the early diagnosis of gastric cancer.


Introduction
Gastric cancer (GC) is one of the common malignancies that originate from epithelial cells on the gastric mucosa. According to global data, GC is the fourth most common malignancy in the world and the second leading cause of cancer-related deaths [1]. The occurrence and development of GC are complicated and affected by multiple factors such as environment and heredity, and the influence of these factors on the occurrence of GC has not been fully elucidated. The five-year survival rate of advanced GC is still lower than 30% even after the comprehensive treatment of surgery, chemotherapy, and radiotherapy [2], while the five-year survival rate after the treatment of early GC can be more than 90%, even reaching the cure effect [3]. Therefore, early diagnosis of GC is very important.
The occurrence and development of GC is a complex process of multistage, multistep, and multiple mechanisms. There are a series of intermediate stages (including the precancerous state). At present, the more recognized pattern of human GC was proposed by Correa [4]: "normal gastric mucosa -chronic non-atrophic gastritis -atrophic gastritis -intestinal metaplasia -dysplasia -gastric cancer." The diseases of atrophic gastritis (AG) and intestinal metaplasia (IM) are considered to be precancerous lesions that are highly associated with GC [5]. AG and IM have a greater risk of developing into GC if not treated in time. Their early detection and timely treatment have important practical significance for the prevention and treatment of GC.
In view of the wealth of information available about FHSI and the biochemical complexity of tumor development, in this paper, a modeling method based on deep learning combined with the spectral-spatial classification method is proposed for fluorescence hyperspectral images. This method uses the entire spectrum of each pixel and the information from its neighborhood to build the model. This is applied to the early diagnosis of GC to provide an objective, rapid, and accurate diagnostic method for the early diagnosis of GC.

Experiment equipment and image acquisition
In this experiment, an FHSI system designed and built by the Department of Optoelectronic Engineering at Jinan University was used. The core components included an ultraviolet xenon light source with a central wavelength of 361 nm and its interference filter, a liquid-crystal tunable filter (LCTF, CRI Inc., VariSpec VIS, Connecticut City, United States) useful for narrow-band filtering within the 400-720 nm range, a 16-bit complementary metal-oxide semiconductor (CMOS, Hamamatsu Inc., Hamamatsu City, Japan), a dichroic mirror with a cutoff wavelength of 425 nm (Thorlabs Inc., New Jersey City, United States) that can separate laser light, and a signal light. The system also contains a lens set, video capture card, and computer. Figure 1 shows the FHSI system. Spectral cube acquisition: The sample was placed on a slide. The microscope was adjusted at low magnification (5×) until a clear image could be seen. Following that, a 361 nm ultraviolet laser was used for fluorescence emission, and one image frame was acquired at every 2 nm interval in the 450-680 nm range. In total, 116 fluorescence spectral images were acquired for each patient (experiment temperature: 25 ± 1°C, humidity: 46% relative humidity). The large network model and parameters in a convolutional neural network (CNN) require a large amount of internal memory and computing resources during the training phase, and has higher requirements for the computing platform. The main advantage of GPU over CPU is the parallel computing capability, thus reducing the training time, and we constructed a relevant hardware platform to solve the problem of high-speed calculations by selecting a suitable graphics processing unit (GPU).The specific information of the computing platform is as follows: CPU :  Intel Core i5-8500, base frequency: 3 GHz, GPU: NVIDIA chip, GTX1080Ti motherboard,  11-GB GDDR5 memory with 3584 stream processing units, internal memory: 16-GB DDR4 RECC internal memory, power supply: Jetsoft 600 W, and motherboard: Gigabyte Z270 HD3.
All data preprocessing was performed using our open-source SIproc software, implemented in C++ and CUDA. Training and testing were performed in Python using open-source software packages. TensorFlow, which leveraged the TFlearn interface, was used to design and implement CNNs.

Experimental samples
The fresh gastric mucosal tissues used in this study were obtained from the Departments of Gastroenterology at the People's Liberation Army 74th Group Army Hospital and Zhujiang Hospital of Southern Medical University. There were 120 patients: 62 male and 58 female. The mean age was 55 years. Among these samples, there were 53 cases with nonprecancerous lesions (including normal gastric tissue and nonatrophic gastritis), 35 cases with precancerous lesions (AG accompanied by IM), and 32 cases with GC. Other pathological types were excluded.
During gastroscopic examination, conventional disposable biopsy forceps (tip width of 6 mm) were used. Two adjacent tissue-sample blocks of apparent lesions were collected. One block was used for histopathological diagnosis, while the other was not treated but placed inside a cryovial and immediately stored in a liquid nitrogen tank. The time interval from tissue sample collection to spectral image detection was 1 week. All samples were operated by endoscopists with five or more years of experience, and diagnoses were made by pathologists with eight or more years of experience. Informed consent was obtained from patients for the collection of gastric mucosal tissue samples. This process conformed to ethical requirements, and the patients were informed of the situation.

ResNet
The residual network (ResNet) was proposed by Kaiming He [35] et al. in 2015. Compared with the classic CNN architecture, ResNet's main feature is the introduction of residual connectivity, which can solve its performance degradation problem when training a very deep network [36,37]. At present, a large number of ResNet-based works have been proposed and successfully applied to different fields such as natural language processing [38], speech recognition [39,40], and remote sensing image classification [41]. The application of ResNet has also improved the performance of the corresponding models.
In general, a deep residual network consists of a set of residual blocks, each containing several stacked convolutional layers [with a modified linear unit (ReLU) and a batch-normalized layer attached as a convolutional layer]. A residual block with an identity map can be expressed as Eq. (1).
where h l and h l+1 are the input and output of the first residual block, respectively; Relu(.) is rectified linear unit function; F is the residual mapping function; and ω l is the parameter of the residual learning unit. Specifically, when the channel (size) of F(h l ,ω l ) and h l are not equal, a linear projection is usually applied to match the size, so Eq. (1) can be further converted to Eq. (2).
The Resnet34 framework was built in this paper and is shown in Fig. 2. A basic residual block is shown in Fig. 2(B). Figure 2(A) shows a 34-layer plain network, and Fig. 2(B) shows a 34-layer residual network. Compared with the plain network, the residual network has a shortcut that directly implements the direct connection route of the unit map, similar to a "short circuit." Through experiments, this structure with a shortcut can effectively solve the problem of gradient disappearance.

Assessment of model parameters
During the construction of the CNN diagnosis model, the accuracy, loss function threshold, specificity, and sensitivity are important assessment markers. The closer the accuracy, specificity, and sensitivity are to 1, the better the diagnosis results of the constructed model. Equations (3)-(5) show the formulas for the different parameters.
In Eq. (5), TPR represents the sensitivity, FPR represents the specificity, TP represents the number of positive samples from the validation set that were correctly classified by the model, FN represents the number of positive samples from the validation set that were wrongly classified by the model, FP represents the number of negative samples from the validation set that were wrongly classified by the model, and TN represents the number of negative samples from the validation set that were correctly classified by the model.

Spectral acquisition and normalization
Each sample was detected by an FHSI system to obtain a spectral cube consisting of 116 frames of 256-order grayscale images. Figure 3(A) is a fluorescence image of a gastric mucosal tissue sample at a peak of 608 nm (normal, AG, IM, GC). A binary image is shown in Fig. 3(B). The spectral curve of the gastric tissue is drawn by randomly selecting 100 pixels from the spectral cube. The normalized spectra of normal, AG, IM, and GC are shown in Fig. 3(C).

Original spectral analysis
Human tissue autofluorescence groups are listed in Table 1 [42]. The average spectra of normal, AG, IM, and GC are shown in Fig. 4. The error bars in Fig. 4(A) represent the difference in fluorescence intensity between different pixel points. In Fig. 4(B), 496 nm is the fluorescent characteristic peak of the pyridoxal phosphate "Schiff" base, 546 nm is the fluorescent characteristic peak of phospholipid, and 670 nm is the fluorescent characteristic peak of hematoporphyrin. The fluorescence intensity of normal gastric mucosa at 496 nm was weaker than that of the lesion, indicating that the metabolic abnormality of the pyridoxal phosphate "Schiff" base was accompanied by lesions. The fluorescence intensity of normal gastric mucosa and AG at 546 nm and 670 nm was stronger than that of IM and GC, indicating that gastric mucosa is accompanied by an abnormal metabolism of phospholipids during carcinogenesis. In addition, the fluorescence intensity of IM and GC were relatively close, indicating that the intestinal epithelial metaplasia is similar to the tissue fluorescent substance composition of GC. This also indicates that IM has a greater risk of canceration.

Second derivative spectral analysis
In this study, gastric mucosal tissue samples were divided into three groups: nonprecancerous lesion group, precancerous lesion group, and gastric adenocarcinoma group. Figure 5(A) shows the second derivative spectrum of the nonprecancerous lesion group, precancerous lesion group, and GC group. The value of 466 nm is a hydrophobic subunit ubiquinone reductase (NADPH).  The intensity of NADPH in the GC group is significantly lower than that in the nonprecancerous lesion group and the precancerous lesion group and may be caused by abnormal NADPH metabolism in GC tissues. The value of 546 nm is a phospholipid in the body lipid, and the intensity of phospholipids in the GC group is significantly greater than that in the nonprecancerous lesion group and the precancerous lesion group. Phospholipids are the main component of the cell membrane and may be caused by an abnormal phospholipid metabolism in the cell membrane of the GC group. The error bars in Fig. 5(B) represent the difference in fluorescence intensity between different individuals.

Spatiospectral preprocessing
A spectral cube was obtained after acquiring every sample by the spectral imaging system. This cube was composed of images from 116 frames and 256 grayscale levels. First, the region of the target image (size: 200 × 200) was extracted. Following that, 10 × 10 pixels were used to form a 10 × 10 × 100-long one-dimensional vector in 100 consecutive wave bands, which was converted to a 100 × 100 two-dimensional matrix. This step was traversed across the entire image to obtain a series of 400 two-dimensional matrix images containing spatial-spectral information. Figure 6 shows the procedure of spatial-spectral preprocessing. Images of nonprecancerous lesions, precancerous lesions, and GC are shown in Fig. 7. Case and picture information of nonprecancerous lesion group, precancerous lesion group, and gastric cancer group are shown in Table 2. Original fluorescent image/picture 53×100 35×100 32×100 12000 Image of spatial-spectral/pictue 53×400 35×400 32×400 48000

Sample set division
A sample set division of spatiospectral images is listed in Table 3. The ResNet34 model randomly selects the joint spatial-spectral preprocessed images in a 3:1:1 ratio for the training set (28,800 images), validation set (9600 images), and test set (9600 images). In this study, hyperspectral image data characteristics and classification requirements were used to design ResNet34. This study is a multiple classification of the problems of the nonprecancerous lesion, precancerous lesion, and GC groups. Figure 8 shows the entire framework. Figure 8(A) shows a spatiospectral image; (B) contains one convolutional layer and four residual blocks, where depth feature extraction is carried out to extract the features of the middle convolution layer and obtain the feature expression by mean pooling; and (C) shows a softmax multiclass classification, which outputs the category information for I (nonprecancerous lesion group), II (precancerous lesion group), and III (GC group).

Result of ResNet34 model
The results of 200 iterations of the ResNet34 model are shown in Fig. 9. As shown in Fig. 9(A), the accuracy of the training set and the validation set are not much different when the number of iterations is within 0-18. After the number of iterations is greater than 18, the accuracy of the training set is significantly higher than that of the validation set. The accuracy of the training set and the validation set constantly increase with an increase in the iteration times when the number of iterations is within 0-40, and the accuracy gradually stabilizes after the number of iterations is higher than 40. As shown in Fig. 9(B), the thresholds of the loss function in the training set and the validation set are not much different when the number of iterations is within 0-14. After the number of iterations is greater than 14, the thresholds of the loss function in the training set are significantly smaller than those of the validation set. The thresholds of the loss function in the training set and the validation set are constantly reduced with an increase in the iteration times when the number of iterations is within 0-40, and the threshold of the loss function gradually stabilizes after the number of iterations is higher than 40. As can be seen from the results, the results of the ResNet34 model vary greatly with changes in the number of iterations (learning times). The best results are as follows: the accuracy rate of the training set is 99.8%, the threshold of the loss function is 0.009, the accuracy rate of the validation set is 97.30%, and the threshold of the loss function is 0.79.

Superparameter optimization and prediction results
To further optimize the model, we optimized the batch size, momentum factor, number of iterations, and learning rate for ResNet34. The parameter optimization settings were as follows: batch size (10-100, step length: 1); momentum factor (0-0.9, step length: 0.1); learning rate (0.0001, 0.001, 0.01, [0.1-0.7, step length 0.1]), and the maximum number of iterations was 200. Gradual optimization was carried out for these four parameters. The optimization results are shown in Fig. 10. Optimal results were obtained when the batch size was 10, the momentum factor was 0.2, the learning rate was 0.6, the number of iterations was 126, and the accuracy of the validation set was 97.72%.
Optimized parameters (batch size = 10, momentum factor = 0.2, learning rate = 0.6, and number of iterations = 126) were used to construct the ResNet34 model. Figure 10 shows the operation results. The weight parameters in the trained model were used to predict the samples in the test set. Table 4 lists the results of the test set: the overall accuracy was 96.5%, and the specificities of the nonprecancerous lesion, precancerous lesion, and GC groups were 96.0%, 97.3%, and 96.7%, respectively. The sensitivities were 97.0%, 96.3%, and 96.6%, respectively.

Discussion
The error bars shown in Fig. 3(D) represent the spectral differences of different spatial parts of the gastric mucosa of the same individual. The spatial distribution of the constituents of the fluorescent substance is not uniform, resulting in a large difference in the fluorescence intensity between the pixels. In addition, the spatial difference in the spectral intensity varies with the wavelength range. For example, nonprecancerous lesions have large spatial differences before the 600 nm band, while spatial differences after 600 nm are small. There will be differences between different types of tissues. In the range of error fluctuations, the fluorescence intensity of normal tissues overlaps with the fluorescence intensity of other types of tissues (AG, IM, and GC). Therefore, 496 nm as a spectral fingerprint of normal and disease, and 546 nm and 670 nm as differential fingerprints of GC and nongastric cancer, respectively, are more likely to produce misjudgments. The error bars shown in Fig. 5(B) represent the differences between different individuals in the same category. It can be seen from Fig. 5(B) that the second derivative spectrum is greatly affected by individual differences, and the fluorescence intensity of each tissue type at 550 nm and 580 nm overlaps more severely, indicating that the individual differences in these two bands are larger. The fluorescence intensity of the tissue types at 466 nm and 546 nm also partially overlap, and this may cause significant interference in the identification of nonprecancerous lesions, precancerous lesions, and GC. Therefore, it is easy to misjudge GC and nongastric cancer when only using the 466 nm and 546 nm values of the second derivative spectrum.
In summary, it is necessary to further use fluorescence spectroscopy imaging technology combined with deep learning and the spatial-spectral combined classification method in modeling diagnoses. This will allow for the extraction of more effective information for the nonprecancerous lesion group, precancerous lesion group, and GC group samples. Perform reliable and accurate identification.
Because biological tissues are so complex, it is likely that many biomolecular components are involved in countless biochemical processes that simultaneously affect disease. Deep learning combined with the spectral-spatial classification method can be used for multiple pathological classification and to make full use of "spatial + spectral" information. In this experiment, the accuracy of nonprecancerous lesions, precancerous lesions, and GC reached 96.5% by using this method, indicating that deep learning combined with the spectral-spatial classification method is effective and reliable for the early diagnosis of GC.
The accuracy rate is relatively high when using the deep learning method and as the result of the difference in gray image information after spatiospectral preprocessing. The information of the fluorescence component in different bands and the spatial difference information of the fluorescence component at different pixel points was included in the image after spatiospectral preprocessing. The difference information of "spatial + spectral" for these grayscale images could be automatically learned by using the deep learning method, and the advanced feature representation was obtained. Finally, the accurate category information of each type of tissue was output (including the nonprecancerous group, precancerous lesion group, and GC group).
Early GC diagnosis has always been a local and overseas research hotspot. As shown in Table 5, new endoscopic techniques [chromoendoscopy (CE), narrow-band imaging (NBI), blue laser imaging (BLI), and optical coherence tomography (OCT)], new tumor markers (mRNA), spectral detection [infrared spectral (IR) and Raman spectrum (RS)], the identification of small and high-risk nonneoplastic gastric lesions (e.g., AG and IM), and GC have made certain progress. However, the accuracy of the endoscopic method mainly depends on the clinician's experience. The specificity and sensitivity of tumor markers have not reached the requirements of clinical application. The infrared spectrum of water has a great influence on other substances, which causes significant interference to some lesion information in the early stage of GC and is easily covered up by the water absorption signal. Owing to the weakness of Raman scattering, a large confocal Raman spectrometer is required to meet the measurement requirements in the laboratory. In addition, chemical reagents such as special metal nanomaterials are needed for surface-enhanced Raman spectroscopy analysis. Thus, Raman spectroscopy is difficult to popularize and apply in clinical practice. According to the existing literature, here, we presented for the first time the application of FHSI combined with a deep learning algorithm (ResNet34) and spectral-spatial classification method as a sensitive and specific diagnostic tool for the early diagnosis of GC. A good classification of nonprecancerous lesions, precancerous lesions, and GC was obtained. This indicates the feasibility of using FHSI technology for the early diagnosis of GC. FHSI technology can simultaneously obtain the biochemical information and image spatial information of gastric mucosa tissue, and has great clinical application potential.

Conclusions
The fluorescence spectrum characteristics of GC tissues are significantly different from those of normal tissues. This may be caused by the abnormal metabolism of the pyridoxal phosphate "Schiff" base concentration and hematoporphyrin IX concentration. In addition, the fluorescence intensity of a GC group in the second-derivative spectrum was significantly lower than that of the nonprecancerous and precancerous groups. This may be caused by the abnormal NADPH metabolism in the GC tissues. These endogenous fluorescence substances can be used as spectral biomarkers to distinguish nonprecancerous lesions, precancerous lesions, and GC. However, owing to the great influence of the spatial distribution differences of tissue components, individual differences, and instrument noise, the fluorescence spectra overlap is significant and is easy to misjudge. The discrimination accuracy, specificity, and sensitivity of a deep learning model for the nonprecancerous lesion group, precancerous lesion group, and GC group were all above 96%, and the stability of the model was effectively improved by adjusting the superparameters through experience.