A Deep Recurrent Neural Network-Based Explainable Prediction Model for Progression from Atrophic Gastritis to Gastric Cancer

Kim, Hyon Hee; Lim, Young Seo; Seo, Seung-In; Lee, Kyung Joo; Kim, Jae Young; Shin, Woon Geon

doi:10.3390/app11136194

Open AccessArticle

A Deep Recurrent Neural Network-Based Explainable Prediction Model for Progression from Atrophic Gastritis to Gastric Cancer

¹

Department of Statistics and Information Science, Dongduk Women’s University, Seoul 02748, Korea

²

Department of Internal Medicine, Kangdong Sacred Heart Hospital, Hallym University College of Medicine, Seoul 05355, Korea

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2021, 11(13), 6194; https://doi.org/10.3390/app11136194

Submission received: 16 May 2021 / Revised: 28 June 2021 / Accepted: 30 June 2021 / Published: 3 July 2021

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Gastric cancer is the fifth most common cancer type worldwide and one of the most frequently diagnosed cancers in South Korea. In this study, we propose DeepPrevention, which comprises a prediction module to predict the possibility of progression from atrophic gastritis to gastric cancer and an explanation module to identify risk factors for progression from atrophic gastritis to gastric cancer, to identify patients with atrophic gastritis who are at high risk of gastric cancer. The data set used in this study was South Korea National Health Insurance Service (NHIS) medical checkup data for atrophic gastritis patients from 2002 to 2013. Our experimental results showed that the most influential predictors of gastric cancer development were sex, smoking duration, and current smoking status. In addition, we found that the average age of gastric cancer diagnosis in a group of high-risk patients was 57, and income, BMI, regular exercise, and the number of endoscopic screenings did not show any significant difference between groups. At the individual level, we identified that there were relatively strong associations between gastric cancer and smoking duration and smoking status.

Keywords:

gastric cancer prediction; deep recurrent neural network; risk factor detection; medical checkup data; progression to gastric cancer from atrophic gastritis

Graphical Abstract

1. Introduction

Gastric cancer is the fifth most common cancer in the world and one of the most frequently diagnosed cancers in South Korea [1,2]. The national cancer screening program in Korea provides adults over the age of 40 with endoscopic screening once every two years, and the mortality rate is decreasing due to early gastric cancer detection and prompt treatment. However, endoscopic screening has adverse effects and can generate false-positive results that lead to overdiagnosis [3,4]. Therefore, it is important to identify the groups that are at high risk of gastric cancer and to recommend regular endoscopic screening for these patients [5,6].

Known risk factors for gastric cancer include infection with Helicobacter pylori (H. pylori), salty and smoked food intake, smoking, alcohol, family history of gastric cancer, low socioeconomic status, and obesity [2,7,8]. Notably, H. pylori infection was identified as a major cause of gastric cancer [9], and atrophic gastritis is a precursor that progresses to gastric cancer, although in less than 10% of cases [10]. Therefore, it is also necessary to identify the risk factors and predict the cases of atrophic gastritis that are at high risk of developing into gastric cancer.

In this study, we proposed a deep recurrent neural network-based prediction model to identify risk factors for progression from atrophic gastritis to gastric cancer and to elucidate patients with atrophic gastritis who are at high risk of gastric cancer. Deep learning has wide utilization in health care applications such as medical imaging, screening, biomarker selection, and electronic health record (EHR) analysis [11,12,13]. In particular, predicting patient status using EHRs with various machine learning techniques such as support vector machine (SVM), random forests, and deep learning has attracted much attention [14,15,16]. Researchers have used EHRs and deep learning to predict pneumonia risk, hospital readmission, clinical events, and patients’ future [17,18,19].

The proposed predictive model has explainability, which is the ability to explain how a prediction has been obtained using important features for each patient. Typically, deep learning-based predictive models show high accuracy, but they have disadvantages in their explainability [20,21]. However, for medical applications, explanations are essential for both doctors and patients to understand and trust the predictive model’s prediction results. Thus, explainability is essential for predictive models in medical applications [22].

In this paper, we present DeepPrevention, which consists of a prediction module and an explanation module. The prediction module performs a deep recurrent neural network (RNN)-based prediction, which predicts the probability that atrophic gastritis will develop into gastric cancer. The explanation module explains the prediction results, which identify risk factors for progression to gastric cancer from atrophic gastritis using the Chi-squared test. Furthermore, it detects high-risk patients who have a high probability of progressing to gastric cancer using K-means clustering [23]. Finally, the explanation module provides a personal explanation of the prediction using a local surrogate called LIME [24].

The data set used in this study consisted of South Korea National Health Insurance Service (NHIS) medical checkup data for patients with atrophic gastritis from 2002 to 2013. The NHIS data were converted to the Observational Medical Outcomes Partnership-Common Data Model (OMOP-CDM) by Observational Health Data Sciences and Informatics [25]. A total of 29,557 patients with atrophic gastritis were identified, and among them, 771 had progressed to gastric cancer. As gastroscopy is recommended for adults aged 40 and older, we restricted the patient age range to 40–75 years; this resulted in 18,846 atrophic gastritis patients and 610 gastric cancer patients undergoing our analyses.

For our analysis, we considered the patients’ demographic and environmental risk factors, where the demographic factors were sex, age, and income, and the environmental factors were smoking habits, alcohol, regular exercise habits, and body mass index. We also captured each patient’s number of endoscopic screenings. To maintain the accurate incidence of gastric cancer in gastritis patients, we did not under- or over-sample, even though the data imbalance was severe. Instead, we proposed a deep RNN model with eight hidden layers to learn the features of the minority class, that is, gastric cancer patients. We also applied L2 regularization and dropout, and the proposed model achieved an area under the receiver operating characteristics curve (AUC) of 0.84.

In our experimental results, the most influential predictors of developing gastric cancer were sex, smoking duration, and current smoking status; most of the patients in the high-risk group were current smokers. We also found that the average age at gastric cancer diagnosis was 57 years. Eight times more male than female patients were diagnosed with gastric cancer, but there were relatively strong associations between smoking duration and smoking status; a family history of cancer was also closely related.

The remainder of this paper is as follows. In Section 2, we discuss related work, and in Section 3, we give an overview of DeepPrevention. In Section 4, we present the deep RNN-based prediction model, and in Section 5, we describe the influential risk factors, characteristics of a group of patients at high risk, and personal explanation in detail. Finally, we give concluding remarks in Section 6.

2. Related Works

EHRs are widely used for predicting clinical risk [26]. Weng et al. [27] improved cardiovascular risk prediction using the clinical data of 378,256 patients from UK family practices with 22 risk factor variables. The authors applied four machine learning algorithms, logistic regression, random forests, gradient boosting, and neural networks, to UK primary care EHRs, and their machine learning-based prediction models outperformed a conventional algorithm against the American College of Cardiology guidelines.

Similar to our study, Taninaga et al. [6] predicted future gastric cancer risk based on XGBoost and logistic regression using medical checkup data from 25,942 participants who underwent multiple endoscopies in Japan. They found no accurate connections between long-term H. pylori infection or the presence of chronic atrophic gastritis and future gastric cancer. Biological background and blood test results led to increased prediction performance.

More recently, deep learning has attracted much attention in clinical applications based on EHRs [28]. The most important characteristics of EHRs is that they are records of time-series data. Each patient’s medical and visit history is represented in sequence, and researchers used many different representations of the EHRs for prediction models based on deep learning. Deep Patient [18] uses unsupervised deep feature learning to represent EHR data. These researchers used a stack of denoising autoencoders to train the input data. Other researchers proposed Deepr (deep record) [29] to analyze the sequences of patients’ diagnoses and treatment to predict their medical outcome.

Recurrent neural networks [30] and their variations [31,32] can successfully predict patients’ future medical outcomes. DeepCare [16] introduced convolutional LSTM (C-LSTM), which is an end-to-end deep dynamic memory neural network built on long short-term memory (LSTM), and researchers use it to infer illness states and predict future medical outcomes based on illness history in medical records. The proposed C-LSTM in DeepCare predicts next disease stages, recommends interventions, and estimates unplanned readmission among diabetic and mental health patients.

A common criticism of deep learning models is that they cannot explain the features that influence the predictions. In medical applications, in particular, it is essential to describe the factors that contribute to the results. Towards this aim, researchers have applied explainable artificial intelligence (XAI) to a variety of medical applications. For instance, researchers introduced an XAI model to predict acute critical illness from EHRs with visual explanations for the given prediction [33]. In this study, we adopted both traditional statistics such as Chi-squared test and K-means clustering, and explainable AI called LIME. As a result, we can describe risk factors for whole patients, characteristics for a group of patients at high risk, and personal explanations.

3. DeepPrevention Overview

3.1. Data Description

We extracted the medical data for this study from South Korea’s NHIS database for 2002–2013, converting the data to OMOP-CDM format. A total of 29,557 atrophic gastritis patients were identified, of whom 771 had progressed to gastric cancer. For our study purposes, we restricted the sample to patients aged 40 to 75, which resulted in a data set of 18,846 patients with atrophic gastritis, among whom 610 had progressed to gastric cancer. As our main consideration was the effect of lifestyle habits on the development of gastric cancer in patients with atrophic gastritis, we inputted the following 13 features into the prediction model:

Demographics: age, sex, test age at which gastric cancer was diagnosed
Smoking habits: frequency of smoking, smoking duration, current smoking status
-
Frequency of smoking: ranges included less than half a pack of cigarettes a day, less than one pack, more than one pack but fewer than two, more than two packs
-
Smoking duration: ranges included fewer than 5 years, more than 5 years but fewer than 9, more than 10 years but fewer than 19, more than 20 years but fewer than 29 years, more than 30 years
-
Current smoking status: yes for smokers and no for non-smokers
Drinking habits: categorical values included ranges such as two or three times in a month, one or two times in a week, three or four times in a week, and every day
Regular exercise: binary yes or no
Income: index into 10 levels by 10% according to the total household income
Family history of cancer: binary yes or no
Body mass index: weight in kilograms divided by the square of height in meters
Number of endoscopic screening tests
Current status: 0 for before gastric cancer diagnosis and 1 for after gastric cancer diagnosis

To explore and understand the characteristics of the data set used in this study, two-dimensional visualization was performed using t-Stochastic Neighbor Embedding (t-SNE) [34], as shown in Figure 1. t-SNE reduces high dimensional data to a two-dimensional space, i.e., component 1 and component 2, while preserving the distance between points in the high dimensional space. The red spots in Figure 2 represent gastric cancer patients, and the blue spots represent atrophic gastritis patients. From the visualization of our data set, we found that the complexity of boundaries separating classes was extremely high, and the problem of class imbalance was serious.

In our data set, only 3% of patients with atrophic gastritis developed gastric cancer. It is known that over-sampling methods such as the Synthetic Minority Oversampling Technique (SMOTE) [35] and Adaptive Synthetic Sampling (ADASYN) [36] in extremely class-imbalanced data cause overfitting and an increased training time due to the increased size of the training set [37]. Recently, research on deep learning with class imbalance has attracted much attention [38]. It was noted that very deep neural networks are effective in highly imbalanced class distribution [39]. Therefore, to solve the class imbalance problem, we proposed a deep recurrent neural network with eight hidden layers to capture the features of the minority class at the algorithm level.

3.2. Data Preprocessing

The dataset was composed of categorical variables and numerical variables. For the categorical variables, missing values were replaced with the most frequent value. For the numerical variables, missing values for BMI were replaced with the average value of the patient’s data, and missing values for the number of endoscopic screenings were replaced with 0. Furthermore, normalization was performed for the numerical variables. Since adults undergo medical checkups every one or two years in Korea, we converted the study data to 3D tensor data on the basis of the checkup date. The preprocessing pseudo code is given in Algorithm 1.

Algorithm 1 Pseudo-Code for Data Preprocessing

Input: X: NHIS medical checkup data of atrophic gastritis patients
Output: Y: 3D tensor data of X based on medical checkup time

Split X into categorical variables and numerical variables
if features are categorical variables then
replace missing values with the most frequent value
elseif a feature is BMI then
replace missing value with the average value of the patient’s data
else a feature is the number of endoscopic screenings then
replace missing value with 0
endif
Normalize the numerical variables
Combine the numerical variables and categorical variables
Sort by PersonID, Year
max_length ← the number of the most frequent medical checkup
for id in personID:
Groupby id and Padd by max_length
Endfor
Create 3D array
Return Y

3.3. DeepPrevention Architecture

DeepPrevention is composed of a prediction module and an explanation module to develop an explainable prediction model, as shown in Figure 2. The DeepPrevention prediction module built on a deep RNN [40] captures the features of time-series NHIS medical checkup data that formed this study’s data set. At each time step, the model read each chronic gastritis patients’ annual checkup data and returned a prediction of progression to gastric cancer. We set a predicted value of 80% or more as the likelihood that a patient would have developed gastric cancer (Figure 2, left panel). Then, to explain the prediction results, we performed a statistical analysis of patients with high potential to develop gastric cancer based on the 80% cutoff (Figure 2, center panel); we first used the Chi-squared statistics [41] to detect the features that influenced the classification of gastric cancer and atrophic gastritis and then used K-means clustering [42] to identify patients at higher risk. Finally, we generated a personal explanation of a prediction (Figure 2, right panel).

The proposed prediction model is a deep neural network model with eight hidden layers. We arrived at the appropriate number of layers by increasing the number in each experiment. Table 1 shows the experimental results. For the evaluations, we selected sensitivity given by Equation (1), specificity given by Equation (2), and AUC [43,44,45] which is the probability that a classifier would rank a randomly chosen positive instance higher than it ranked a randomly chosen negative one. Algorithm 2 describes DeepPrevention in detail.

Algorithm 2 Pseudo code of DeepPrevention

Input: X: 3D tensor patient data
Output: A1: a set of gastritis patients predicted to develop into gastric cancer patients
A2: a set of patients predicted to remain as gastritis patients
B: features affecting prediction
C: a group of patients at high risk
D: visualization of personal explanation

// STEP 1 predicting patients as gastritis patients and gastric cancer patients
Split X into training data trainX and test data testX
Learn trainX by deep recurrent neural network
Create a prediction model M with optimized parameters
Predict the probability p of progression with testX using M
for id in personID:
if p ≥ 0.8 then A1 ← id
else
A2 ← id
endif
endfor
Return A1, A2

// STEP 2 Chi-squared test to detect influential risk factors
Read A1, A2
Create contingency tables for categorical variables
Calculate Chi-squared values and p-values for each categorical variable
for pv in p-values
if pv < 0.05 then B ← variable of pv
Return B

//STEP 3 K-means clustering to identify a group of patients at high risk
Read A1
Decide K, which is the optimal number of groups using the elbow method
Apply K-means clustering algorithm into A1 using the K
C ← id in personID belonging to high-risk group
Return C

//STEP 4 LIME to show the personal explanation of a prediction
Read prediction model M, A1
Perform personal explanation of a patient using M
D ← Visualization of the personal explanation
Return D

4. Prediction Model and Evaluation

The prediction model is a deep neural network model with eight hidden layers, as shown in Figure 3.

For the evaluations, we selected sensitivity given by Equation (1), specificity given by Equation (2) and receiving operator characteristic-area under the curve (ROC-AUC) [43,44,45] which is the probability that a classifier would rank a randomly chosen positive instance higher than it ranked a randomly chosen negative one. The main consideration in predicting gastric cancer progression was sensitivity because less than 3% of all patients had gastric cancer.

Sensitivity = \frac{N u m b e r o f t r u e p o s i t i v e s}{T o t a l n u m b e r o f g a s t r i c c a n c e r p a t i e n t s}

(1)

Specificity = \frac{N u m b e r o f t r u e n e g a t i v e s}{T o t a l n u m b e r o f a t r o p h i c g a s t r i t i s p a t i e n t s}

(2)

As Table 1 shows, the sensitivity increased as the number of hidden layers increased. On the basis of our results, we selected a deep RNN model with eight hidden layers. Then we applied dropout and L2 regularization to overcome overfitting to the minority class (i.e., gastric cancer diagnosis). As a result, we achieved an AUC of 0.84, a sensitivity of 0.50, and a specificity of 0.98 with the optimal prediction model as shown in Table 1.

Figure 4 shows the ROC-AUC graph, which is suitable for performance evaluation of prediction model for imbalanced data. The blue solid line represents the diagnostic ability of a binary classifier that is created by plotting the true positive rate against the false positive rate at various threshold settings. The orange dashed line illustrate random classifier’s ROC points.

The prediction model was evaluated on a personal computer with Intel Core i7-7500U [email protected] Ghz and 32 GB RAM. For the implementation, the Python programming language 3.7.3 along with keras 2.2.4, sklearn 0.24.2, matplotlib 3.0.3, and seaborn 0.9.0 machine learning libraries was employed [46]. We performed all experiments using Windows 10.

Among 18,846 atrophic gastritis patients and 610 gastric cancer patients, we used 70% of patient cases as the training data and 30% as the test data, which comprised 5650 patients with atrophic gastritis and 187 with gastric cancer (total n = 5837). Table 2 shows the confusion matrix of the test data, reflecting a prediction of 5650 cases of atrophic gastritis.

Table 3 presents the precision, recall, and F1-score for the study data. As previously mentioned, the atrophic gastritis precision, recall, and F1-score were extremely high because the data set was highly imbalanced. We found that the precision, recall, and F1-score values were lower for gastric cancer than for atrophic gastritis.

5. Explanation of the Prediction Results

The DeepPrevention model proposed for this study predicted that 94 of the actual gastric cancer patients would be gastric cancer patients. In this section, we interpret these prediction results.

5.1. Analysis of Risk Factors

We calculated Chi-squared to determine significant distinguishing features between patients with gastric cancer and patients with atrophic gastritis. Among 13 features, categorical variables such as sex, alcohol habit, smoking status, smoking duration, frequency of smoking, family cancer history, income, and exercise were selected. Additionally, the number of endoscopic screenings and BMI were transformed into categorical values and added. For the number of endoscopic screening tests, if it was more than one, then the outcome was transformed to yes, otherwise it was transformed to no. For BMI, the values were transformed to underweight, normal weight, and overweight. As Table 4 indicates, sex, smoking status, and smoking duration were significantly distinct. Differently from our expectation, regular exercise and number of endoscopic screenings were not related to gastric cancer incidence.

This interpretation has some limitations. First, although H. pylori infection is a well-known risk factor of gastric cancer, we did not consider the patients’ infection status. As the purpose of our study was to identify risk factors of the patients’ lifestyle, we considered demographic and environmental factors. Second, our findings that sex, smoking status, and smoking duration are influential factors were based on the 93 actual gastric cancer patients predicted as gastric cancer patients. Therefore, the dataset is insufficient to generalize the findings. Despite these limitations, our study provides an improved understanding of the risk factors of progression of atrophic gastritis to gastric cancer.

5.2. Analysis of a Group of Patients at High Risk

To identify the group of high-risk patients, we applied K-means clustering [46] to the data set of 92 gastric cancer patients. Using the elbow method [47], the optimal K was 3, and the number of patients in groups 1, 2, and 3 was 31, 36, and 26, respectively. Figure 5a shows the distribution of each group’s probability of progression to gastric cancer. Group 1 showed a markedly high probability for progression to gastric cancer; therefore, we defined these patients as the high-risk group 3. Figure 5b shows 3D scatterplots for each group.

Figure 6 shows boxplots of age at diagnosis for each group. The average age at diagnosis was 57.0, 69.0, and 45.9 years, respectively. In addition, we identified that income, BMI, regular exercise, and the number of endoscopic screenings did not show any significant difference between groups.

5.3. Personal Explanation

To give an individual perspective on the prediction model, the explanation module explains the patient-specific analysis. Visualization of the personal explanation was implemented using LIME in the interpretML library [48]. Figure 7a,b present two examples of the visualization of the personal explanation of high risk using a local surrogate analysis to explain the impact of the predictors. In both cases, sex, frequency of smoking, and smoking duration affected the prediction of developing gastric cancer, consistent with the risk factors identified in Section 5.1 through Chi-squared tests. The important features affecting the prediction of gastritis patients differed from patient to patient. As shown in Figure 7a, in addition to smoking habits, sex and income were identified as risk factors to progression from atrophic gastritis to gastric cancer; whereas in case of Figure 7b, family history of cancer, alcohol habit, and exercise were identified as risk factors.

Personal explanation was performed on 59 patients with a probability for progression to gastric cancer of 0.97 which is the highest probability. The risk factor that appeared in the most patients was age at diagnosis, and was identified as an important predictor in 36 patients. The next most frequent risk factor was smoking duration, which was found in 20 patients. The experimental results showed that to prevent development of gastric cancer, individual analysis and treatment are essential.

6. Discussion and Conclusions

6.1. Discussion

In recent years, AI-based diagnoses and prognosis prediction have emerged in the field of gastric cancer [49]. While DeepPrevention was developed to predict gastric cancer progression from atrophic gastritis using medical check-up data, Jiang et al. predicted gastric cancer survival using SVMs [50]. A deep neural network was also applied to predict early recurrence in advanced gastric cancer [51] and computed tomography diagnosis of metastatic lymph nodes from gastric cancer [52]. Table 5 shows AI-based prediction and diagnosis in the gastric cancer field. The AUC of DeepPrevention outperformed two other studies [50,51]. Gao et al. achieved a high AUC of 0.9541 because they used CT images rather than electronic health records.

While other AI-based applications in the gastric cancer field focus on prognosis prediction and diagnosis, our study focused on prevention of gastric cancer. That is, we predicted a high-risk group of patients and risk factors among atrophic gastritis patients. These prediction results could be useful to prevent gastric cancer in atrophic gastritis patients. To prevent gastric cancer progression, we used medical checkup data, unlike other research that used EHRs. Our main consideration was identifying risk factors from lifestyle characteristics in atrophic gastritis patients. Smoking status and smoking duration were determined as important lifestyle factors influencing gastric cancer progression.

Although we attempted to achieve higher sensitivity, because of the extremely imbalanced data we achieved up to 50% sensitivity. We applied the SMOTE algorithm, a well-known over-sampling method, to our data set but an overfitting problem occurred. We reached the conclusion that in cases of extremely imbalanced data with high complexity, algorithm-level methods are effective. Therefore, we used hidden layers to capture the characteristics of the minority class and used dropout and L2 regularization to avoid overfitting. To improve the prediction performance, we plan to adopt a stacked ensemble method [53] by combining SVMs, random forests, logistic regression, and deep learning.

6.2. Conclusions

In this study, we predicted patients with atrophic gastritis who were at high risk of developing gastric cancer and analyzed some of their characteristics. For this purpose, we used DeepPrevention, which is composed of a prediction module and an explanation module, based on a deep recurrent neural network with eight layers, and applied dropout and L2 regularization. The prediction model achieves 0.84 AUC, 0.5 sensitivity, and 0.98 specificity. The explanation module identified the significant features for distinguishing between atrophic gastritis and development of gastric cancer using Chi-squared tests. Furthermore, to identify a group of patients at high risk, K-means clustering was applied to the patients predicted to develop gastric cancer. Finally, to give a personal explanation of the prediction, LIME was applied to a specified patient.

Explainable AI has attracted much attention recently [33,54,55]. In particular, in medical applications, explainability is essential for both doctors and patients to understand the prediction results. In this study, we provided an explanation module, which explains the perspectives at the population, group, and individual levels. At the population level, sex, smoking status, and smoking duration were identified as influential factors. At the group level, the average diagnosis age was distinguishing factor of the high-risk group, and they were diagnosed as gastric cancer at 57 years. In addition, we identified that income, BMI, regular exercise, and the number of endoscopic screenings did not show any significant difference between groups. Finally, at the individual level, it was found that among the analyzed patient characteristics, two lifestyle habits were influential in the progression from atrophic gastritis to gastric cancer: current smoking status and smoking duration.

Real-world medical applications often confront the problem of imbalanced data such as that encountered in this study. In the case of the extremely imbalanced data in this study, oversampling caused an overfitting problem. Furthermore, we believe that it is important to create a model that maintains the ratio of gastric cancer incidence in the real world. Therefore, we proposed a deep recurrent neural network with eight hidden layers to capture the features of the minority class; the resulting model demonstrated a sensitivity of 0.5 and a specificity of 0.98. We are currently attempting to improve its sensitivity by extending the number of patients and their features to include related chronic disease by combining other data sources. In addition, we are developing stacking ensemble learning using SVMs and random forests. As a future work, we plan to utilize H. pylori infection information and consider genomic analysis.

Author Contributions

Conceptualization, H.H.K. and Y.S.L.; methodology, H.H.K. and Y.S.L.; software, K.J.L. and J.Y.K.; validation, S.-I.S., W.G.S. and H.H.K.; formal analysis, H.H.K.; resources, W.G.S.; data curation, S.-I.S.; writing—original draft preparation, H.H.K.; writing—review and editing, Y.S.L.; supervision, S.-I.S.; project administration, W.G.S.; funding acquisition, W.G.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Health & Welfare, Republic of Korea, and a grant from the Korea Health Technology R & D Project through the Korea Health Industry Development Institute (KHIDI) (grant number HI19C0143).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are openly available at github https://github.com/lim1014/DeepPrevention (accessed on 30 June 2021).

Conflicts of Interest

The authors declare no conflict of interest.

References

Bray, F.; Ferlay, J.; Soerjomataram, I.; Siegel, R.L.; Torre, L.A.; Jemal, A. Global Cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2018, 68, 394–424. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Yoon, H.; Kim, N. Diagnosis and management of high-risk group for gastric cancer. Gut Liver 2015, 9, 5–17. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Sun, Y.; Lee, J.; Woo, H.; Shin, D.; Kong, S.; Lee, H.; Shin, A.; Yang, H. National cancer screening program for gastric cancer in Korea: Nationwide treatment benefit and cost. Cancer 2020, 126, 1929–1939. [Google Scholar] [CrossRef]
Hamashima, C.; Shabana, M.; Okada, K.; Okamoto, M.; Osaki, Y. Mortality reduction from gastric cancer by endoscopic and radiographic screening. Cancer Sci. 2015, 106, 1744–1749. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Hamashima, C. Benefits and harms of endoscopic screening for gastric cancer. World J. Gastroenterol. 2016, 28, 6385–6392. [Google Scholar] [CrossRef] [PubMed]
Taninaga, J.; Nishiyama, Y.; Fujibayashi, K.; Gunji, T.; Sasabe, N.; Iijima, K.; Naito, T. Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical checkup data: A case-control study. Sci. Rep. 2019, 9, 12384. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Karimi, P.; Islami, F.; Anandasabapathy, S.; Freeman, N.D.; Kamangar, F. Gastric cancer: Descriptive epidemiology, risk factors, screening, and prevention. Cancer Epidemiol. Prev. Biomark. 2014, 23, 700–713. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kim, G.H.; Liang, P.S.; Bang, S.J.; Hwang, J.H. Screening and surveillance for gastric cancer in the United States: Is it needed? Gastintest. Endosc. 2016, 84, 18–28. [Google Scholar] [CrossRef] [Green Version]
Kumar, S.; Metz, D.C.; Ellenberg, S.; Kaplan, D.E.; Goldberg, D.S. Risk factors and incidence of gastric cancer after detection Helicobacter pylori infection: A large cohort study. Gastroenterology 2020, 158, 527–536. [Google Scholar] [CrossRef] [Green Version]
Cheung, D.Y. Atrophic gastritis increases the risk of gastric cancer in asymptomatic population in Korea. Gut Liver 2017, 11, 575–576. [Google Scholar] [CrossRef] [Green Version]
Esteva, A.; Robincquet, A.; Ramsundar, B.; Kuleshov, V.; DePristo, M.; Chou, K.; Cui, C.; Corrado, G.; Thrun, S.; Dean, J. A guide to deep learning in healthcare. Nat. Med. 2019, 25, 24–29. [Google Scholar] [CrossRef]
Ravi, D.; Qong, C.; Deligianni, F.; Berthelot, M.; Andreu-Perez, J.; Lo, B.; Yang, G. Deep learning for health informatics. IEEE J. Biomed. Health Inform. 2017, 21, 4–21. [Google Scholar] [CrossRef] [Green Version]
Miotto, R.; Wang, F.; Wang, S.; Jiang, X.; Dudley, J.T. Deep learning for healthcare: Review, opportunities and challenges. Brief Bioinform. 2018, 19, 1–11. [Google Scholar] [CrossRef] [PubMed]
Xiao, C.; Choi, E.; Sun, J. Opportunities and challenges in developing deep learning models using electronic health records data: A systematic review. J. Am. Med. Inform. Assoc. 2018, 25, 1419–1428. [Google Scholar] [CrossRef]
Shickel, D.; Tighe, P.J.; Bihorac, A.; Rashidi, P. Deep HER: A survey of recent advances in deep learning techniques for electronic health record (HER) analysis. IEEE J. Biomed. Health Inform. 2018, 22, 1589–1604. [Google Scholar] [CrossRef] [PubMed]
Phan, T.; Tran, T.; Phung, D.; Venkatesh, S. Predicting healthcare trajectories from medical records: A deep learning approach. J. Biomed. Inform. 2017, 69, 218–229. [Google Scholar] [CrossRef] [PubMed]
Caruana, R.; Lou, Y.; Gehrke, J.; Koch, P.; Sturm, M.; Elhadad, N. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the Knowledge Discovery and Database, Sydney, Australia, 10–13 August 2015; pp. 1721–1729. [Google Scholar] [CrossRef]
Miotto, R.; Li, L.; Kidd, B.A.; Dudley, J.T. Deep Patient: An unsupervised representation to predict the future of patients the electronic health records. Sci. Rep. 2016, 6, 26094. [Google Scholar] [CrossRef] [PubMed]
Choi, E.; Bahadori, M.T.; Shuetz, A.; Stewart, W.F.; Sun, J. Doctor AI: Predicting clinical events via recurrent neural networks. JMLR Workshop Conf. Proc. 2016, 56, 301–318. [Google Scholar] [PubMed]
Gunning, D.; Stefik, M.; Choi, J.; Miller, T.; Stunpf, S.; Yang, G. XAI-Explainable artificial intelligence. Sci. Robot. 2019, 4, 1–2. [Google Scholar] [CrossRef] [Green Version]
Du, M.; Liu, N.; Hu, X. Techniques for interpretable machine learning. Comm. ACM 2020, 63, 68–77. [Google Scholar] [CrossRef] [Green Version]
Choi, E.; Bahadori, M.T.; Kulas, J.A.; Schuetz, A.; Stewart, W.F.; Sun, K. RETAIN: An interpretable predictive model for healthcare using reverse time attention mechanism. In Proceedings of the 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 1–9. [Google Scholar]
Ahmed, M.; Seraj, R.; Islam, S.M.S. The K-means Algorithm: A comprehensive survey and performance evaluation. Electronics 2020, 9, 1295. [Google Scholar] [CrossRef]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar] [CrossRef]
You, S.C.; Lee, S.; Cho, S.; Park, H.; Jung, S.; Cho, J.; Yoon, D.; Park, R.W. Conversion of National Health Insurance Service-National Sample Cohort (NHIS-NSC) database into Observational Medical Outcomes Partnership-Common Data Model (OMOP-CDM). Stud. Health Technol. Inform. 2017, 245, 467–470. [Google Scholar] [CrossRef] [PubMed]
Goldstein, B.A.; Navar, A.M.; Pencina, M.J.; Ioannidis, J.P.A. Opportunities and challenges in developing risk prediction models with electronic health records data: A systematic review. J. Am. Med. Inform. Assoc. 2017, 24, 198–208. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Weng, S.F.; Reps, J.; Kai, J.; Caribaldi, J.M.; Qureshi, N. Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS ONE 2017, 12, e0174944. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Solares, J.R.A.; Raimondi, F.E.D.; Zhu, Y.; Rahimian, F.; Canoy, D.; Tran, J.; Gomes, A.C.P.; Payberah, A.H.; Zottoli, M.; Nazaradeh, M.; et al. Deep learning for electronic health records: A comparative review of multiple deep neural architectures. J. Biomed. Inform. 2020, 101, 103337. [Google Scholar] [CrossRef]
Nguyen, P.; Tran, T.; Wickramasighe, N.; Venkatesh, S. Deepr: A convolutional net for medical records. IEEE J. Biomed. Health Inform. 2017, 21, 22–30. [Google Scholar] [CrossRef] [PubMed]
Connor, J.T.; Martin, R.D.; Atlas, L.E. Recurrent neural networks and robust time series prediction. IEEE Trans. Neural Netw. 1994, 5, 240–254. [Google Scholar] [CrossRef] [Green Version]
Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef] [Green Version]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Lauritsen, S.M.; Kristensen, M.; Olsen, M.V.; Larsen, M.S.; Lauritsen, K.M.; Jorgensen, M.J.; Lange, J.; Thiesson, B. Explainable artificial intelligence model to predict acute critical illness from electronic health records. Nat. Commun. 2020, 11, 3852. [Google Scholar] [CrossRef]
Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn Res. 2008, 9, 2579–2605. [Google Scholar]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the IEEE International Joint Conference on Neural Networks, Hong Kong, China, 1–8 June 2008. [Google Scholar]
Chawla, N.V.; Japkowicz, N.; Kotcz, A. Editorial: Special issue on learning form imbalanced data sets. SIGKDD Explor. Newsl. 2004, 6, 1–6. [Google Scholar] [CrossRef]
Johnson, J.M.; Khoshgoftaar, T.M. Survey on deep learning with class imbalance. J. Big Data 2019, 6–27. [Google Scholar] [CrossRef]
Ding, W.; Huang, D.; Chen, Z.; Yu, X.; Lin, W. Facial action recognition using very deep networks for highly imbalanced class distribution. In Proceedings of the APSIPA ASC, Kuala Lumpur, Malaysia, 12–15 December 2017; pp. 1368–1372. [Google Scholar] [CrossRef]
Pascanu, R.; Gulcehre, C.; Cho, K.; Bengio, Y. How to construct deep recurrent neural networks. In Proceedings of the Second International Conference on Learning Representations, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
Miller, R.; Siegmund, D. Maximally selected Chi-squared statistics. Biometrics 1982, 38, 1101–1106. [Google Scholar] [CrossRef]
Sinaga, K.P.; Yang, M. Unsupervised K-Means Clustering Algorithm. IEEE Access 2020, 8, 80716–80727. [Google Scholar] [CrossRef]
Yerushalmy, J. Statistical problems in assessing methods of medical diagnosis with special reference to x-ray techniques. Public Health Rep. 1947, 62, 1432–1439. [Google Scholar] [CrossRef] [PubMed]
Altman, D.G.; Bland, J.M. Diagnostic tests: Sensitivity and specificity. BMJ 1994, 308, 1552. [Google Scholar] [CrossRef] [Green Version]
Powers, D.M.W. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar]
Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A K-means clustering algorithm. J. R. Stat. Soc. 1979, 28, 100–108. [Google Scholar] [CrossRef]
Ketchen, D.J.; Shook, C.L. The application of cluster analysis in strategic management research: An analysis and critique. Strateg. Manag. J. 1996, 17, 441–458. [Google Scholar] [CrossRef]
InterpretML, Interpret-Text-Alpha Release. Available online: https://github.com/interpretml/interpret (accessed on 30 June 2021).
Niu, P.H.; Zhao, L.L.; Wu, H.L.; Zhao, D.B.; Chen, Y.T. Artificial intelligence in gastric cancer: Application and future perspectives. World J. Gastroenterol. 2020, 28, 5408–5419. [Google Scholar] [CrossRef] [PubMed]
Jiang, Y.; Xie, J.; Han, Z.; Liu, W.; Xi, S.; Huang, L.; Huang, W.; Lin, T.; Zhao, L.; Hu, Y.; et al. Immunomarker support vector machine classifier for prediction of gastric cancer survival and adjuvant chemotherapeutic benefit. Clin. Cancer Res. 2018, 24, 5574–5584. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhang, W.; Fang, M.; Dong, D.; Wang, X.; Ke, X.; Zhang, L.; Hu, C.; Guo, L.; Guan, X.; Zhou, J.; et al. Development and validation of a CT-based radiomic nomogram for preoperative prediction of early recurrence in advanced gastric cancer. Radiother. Oncol. 2020, 145, 13–20. [Google Scholar] [CrossRef] [PubMed]
Gao, Y.; Zhang, Z.D.; Li, S.; Guo, Y.T.; Wu, Q.Y.; Liu, S.H.; Yang, S.J.; Ding, L.; Zhao, B.C.; Li, S.; et al. Deep neural network-assisted computed tomography diagnosis of metastatic lymph nodes from gastric cancer. Chin. Med. J. 2019, 132, 2804–2811. [Google Scholar] [CrossRef] [PubMed]
Pari, R.; Sandhya, M.; Sanker, S. A multi-tier stacked ensemble algorithm for improving classification accuracy. Comput. Sci. Eng. 2020, 22, 74–85. [Google Scholar] [CrossRef]
Gong, K.; Lee, H.K.; Yu, K.; Xie, X.; Li, J. A prediction and interpretation framework of acute kidney injury in critical care. J. Biomed. Inform. 2021, 113, 103653. [Google Scholar] [CrossRef]
Lundberg, S.M.; Nair, B.; Vavilala, M.S.; Horibe, M.; Eisses, M.J.; Adams, T.; Liston, D.E.; Low, D.K.W.; Newman, S.F.; Kim, J.; et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2018, 10, 749–760. [Google Scholar] [CrossRef]

Figure 1. t-SNE visualization.

Figure 2. DeepPrevention architecture.

Figure 3. A deep recurrent neural network architecture.

Figure 4. ROC-AUC graph.

Figure 5. (a) Distribution of the probability of the three groups; (b) 3D scatterplot.

Figure 6. Age at diagnosis of each group.

Figure 7. (a) An example of personal explanation; (b) an example of personal explanation.

Table 1. Performance evaluation according to hidden layers.

Number of Hidden Layers	Sensitivity	Specificity	AUC
4	0.33	1.00	0.78
5	0.40	1.00	0.79
6	0.42	1.00	0.82
7	0.47	1.00	0.83
8	0.50	0.98	0.84

Table 2. Test data confusion matrix.

	Predicted Gastric Caner	Predicted Atrophic Gastritis
Actual gastric cancer	93	94
Actual atrophic gastritis	25	5625

Table 3. Test data precision, recall, and F1-score.

	Precision	Recall	F1-score	Support
Atrophic gastritis	0.98	1.00	0.99	5650
Gastric cancer	0.79	0.50	0.61	187

Table 4. Chi-squared analysis of demographic and lifestyle data.

Feature	Chi-Squared	p-Value
Sex	12.9966	0.0003
Alcohol habit	13.1282	0.0690
Smoking status	18.5011	0.0001
Smoking duration	20.9237	0.0003
Frequency of smoking	2.1212	0.5476
Family history of cancer	0.9882	0.3202
Income	0.4009	0.8184
Exercise	0.0	1.0
Number of endoscopic screenings	2.0452	0.5631
BMI	122.9448	0.4844

Table 5. AI-based prediction and diagnosis in the gastric cancer field.

Ref.	Study Population	Number of Cases	Methods	Results
DeepPrevention	NHIS	29,557 cases	Deep RNN	AUC 0.8388
Jiang et al., 2018 [50]	Hospital	786 cases	SVM classifier	AUC 0.834
Zhang et al., 2020 [51]	Hospital	669 cases	Deep NN	AUC 0.831
Gao et al., 2019 [52]	Hospital	32,495 cases	FR-CNN	AUC 0.9541

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, H.H.; Lim, Y.S.; Seo, S.-I.; Lee, K.J.; Kim, J.Y.; Shin, W.G. A Deep Recurrent Neural Network-Based Explainable Prediction Model for Progression from Atrophic Gastritis to Gastric Cancer. Appl. Sci. 2021, 11, 6194. https://doi.org/10.3390/app11136194

AMA Style

Kim HH, Lim YS, Seo S-I, Lee KJ, Kim JY, Shin WG. A Deep Recurrent Neural Network-Based Explainable Prediction Model for Progression from Atrophic Gastritis to Gastric Cancer. Applied Sciences. 2021; 11(13):6194. https://doi.org/10.3390/app11136194

Chicago/Turabian Style

Kim, Hyon Hee, Young Seo Lim, Seung-In Seo, Kyung Joo Lee, Jae Young Kim, and Woon Geon Shin. 2021. "A Deep Recurrent Neural Network-Based Explainable Prediction Model for Progression from Atrophic Gastritis to Gastric Cancer" Applied Sciences 11, no. 13: 6194. https://doi.org/10.3390/app11136194

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Deep Recurrent Neural Network-Based Explainable Prediction Model for Progression from Atrophic Gastritis to Gastric Cancer

Abstract

1. Introduction

2. Related Works

3. DeepPrevention Overview

3.1. Data Description

3.2. Data Preprocessing

3.3. DeepPrevention Architecture

4. Prediction Model and Evaluation

5. Explanation of the Prediction Results

5.1. Analysis of Risk Factors

5.2. Analysis of a Group of Patients at High Risk

5.3. Personal Explanation

6. Discussion and Conclusions

6.1. Discussion

6.2. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI