Mathematical modeling for the prediction of cerebral white matter lesions based on clinical examination data

Cerebral white matter lesions are ischemic symptoms caused mainly by microangiopathy; they are diagnosed by MRI because they show up as abnormalities in MRI images. Because patients with white matter lesions do not have any symptoms, MRI often detects the lesions for the first time. Generally, head MRI for the diagnosis and grading of cerebral white matter lesions is performed as an option during medical checkups in Japan. In this study, we develop a mathematical model for the prediction of white matter lesions using data from routine medical evaluations that do not include a head MRI. Linear discriminant analysis, logistic discrimination, Naive Bayes classifier, support vector machine, and random forest were investigated and evaluated by ten-fold cross-validation, using clinical data for 1,904 examinees (988 males and 916 females) from medical checkups that did include the head MRI. The logistic regression model was selected based on a comparison of accuracy and interpretability. The model variables consisted of age, gender, plaque score (PS), LDL, systolic blood pressure (SBP), and administration of antihypertensive medication (odds ratios: 2.99, 1.57, 1.18, 1.06, 1.12, and 1.52, respectively) and showed Areas Under the ROC Curve (AUC) 0.805, the model displayed sensitivity of 72.0%, and specificity 75.1% when the most appropriate cutoff value was used, 0.579 as given by the Youden Index. This model has shown to be useful to identify patients with a high-risk of cerebral white matter lesions, who can then be diagnosed with a head MRI examination in order to prevent dementia, cerebral infarction, and stroke.


Introduction
In Japan, it is generally recognized that the increase in national health expenditure that accompanies aging is a serious social problem [1]. Although the number of deaths by stroke has decreased drastically due to preventive treatment, stroke still ranks at the top of the health care expenditures in Japan [2]. It is estimated that 2 million people are currently bedridden, and this number will increase to 3 million by 2025 [3]. Therefore, the early detection and prevention of cerebrovascular diseases is important for the reduction of health care expenditures [4]. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 [range: ] (mean ± S.D.), and 1,044 subjects were diagnosed from their MRI results as having cerebral white matter lesions. In the head MRI, T1 weighted images (T1WI), T2 weighted images (T2WI), and Fluid Attenuated Inversion Recovery (FLAIR) images were obtained by using MRI scanners, MAGNETOM Symphony (Siemens Healthineers Japan, Tokyo, Japan) and MAGNETOM ESSENZA (Siemens Healthineers Japan, Tokyo, Japan).

Ethical consideration
This study was conducted based on the approval of the ethical review committee of Shin Takeo Hospital. For the protection of patient privacy, the patient data was collected with unlinkable anonymization by a third party, and was saved in a password-protected storage medium for research use only. The clinical data used in this study will be available upon request from readers to the corresponding author based on the data usage agreement and considering the patients' right of privacy.

General inspection and blood and biochemical tests
In the general inspection, six characteristics were recorded for the study: age, gender, systolic blood pressure (SBP), diastolic blood pressure (DBP), body mass index (BMI), and presence of visceral steatosis (for the determination of metabolic syndrome) [12]. The blood and biochemical tests were conducted with laboratory test systems, C8000 (Canon Medical Systems Corporation, Tochigi, Japan) and Acute (Canon Medical Systems Corporation, Tochigi, Japan), respectively. HbA1c was determined with a glycohemoglobin analyzer HA8181 (Arkray inc., Kyoto, Japan). In the blood and biochemical tests, which were conducted as a part of the comprehensive medical checkup, LDL cholesterol (LDL), HDL cholesterol (HDL), LH ratio (quotient of LDL and HDL), Triglyceride (TG), hemoglobin A1c (HbA1c), and blood glucose level (BS) were recorded.

Ultrasonic testing
The ultrasonic testing was conducted with ultrasonic diagnostic equipments, LOGIQ S7 Expert (GE Healthcare Japan, Tokyo) and Aplio 400 (Canon Medical systems, Tochigi, Japan). Two characteristics were recorded: carotid plaque score (PS) [13] and plaque number (n-plaque). The PS was calculated as follows. The carotid artery was divided into four 15 mm long sections: the central side of the common carotid artery (CCA), the peripheral side of the CCA, the bifurcation of the CCA and the central side of the internal carotid artery. Then, the sum of the maximum values of intima-media thickness exceeding 1.1 mm was calculated.

Questionnaire in the specific health examination
In the questionnaire in the specific health examination (shown in S1 Doc), six questions were answered by examinees when receiving a comprehensive medical checkup, regarding their experience with medications to reduce blood pressure, medications to reduce blood sugar or insulin injection, medications to decrease the level of cholesterol or of neutral fat, as well as drinking habits (everyday, sometimes, or rarely drink (cannot drink)), drinking volume (less than 180 ml, 180-360 ml, 360-540 ml or more than 540 ml), and smoking habits.
Additional data to replicate all of the figures, graphs, tables, statistics, and other values in this study is available at doi:10.5061/dryad.007467q as Data files: WM_data.

Assessment of each clinical examination data and questionnaire
R version 3.4.4 and its suitable packages were used to perform all statistical analysis and statistical modeling in this study [14,15]. The relationships between each examination item or each answer of the specific medical examination questionnaire and the probability of the presence of cerebral white matter lesions were investigated. Student's t tests [16] were performed for continuous variables and Fisher's exact tests [17] were performed for categorical variables.

Comparison among various models based on different algorithms
Four kinds of models including linear models, nonlinear models, and a stochastic model were created, and an accuracy comparison among the models was conducted. The normalization of the variables was performed to adjust the factor levels. In the linear modeling, a logistic regression modeling (LogReg) [18] were employed, and in the nonlinear modeling, support vector machine (SVM) [19] or random forest (RF) [20] models were constructed. Furthermore, a Naive Bayes classifior (NB) [21] was constructed as a stochastic model. The probability of the presence or absence of white matter lesions was used as the dependent variable. In each model, the Youden Index [22,23] was used as the most appropriate cutoff value to compare the performance among models.
The discrete variables (or categorical variables) in this study were treated as follows: Regarding binary variables of models, 0 or 1 was assigned for absence or presence, respectively, e.g., sex: X = 0 for male and X = 1 for female; medication to reduce blood pressure: X = 0 for "No" and X = 1 for "Yes"; medication to reduce blood sugar or insulin injection: X = 0 for "No" and X = 1 for "Yes"; medication to reduce blood pressure: X = 0 for "No" and X = 1 for "Yes".

Variable selection
In the variable selection, graphical modeling [26] was performed so as not to select duplicate variables from the same cluster showing strong correlation by considering multicollinearity [27] with the R packages "corpcor (corpcor, Version 1.6.9)" [28] and "qgraph (qgraph, Version 1.5)." [29] The strength of correlation was calculated with a Pearson's product moment correlation coefficient. For the model performance comparison of the models, 10-fold cross-validation [30] was performed for each model.
The procedure of the 10-fold cross-validation was as follows. First, the dataset was divided into ten fractions, and the first 1/10 fraction was used as a holdout set. Model training was performed with the remaining 9/10 fractions, and then the 1/10 holdout set was used for evaluation of the trained model, from which the values of certain evaluation indices were recorded. After that, the first 1/10 holdout was returned to the original data and the next 1/10 holdout set was taken out. This training and testing of the model were performed in an iterative manner until each of the fractions had been used for as a holdout set.

Evaluation of discrimination performance and model selection
Accuracy, error rate, sensitivity [31], specificity [31], positive predictive value (PPV) [32], negative predictive value (NPV) [32], and Area Under the ROC Curve (AUC) [33] were used for as evaluation indices. The accuracy was calculated by (number of true positive + number of true negative)/total population. The error rate was calculated by (number of false positive + number of false negative)/total population. True positive rate (TPR, also called sensitivity) was calculated by number of true positive/number of positive. True negative rate (TNR, also called specificity) was calculated by number of true negative/number of negative. Meanwhile, PPV was calculated by number of true positive/number of predicted positive and NPV was calculated by number of true negative/number of predicted negative. The ROC curves were plotted using the R packages "plotROC (plotROC, Version 2.2.1)" [34] and "ggplot2 (ggplot2 3.0.0)" [35].
The average values of the 10-fold cross-validations of each index were compared. Finally, based on the accuracy comparison among the four models, the model showing good accuracy in addition to good clinical interpretation was selected and established as the discriminant model for identifying the patients with cerebral white matter lesions.

Assessment of each clinical examination data and questionnaire
Using the head MRI, which is the gold standard for detection of white matter lesions, 1,044 out of 1,904 subjects (54.8%) were diagnosed with white matter lesions.  Fig 1, the association between the presence or absence of white matter lesions and other results from the clinical investigations, i.e., the general inspection, blood and biochemical tests, ultrasonic testing, and specific health examination questionnaire, were investigated. Table 1 shows the assessment of each clinical examination data and questionnaire. In the general inspection, significant differences in age, gender, SBP, DBP, and the presence of metabolic syndrome were found between groups. In the blood tests, significant differences were seen in HDL, HbA1c, and BS. In the ultrasonic tests, significant differences were found in PS and n-plaque. In the answers to the specific health examination questionnaire, significant differences were seen in the patients' experience with medications to reduce blood pressure and those to reduce blood sugar or insulin injection, as well as drinking habits, drinking volume, and smoking habits.

Variable selection for multivariate analysis
Graphical modeling was performed to avoid the multicollinearity problem that can result from the interactions among explanatory variables when creating a mathematical model. Fig 2  shows the results of the graphical modeling for variable selection. Four clusters of variables with strong correlation were created. The first cluster contained PS and n-plaque; therefore, PS was selected from this cluster because it could capture the effects of both plaque number and plaque thickness [36]. The second cluster contained SBP and DBP; SBP, which is generally considered to be a risk factor of atherosclerosis [37], was selected from this cluster. The third cluster contained HbA1c and BS; HbA1c, which is thought to reflect the blood sugar level for the most recent several months, was selected from this cluster. The fourth cluster contained LDL, HDL, and LH ratio; LDL, also called "bad" cholesterol, which is regarded to be a risk factor of arteriosclerosis [38], was selected from this cluster. From the specific health examination questionnaire, experience with medications to reduce blood pressure, experience with medications to reduce blood sugar or insulin injection, experience with medications to decrease the level of cholesterol or of neutral fat, and drinking habits [39] were selected. Table 2 shows the results of the performance comparison among models by 10-fold cross-validation. The Youden Index [22,23] was used as the most appropriate cutoff value to evaluate the performance of each model. NB showed the highest average accuracy of 72.0% in the holdout datasets; RF showed the highest average sensitivity of 83.1%; in specificity and PPV, logistic regression analysis (LogReg) showed the highest averages, 79.4% and 79.1%, respectively; in NPV, Naive Bayes classifier (NB) showed the highest average of 69.8%; and in AUC, LogReg showed the highest average of 0.799. Fig 3 shows the ROC curves with each model using the same test data set, and they are almost overlapping with all the models.

Creation of a clinical predictive model
Finally, the LogReg model was selected as a clinical predictive model because it showed the highest score in three of the six indices (specificity, PPV and AUC) by 10-fold cross-validation, and because of its clinical interpretability (e.g., odds ratio), usability, and model simplicity.
The final discriminant model created in this study is as follows: þ0:43x i7 þ0:15x i8 þ0:42x i9 þ0:37x i10 þ0:15x i11 þ0:24x i12 þ0:04x i13 where the presence or absence of white matter lesions was used as the dependent variable Y (Y = 1 when white matter lesions were present, and Y = 0 when white matter lesions were absent). The explanatory variables were defined as follows: X 1 showed age; X 2 showed sex (X 2 = 0 for male and X 2 = 1 for female); X 3 showed PS; X 4 showed LDL; X 5 showed SBP; X 6 showed HbA1c; X 7 and X 8 showed the determination of metabolic syndrome (X 7 = 0 and X 8 = 0 for non-metabolic syndrome; X 7 = 1 and X 8 = 0 for the reserve of metabolic syndrome; X 7 = 1 and Variable selection by graphical modeling. The graphical modeling was performed using the R packages "corpcor" [26] and "qgraph" [27]. https://doi.org/10.1371/journal.pone.0215142.g002 X 8 = 1 for metabolic syndrome); X 9 showed the experience with medication to reduce blood pressure (X 9 = 0 for "No"; X 9 = 1 for "Yes"); X 10 showed the experience with medication to reduce blood sugar or insulin injection (X 10 = 0 for "No"; X 10 = 1 for "Yes"); X 11 showed the experience with medication to reduce the level of cholesterol or of neutral fat (X 11 = 0 for "No"; X 11 = 1 for "Yes"); X 12 and X 13 showed the drinking habits (X 12 = 0,X 13 = 0 for "rarely drink (cannot drink)"; X 12 = 1, X 13 = 0 for "sometimes"; X 12 = 1, X 13 = 1 for "everyday"). For the i-th patient's data x i , Pr(Y i = 1|X i = x i ) showed the probability that the i-th patient had the The four models, logistic regression (LogLeg), Naive Bayes classifier (NB), support vector machine (SVM), and random forest (RF), were compared with 6 indices: accuracy, error rate (error), true positive rate (TPR, also called sensitivity), true negative rate (TNR, also called specificity), positive predictive value (PPV), negative predictive value (NPV), and Area Under the ROC Curve (AUC).
https://doi.org/10.1371/journal.pone.0215142.t002 The ROC curves were plotted using the R packages "plotROC" [32] and "ggplot2" [33]. https://doi.org/10.1371/journal.pone.0215142.g003 white matter lesions, and Pr(Y i = 0|X i = x i ) showed the probability that the i-th patient did not have white matter lesions. Table 3 shows the odds ratios estimated by the logistic regression model. Each variable was normalized and the estimate was adjusted to the same order. Table 4 shows the results of the logistic regression analysis performed on all of the data, instead of the train/test split dataset or cross-validation data. The confidence intervals of the odds ratio of age, gender, PS, SBP, and experience with medications to reduce blood pressure showed significance, as they were all greater than one. The ROC curve with logistic discrimination is shown in Fig 4. In the ROC curve, for the most appropriate cutoff value, 0.579, given by the Youden Index [22,23], the specificity was 0.249 and sensitivity was 0.720, which is the farthest point on the ROC curve from the diagonal line of AUC = 0.5.

Discussion
In the assessment of questionnaire in Table 1, though smoking habits showed significance (p<0.001) in the between-groups comparison, the ratio of non-smoking patients with white matter lesions (86.7%) was higher than that of smokers with white matter lesions with smoking habits (13.3%). Generally, it is considered that smoking is a risk factor for cerebrovascular disorders [37], but this study showed the opposite result. Although it is said that smoking is a risk factor of cerebrovascular disorder [36], smoking itself is not a direct cause of ischemic change due to microangiopathy, that is, white matter lesions. It is possible that other indirect and complex factors including changes in blood constituents, such as a decrease of "good" HDL cholesterol due to smoking habits, could lead to their expression. Therefore, it is probably necessary to investigate the detailed temporal changes in smoking amounts in order to complete an effective evaluation of smoking habits. In the specific health examination questionnaire in this study, the question regarding smoking habits only asked whether the participant was a heavy smoker, defined as having smoked a total of over 100 cigarettes or have smoked over a period of 6 months, not considering the past history or cigarettes smoked or/and the number of cigarettes smoked in a day. It was considered that the index could become a more meaningful variable regarding smoking, if it included detailed information such as the smoking index represented by the product of the number of cigarettes smoked per day and the smoking history (years). However, in Japan, specific health examination questionnaires are utilized in medical checkups to check the patient's past medical history and lifestyle and to aid in patient consultation by using the similar format of questionnaires at most of the hospitals. Therefore, it is considered an important part of the patient consultation in medical checkups. Therefore, in this modeling, the specific health examination questionnaires were regarded as an important factor from the viewpoint of its practical usage.
From the assessment results of the 10-fold cross-validation, it was determined that the appropriate variable selection had been performed using the variables obtained from the limited clinical examination data so that there was not much difference in the accuracy among created models. The models used in this study include linear (LogReg) and nonlinear (SVM and RF) models as well as a stochastic model (NB). SVM [40] and RF [41] are often reported as showing better discrimination performance than linear models. Furthermore, it is considered that Bayesian modeling, which is stochastic, shows less overfitting and better generalization ability than conventional machine learning [42]. However, some contradictory reports [43][44][45] have appeared recently; therefore, so it was necessary to clarify the effect on the discrimination performance due to the difference in algorithms. In this study, it was demonstrated that the model algorithm did not significantly influence the discrimination performance if the appropriate variable selection had been conducted when constructing the predictive model.
Considering both the discrimination performance and the interpretability of the model, the linear model is better than the other models for the prediction of the presence of white matter lesions from clinical examination data. Using a logistic regression model, the presence of white matter lesions can be represented with a probability, and the odds ratio of a given risk factor can also be calculated. Therefore, it is considered to be suitable for clinical use among models in this study.
In this study, logistic discrimination was selected for the construction of the clinical model. There were no significant differences in accuracy, error rate, and AUC among all algorithms, but significant differences were observed in sensitivity and specificity. Generally, in screening testing before the major invasive testing, sensitivity is important in efforts to reduce the occurrence of false negatives. However, in this study, cerebral white matter lesions were predicted, a diagnosis that does not show subjective symptoms, does not lead to lethal symptoms, and does not require direct care. When diseases that cause cerebrovascular disorders are discovered, such as hypertension, hyperlipidemia, and atrial fibrillation, treatment is conducted to combat them. Considering the practical usage of the model, the model is to be used for the people getting general health examinations, by predicting the presence or absence of cerebral white matter lesions, thus allowing for the recommendation that the predicted high risk subjects receive the brain dock. Therefore, in this modeling, specificity is more important than sensitivity. By using the logistic discriminant model showing high specificity, the odds ratio of risk factors can be calculated, making the clinical evaluation seems easy.
Regarding the odds ratio of age (2.99 in Table 3), since age was standardized, the odds ratio for 1 year change in age was calculated to be 1.10, which is consistent with reports that slight white matter lesions in the elderly are an age-related phenomenon [46]. Advancing white matter lesions indicate a high risk of dementia and stroke [6], and it is important to prevent their progression.
The odds ratio of PS was significant at 1.12, which is consistent with the fact that PS is commonly used as an indicator of arteriosclerosis [47]. In the facilities investigated in this study, a PS test is performed only in the brain dock course. Given this demonstration of an association of PS with cerebral white matter lesions, if PS were to be added as an optional testing item in the comprehensive medical checkup, it would be useful as a variable for cerebral white matter lesion prediction.
Variables such as the medication to reduce blood pressure, the medication to reduce blood sugar or insulin injection, and the medication to decrease the level of cholesterol or of neutral fat are data derived from the specific health examination questionnaire and are used as a substitute for variables from the past medical history. Naturally, not all hospitals conduct a comprehensive medical checkup and there are many comprehensive medical checkups performed by private companies in Japan; therefore, it seems that there are not so many comprehensive medical checkups performed in the primary care hospitals. Therefore, because the medical history data must be obtained from the questionnaire, it is difficult to identify the exact disease name and diagnostic results without a common questionnaire format. If information can be collected in a common format, such as using hierarchical categories regarding the medical history with the questionnaire, a past history can also be used as variables, which would very likely improve the discrimination performance.
Approximately 54.8% of the patients whose data were used in this study have been diagnosed as having white matter lesions and so the prevalence is high. The brain dock course in this study is more expensive than the general comprehensive health examination courses. Consequently, it is a possibility that many examinees that had risks or medical history related to arteriosclerosis were included in the study. Because of the possibility that subjects in this study may be a high-risk group, it is not possible to conclude that the prediction of all cerebral white matter lesions can be applied for all subjects. However, it was shown to be possible to discriminate a risk group in subjects that had not received an MRI by using a prediction based on the clinical examination data.

Conclusions
To predict cerebral white matter lesions based on clinical examination data, a logistic regression model was selected from some candidate models, created by various algorithms, based on a comparison of accuracy and interpretability. The explanatory variables of the model were age, gender, PS, LDL, SBP, and the administration of antihypertensive medication. Variable selection was important to the establishment of a high accuracy model, but the model algorithm did not significantly influence the discrimination performance if an appropriate variable selection was conducted while constructing the prediction model. This model will allow clinicians to discriminate a risk group in subjects who have not received a head MRI test.
Supporting information S1 Doc. Questionnaire in the specific health examination, which was translated from the form in Japanese, in Shin Takeo Hospital. (DOCX)