A Bayesian Approach to Identifying New Risk Factors for Dementia

Supplemental Digital Content is available in the text


INTRODUCTION
D ementia is one of the most disabling and burdensome health conditions worldwide. Globally, 35.6 million people were estimated to be affected with dementia in 2012, and this number is expected to increase to 60 million by 2030 and to 114 million by 2050. 1,2 In addition to its rapidly increasing incidence, dementia is the major cause of disability in the elderly population and imposes a huge economic burden worldwide. 3,4 Several risk factors for dementia, such as advanced age, head injury, depression, diabetes mellitus (DM), and vascular diseases, have been recognized. [5][6][7] However, other potential risk factors that might be useful for clinicians for developing appropriate treatment strategies in patients at the early stages of dementia and preventing the worsening of the condition remain either controversial or ignored.
Bayesian statistics was introduced in medical research in 1982; however, the US Food and Drug Administration approved and issued a draft guideline for its application in clinical research only in 2010. 8 Bayesian statistics, which is learning from evidence as it accumulates, is currently applied in all major areas of medical statistics, including clinical trials, epidemiology, meta-analyses and evidence synthesis, spatial modeling, longitudinal modeling, survival modeling, molecular genetics, and decision making for new technologies. 9 This approach, in short, is a way to combine the past (prior) with present (current study) to make decisions about the future (posterior conclusions). 8 To the best of our knowledge, few studies have focused on using Bayesian statistics to identify other potential risk factors for dementia.
Chinese characters are a type of hieroglyphics. The Chinese character for intelligence denotes good hearing, keen eyesight, and a strong heart or brain. Dementia is defined as a progressive global intellectual impairment occurring in clear consciousness. 10 Heart and brain diseases, such as acute myocardial infarction and stroke, are generally associated with increased risks of dementia. We hypothesized that older people with decreasing intellect (or dementia) are associated with poor hearing (hearing loss) and poor eyesight (senile cataract).
In this study, we identified new potential risk factors for dementia from nationwide longitudinal population-based data by using Bayesian statistics. We first tested the consistency of the results obtained using Bayesian statistics with those of classical frequentist probability with the 4 recognized risk factors for dementia, namely severe head injury, depression, DM, and vascular diseases. Then, we used Bayesian statistics to verify 2 new potential risk factors for dementia, namely hearing loss and senile cataract, determined from the Taiwan's National Health Insurance Research Database (NHIRD).

Data Sources and Ethical Consideration
Since its implementation in 1995, the Taiwan's National Health Insurance (NHI) program has been providing comprehensive, unified, and universal healthcare services to approximately 99% of the Taiwanese population. 11 We used one of the subsets of the Taiwan's NHIRD, developed by the NHI program using 1995 to 2010 data and consisting of 1 million patients (approximately 5% of the total Taiwanese population) randomly selected in 2010. The NHIRD contains data on patients' demographics, diagnoses, medication types, prescription dates, and dosages and durations of drug supply.
All data used for the present study are available after others apply to the Center for Biomedical Resources of National Health Research Institutes in Taiwan and should be submitted to the executive committee (for more information please you refer to http://nhird.nhri.org.tw/en/). Furthermore, this study protocol was also approved by an institutional review board (KMUHIRB-SV (II)-20150007), and informed consent was waived because of the use of previously stored deidentified medical information from the NHIRD.

Study Design and Sampling
The study group consisted of patients with dementia diagnosed on the basis of the International Classification of Disease, Ninth Revision, Clinical Modification (ICD-9-CM) diagnostic criteria (ICD-9-CM codes 290, 294.1-294.2, 331.0, A210, A213, and A222) between March 1995 and December 2010. To enhance diagnostic validity, we only selected patients who had inpatient diagnosis files with primary or secondary diagnosis of dementia or outpatient diagnosis files with at least three consistent diagnosis of dementia. 12 We assigned their first visit for the diagnosis of dementia as their index date.

Potential Risk Factors Associated in Patients With Dementia
We assessed patients with dementia before or on their index date to ascertain their histories of severe head injury (ICD  Table e-1 in the Appendix; http://links.lww.com/MD/B10). [13][14][15][16] In addition, we examined the inpatient and outpatient diagnosis files of patients without dementia between 1995 and 2010 to ascertain their histories of severe head injury, depression, DM, vascular diseases, senile cataract, and hearing loss. Sociodemographic data of enrolled patients was also recorded, including age, gender, geographic region, urban level, and monthly income (see Table 1).

Statistical Analysis
We derived the odds ratio (OR) by using the Bayesian approach. 17 We attempt to learn about the unknown distribution from given data, to make some inferences about the certain properties of the distribution, and to determine the relative likelihood that each possible distribution is actually the correct one. Suppose that proportion u of patients with dementia in a population in the presence of a risk factor is unknown, and let the prior distribution assigned to u be a uniform distribution at the interval (0, 1); that is, the prior p.d.f. z(u) ¼ 1 for 0 < u < 1. This ''informationless'' prior assignment is for the sake of objective purpose and reducing computing cost. Suppose there is a given random sample of n persons who all have a risk factor, and for i ¼ 1, 2, 3, n, let X i ¼ 1 if the ith person has dementia and let X i ¼ 0 otherwise. Then, X 1 , X 2, X 3 , X n form n Bernoulli trials with parameter u. We can determine the posterior p.d.f. of u. The p.f. of each observation X i is . þ x n , the joint p.f. of X 1 , X 2 , X 3 ,. . .,X n can be written in the following form for x i ¼ 0 or 1 (i ¼ 1, 2, 3,. . ., n): It is a beta distribution with parameters a ¼ y þ 1 and b ¼ n -y þ 1. Therefore, for 0 < u < 1, That is, the above derived result illustrates that the posterior distribution of the combination of uniform prior distribution and the likelihood function of Bernoulli trials is Beta distribution. The mean of u given the observation x is E n (ujx) ¼ yþ1 nþ2 , which is also the estimate of u based on the squared error loss function that is to be derived later. Let u Ã be the estimate of u; then, the OR is equal to u Ã 1Àu Ã. Similarly, we can get the OR of cases in a population in the absence of a risk factor, following which the OR can be obtained.
After obtaining the posterior probability distribution of u using the Bayes' theorem, we can identify the Bayes estimator, d(X 1 , X 2 , X 3 ,. . ., X n ), a real-valued function of random variables X 1 , X 2 , X 3 ,. . ., X n that specifies the estimated value of u for each possible set of values of observed data X 1 , X 2 , X 3 ,. . ., X n . An estimate, say a, is a specific value of d(x 1 , x 2 , x 3 ,. . ., x n ) of the estimator determined using specific observed values x 1 , x 2 , x 3 ,. . ., x n . By using the squared error loss function, which is the most commonly used loss function in estimation problem, L(u, a) ¼ (u -a) 2 . Bayes estimate d Ã (x) for any observed value of x will be the value of a for which the expectation E[(u -a) 2 jx] is minimum. This expectation is minimum when a is chosen to be equal to the mean E(ujx) of the posterior distribution; in other words, when the squared error loss function is used, the Bayes estimator is d Ã (X) ¼ E(ujX). Therefore, we can use the mean of posterior probability distribution as the estimate.
We used the aforementioned Bayesian inference-given the evidence of 1 million population, sampled from the NHIRD-to determine the posterior probabilities of u. The posterior distribution is insensitive to variations in the prior distribution; this insensitivity rapidly increases with the sample size. 18 This is the reason why we do not use a strong subjective prior and just choose an informationless prior. Furthermore, this can also reduce the computation cost. The probability calculus does more than explaining how states of belief (i.e., the proportion u of dementia in a population in the presence of a risk factor) decompose into prior and empirical (i.e., the NHIRD) elements; it also measures the probability u relative to the NHIRD. The prior distribution of u is a uniform distribution of subjective prior probabilities over the range (0, 1). Thus, using the Bayes' theorem and the NHIRD sample, we calculated the corresponding posterior probability to be a beta distribution as stated earlier. Thus, the posterior probability is concentrated in a narrow region even if the prior probability curve is spread out, illustrating the considerable initial uncertainty regarding the parameter value.

Demographic Characteristics of the Study Population
Of the 1,000,000 patients screened, we included 109,709 (11.0%) patients aged !65 years. A total of 6546 (6.0%) patients were diagnosed with dementia. Table 1 shows the distribution of sociodemographic characteristics of the study population. Patients with dementia were significantly older than those without dementia (80.2 years vs 74.3 years, P < 0.001). Moreover, the percentage of patients with dementia in the young-old (65-74 years), old-old (75-84 years), and oldest-old (!85 years) groups was significantly increased by 2.5%, 8.1%, and 16.7%, respectively (P < 0.001). The percentage of women with dementia was significantly higher than that of men with dementia (n ¼ 3698, 6.5% vs 2848, 5.4%; P < 0.001); furthermore, patients with high income had significantly lower incidence of dementia than in those with lower income (2.8%, 5.2%, and 9.6%; P < 0.001).

Risk Factors for Dementia
Regarding the 4 recognized risk factors for dementia, ORs in the older population with a history of severe head injury, depression, DM, and vascular diseases were 1.701, 2.637, 1.207, and 3.469, respectively (Table 2); ORs for these risk factors were higher in men with dementia than in women with dementia except for DM. Furthermore, we tested ORs of other potential risk factors for dementia, including senile cataract and hearing loss. We observed that hearing loss (OR ¼ 1.577) and senile cataract (OR ¼ 1.492) were associated with an increased risk of dementia in the older population. In terms of sex-based differences, ORs of these risk factors were generally higher in men with dementia than in women with dementia ( Table 2).

DISCUSSION
To the best of our knowledge, this is the first nationwide population-based study identifying new potential risk factors for dementia using Bayesian statistics. Overall, older age, female sex, and lower income were identified as independent risk factors for dementia in this study. By using Bayesian statistics to assess the older population in Taiwan, we reidentified the 4 recognized risk factors for dementia, namely vascular diseases, depression, severe head injury, and DM; their ORs ranged were from 3.469 to 1.207. The results obtained are in agreement with our hypothesis that the 4 recognized risk factors for dementia using classical frequentist probability are consistent with those using Bayesian statistics. Furthermore, we identified that hearing loss (OR ¼ 1.577) and senile cataract (OR ¼ 1.492) are associated with an increased risk for dementia.
Taiwan has been an aging country since 1993, and the older population exceeded 10% of the total population in 2005. 5 Our results showed that approximately 11% of the population was aged !65 years in 2010. The prevalence of dementia in the older population in Taiwan from 1995 to 1998 was 3.2%, which is lower than that in other developed countries. 5,[19][20][21][22] Thus, our data demonstrated that the prevalence of dementia in the older population from 1995 to 2010 was approximately 6%, similar to that in other developed countries. [20][21][22] The risk of dementia for an individual can vary with area, population, and diagnostic criteria because various sociodemographic risk factors vary, expect for age and educational level. A review on Asian populations, including that of Taiwan, has showed that older age, female sex, and lower socioeconomic status were the risk factors for dementia in the East. 19 These results are similar to those observed in our study.
Increasing evidence supports that increased risks of dementia are associated with its treatable medical comorbidities, including severe head injury, 5,23-25 depression, 26,27 DM, 24,28,29 and vascular diseases. 5,30,31 An analysis of 15 case-control studies from 1991 to 2001 concluded that a history of head injury is a significant risk factor for Alzheimer disease (AD, OR ¼ 1.58), and OR for this risk factors is higher in men than in women. 23,24 However, the OR of this study is lower than that of our study (OR ¼ 1.70) because of differences criteria and disease diagnosis, but the difference in ORS between sexes is similar. A meta-analysis of depression and AD risk using random-effects models revealed pooled ORs of 2.03 for casecontrol and 1.90 for cohort studies. 26 The OR for AD obtained in this study is also lower than that obtained in our study for dementia (OR ¼ 2.637) because of differences in diagnosis. The pooled adjusted risk ratio (RR) for dementia when patients with DM were compared with those without dementia was 1.47. 28 Another study based on the Taiwanese population has showed DM (hazard ratio [HR] ¼ 1.76) to be associated with an increased risk for AD. 29 A systematic literature review demonstrated that a history of stroke doubles the risk of incident dementia in the older population. 32 In brief, the ORs of the 4 recognized risk factors for dementia obtained in our study are higher than those in other studies because we evaluated dementia whereas other studies evaluated AD.
We hypothesized that poor hearing (hearing loss) and poor eyesight (senile cataract) are associated with an increased risk of dementia in older people. A prospective population-based study in the United States demonstrated hearing loss to be an independent predictor for developing dementia (HR ¼ 1.27, confidence interval [CI] ¼ 1.03-1.56). 33 Another preliminary investigation in Japan showed hearing loss to be independently associated with behavioral and psychological symptoms of dementia (OR ¼ 4.65, 95% CI ¼ 1.70-12.00). 34 The OR of our study (OR ¼ 1.557) is lower than that of the latter but higher than that of the former studies because these 3 studies evaluated different populations and had different statistical values (OR vs HR) and dementia diagnosis. The association between cataract and dementia has been controversial. According to animal model studies and pathological reviews, cataract is a potential biomarker for dementia because both these diseases share common etiological mechanisms. [35][36][37] Moreover, a cataract surgery is associated with improving cognitive performance and decreasing dementia risk; it might cause the surgery to decrease the risk of fall injury and head injury. [38][39][40] However, 2 other epidemiological studies have demonstrated a decreased association between cataract and dementia. 41,42 Our study found that cataract is a potential risk factor for dementia in the Taiwanese population (OR ¼ 1.492). One possible reason is that the severity of cataract and study population play major roles in the development of dementia. In brief, our results indicated that hearing loss and senile cataract are potential risk factors for dementia in the Taiwanese population.
Most research work is still based on classical frequentist hypothesis test. Hypothesis testing assesses the effect of a given risk factor by measuring OR and 95% CI to estimate the precision of OR. A large CI indicates a low precision level of OR, whereas a small CI indicates a high precision level of OR. The 95% CI does not report a measure's statistical significance. It would be inappropriate to interpret an OR with a 95% CI that spans the null value (eg, OR ¼ 1), indicating lack of association between the risk factor and dementia. Because the frequentist interpretation of probability considers long-run frequencies rather than data at hand, it is impossible to assign a probability to a hypothesis, given the data. Frequentist hypothesis tests do not technically test any hypothesis, given the data; instead, the test the data, given the hypothesis. The Bayesians approach tests the hypothesis (ie, u here), given the data.
Sample size that is too large or too small is a potential issue in classical frequentist hypothesis test. In the case of a too large sample size, hypothesis testing becomes biased in favor of rejecting the null hypothesis. When a sample size is too small, the frequentist reliance on asymptotic approximations becomes inappropriate and the estimates become biased. By contrast, Bayesian inference does not force an artificial dichotomy between a null and alternative hypothesis, allows exact probability statements about any hypothesis (ie, u here), and is not biased due to a large sample size (eg, 1 million sample). 18 Regarding a small sample size, Bayesian inference is also unbiased due to exact estimation (as opposed to asymptotic approximation) and still permits exact probability statements. Bayesian inference handles uncertainty better than frequentist inference, and the probability intervals of parameters are appropriately wide. In addition, the findings from the experimental result shown in Table 2 suggest that Bayesian inference can provide a similar trend as frequentist inference and thus can be helpful in assessing risks for dementia. Furthermore, because all observed data are used to compute posterior probability distribution, any bias from the data sampling, which is often criticized in frequentist hypothesis test, can be avoided. Furthermore, Bayesian approach is helpful in assessing the effect of the coexistence of multiple risk factors, which has not been discussed yet in research studies by using the analysis method of classical frequentist hypothesis test.

Strengths and Limitations
The main strength of our study is the use of nationwide population-based data that accurately represent the general population. Majority of the events could be traced and referral bias is minimized because the Taiwan's NHI program is a single-payer, mandatory health insurance with affordable payments and services covering approximately 99% of the Taiwanese population. In addition, one of the major benefits of the Bayesian method is its ability to incorporate prior information. 43,44 While other risk factor assessment approaches use prior information by illustrating the levels or ranges of individual parameters in sensitivity analysis, the Bayesian method analyzed historical data sets or refers expert and domain knowledge to determine what is known about biological parameters and processes. 18,45 Most traditional risk factor assessment methods do not use any of the quantitative information that could be gathered from historical experiences with other risk factors and thus treat each risk factor assessment as a new and independent problem. However, it is extremely computationally intensive to apply Bayes theorem to complex models. 18,45 It often takes days to conduct defensible decision analyses for assessments using computers. We tested consistency between results obtained using classical frequentist probability and Bayesian statistics in the 4 recognized risk factors for dementia (head injury, depression, DM, and vascular disease). Moreover, Bayesian statistics could effectively help clinicians and researchers to explore more potential risk factors for diseases like dementia using big data such as the Taiwan's NHIRD. Moreover, our results may provide a good representation of risk factors for patients with dementia in ethnic Chinese populations. Furthermore, the diagnoses of dementia and its comorbidities were reliable because these health insurance claims are scrutinized by medical reimbursement specialists and subject to peer review.
Nevertheless, there are some limitations of our study. First, the Taiwan's NHIRD does not include detailed information on biochemistry data, body mass index, clinical severity of diseases, family history, and lifestyle, which may be potential risk factors or comorbidities for dementia. As an example, APOE gene is recognized as a substantial factor in the majority of patients with dementia, but this biochemistry data are unavailable in the NHIRD. Similarly, lifestyle-related dementia risk factors, including physical activity, dietary habits, cigarette smoking, and alcohol consumption, were not available. Second, this study employed a retrospective study design, which tends to be more susceptible to biases than a prospective design 38 ; therefore, we avoided 2 possible major biases related to sample problem (selecting random patients from nationwide data) and recall problem (analyzing national health care records). Finally, all administrative databases are subject to possible coding errors or undercoding.

CONCLUSION
The findings of this study concur with our 2 hypotheses. First, our results of the 4 recognized risk factors for dementia, namely severe head injury, depression, DM, and vascular diseases, are consistent with both classical frequentist probability and Bayesian statistics. Second, hearing loss and senile cataract were found to be potential risk factors for dementia in the older Taiwanese population. Bayesian statistics could help clinicians to explore more risk factors for illnesses, such as dementia, to develop appropriate treatment strategies for these patients.