Development of A Model for Predicting The 4-Year Risk of Systematic Knee Osteoarthritis in China: A Longitudinal Cohort Study

Objectives: We aimed to develop a model for predicting the 4-year risk of knee osteoarthritis (KOA) based on survey data obtained via a random, nationwide sample of Chinese individuals. Methods: We analyzed data from 8,193 middle-aged and older adults included in the China Health and Retirement Longitudinal Study (CHARLS). The incident of systematic KOA was dened as participants were free of systematic KOA at baseline (CHARLS2011) and were diagnosed with systematic KOA at the 4-year follow-up (CHARLS2015). We estimated the effects of potential predictors on the incident of KOA using logistic regression models and validated the nal model internally. Model performance was assessed based on discrimination (area under the receiver operating characteristic curve, AUC) and calibration. Results: A total of 815 incident cases of KOA were identied at the four-year follow-up, resulting in a cumulative incidence of approximately 9.95%. The nal multivariate model included age, sex, waist circumference, residential area, diculty with activities of daily living (ADLs)/instrumental activities of daily living (IADLs), history of hip fracture, depressive symptoms, number of chronic comorbidities, self-rated health status, and level of moderate physical activity (MPA). The bias-corrected AUC for this model was 0.704. The calibration curve revealed satisfactory agreement between the observed and predicted incidence of systematic KOA. A simple clinical score model was developed for easily quantifying the risk of KOA based on these factors. Conclusion: Our prediction model may aid in the early identication of individuals at the greatest risk of developing KOA within 4 years. activities of daily living; KOA:Knee osteoarthritis; LPA:Light physical activity; MPA:Moderate physical activity; MS:Metabolic syndrome; OR:Odds ratio; PA:Physical activity; PRO:Patient-reported outcome; SD:Standard deviation; TRIPOD:Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis; VPA:Vigorous physical activity.


Background
Knee osteoarthritis (KOA) is among the most common chronic diseases leading to disability worldwide, carrying a substantial and increasing health burden [1,2]. The prevalence of systematic KOA and radiographic KOA in patients over 60 years of age ranges from 10.0-16.0% and 35.0-50.0% [3][4][5][6][7], respectively, and there are approximately 250 million people living with KOA worldwide. In the United States, the prevalence of KOA has increased twofold in men and threefold in women over the past 20 years [5], with systematic KOA affecting approximately 15.1 million individuals in the national population [8]. The number of patients with KOA in China is much higher due to its larger population and higher prevalence rates. The increasing prevalence of KOA has in turn increased the socioeconomic burden for affected individuals and healthcare systems.
To date, there are no effective therapeutic strategies for KOA. Prediction models for KOA aim to synthesize multiple factors to comprehensively predict the incident risk for individuals and may allow for early detection and prevention [9]. The Nottingham KOA model was one early model for the prediction of 12-year KOA risk in middle-aged adults, including easily obtainable factors such as age, sex, family history, body mass index (BMI), occupational risk, and history of knee injury [10]. However, this model was developed based on data from only two communities in North Nottinghamshire, rather than data from a random sample of the general population, limiting its validity in other populations [11]. Several studies have developed prediction models based on genomic data [12][13][14][15] or radiographic/clinical biomarkers [16] such as hip α-angle and spinal bone mineral density. However, these models are di cult to promote in clinical practice due to their high cost or complexity [17].
The primary risk factors for incident KOA include advanced age, female sex, overweight/obesity, knee injury, and smoking [18][19][20][21]. Among these, smoking decreases the risk of KOA, while the other factors increase the risk of KOA. Although physical activity [22,23], occupational factors [18], ethnicity, and genetics [19] have also been associated with the incidence and/or progression of KOA, previous studies have reported inconsistent results due to methodological differences. Other potential risk factors for the development of KOA include metabolic syndrome [24][25][26], waist circumference [27], and depressive symptoms [18,28], although ndings regarding these factors remain controversial. Previous studies have reported a dual association between osteoarthritis and certain comorbidities (e.g., hypertension, ischemic heart disease, diabetes, etc.) [18,19], suggesting that these comorbidities can also in uence the incidence and progression of KOA. However, existing risk models of KOA have failed to include these potential risk factors, and there are currently no models for predicting KOA risk in the Chinese population. Therefore, in the present study, we aimed to develop a model for predicting the 4-year risk of KOA based on survey data obtained via a random, nationwide sample of Chinese individuals and this model would consider these potential factors.

Study design and data source
The present retrospective cohort study relied on 4-year data from the China Health and Retirement Longitudinal Study (CHARLS)-a nationwide study among Chinese adults aged 45 years or older for whom the detailed cohort pro le has been published [29]. The national baseline survey for the study was conducted between June 2011 and March 2012 (CHARLS2011), and 17,708 respondents across 150 counties/districts and 450 villages/resident committees were recruited using a multistage sampling strategy. The respondents are followed up every 2 years via face-to-face computer-assisted personal interviews. Detailed information related to demographic background, socioeconomic status, biomedical ndings, health status, and functioning was collected at baseline and at each follow-up using a structured questionnaire [29]. Blood samples were also obtained at each time point. The present study included participants recruited in CHARLS2011 and re-examined in CHARLS2015.

Outcomes
The primary outcome was the incident of systematic KOA during the 4-year follow-up period. In accordance with the de nition utilized in a previous study, systematic KOA was de ned as physiciandiagnosed arthritis based on the presence of concurrent pain in the knee joint [30]. The incident of systematic KOA was de ned as that the participants was free of systematic KOA in CHARLS2011 and was diagnosed with systematic KOA in CHARLS2015. The presence of pain in the knee joint was assessed based on responses to the following question: "Are you often troubled by pain in any part of your body?" If the participant answered in the a rmative, the following question was asked: "In what part of your body do you feel pain?"

Model structure
We used multiple imputations based on ve replications and a chained equation approach [31] to ll the missing values for systolic blood pressure, diastolic blood pressure, triacylglycerol, HDL cholesterol, fasting blood glucose, duration and frequency of physical activity, CESD-10 scores, and smoking status.
We used descriptive statistics (means and standard deviations for continuous data and counts and percentages for categorical data) to report key variables.
We conducted univariable and multivariable logistic regression analysis to establish a model for predicting the risk of KOA. All candidate variables were rst evaluated via an unconditional univariate logistic regression analysis, and we then selected variables according to the statistical signi cance and clinical value to conduct multivariate logistic regression analyses. In the multivariate logistic regression analysis, stepwise selection was combined with the Akaike information criterion (AIC) to determine the nal model structure. The coe cients, odds ratios (ORs), and 95% con dence intervals (CIs) were estimated via 1,000-replication bootstrapping to obtain stable and unbiased parameters [32].

Internal validation
The multivariate models were internally validated using a bootstrap procedure (sampling with replacement for 1,000 iterations) to assess bias-corrected estimates of predictive ability.

Model performance
We assessed the predictive performance of the nal model using calibration and discrimination measures. Discrimination refers to the ability to distinguish patients experiencing an event from those not experiencing the event and was quanti ed based on the area under the receiver operating characteristic curve (AUC) in this study. Calibration refers to how closely the predicted risk corresponds with the observed risk and was assessed visually using calibration plots.

Clinical scoring tool
We developed a points-based risk-scoring tool based on the nal model for easy clinical use-a widely utilized method of clinical scoring [17] . This clinical risk prediction tool can be used to identify individuals who are high risk of developing KOA during the following 4 years. The. We categorized continuous factors based on the results of meta-analyses and clinical practice guidelines. Scores for categorical variables were determined by multiplying the b coe cients (log odds) in the multivariate logistic regression model by 10 and rounding off decimal place. The total score was calculated by summing the scores of all variables. Sensitivity, speci city, and the AUC were calculated at different cutoff values, and the maximal Youden index was used to identify the optimal cut-off point [33]. The Youden index was calculated as follows: sensitivity + speci city − 1.
The present study was conducted in accordance with the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) guidelines for model development and reporting. All analyses were performed using STATA version 15.1 (STATA Corporation, College Station, TX) and R version 3.6.3 (R Foundation for Statistical Computing, Vienna, Austria). All statistical tests were two-sided and p-values <0.05 were considered statistically signi cant.

Ethics statement
Given that the present study is a secondary analysis of publicly available CHARLS data, the Medical Ethics Board Committee of Peking University granted the study an exemption from review.

Results
In CHARLS2011, physical activity measures were available for 3,684 participants, while blood samples were available for 11,847 participants. Complete KOA data were available in CHARLS2011 and CHARLS2015 for 9,204 of these participants. Seven participants were excluded because they declined to undergo assessments of body measurements, making over 50% of variables inaccessible. Among them, one patient was diagnosed with KOA in 2011, and one developed KOA in 2015. Among the remaining 9,197 participants, an additional 1,004 were excluded because they were diagnosed with KOA at baseline (CHARLS2011). Thus, data from a total of 8,193 patients were included when developing the model. Among the 8,193 included participants, 815 developed systematic KOA in the following 4 years. The prevalence of systematic KOA in CHARLS2011 and the 4-year cumulative incidence of systematic KOA in CHARLS2015 for the whole population and for different groups strati ed by sex and age (£60 years and >60 years) are shown in Table 1.
The mean age was 58.82 years (standard deviation (SD): 9.01 years), and 4,251 patients were female (51.89%). At baseline, 23.31% participants had di culty with ADLs/IADLs, while 17.08% were diagnosed with metabolic syndrome, and 44.66% reported one or two chronic comorbidities. A history of hip fracture was reported by 252 (3.08%) participants. Other baseline characteristics are summarized in Table 2.
Univariable and multivariable analysis Table 3 shows the results of the univariate and multivariate analysis based on the imputed dataset, along with the number of missing values for each variable. In the univariable analysis, age was identi ed as a risk factor for KOA, with the biggest difference occurring between the 60-64 and 65-69 age groups.
Female sex, rural residence, history of hip fracture, ADL/IDL di culty, severe depressive symptoms, more chronic comorbidities, poor health status, and higher levels of MPA were associated with an increased risk of developing KOA, while smoking was associated with a decreased risk of developing KOA. Although high BMI/waist circumference and metabolic syndrome were also positively associated with the incidence of KOA, these associations were not signi cant. Considering the important effect of metabolism on the incident of KOA, we included waist circumference and metabolic syndrome in the multivariate model given their relatively smaller p-values.
The nal prediction model included ten variables: age, sex, waist circumference, residential area, ADL/IADL di culty, history of hip fracture, depressive symptoms, number of chronic comorbidities, health status, and level of MPA.

Model performance
The discrimination and calibration curves for the model are shown as Figure 1a and Clinical score model We developed a simple clinical score model based on the ten variables included in the nal multivariate model (Table 4). Total scores in this model range from 0 (lowest risk) to 51 (greatest risk). This clinical score model may aid in identifying patients at the greatest risk for developing KOA within the next 4 years. The AUC of the risk score model was 0.713, and the optimal cut-off where patients with a score greater than or equal to 20.5 were most likely to develop KOA in 4 years was obtained from the maximal Youden index. At the optimal cut-off, the sensitivity and speci city were 63.3% and 66.0%, respectively.
Referring to the previous score model [16], the incident probability of KOA within 4 years was calculated by dividing the total risk score by 51 and multiplying by 100%.

Discussion
In the present study, we developed and internally validated a model for predicting the 4-year risk of systematic KOA among the Chinese population, based on data from the CHARLS cohort. We also developed an easy-to-use clinical score model to identify individuals at the greatest risk of developing KOA. The nal prediction model included ten convenient and accessible variables: age, sex, waist circumference, residential area, ADL/IADL di culty, history of hip fracture, depressive symptoms, number of chronic comorbidities, self-reported health status, and level of MPA. Among these variables, age, sex, and waist circumference are the factors most commonly included in previous models for predicting KOA risk, while the others have been controversial or have not been considered in relation to KOA risk. To our knowledge, ours is the rst model for predicting KOA risk in China, and our results suggest that this model can be used to aid in the prevention of KOA, thereby helping to reduce disease burden.
While older age was identi ed as a risk factor for KOA in our study, the most signi cant increase in risk was observed in the 60-69 years group. As estimated, the cumulative incidence of systematic KOA gradually increased beginning at 45 years of age, increasing rapidly after 55 years of age. The peak rate of increase was observed at approximately 65 years of age. However, after 70 years of age, increases in the cumulative incidence of KOA were no longer signi cant [34]. When taken with the results of previous studies, our ndings highlight the need to prevent the incident of KOA in individuals from 45 to 70 years of age. Consistent with the results of previous studies [18,19], female sex was also identi ed as a signi cant risk factor for the incident of KOA: after adjustment for other factors, the risk of developing KOA was 33% higher in women than in men.
Overweight/obesity increases the risk of KOA, in part because obesity creates an abnormal loading environment for weight-bearing joints. Obesity may also contribute to the pathogenesis of KOA by promoting chronic low-grade systemic in ammation or inducing mechanical damage to joint tissues, which in turn may lead to local in ammation. Together, these factors may weaken and degrade joint tissues [35]. Alternatively, the increased risk of KOA may be caused by the positive energy balance and meta ammation associated with obesity [36]. Metabolic syndrome has also been associated with the incident and progression of KOA [36]. Although BMI has been illustrated as an important predictor of KOA [37], Wallace et al. [35] reported that increased abdomen size is associated with a greater risk of radiographic KOA than high BMI. Further studies are required to determine whether BMI or waist circumference comprehensively in uences KOA risk due to mecha ammation and meta ammation. In the present study, we separately analyzed the effect of BMI, waist circumference, and metabolic syndrome on KOA incident in the Chinese population using univariate logistic regression analyses. However, none of these three factors was identi ed as a signi cant predictor of KOA incident. While similar insigni cance also been reported for other ethnic groups [38], additional studies are required to verify this lack of signi cance in the Chinese population.
Our ndings also indicated that depressive symptoms are a signi cant predictor of KOA risk. Patients with mild or moderate-to-severe depression were two or three times more likely to develop KOA than those without depression, respectively. Previous studies [39,40] have revealed that KOA and other chronic diseases increase the odds of depression due to long-term treatment, chronic pain, or high treatment costs. Indeed, the pooled prevalence of depressive symptoms in patients with osteoarthritis has been reported as 19.9% [39]. Xu et al. demonstrated that depression, hypertension, diabetes, arthritis, asthma, and osteoarthritis are prone to be comorbid with other conditions [41]. In patients without any comorbidities at baseline, the risk of KOA remained higher in patients with depressive symptoms than in those without depressive symptoms.  [42]. Although a bidirectional causal association has rarely been illustrated either between arthritis and depression or any other chronic disease and depression, depressive symptoms may be a signi cant risk factor for KOA. Targeted strategies for addressing depressive symptoms may therefore aid in reducing the s incidence of KOA.
As patients with chronic diseases often exhibit comorbidities that interact with the disease in complex ways [41], we assessed relationships for 12 main types of comorbidities. The risk of developing KOA was 24% higher in patients with one or two comorbidities than in those without comorbidities. However, the risk of KOA increased by 54% in patients with three or more comorbidities. KOA and some comorbidities may accelerate the progression of one another [18], and comorbidities have been shown to predict worse pain and deteriorating physical function in patients with KOA [43]. The present study considered the effect of comorbidities when developing a prediction model for KOA in the Chinese population. Further studies including patients of different ethnic groups are required to determine whether this model can be applied more broadly.
Although physical activity is a modi able risk factor for KOA, the reported associations between physical activity and outcomes have been inconsistent. In the present study, we observed no signi cant association between VPA/LPA and the incident of KOA; however, MPA positively predicted the incident of KOA. After adjustment for other factors, the risk of developing KOA was 30% greater in individuals engaging in all levels of MPA (low, middle, high) than in those engaging in no MPA. Notably, a previous study also reported that walking and other recreational activities did not increase the risk of OA in older adults [22]. Results from the Chingford cohort demonstrated that physical activity related to work (OR: 1.48, 95% CI: 0.34-5.64) and sports (OR: 1.23, 95% CI: 0.54-2.81) increases the risk of osteophytes, while physical activities such as walking decrease the risk of osteophytes (OR: 0.60, 95% CI: 0.22-1.71) in middle-aged women [44]. However, additional studies are required to verify these effects given that the differences were not statistically signi cant. Findings from the Framingham Heart Study [45] indicate that long hours of heavy habitual physical activity per day are associated with an increased risk of KOA (OR: 1.3 per hour), with stronger effects occurring when strenuous physical activity is performed for more than 3 h per day (OR: 2.9, 95% CI: 1.2-6.9 for radiographic KOA and OR: 5.3, 95% CI: 1.2-24 for systematic KOA). In their study, the effects of MPA and LPA were insigni cant. This discrepancy between our ndings and previous results may be due in part to differences in the method of assessing physical activity levels or differences in the characteristics of the population. Given these differences, additional studies should aim to verify the in uence of different types of physical activity on the risk of KOA. Such studies should seek to determine the most appropriate type, duration, frequency, and intensity of physical activity for preventing KOA in different populations.
Assessments of di culty in performing ADLs or IADLs may aid in the prediction of KOA incident. In the present study, the risk of developing KOA was 49.0% greater in individuals with ADL/IADL di culty than in those without di culty. Impairments in ADLs prior to KOA onset may represent a predictive signal for KOA. Hence, preventive interventions may be useful in reducing the incident of KOA in those who have di culty with ADL/IADL.
Our prediction model for KOA included one patient-reported outcome (PRO): self-rated health status. Numerous previous studies have focused on PROs, which may capture important disease-related information prior to the onset of clinical signs or pathophysiological changes [46]. Although Silverwood et al. noted that poor self-rated health status was a potential risk factor for KOA, no signi cant associations were observed [18]. In the present study, the odds of developing KOA increased as health status worsened. Compared to participants who rated their health status as "very good", those who rated their health status as "fair" exhibited a 21% increase in the risk of KOA, while those who rated their health status as "poor" or "very poor" exhibited a 199% and 267% increase in the risk of KOA, respectively. Selfratings of health status comprehensively re ect one's physical and psychological function, as well as one's knowledge and ability to cope with diseases and self-e ciency. Although previous researchers have identi ed several potential risk factors, most of these factors were pooled from epidemiological analyses or clinical experience. Our ndings highlight the need to consider the patient's perspective, as this may aid in furthering our understanding of KOA while reducing the incidence of the disease.
Given that the knee and hip joints are the two most important weight-bearing joints, we sought to determine whether a history of hip fracture increases the risk of developing KOA. Our ndings indicated that a history of hip fracture was associated with a 53% increase in the risk of KOA. Related studies [47,48] have demonstrated that rheumatoid arthritis increases the risk of hip fracture due to bone loss induced by chronic in ammation, use of glucocorticoids, and physical inactivity. However, studies reporting an association between hip fracture and KOA incidence are rare. Further studies are therefore required to determine whether hip fracture can be used to predict KOA risk, and to identify the potential mechanisms underlying this association.
In our study, the 4-year cumulative incidence of KOA in rural and urban areas was 10.9% and 8.1%, respectively. Residents of rural areas exhibited a 24% increase in the risk of developing KOA when compared with residents of urban areas, after controlling for other factors. Several previous studies have noted that symptomatic KOA is more common in rural residents of China than in urban residents [30,49].
Limited access to knowledge regarding the prevention of KOA and other chronic diseases, a lack of economic resources for timely treatment of chronic diseases, poor ability to manage one's health, and earlier impairments in physical function due to strenuous farm work may explain the increased risk of KOA in rural residents. These results suggest that policies and resources should be directed toward preventing KOA among residents of rural areas in China.
The present study possesses several limitations of note. First, although we used the imputation method to handle missing data and a bootstrap strategy to limit the in uence of bias, incomplete data may still have biased our ndings. Second, to e ciently predict the risk of KOA, we considered some new variables as predictors in our model. Though they were signi cantly associated with the incident of KOA, further studies are required to elucidate the mechanisms underlying these associations. Lastly, although we used a nationally representative cohort to develop the prediction model, the model was internally validated. Therefore, external validation in other Chinese populations and different ethnic groups remains necessary.

Conclusion
In the present study, we developed the rst model for predicting the 4-year risk of developing systematic KOA in China, using CHARLS data. Our simple score model may aid in the early identi cation of individuals at the greatest risk of developing KOA within 4 years in clinical practice or community setting. Such early identi cation may allow for improved patient education and modi cation of certain risk factors, which may in turn decrease rates of KOA incidence and progression.

Declarations
Ethics approval and consent to participate The Medical Ethics Board Committee of Peking University granted the study an exemption from review.

Consent for publication
Not applicable.

Data availability statement
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.
Competing interests