Development and validation of a nomogram to better predict hypertension based on a 10-year retrospective cohort study in China

Background: Hypertension is a highly prevalent disorder. A nomogram to estimate the risk of hypertension in Chinese individuals is not available. Methods: 6201 subjects were enrolled in the study and randomly divided into training set and validation set at a ratio of 2:1. The LASSO regression technique was used to select the optimal predictive features, and multivariate logistic regression to construct the nomograms. The performance of the nomograms was assessed and validated by AUC, C-index, calibration curves, DCA, clinical impact curves, NRI, and IDI. Results: The nomogram140/90 was developed with the parameters of family history of hypertension, age, SBP, DBP, BMI, MCHC, MPV, TBIL, and TG. AUCs of nomogram140/90 were 0.750 in the training set and 0.772 in the validation set. C-index of nomogram140/90 were 0.750 in the training set and 0.772 in the validation set. The nomogram130/80 was developed with the parameters of family history of hypertension, age, SBP, DBP, RDWSD, and TBIL. AUCs of nomogram130/80 were 0.705 in the training set and 0.697 in the validation set. C-index of nomogram130/80 were 0.705 in the training set and 0.697 in the validation set. Both nomograms demonstrated favorable clinical consistency. NRI and IDI showed that the nomogram140/90 exhibited superior performance than the nomogram130/80. Therefore, the web-based calculator of nomogram140/90 was built online. Conclusions: We have constructed a nomogram that can be effectively used in the preliminary and in-depth risk prediction of hypertension in a Chinese population based on a 10-year retrospective cohort study. Funding: This study was supported by the Hebei Science and Technology Department Program (no. H2018206110).


Introduction
Systemic arterial hypertension (hereafter referred to as hypertension) is the most common risk factor for cardiovascular diseases and the biggest contributor to world mortality from noncommunicable diseases (Mills et al., 2020;Burnier and Egan, 2019). Globally, the number of adults with hypertension increased from 594 million in 1975 to 1.13 billion in 2015; the increase is especially significant in low-income and middle-income countries (NCD Risk Factor Collaboration (NCD-RisC), 2017). As estimated, the number of adults with hypertension is predicted to rise to 1.56 billion by 2025 (Kearney et al., 2005). In China, high systolicblood pressure (SBP) is the leading risk factor for both number of deaths and percentage of disability-adjusted life-years, which accounted for 2.54 million deaths in 2017 (Zhou et al., 2019). In addition, according to the latest nationwide survey of 451,755 participants from 31 provinces in China, 23.2% (nearly 244.5 million) of Chinese adults have hypertension. The data generated in the China Patient-Centered Evaluative Assessment of Cardiac Events Million Persons Project have shown more serious results [Joint Committee for Guideline Revision, 2019]. Among individuals with hypertension, while 46.9% are aware of their condition and 40.7% take prescribed antihypertensive medications, only 15.3% are in control of their blood pressure . Hypertension has imposed so heavy an economic burden on healthcare systems that it requires urgent attention. Early detection of hypertension is vitally important in its control and effective treatment, especially with high-risk subjects.
Hypertension, also known as high blood pressure, is characterized by a persistent elevation of blood pressure in the systemic arteries. Traditionally, the diagnostic criteria of hypertension were SBP ≥140 mmHg and/or diastolicblood pressure (DBP) ≥90 mmHg for the untreated participants, or those taking medication for hypertension. The criteria were broadly accepted by both the 2018 Chinese Guidelines for Prevention and Treatment of Hypertension and the 2018 European Society of Cardiology/European Society of Hypertension guidelines (2018 ESC/ESH) (Joint Committee for Guideline Revision, 2019 ;Williams et al., 2018). However, in November 2017, the American College of Cardiology and the American Heart Association published a guideline for the Prevention, Detection, Evaluation, and Management of High Blood Pressure in Adults (2017 ACC/AHA) (Whelton et al., 2018), which redefined the diagnostic criteria of hypertension from 140/90 mmHg to 130/80 mmHg for SBP/ DBP. This conspicuous numerically based change results in an increased number of patients being diagnosed with hypertension and in questioning the goal's clinical applicability given the financial burden and clinical outcomes (López-Jaramillo et al., 2020). The applicability and potential impact of ACC/AHA 2017 need to be assessed prior to adopting the guideline, especially in China.
Hypertension has been deemed as a complex and multifactorial trait. It is well known that the pathophysiology of hypertension is shaped by combined action of environmental, genetic, anatomical, neural, endocrinal, humoral, and hemodynamic factors (Rodriguez-Iturbe et al., 2017). For example, the Dietary Approaches to Stop Hypertension diet is reported to be closely related to lower risk of hypertension (Navarro-Prado et al., 2020;Francisco et al., 2020). Moreover, psychosocial factors are also possible potentiators and triggers of hypertension. It was showed that psychosocial stress, including occupational stress, socioeconomic pressure, anxiety, and depression, was all associated with greater risk of hypertension, and hypertensive patients had higher level of psychosocial stress compared to normotension patients (Liu et al., 2017). Therefore, a simple and reliable model that helps clinicians or subjects to estimate the risk of hypertension is urgently in need.
In the present study, we aimed to develop and validate a risk prediction model for the screening of hypertension by analyzing the routine parameters of physical examination in China.

Characteristics of subjects
In Group 140/90 , as well as the cut-off value of 140/90 mmHg, the total prevalence of hypertension in 2019 was 24.77% (1536 subjects). At a ratio of 2:1, 4134 subjects were assigned into the training set and 2067 in the validation set. The prevalence of hypertension was 25.35% (1048 subjects) in the training set and 23.61% (488 subjects) in the validation set, respectively. The characteristics of subjects are shown in Table 1. There were no significant differences in the characteristics of hypertension status in 2019, gender, family history of hypertension, smoking status, drinking status, age, SBP, DBP, height, weight, body mass index (BMI), white blood cell count (WBC), lymphocyte count (LYMPH), neutrophil count (NEUT), lymphocyte percentage (LYMPHP), neutrophil percentage (NEUTP), red blood cell count (RBC), hemoglobin (HGB), hematocrit (HCT), mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), mean cell hemoglobin concentration (MCHC), red blood cell distribution width-coefficient of variation (RDWCV), red blood cell distribution width standard deviation (RDWSD), platelet count (PLT), mean platelet volume (MPV), plateletcrit (PCT), platelet distribution  Table 1 continued on next page width (PDW), middle cell count (MID), middle cell percentage (MIDP), alanine aminotransferase (ALT), aspartate transaminase (AST), total protein (TP), albumin (ALB), total bilirubin (TBIL), direct bilirubin (DBIL), glucose (GLU), cholesterol (CHOL), triglycerides (TG), neutrophil-to-lymphocyte ratio (NLR), and platelet-to-lymphocyte ratio (PLR) between the two sets. In Group 130/80 , as well as the cut-off value of 130/80 mmHg, the total prevalence of hypertension in 2019 was 37.92% (1430 subjects). At a ratio of 2:1, 2514 subjects were assigned into the training set and 1257 in the validation set. The prevalence of hypertension was 37.39% (940 subjects) in the training set and 38.98% (490 subjects) in the validation set, respectively. The characteristics of subjects are shown in Table 2 Table 3). These 10 independent factors were used to construct the nomogram 140/90 (Figure 2A).
In Group 130/80 , 21 variables had nonzero coefficients in the LASSO regression model based on the analysis of the training set. These variables included family history of hypertension, drinking status, age, SBP, DBP, weight, BMI, WBC, NEUT, LYMPHP, RBC, RDWSD, PCT, PDW, ALT, AST, TP, TBIL, GLU, CHOL, and TG ( Figure 1C D). Multivariate logistic regression analysis revealed that family history of hypertension, age, SBP, DBP, RDWSD, and TBIL were independent risk factors for hypertension ( Table 4). These six independent factors were used to construct the nomogram 130/80 ( Figure 2B).

Clinical utility of nomogram 140/90 and nomogram 130/80
The decision curve analysis (DCA) showed that the nomogram 140/90 and nomogram 130/80 had greater net benefits for the identification of hypertension than that of any single factor in the training sets, respectively (Figure 5A B). Similar results were found in the validation sets ( Figure 5C D). In addition,

Website of nomogram 140/90
The web-based user-friendly calculator of nomogram 140/90 (https:// haijianglaoqi. shinyapps. io/ Risk-ofhypertension/) was developed and freely available online to help patients and physicians to calculate the risk of hypertension.

Discussion
Accurate and timely diagnosis of hypertension is of great importance for effective therapy. Therefore, it is necessary to establish a model to estimate the risk of hypertension to aid in risk stratification and management. Some hypertension risk prediction models have been preliminarily developed in different populations in the past decade, such as in Iranian, Korean, Japanese, and Indian (Bozorgmanesh et al., 2011;Lim et al., 2013;Otsuka et al., 2015;Sathish et al., 2016). In these models, the risk factors of hypertension varied widely across studies. Age, SBP, DBP, and current smoking status were the most common independent factors in the studied population, and all of them were included in four prediction models. However, there is no agreement among investigators as to what constitutes a major predictor. It is therefore suggested that a hypertension risk prediction model developed in the particular racial, ethnic, or national groups may not be directly applied to other populations. In China, some hypertension risk prediction models were also established. In 2016, Chen et al., 2016 constructed a sex-specific multivariable hypertension prediction model based on northern urban Han Chinese population. The predictive model yielded an AUC of 0.761 for men and 0.753  for women. The limitation of their study is that it did not perform internal or external validation. In 2019, Xu et al., 2019 constructed several predictive models for hypertension among Chinese rural populations. In the training set, AUCs ranged from 0.720 to 0.767 for men and from 0.740 to 0.809 for women. In the testing set, AUCs ranged from 0.722 to 0.773 for men and from 0.698 to 0.765 for women. Two studies mentioned above were carried out either in a single rural area or urban area. Another predicted model for hypertension based on a large cross-sectional study was established recently (Ren et al., 2020). In spite of a large group of people (73,158 samples), the prediction performances of the model were simply assessed by probability of disease (POD) index and AUC values (76.52% in the train set and 75.81% in the test set). Therefore, the present study might be the first one to develop nomogram for the prediction of hypertension based on systematic assessment and validation in China.
In this study, according to two diagnostic criteria of hypertension, we developed and validated two nomograms for the prediction of hypertension based on a 10-year retrospective cohort study in Chinese population. Both nomograms were constructed mainly based on the physical examination data. The nomogram 140/90 incorporated 10 parameters including family history of hypertension, age, SBP, DBP, BMI, MCHC, MPV, TP, TBIL, and TG. The nomogram 130/80 incorporated six parameters including family history of hypertension, age, SBP, DBP, RDWSD, and TBIL. All parameters are readily available in routine health examinations. Therefore, these nomograms will be useful for the in-depth assessment without the assistance of physicians. Notably, receiver operating characteristic (ROC) analysis indicated that AUC of nomogram 140/90 was higher than that of nomogram 130/80 . The nomogram 140/90 also displayed excellent discrimination with a C-index of 0.75 and good calibration. High C-index value of 0.772 could still be reached in the internal validation. DCA and clinical impact curve showed that the majority of the threshold probabilities in this model had good net benefits. Moreover, NRI and IDI were originally proposed to characterize accuracy improvement in predicting a binary outcome, when new biomarkers are added to regression models. Most recently, these two indices have been extended from binary outcomes to multicategorical and survival outcomes . For example, in 2020, Zhang et al. compared the predictive ability of a stroke prediction model (China-PAR) with the revised Framingham Stroke Risk Score (R-FSRS) for 5-year stroke incidence in a community cohort of Chinese adults . The two prediction models have five same risk factors, as well as two additional factors in R-FSRS and six additional factors in China-PAR. The NRI and IDI values were assessed to compare the discrimination ability of two prediction models. Similarly, to better assess the performance of nomogram 140/90 and nomogram 130/80 , the NRI and IDI were also used to determine the best model. In this study, the NRI and IDI showed that the nomogram 140/90 exhibited superior performance than nomogram 130/80 . Thus, the nomogram 140/90 is the most sensitive hypertension risk prediction tool under the promise of guaranteeing accuracy.
Multivariate logistic regression analysis revealed that gender was not an independent predictor for hypertension in our study. Similar to our study, researchers from Iran revealed the same result, reporting that sex was not found to be an independent risk factor for hypertension (Talaei et al., 2014). Nonetheless, several previous studies have reported gender to be significantly associated with hypertension (Chen et al., 2016;Fidalgo et al., 2019;Zheng et al., 2014;Kshirsagar et al., 2010). It is still controversial whether the incidence of hypertension is associated with gender. In the future, the role of gender on blood pressure and its correlation with hypertension need further evaluation with larger population cohorts. In our study, the total incidence of hypertension was 24.77% and 37.92% at a cut-off value of 140/90 mmHg and 130/80 mmHg, respectively. The reported prevalence of hypertension in China shows a great geographical variation, ranging from 23.2% to 44.7% (Joint Committee for Guideline Revision, 2019; Lu et al., 2017;Asgari et al., 2020;Labasangzhu et al., 2020). The difference between these studies may be caused by the following reasons. Firstly, the surveys mentioned above were conducted in different periods and in different age groups by different organizations, which may result in inconsistencies. Secondly, different dietary habits and lifestyles among different populations may contribute to observed differences. For example, mean sodium intake of northern Chinese was notably higher than that of southerners [Heizhati et al., 2020]. That excessive salt intake increases the risk of hypertension has been well documented in epidemiological and clinical studies (Anderson et al., 2010).
There are also some limitations in our study. Firstly, as mentioned above, the nomogram was constructed based on multivariate analysis of physical examination data between 2009 and 2019. Loss to follow-up and missing data reduced the effective sample size and may threaten the internal validity of the study. Secondly, some subjects who were diagnosed with secondary hypertension may be also included in this study. The diagnosis of hypertension may lack strictness. Thirdly, the nomogram showed medium prediction accuracy may suggest that other factors should be included. Many blood parameters were deleted because of missing data. These may have inevitably caused bias. The prediction accuracy could perhaps be improved in further studies with large sample sizes and more variables. Further multicenter external validation should be performed to verify the discriminating ability and generalizability of our nomogram.

Study population and data collection
The current study was carried out based on a large cohort study, named the Physical Examination Survey. The survey was conducted among subjects who underwent medical examination in a physical examination center of Hebei Province in 2009 and 2019. As shown in Figure 7, a total of 51,165 and 209,636 subjects who underwent physical examination in 2009 and 2019 were enrolled in this study, respectively. To avoid potential observational bias, we excluded subjects who had taken antihypertensive drugs before the medical examination. Subjects who did not finish the procedures of this survey and had missing data on the collected parameters were also excluded. After rigorous screening, 8020 subjects who underwent medical examination both in 2009 and 2019 were finally enrolled. At a cut-off value of 140/90 mmHg, 6201 subjects who had normal blood pressure in 2009 were enrolled in Group 140/90 . At a cut-off value of 130/80 mmHg, 3771 subjects who had normal blood pressure in 2009 were enrolled in Group 130/80 . The data of Group 140/90 and Group 130/80 were used to construct the nomogram 140/90 and nomogram 130/80 for predicting hypertension, respectively. The socio-demographic and clinical parameters from the electronic medical records system were collected, including hypertension status in 2009, hypertension status in 2019, gender, family history of hypertension, smoking status, drinking status, age, SBP, DBP, height, weight, BMI, white blood cell count (WBC), lymphocyte count (LYMPH), neutrophil count (NEUT), lymphocyte percentage (LYMPHP), neutrophil percentage (NEUTP), red blood cell count (RBC), hemoglobin (HGB), hematocrit (HCT), mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), MCHC, red blood cell distribution width-coefficient of variation (RDWCV), RDWSD, platelet count (PLT), MPV, plateletcrit (PCT), platelet distribution width (PDW), middle cell count (MID), middle cell percentage (MIDP), alanine aminotransferase (ALT), aspartate transaminase (AST), TP, albumin (ALB), total bilirubin (TBIL), direct bilirubin (DBIL), glucose (GLU), cholesterol (CHOL), TG, neutrophil-to-lymphocyte ratio (NLR), and platelet-to-lymphocyte ratio (PLR).
All procedures were approved by the Ethics Committee of Hebei General Hospital. All subjects' data were anonymized and de-identified prior to the analyses. The requirement for informed consent was therefore waived.

Definition and assessment
Hypertension was defined as two diagnostic criteria: (1) SBP ≥ 140 mmHg or DBP ≥ 90 mmHg or antihypertensive medication use according to 2018 Chinese Guidelines and 2018 ESC/ESH guidelines; (2) SBP ≥ 130 mmHg or DBP ≥ 80 mmHg or antihypertensive medication use according to 2017 ACC-AHA guidelines. Blood pressure was measured after a minimum of 5 min rest in sitting position. BMI was computed by the ratio of body weight (kg) to height squared (m 2 ). The blood samples were collected in the morning on an empty stomach.

Statistical Aanalysis
For construction and validation of the nomogram, the subjects were randomly divided into training set and validation set at a ratio of 2:1, respectively. The comparability between the two sets was then evaluated. Continuous variables with normal distribution were described as means ± standard deviation and analyzed with Student's t-test to infer the differences between the two sets. Continuous variables with skewed distribution were described as median (25% percentile, 75% percentile) and analyzed with Mann-Whitney U test. Categorical data were presented as numbers (percent) and analyzed with chi-square test or the Fisher's exact test for their comparisons.
The LASSO regression technique was used to select the optimal predictive features in the training set. Then, multivariate logistic regression analysis was used to identify the independent factors by incorporating the feature selected in the LASSO regression. Following the multivariate analysis, factors with a two-sided p value <0.05 were selected for developing the nomograms. The predictive accuracy of nomograms was measured by AUC of the ROC curve and concordance index (C-index) in both the training and validation sets. The consistency between the actual outcomes and predicted probabilities was measured by the calibration curve. The clinical utility of the nomograms was measured by DCA and clinical impact curves for a population size as 1000. To compare the predictive accuracy of the nomogram 140/90 with that of nomogram 130/80 , NRI and IDI were calculated.

Conclusion
In conclusion, based on a 10-year retrospective cohort study, we developed and validated a simple and reliable nomogram to predict the risk of hypertension for the population of China. The nomogram demonstrated favorable predictive accuracy, discrimination, and clinical utility in the training set and validation set, indicating good performance in practical application. This visualization model and website will aid the patients and physicians to predict the 10-year risk of hypertension and better clinical management.