Development and external validation of a nomogram for predicting preterm birth at < 32 weeks in twin pregnancy

The purpose of this study was to develop a dynamic model to predict the risk of spontaneous preterm birth at < 32 weeks in twin pregnancy. A retrospective clinical study of consecutively asymptomatic women with twin pregnancies from January 2017 to December 2019 in two tertiary medical centres was performed. Data from one centre were used to construct the model, and data from the other were used to evaluate the model. Data on maternal demographic characteristics, transvaginal cervical length and funnelling during 20–24 weeks were extracted. The prediction model was constructed with independent variables determined by multivariate logistic regression analyses. After applying specified exclusion criteria, an algorithm with maternal and biophysical factors was developed based on 88 twin pregnancies with a preterm birth < 32 weeks and 639 twin pregnancies with a delivery ≥ 32 weeks. It was then evaluated among 34 pregnancies with a preterm birth < 32 weeks and 252 pregnancies with a delivery ≥ 32 weeks in a second tertiary centre without specific training. The model reached a sensitivity of 80.00%, specificity of 88.17%, positive predictive value of 50.33% and negative predictive value of 96.71%; ROC characteristics proved that the model was superior to any single parameter with an AUC of 0.848 (all P < 0.005). We developed and validated a dynamic nomogram model to predict the individual probability of early preterm birth to better represent the complex aetiology of twin pregnancies and hopefully improve the prediction and indication of interventions.

There are discrepant opinions on how precisely the risk of spontaneous preterm birth (SPTB) in twin pregnancies can be determined. More importantly, preterm birth is a complex syndrome with many causes and phenotypes. In twins, there is an additional pre-existing risk due to overdistension and the effect on the cervix but possibly also due to increased uterine irritation and subclinical inflammation after ART or physical and psychological maternal stress factors 16,17 . The great variety in PTB rates signifies that there are epigenetic transgenerational stress factors and determinants from the social environment and the health care system [18][19][20] . In addition, perinatal morbidity and mortality among twins vary by chorionicity, and monochorionicity is significantly associated with an increased risk of PTB 21 .
Previous studies have demonstrated the association between SPTB in twin pregnancies and specific clinical indicators, such as ethnic origin, age, nulliparity, chorionicity, body mass index (BMI), tobacco usage, history of previous preterm delivery, cervical length and funnelling [21][22][23][24][25][26][27][28][29][30][31] . Different combinations of clinical variables might indicate different likelihoods of SPTB. The purpose of this study is to synthesize an array of maternal demographic factors and clinical variables and develop a practical algorithm to calculate the risk of SPTB for twin pregnancies, similar to the first trimester genetic disease screening tools or the Framingham heart disease score 32 .

Results
Characteristics of the development and external validation groups. In total, 1013 asymptomatic twin pregnancies were eligible for the study, of which 727 collected from the Fujian Maternity and Child Health Hospital were assigned to the training group, while 286 from the Fujian Provincial Hospital were assigned to the external validation group (Fig. 1). In the whole study population, the numbers of positive cases of SPTB at < 28, 32, 34 and 37 weeks were 31 (3.06%), 122 (12.04%), 207 (20.43%) and 596 (58.84%), respectively.
There were no significant differences in maternal demographic and clinical characteristics between the training and validation groups (all P > 0.05), indicating that the features of the training and external validation groups were similar and that subsequent external validation would be representative (Table 1).
Predictive factors associated with SPTB at < 32 weeks. In the training group, we conducted univariate and multivariate regression analyses to detect the correlations between clinical variables and probabilities of preterm delivery before 28 weeks, 32 weeks, and 34 weeks by applying the AIC-based backward procedure (Table 2). Then, we constructed three ROC curves for predicting SPTB according to the results of multivariate analysis. By comparing the AUCs, we found that the predictive value for SPTB at < 32 weeks was the highest (Fig. 2). After comprehensively considering the predictive power and the number of positive cases of SPTB before the three gestational weeks, we chose to establish a predictive model for predicting PTB at < 32 weeks. Multivariate logistic regression analysis (< 32 weeks) showed that nulliparity, monochorionicity, lower prepregnancy BMI, previous preterm birth or late abortion, cervical funnelling and shorter cervical length were independent risk factors for SPTB at < 32 weeks.  ingful independent factors in multivariate regression analysis, we developed a nomogram to predict SPTB probability at < 32 weeks (Fig. 3). Each point could be determined based on the intersection of the vertical line from the variable to the point axis. Then, the total risk score was calculated by adding each variable point. The possibility of twin SPTB at < 32 weeks could be read on the total point axis. Furthermore, a user-friendly dynamic predicative nomogram was established and is available online (https:// zhanw enqia ng. shiny apps. io/ DynNo mapp/). The dynamic nomogram conveniently provided the individual probability of SPTB, which was calculated automatically by the input parameters of each subject (Fig. 4, PS: To facilitate readers' understanding, we specifically recorded a video of how to use the model, which is in the attachment). Harrell's concordance index value of the nomogram model in the training group was 0.848 (95% CI 0.809-0.892). When applied to the external validation group, Harrell's concordance index value in the external group was 0.782 (95% CI 0.735-0.826). The calibration curves indicated that the probability predicted by the nomogram was in good agreement with the actual probabilities in both the internal cohort and external cohort (Fig. 5).
Model performance test and risk stratification. Next, the restricted cubic spline curve showed that the risk escalated continuously with the increasing scores obtained from the nomogram, which proves the reliability of the model (Fig. 6). In the training group and external validation group, the AUCs of the nomogram predicting the probability of SPTB at < 32 weeks were 0.848 (95% CI 0.809-0.892) and 0.782 (95% CI 0.735-0.826), respectively. In both groups, the prediction accuracy of the nomogram was superior to that of any single predictor (all P < 0.005) (Fig. 7). With the ROC curve of the training group, the optimal cut-off value of the risk score (125. 16) was calculated based on the maximum Youden index. Then, the cut-off value categorized the training population into the low-risk group (155 twin pregnancies with risk score ≤ 125.16) and the high-risk group (572 twin pregnancies with risk score > 125.16) (OR 17.09, 95% CI 10.28-28.62, P < 0.05). The model reached a sensitivity of 80.00%, specificity of 88.17%, positive predictive value (PPV) of 50.33% and negative predictive value (NPV) of 96.71%. By using the same cut-off value in the external validation group, the results also proved the predictive performance of the nomogram (Table 3). Thus, we observed that the probability of SPTB in the high-risk group was significantly higher than that in the low-risk group (HR 0.537, 95% CI (0.382-0.756), P < 0.001), and gestational age at delivery was significantly earlier in the high-risk group (Fig. 8).

Discussion
In our retrospective analysis and external validation study, we developed a predictive model of SPTB at < 32 weeks based on maternal characteristics and sonographic cervical measurements to provide an accurate and comprehensive risk estimation, which can serve as an assessment tool to help physicians make decisions about further management of twin pregnancy. The reason we comprehensively considered all the above factors when building the model was that the predictive performance of a single maternal factor or cervix geometry (including length) is not satisfactory, primarily due to poor sensitivity [33][34][35][36][37] . The mechanism of SPTB involves various mechanical stimuli (two continuously growing foetuses and the expanding uterus) and biochemical stimuli (inflammatory factors, fetoplacental signals and steroid hormones) 38,39 . Compared to that in singleton pregnancies, the mechanism of SPTB in twin pregnancies is predominantly determined by overdistension, whereas the role of inflammation and microbiologic invasion of the amniotic cavity (MIAC) is relatively minor 16 . Overdistension of the lower uterine segment and smooth muscle stretch in the human cervix provokes proinflammatory cytokine secretion, and research on changes in the cervical microstructure has been published by Vink et al. 17,40,41 . Jose Villar et al. proposed the use of a phenotypic classification system of PTB that does not force any PTB into a predefined phenotype but instead relies on a new conceptual framework in which a maternal clinical phenotype of PTB potentially related to a certain perinatal outcome is characterized by all relevant conditions observed during pregnancy 18 .
A series of common clinical characteristics, such as age, race, BMI, history of PTB, previous uterine surgeries, and tobacco usage, may indicate the initial states and variations in the structure and function of the cervix, which contributes to the risk of cervical insufficiency 19,20,[25][26][27]42,43 . All these risk factors have interconnected effects and a computational framework for changing and remodelling the cervix. Our study is concordant with existing research indicating that nulliparity, lower prepregnancy BMI, history of PTB or late abortion, chorionicity, cervical funnelling and shorter cervical canal increase the possibility of SPTB in twin pregnancies. However, there is no risk calculation yet for SPTB before 32 weeks, which still represents a population with a tenfold increased risk for perinatal mortality compared to twins at term 44 . Our research incorporated maternal characteristics and biophysical tests of both cervical length and funnelling to develop a dynamic nomogram model that reached favourable PPV and NPV. Given that women with twin pregnancies are at high risk of preterm birth, better PPV www.nature.com/scientificreports/ and NPV indicate a higher rate of clinical diagnosis accuracy and a lower incidence of misdiagnosis. Therefore, our model may better guide clinical strategies, such as therapy decision-making and follow-up schedules, and could reduce complications for clinicians related to excessive monitoring and administration resulting from an undefined or inherently subjective risk assessment. Thus, the ability to generate a risk assessment and present it in the form of a percentage for each patient will enable caregivers to schedule more frequent follow-ups or administer targeted interventions, such as antenatal corticosteroids and tocolytic therapy, as well as transfer to a tertiary medical centre for patients at higher risk while reducing overtreatment and unnecessary hospitalization for those at lower risk. On the other hand, in the study design for the negative trials regarding PTB intervention, only a few researchers screened out and followed high-risk twin pregnancies, which may introduce confusion regarding indications for the interventions and result in bias when comparing outcomes 7,10,11,45 . To some extent, a lack of good care during surveillance frequently makes the difference in RCTs. It would be interesting in the future to determine whether the use of this tool to assess the indications for interventions and stratify patients according to risk could improve outcomes. Our study has some limitations. Most importantly, it is limited by its retrospective design. There is a possibility of confounding bias: patients with unmeasured or unobservable factors who were excluded may represent www.nature.com/scientificreports/ patients at higher risk, so our study might ignore the most clinically interesting population. Second, the study population in the two centres is limited to our own population (Asian), which limits generalizability to people of different races. For example, in many high-resource countries, the risk of PTB is associated with obesity and is not underweight 25,46 . However, this potential limitation may also be considered a strength. All women included in the study were followed up and treated only in the two tertiary medical centres, which limits the confounding factors associated with the heterogeneity in provider bias, such as clinicians' experience, and differences in the  www.nature.com/scientificreports/ process of monitoring and management for offering the intervention. Based on the model, researchers in other countries can make use of their own data on demographic characteristics to justify the odds for their population. The last limitation is that because of the incomplete data for cervical length before 20 weeks, our model may poorly predict very early PTB since we adopted cervical measurements during 20-24 weeks and applied the system relatively late for the high-risk population 47 . In the future, we should concentrate on earlier evaluation of our algorithm to prevent early mortality and severe morbidity.
In summary, we developed and validated a dynamic nomogram model to predict the individual probability of early preterm birth; this nomogram better represents the complex aetiology of twin pregnancies and hopefully improves our understanding of the indications for interventions and, therefore, our ability to predict when they will be needed.

Materials and methods
Study population. We retrospectively collected data from 1461 consecutively asymptomatic women with twin pregnancies in the Fujian Maternity and Child Health Hospital (with an annual delivery number of more than 20,000 and a specified preterm birth clinic for ambulatory patients) and the Fujian Provincial Hospital (with 2398 beds and an annual delivery number of more than 5000) from January 2017 to December 2019. This retrospective study was performed with approval from the Ethics Committee of the Fujian Maternity and Child Health Hospital and the Fujian Provincial Hospital (Ethical approval number: 2019-014). The data were anonymous, and the requirement for informed consent was therefore waived. The completion and reporting of the study was in accordance with STROBE guidelines. Subjects with any of the following conditions were excluded: incomplete records, genetic or structural abnormalities of either foetus, stillbirth of one or two foetuses, gestational age at birth < 20 weeks, twin birth weight < 500 g, monoamniotic or monochorionic twin pregnancy complicated by twin transfusion syndrome (TTTS) or twin anaemia-polycythaemia sequence (TAPS), placement of cervical cerclage, use of vaginal progesterone, maternal or foetal indications for iatrogenic PTB at < 32 weeks, or delivery at a medical centre other than ours. Women who gave birth before 20 weeks were excluded because in most cases, these women were likely to represent a unique subgroup of women whose cervical changes would have been detected very early and would be extremely obvious. Additionally, these women would not have had their cervical measurement at the indicated gestational stage in our study period, which was a major part of our research. As a result, we excluded 448 patients who met the exclusion criteria, and thus, 1013 patients met the inclusion criteria.
We assigned 727 samples collected from the Fujian Maternity and Child Health Hospital as the training group and 286 samples collected from the Fujian Provincial Hospital as the external validation group. All samples were reassessed by two obstetricians according to the inclusion and exclusion criteria (the flowchart showing the derivation of the development cohort and validation cohort is presented in Fig. 1).

Data collection.
Medical records were surveyed retrospectively, and the following data were extracted from patients' charts. Demographic characteristics included maternal age, prepregnancy body mass index (prepregnancy BMI), nulliparity, history of previous cervical surgery, history of tobacco usage, clinical data including validation of gestational age by first trimester ultrasound, chorionicity, history of previous preterm or late abortion (during 12-28 weeks), complications during pregnancy, use of assisted reproductive technology, cervical length (20-24 weeks) and cervical funnelling, and gestational age at delivery.
Gestational age was calculated from the last menstrual period (LMP) and confirmed by the foetal crown-rump length measurement at the first trimester ultrasonic scan. If a discrepancy of more than 7 days was observed, the sonographic gestational age was followed. Chorionicity was confirmed by identifying lambda and T signs with ultrasound imaging between 11 +0 and 13 +6 weeks of gestation 48 .
The ultrasound measurements were in accordance with a unified standard. All patients underwent transvaginal cervical length (TVCL) measurements between 20 and 24 weeks when the optimal image of the cervix was relatively easy to capture. The TVCL measurements of all subjects were performed by experienced sonographers www.nature.com/scientificreports/ at our ultrasound units. The ultrasound assessment was performed to measure the length of the cervical canal from the internal OS to the external OS and observe whether cervical funnelling appears with patients in the lithotomy position with an empty bladder. The measurement was repeated under gentle fundal pressure or the Valsalva maneuver unless severe cervical shortening was observed. Each examination was performed for at least 3 min as an evaluation period to detect the development of a "funnel", which was defined as the protrusion of the amniotic membrane of 3 mm or more into the internal os as measured along the lateral border of the funnel ( Fig. 9)  , and qualitative data are expressed as the number (percentage). The Wilcoxon-Mann-Whitney test or Fisher's exact test was performed to measure the distribution differences of variables between the development and external validation groups. Univariate and multivariate logistic regression analyses were used to detect the correlation between clinical variables and preterm birth at 28 weeks, 32 weeks, and 34 weeks by applying a backward procedure based on the Akaike information criterion (AIC). By drawing the ROC curve of the predicted probabilities of SPTB before three gestational weeks (28, 32, 34 weeks) with multivariate meaningful variables, the prediction power for SPTB before the three gestational weeks was compared. Based on these results, a nomogram model with higher predictive performance was established, and bootstrapping techniques were used for internal verification to improve the robustness of the model.

Model validation.
The performance of the nomogram models in identification and calibration was evaluated.
The discriminative ability and predictive ability of the model were evaluated through Harrell's C-index, and external crowds were introduced to further evaluate the predictive value of the model. The calibration curve was analysed by drawing the predicted probability of the nomogram and the actual occurrence of SPTB. Restricted cubic splines were used to evaluate the correlation between the model's predicted score and the risk of SPTB. Kaplan-Meier curves were generated to compare the pregnancy outcomes in the two groups with different risk stratifications. ROC curve analysis was used to evaluate the prediction performance of the nomogram model and that of each meaningful parameter. Table 3. Association between total risk scores and risks of PTB at < 32 weeks. TP true positive, FP false positive, FN false negative, TN true negative, FPR false positive rate (= 1-specificity), FNR false negative rate (= 1-sensitivity), PPV positive predictive value, NPV negative predictive value. a Adjustment for age, smoking, assisted reproductive technology, history of previous cervical surgery, gestational diabetes. *p-value < 0.05.   Figure 8. Survival curves of the high-risk group and the low-risk group in twin pregnancies. Kaplan-Meier curves were generated for GA at delivery. Log-rank test comparisons between the high-risk group and the lowrisk group showed significant differences (HR 0.537, 95% CI 0.382-0.756, P < 0.001). Ethics approval. All procedures performed in studies involving human participants were approved by the China Ethics Committee and the institutional ethical review boards of the Fujian Maternity and Child Health Hospital and the Fujian Provincial Hospital (Ethical approval number: 2019-014). Because the dataset contained no data enabling patient identification and all women received standard care, the study was exempt from informed consent requirements.