Development and internal validation of a clinical prediction model for spontaneous abortion risk in early pregnancy

Highlight • A clinical predictive model for the risk of spontaneous abortion in early pregnancy.• This model has added mental health factors compared to previous studies.• Targeted interventions targeting high-risk women to reduce the risk of spontaneous abortion.


Introduction
Spontaneous abortion, defined as the unintentional termination of a pregnancy before 20 weeks gestation, is a common complication affecting 10%-15% of clinically recognized pregnancies. 1 It can occur due to embryonic abnormalities, uterine abnormalities, endocrine disorders, infection, lifestyle factors, and other etiologies. 2 Spontaneous abortion is not only a major pregnancy complication but also has significant psychological impacts on women.−5 However, most existing studies utilize retrospective case-control designs and are focused on investigating individual risk factors rather than developing comprehensive prediction models.
There remains a need for robust and well-validated prognostic models that can estimate the risk of spontaneous abortion in early pregnancy based on multiple demographics, clinical, and lifestyle predictors.
Accurate individual risk prediction can aid in counseling, monitoring and timely interventions for high-risk women to mitigate adverse outcomes.A few scoring systems have been recently developed to predict recurrent pregnancy loss rather than first-time miscarriages. 6,7However, those models have limitations such as small sample size (n < 500), inadequate validation, and suboptimal predictive performance (C-statistics < 0.7).Therefore, this study aimed to develop and internally validate a clinical prediction model to estimate the risk of first-trimester spontaneous abortion in pregnant women based on a wide range of predictors encompassing clinical, socio-demographic, lifestyle, and mental health factors.

Study participants
Patients were recruited from January 2021 to December 2022.Baseline data was collected at the first prenatal visit.Follow-up of pregnancy outcomes continued until delivery.A total of 9,895 pregnant women were enrolled in this study, including 9,306 in the normal pregnancy group and 589 in the spontaneous abortion group.The inclusion criteria were established as follows: 1) Participants in this survey research were pregnant women who received an ultrasound diagnosis of normal intrauterine pregnancy and agreed to take part.2) The study included pregnant women between the ages of 18-48.Exclusion criteria were applied.1) Pregnant women who have reproductive system abnormalities; 2) Pregnant women with autoimmune diseases, including antiphospholipid syndrome, systemic lupus erythematosus, undifferentiated connective tissue disease, Sj€ ogren's syndrome, and others; 3) Patients diagnosed with severe heart, liver, kidney, and hematopoietic diseases; 4) Insufficient data collection in the cases studied.

Study methods
Standardized questionnaires were developed in accordance with the survey plan, incorporating information on the age of the pregnant woman, gravidity, number of abortions, number of embryonic arrests, BMI, educational level, family income, history of hypertension, thyroid function, history of diabetes, history of polycystic ovary syndrome, assisted reproduction (if applicable), smoking and alcohol consumption history, exposure to pollution sources (including air pollution and radiation), frequency of staying up late, and recent home renovation status.The DASS-21 Chinese version was implemented to evaluate the psychological condition of expectant mothers.The scale comprises three subscales: depression, anxiety, and stress, with a total of 21 items.Patients utilized a 4-point scoring scheme ranging from "0" (disagree) to "3"  Regular ultrasound examinations were conducted to evaluate the embryonic health of patients.The spontaneous abortion group was identified as those who experienced spontaneous abortion due to embryonic arrest, while those who did not were classified as the normal group.A database containing information on patients with spontaneous abortion was created and reviewed by another researcher.The study protocol was approved by the Ethics Committee of Jinan Second Maternal and Child Health Hospital (Approval number: 2023-YBD-1-05).Prior to completing the questionnaire, the medical staff sought the opinions of the patients.Participation in the study required completion of the questionnaire.The researchers maintained strict confidentiality with regard to patients' personal information.As an observational study, this study follows the STROBE statement.

Statistical methods
EpiData 3.1 and SPSS 27.0 (IBM Corp., USA) statistical software were used for data entry and analysis.The data was compared between two groups using T-tests and Chi-Square tests, and factors with statistically significant differences in univariate analysis underwent logistic regression analysis to screen out influencing factors of early spontaneous abortion.Multivariate regression analysis was performed on all possible predictive factors, and predictors with p > 0.05 were sequentially removed using multivariable logistic regression with backward stepwise elimination to identify independent predictors of spontaneous abortion.The results were considered statistically significant at a p-value of <0.05, and bootstrapping with 1000 samples was used to internally validate the model and adjust for optimism/overfitting. Discrimination was assessed by the C-statistic, and calibration was assessed using the Hosmer-Lemeshow test.

Univariate analysis of general information and clinical factors
This prospective cohort study comprised 9,895 participants, with 9,306 in the normal pregnancy group and 589 in the spontaneous abortion group (Fig. 1).The mean age of the spontaneous abortion group was 33.03 ± 6.12 years, which was significantly greater than the mean age of 30.60 ± 5.98 years in the normal pregnancy group (t = 9.51, p < 0.05).

Univariate analysis of pregnant women's mental health
Significant differences in depression status were found between the normal pregnancy group and the spontaneous abortion group (p < 0.05).Technical term abbreviations have been explained upon first use.The structure is logical and causal connections between statements have been retained.British English conventions have been followed throughout, including formal register, precise word choice, and consistent citation and footnote style.In the spontaneous abortion group, a higher degree of depression was observed to be associated with a higher proportion of spontaneous abortion.The highest proportion of spontaneous abortion was observed in the group with moderate depression (31.07%), followed by severe depression (26.12%) and extremely severe depression (16.12%).The language used is clear, objective, and value-neutral, and avoids biased or ornamental expressions.Significant differences were found in anxiety levels between the normal pregnancy group and the spontaneous abortion group (p = 0.021).In the spontaneous abortion group, a higher degree of anxiety was associated with a greater proportion of spontaneous abortions.Moderate anxiety was the most prevalent (30.39%), followed by severe anxiety (24.82%) and extremely severe anxiety (17.44%).Significant differences in stress levels were observed between the group of women experiencing a normal pregnancy and those who had a spontaneous abortion (p < 0.05).The proportion of spontaneous abortions increased with the severity of stress in the latter group, with the highest rate observed in cases of severe stress (32.37%), followed by extremely severe stress (26.89%) and moderate stress (26.89%) (shown in Table 2).

Nine
The exponentiated coefficients represent the odds ratio for each predictor variable.For example, the odds of spontaneous abortion increase by 1.069 times for each 1-year increase in maternal age.This full model equation allows the calculation of predicted risks of spontaneous abortion for individual patients based on their predictor values.It can be incorporated into a nomogram, web calculator, or mobile app to obtain predicted risks.

Model performance
The discrimination of the model was excellent with a C-statistic of 0.88 (95% CI 0.87−0.90).The C-statistic indicates the ability of the model to differentiate between patients who did and did not experience a spontaneous abortion (Fig. 2).After internal validation with bootstrapping, the optimism-adjusted C-statistic was 0.87, indicating minimal overfitting.
Calibration refers to how closely the predicted risks agree with observed risks.The calibration plot showed good agreement between predicted and observed spontaneous abortion risks across tenths of predicted risk.The Hosmer-Lemeshow test also demonstrated good calibration (p = 0.27).
A predicted probability threshold of >0.08 was selected based on the Youden index to optimize the balance of sensitivity and specificity.The model classified 6.5% of patients as high risk using a predicted probability threshold of >0.08.Among these women, the observed spontaneous abortion rate was 12.4%, compared to 4.7% in the low-risk group.The sensitivity and specificity were 72% and 84%, respectively.The negative predictive value was 97%, suggesting the model was very effective at identifying women at low risk of spontaneous abortion.

Discussion
This study developed and validated a clinical prediction model for estimating first-trimester spontaneous abortion risk in Chinese women, demonstrating good discrimination and calibration.The model enables individualized risk assessment based on a multitude of demographic, clinical, lifestyle and mental health predictors.With further validation, it holds promise to guide counseling and interventions for high-risk women.
Several robust predictors emerged, including advanced maternal age, obstetric history, chronic conditions like thyroid disorders and polycystic ovary syndrome, assisted reproduction, toxic environmental exposures, and poor mental health. 8,9The wide range of factors underscores the complex multifactorial etiology of spontaneous abortion. 10dvanced age likely contributes through age-related reductions in oocyte quality, uterine receptivity, and embryo aneuploidy. 11Recurrent pregnancy loss may reflect cumulative damage to endometrial function and the maternal-fetal interface. 12Medical comorbidities such as thyroid disease can perturb the hormonal milieu and metabolic environment needed to sustain early pregnancy. 13,14Assisted reproduction increases risks due to underlying subfertility, and effects of controlled ovarian stimulation and laboratory procedures. 15Environmental toxins can disrupt placental development and trigger embryonic oxidative stress and DNA damage. 16,17−21 This study has several strengths.Firstly, the large prospective cohort allowed for the analysis of numerous candidate variables.Secondly, rigorous adherence to TRIPOD guidelines enhanced model development and internal validation.Finally, discrimination and calibration metrics indicate good predictive performance.
Limitations of this study include its single-center design and reliance on self-reported data, which may introduce residual confounding given the observational design.Furthermore, external validation and impact analysis are required prior to clinical application of the model.Future refinements incorporating emerging biomarkers and modifiable risk factors may further enhance its utility.

Conclusion
This study highlights that spontaneous abortion susceptibility is influenced by a complex interplay of maternal age, obstetric history, chronic medical conditions, mental health, and environmental factors.The prediction model enables individualized risk quantification to guide the management of high-risk women.With ongoing validation and refinement, it has significant potential to optimize outcomes and reduce the burden of this common pregnancy complication.A multidimensional approach addressing medical, psychological, and environmental health is recommended for optimal management of spontaneous abortion susceptibility.

(
strongly agree) to indicate their emotional state during the past week, with elevated scores indicating more intense feelings.The DASS score was used to classify the measurement results into 5 levels: normal (depression score ≤9, anxiety score ≤7, stress score ≤14), mild (depression score 10∼13, anxiety score 8∼9, stress score 15∼18), moderate (depression score 14∼20, anxiety score 10∼14, stress score 19∼25), severe (depression score 21∼27, anxiety score 15∼19, stress score 26∼33), and extremely severe (depression score ≥ 28, anxiety score ≥ 20, stress score ≥ 34).The total scale achieved a Cronbach's α coefficient of 0.890, indicating strong reliability and efficacy for evaluating the mental health condition of pregnant women.Medical staff led pregnant women to scan the QR code and fill out personal information and questionnaires via smart devices during early pregnancy (within 6 weeks).

Table 1
Univariate analysis of general information and clinical factors of the research subjects.

Table 2
predictors were included in the final model based on clinical relevance and statistical significance on multivariate analysis (p < 0.05): maternal age, history of embryonic arrest, thyroid dysfunction, polycystic ovary syndrome, assisted reproduction, exposure to pollution, recent home renovation, depression score, and stress score.Single factor analysis of the mental health status of the research subjects.
Fig. 2. Receiver Operating Characteristic (ROC) curve for discrimination of early spontaneous abortion.ART