Quality of clinical prediction models in in vitro fertilisation: Which covariates are really important to predict cumulative live birth and which models are best?

The improvement in IVF cryopreservation techniques over the last 20 years has led to an increase in elective single embryo transfer, thus reducing multiple pregnancy rates. This strategy of successive transfers of fresh followed by frozen embryos has resulted in the acceptance of using cumulative live birth over complete cycles of IVF as a critical measure of success. Clinical prediction models are a useful way of estimating the cumulative chances of success for couples tailored to their individual clinical factors, which help them prepare for and plan future treatment. In this review, we describe several models that predict cumulative live birth and recommend which should be used by couples and/or their clinicians and when they should be used. We also discuss the most relevant predictors to consider when either developing new IVF prediction models or updating existing models.


Background
Over the last 15 years, IVF practice has shifted from predominantly transferring multiple fresh embryos at a time to transferring a single fresh embryo (preferably a blastocyst), followed by successive episodes involving the transfer of single frozen-thawed embryos [1e3].This change has been triggered by improvements in extended culture and embryo cryopreservation techniques.Such practice has seen the reduction of multiple pregnancies without compromising live birth rates and has led to a shift in the way outcomes are reported [4,5].The traditional focus on live birth rates per fresh cycle has expanded to incorporate cumulative live birth rates, which reflect the impact of frozen embryo replacements following an initial fresh transfer, as well as subsequent treatment episodes [6e9].Cumulative live birth rates are more helpful to couples and clinicians because they allow them to plan their care over a period of time [10].While useful for getting an overall picture of IVF success at a national or clinical level, average cumulative live birth rates are not suitable for personalised medicine given that many patient and treatment-level characteristics can affect the chances of live birth in every couple [11].A way of estimating the chance of live birth by factoring in all of these important characteristics is to use clinical prediction models.
Clinical prediction models are mathematical equations that allow us to combine a number of patient characteristics to predict an outcome in an individual [12].These models can be used to predict the chance of a diagnosis or a consequence of a medical condition over a specified period of time.The former is usually termed a diagnostic model and the latter a prognostic model.In reproductive medicine, we are usually concerned with predicting pregnancy outcomes by means of prognostic models.For prediction modelling, we are primarily interested in the absolute risk of an individual given their personal characteristics.The term absolute risk refers to the chance that a patient will have the outcome over some specified time period, e.g., a 20-year-old woman with unexplained infertility may have a 20% chance of live birth without IVF over the next two years.Absolute risk is different from relative risk which concerns the chance of the outcome occurring for one group of patients compared with some other group, e.g., the chance of live birth without treatment over the next two years for the average woman with endometriosis relative to the average woman with unexplained infertility may be a half.Because the term 'risk' is often used for unfavourable outcomes, we tend to use the term 'chance' for favourable outcomes such as live birth.
Clinical prediction models have different uses, which must be decided before they are developed.They can be useful for providing evidence-based input for shared decision-making around interventions such as the choice of treatment, increased (or decreased) monitoring or referral to specialist care.They can also be useful to counsel patients or stratify patients by disease severity for the treatment or research (e.g., inclusion in randomised trials).Specifically, IVF prediction models may be useful for informing patients of their individual chances of having a baby to manage their expectations and allow them to prepare physically, emotionally and (where relevant) financially for future treatment.In this review, we examine existing IVF prediction models, which attempt to predict the cumulative chances of a live birth over several complete IVF cycles (i.e., cycles involving fresh and frozen embryos created from a single oocyte retrieval episode) using clinical data.We will provide recommendations as to which models are best for clinical and patient use.We will also consider which predictors are the most important to include for researchers wishing to validate or revise existing IVF models.

Cumulative live birth prediction models
In early 2020, a systematic review found 35 IVF prediction models in existence and reported on their methodological quality and predictive performance [13].
Three models (published before the end of January 2019) predicted cumulative live birth per woman [14,15].The first estimated the cumulative live birth up to three fresh embryo transfer attempts but excluded any subsequent frozen embryo transfers [14].The other two models were developed using national UK data that predict the cumulative chance of a live birth over multiple complete cycles of IVF [15].The term 'complete cycle' is used throughout this review and is always defined as all fresh and frozen embryo transfers resulting from a single episode of ovarian stimulation.The pre-treatment model calculates the cumulative chance of live birth and should only be used before starting the first IVF cycle.The predictors in this model include complete cycle number, female age, duration of infertility, previous pregnancy status, cause of infertility (tubal factor, male factor, anovulation or unexplained infertility) and type of treatment planned (IVF versus intracytoplasmic sperm injection (ICSI)).The post-treatment model revises the prediction at the time the woman undergoes her first embryo transfer and includes extra treatment-specific predictors such as the number of eggs collected, the number of embryos transferred (0e3) and the age of the embryo, i.e., blastocyst or cleavage stage.These models (available as the OPIS prediction calculators here: https://w3.abdn.ac.uk/clsm/opis) were also subsequently externally validated on an independent prospective cohort of 1515 women from The Netherlands [16] (see Table 1).The pre-treatment model had a relatively low c-statistic of 0.62 in the external cohort and needed recalibration.The c-statistic is a measure of model discrimination.To understand what discrimination means in this context, imagine a random pair of patients from the external cohort, where one patient actually had a live birth and the other did not.The model should ideally have calculated a higher predicted chance of live birth for the patient who had the baby.If we repeat this for all possible pairs, then the proportion correctly assigned a higher prediction gives us the c-statistic.A c-statistic of 0.5 means that our model is no better at distinguishing between low-and high-risk patients than a coin toss, while a c-statistic of 1 means the model is perfect (which is never the case).Calibration, on the other hand, is concerned with the agreement between the predicted and observed events and is ideally assessed using a flexible calibration curve [17].The post-treatment model performed better with a c-statistic of 0.71 and did not require recalibration.The validation study also updated the models by adding BMI, anti-Müllerian hormone (AMH) and antral follicle count (AFC).All three improved the discrimination in the pre-treatment model (c-statistic ¼ 0.66), while no improvement was found in the post-treatment model (c-statistic ¼ 0.71).The post-treatment model was adjusted for the number of eggs collected, which could also be seen as a reflection of ovarian reserve.On the basis of these results, the additional value of the ovarian reserve tests can be questioned when a prediction model includes treatment information such as the number of eggs, given the extra cost and physical burden associated with them.Female age is known to be correlated with ovarian reserve, which may reduce the added value of these tests [18].The post-treatment model was recommended in the review by Ratna et al. (2021) [13], on the basis of its methodology, predictive performance and the quality of reporting [13,15,19].However, the pre-treatment model (which had lower discrimination) is arguably more useful, given that its intended moment of application is before IVF begins.The biggest limitation of these UK models is that the data used to develop the model are more than 13 years old, which may affect the accuracy of the model when applied to today's patients.The HFEA data did not have some potentially important predictors, which could have been included such as BMI, paternal age, alcohol intake, smoking and markers of ovarian reserve.

Models of note since the Ratna systematic review (2020e2022)
Since the Ratna review, two further model development studies are worthy of discussion.Both have predicted cumulative live births over multiple complete cycles (Table 1).

The US
Two prediction models have been generated in one study, which used national data from the Society for Assisted Reproductive Technology (SART) in the US [20].A pre-treatment model estimates the individualised chance of cumulative live birth over the first three complete cycles.The post-treatment model predicts chances before starting the second complete cycle in couples whose first complete cycle was unsuccessful.The model is available as a prediction tool at sart.org.The pre-treatment model was adjusted for female age, previous full-term birth status, type of infertility (male factor, polycystic ovary syndrome, uterine factor, diminished ovarian reserve and unexplained infertility) and the female's BMI.A second pre-treatment model was also created for women who had an AMH measurement.An important assumption with these models is that continuous predictors, e.g.age, should have a linear relationship with the outcome.Age, BMI and AMH had a non-linear relationship with live birth and so were included in the models as restricted cubic spline terms.As the value of the AMH level increased so did the odds of live birth until around 5 ng/mL when it steadied.A woman with an AMH of 5 ng/mL had 22% increased odds of live birth than a woman with an AMH level of 2.5 ng/mL.Unfortunately, because of the limitations of the data set, the authors could not include AMH as a predictor in the post-treatment model.They also could not assess the impact of clinics using different AMH assays, and although their performance was good in the SART data, the models have yet to be externally validated using independent data sets.

The UK
A further UK-based IVF prediction model was recently published by the same research group that developed the 2016 models and OPIS calculator [15,21].When a couple have concluded their first complete cycle of IVF and have not achieved a live birth, they may decide to undergo a second complete cycle.Couples who were successful may decide to have more children.The previous UK models can only be used to estimate the chance of live birth either before commencing the first IVF treatment or at the first embryo transfer attempt, which makes it more challenging for couples to prepare for the next stage of treatment.Using these models when the couple have finished their first complete cycle will not result in accurate predictions because they were developed using patient data measured before the first cycle.By the start of the second complete cycle, patient predictor values will have changed, e.g., they will be older, the duration of infertility will be longer and their cause of infertility may have changed.Further, many of the patients used to develop the models will not have had a second complete cycle, which means that the case mix will have changed.Further prognosticators from the first complete cycle, such as the number of eggs collected and the pregnancy outcome, will also be known.All of this new information was included in a model developed to estimate the chance of live birth in couples beginning a second complete cycle of IVF.
The model was developed on 49,314 women from the HFEA registry who started their second complete cycle between 1999 and 2008 using their own eggs and their partner's sperm.In addition to female age, the number of eggs retrieved in the first complete cycle and the outcome of the first complete cycle (live birth, pregnancy loss and no pregnancy) were proven as key predictors (see Table 1).Other predictors included the duration of infertility, tubal infertility, the type of treatment and the time between the first and second egg retrievals.The model was externally validated on 39,442 UK women who underwent their second complete cycle between 2010 and 2016.The c-statistic was 0.65, and calibration showed a systematic overprediction of live birth for all women.The parameter estimates were recalibrated, and subsequently, the model showed much improved calibration.It should be noted that the validation data are now 6 years old, which may affect the accuracy of the model for new patients.Also, as mentioned earlier, the HFEA registry does not have some potentially important predictors.
According to the UK's National Institute for Health and Care Excellence guidelines, women under 40 years of age should be offered three complete cycles of IVF through the National Health Service [4].However, because the local Clinical Commissioning Groups in the UK make their own decisions regarding access to IVF funding, this means that some parts of the country are offered anything from one to three fully funded complete cycles.Some are not provided any funding.Therefore, for many couples who do not have access to funding after one complete cycle, this model will be particularly helpful as it can provide their predicted chance of live birth if they were to continue treatment.This will help them plan ahead and prepare financially.

Important predictors of live birth
Knowing which characteristics are the most relevant for predicting live birth after IVF treatment is helpful for researchers wishing to either develop a model or, preferably, update existing models with new predictors that improve the performance.A systematic review of predictive factors in IVF by van Loendersloot et al. [11] found that female age, the duration of infertility, basal follicle-stimulating hormone and the number of oocytes were most relevant.However, the study called for better quality studies to focus on whether embryo quality and the number of embryos transferred would be useful predictors.
[15] investigated the relative importance of each predictor in the two UK models.This was done by calculating the adequacy, which is the proportion of the final model's goodness of fit (measured using the À2*log likelihood (-2LL) statistic) that is explained by the individual predictor [22,23].For the final model (with all predictors included), the -2LL was calculated.Then, the same statistic was calculated again for a model, which is only adjusted for the complete cycle number and the particular predictor of interest (e.g., female age).The smaller model's -2LL is calculated as a proportion of the final model's -2LL.This is repeated for each of the remaining predictors.The predictor with the largest proportion is said to explain the most variation in the outcome.For the pre-treatment model, female age explained 85% of the total variation explained by all predictors.However, when treatment predictors were included, they found that female age (44%), cryopreservation of embryos (39%) and the number of eggs (38%) each explained a similar high amount of the total variation explained by all the predictors.None of the other published IVF prediction models investigated the adequacy or ranked predictors by importance.
However, a limitation of the adequacy method is that the proportion can appear large even if a predictor is weakly associated with the outcome.This would occur if the full model had a small -2LL, i.e., does not explain much of the total variation.Furthermore, it will suffer from omitted variable bias, which refers to important unknown or unavailable predictors of live birth that could potentially change the adequacy of another predictor.Steyerberg recommends that we simply judge the importance of each predictor by looking at the relative risk (i.e., odds ratios) of the predictors and using clinical judgement [24].In that respect, female age still comes out as the most important with an odds ratio (95% CI) of 1.66 (1.62e1.71)for a 37-year-old versus a 31-year-old.Note that age was not categorised but was included in the model as restricted cubic spline terms to account for the non-linear relationship between age and the outcome.The odds ratio presented is the 25th percentile versus the 75th percentile value for age, which is an easier way of interpreting the association for a non-linear relationship.
The SART pre-treatment model showed that female age, BMI and AMH had the strongest associations with live birth as did age, BMI and the number of eggs collected for the post-treatment model [20].BMI and AMH were unavailable in the UK database and so could not be included as predictors; however, the duration of infertility was not available in the US database.In all models, the causes of infertility had reasonably small associations with live birth, with male factor, diminished ovarian reserve, uterine factor and tubal infertility having the strongest association.
When predicting the second complete cycle, it is clear from the Ratna et al.
[21] study that the number of eggs collected from the first retrieval and the outcome of the first complete cycle are also important to consider.For the latter predictor, the odds of live birth for women who had a previous IVF live birth were almost twice that for women who had no pregnancy at all over the first complete cycle.Women who had a pregnancy loss (and no live birth) in the first complete cycle had a 35% increased odds of live birth than women who had no pregnancy over the first complete cycle.

A note on useful complete cycle-specific live birth prediction models
Two further models are worthy of note.Although they do not predict live birth cumulatively over multiple complete cycles of IVF, they do predict over the first complete cycle of IVF.While the following studies do not specifically use the term 'complete cycle' in their articles, their approaches agree with our definition, i.e., all fresh and subsequent frozen-thawed embryo transfer cycles from one episode of ovarian stimulation.

The Netherlands
The model by van Loendersloot et al. [25] (identified by the Ratna et al. (2020) [13] review) predicts the chance of ongoing pregnancy over the first complete cycle.It also predicts ongoing pregnancy at each successive complete cycle for couples in whom all previous complete cycles were unsuccessful.It was developed using a cohort of 1326 couples treated at a single centre in The Netherlands.The model was adjusted for the number of previous failed cycles as well as female age, duration of infertility, basal FSH, previous ongoing pregnancy and causes of infertility.It also includes predictors based on laboratory data from the previous failed IVF cycle, e.g., fertilisation method (IVF/ICSI), number of embryos after egg retrieval, mean morphological score per day 3 embryo, presence of 8-cell embryos on day 3 and presence of morulae on day 3.It was externally validated on a data set from the same centre but from a more recent time period.The c-statistic was 0.68, and the model was updated following evidence of miscalibration.Two further independent validation studies using data from single centres in Italy and Belgium showed lower discrimination (both 0.64) and poor calibration [26,27].However, the Italian study recalibrated the model to find better agreement, while the Belgian study fitted a new model to their own data.
We find that the van Loendersloot model is informative because it takes account of frozen-thawed cycles giving patients a fuller picture of their likely chances of success over their current complete cycle of treatment.We recommend further large external validation studies for this model since it was developed and validated on data from one centre.For use in other centres, external validation of the model using data from those centres is recommended [28,29].

China
A model predicting cumulative live birth over the first complete cycle only was developed using data on almost 18,000 women from a university hospital in China [30].Age, number of oocytes, number of good quality embryos (defined as an embryo with 6e12 blastomeres graded 1 and 2), fertilisation rate, treatment type (IVF versus ICSI) and duration of infertility were included as predictors.Age and oocytes were included as linear terms, meaning that they did not adjust for the known non-linear relationship between these predictors and live birth [15,20].The model was internally validated using 10 times 10-fold cross-validation, which resulted in a c-statistic of 0.74.The model has yet to be externally validated, but the final model parameters including the intercept were not presented, which will make it difficult for independent investigators to conduct external validation on their data sets.

Recommendations and further work
For prediction of a cumulative live birth over multiple complete cycles of IVF (where a complete cycle is defined as all fresh and frozen embryo transfers arising from a single episode of ovarian stimulation), we recommend the use of the UK and US models at pre-and post-treatment.All were developed on national-level data sets and followed the recommended reporting guidance for model development [19].The pre-treatment models from both countries may be used before couples commence their first IVF cycle while the post-treatment models are useful before starting a second complete cycle [15,20,21].However, it is not guaranteed that using these models in countries outside the UK and the US will provide accurate predictions for their patients.Therefore, all of these models need to be validated on independent geographical data sets for use in other clinics and countries.Further, they need to be continually validated and updated using data collected within the population they have been developed to prevent calibration drift [31].Calibration drift can be caused by changes in casemix and IVF practice.Clinics or countries which display over-or under-prediction upon calibration assessment can still use the model after it has been recalibrated.This can be as simple as adjusting the model intercept to reflect the IVF success rates in that clinic or country.However, if that does not work, there are several other ways of correcting miscalibration [32].
With respect to predictors that should be considered when developing new models or updating existing models, female age is the most important.Other factors that should be considered for prediction before starting treatment include the duration of infertility, female BMI and markers of ovarian reserve.Ovarian reserve markers make a statistically and clinically significant contribution in the prediction of live birth following IVF treatment.However, they do not appear to be as important as female age (with which they correlate quite highly).Previous research seems to suggest that out of the ovarian reserve markers, AMH is the most reliable [33e35], and it has been shown to have some association with live birth independently of age [36].However, another systematic review concluded that AMH and AFC added nothing when included with age in the prediction of ongoing pregnancy after IVF [18].Future IVF prediction studies should utilise large (possibly national level) data sets with which to externally validate existing recommended models.They should investigate the added value of including different ovarian reserve markers in these models to confirm whether AMH is the most predictive.
For models that predict from the point of treatment, the number of eggs collected, double versus single fresh embryo transfer and blastocyst versus cleavage stage transfer should be considered.Further research is needed into whether embryo quality measures add further predictive accuracy.If the sample size is not an issue, then it is important to include all known predictors, including those that are not strongly associated with live birth, to increase predictive accuracy.These include causes of infertility, previous pregnancy status and treatment type, e.g., IVF versus ICSI.

Summary
IVF prediction models that estimate the chance of live birth over multiple complete cycles of treatment are useful to provide a complete picture of a couple's likelihood of success.Models have been developed using national-level data in the UK and US for predictions before starting treatment.The UK has two further models, which provide revised predictions at later stages: one for use at the time of the first fresh embryo transfer and the other for use before starting a second complete cycle of treatment.The US has one further model for use at the start of the second compete the cycle but only for couples whose first complete cycle was unsuccessful.Models developed using data from single centres in China and The Netherlands are able to predict pregnancy outcomes over the first complete cycle.The latter can also be used to predict each successive complete cycle assuming previous complete cycles failed.We recommend using the UK and US models, but both need continual validation using updated patient data to be relevant in terms of predictive accuracy in new patients.All of the models require external validation in different geographical regions to ensure that they provide accurate predictions in those countries (or centres).For researchers developing new prediction models, the most important patient predictors to include are female age, duration of infertility, BMI and ovarian reserve markers.When revising predictions using treatment data, the model should include the number of eggs collected.Further work is needed to determine the added predictive value of embryo quality.

Declaration of competing interest
None.

Table 1
Clinical prediction models predicting cumulative live birth for couples undergoing IVF including validation results and predictors.