Skip to main content

Development of machine learning models to predict gestational diabetes risk in the first half of pregnancy

Abstract

Background

Early prediction of Gestational Diabetes Mellitus (GDM) risk is of particular importance as it may enable more efficacious interventions and reduce cumulative injury to mother and fetus. The aim of this study is to develop machine learning (ML) models, for the early prediction of GDM using widely available variables, facilitating early intervention, and making possible to apply the prediction models in places where there is no access to more complex examinations.

Methods

The dataset used in this study includes registries from 1,611 pregnancies. Twelve different ML models and their hyperparameters were optimized to achieve early and high prediction performance of GDM. A data augmentation method was used in training to improve prediction results. Three methods were used to select the most relevant variables for GDM prediction. After training, the models ranked with the highest Area under the Receiver Operating Characteristic Curve (AUCROC), were assessed on the validation set. Models with the best results were assessed in the test set as a measure of generalization performance.

Results

Our method allows identifying many possible models for various levels of sensitivity and specificity. Four models achieved a high sensitivity of 0.82, a specificity in the range 0.72–0.74, accuracy between 0.73–0.75, and AUCROC of 0.81. These models required between 7 and 12 input variables. Another possible choice could be a model with sensitivity of 0.89 that requires just 5 variables reaching an accuracy of 0.65, a specificity of 0.62, and AUCROC of 0.82.

Conclusions

The principal findings of our study are: Early prediction of GDM within early stages of pregnancy using regular examinations/exams; the development and optimization of twelve different ML models and their hyperparameters to achieve the highest prediction performance; a novel data augmentation method is proposed to allow reaching excellent GDM prediction results with various models.

Peer Review reports

Introduction

Gestational Diabetes Mellitus (GDM) is defined as any degree of glucose intolerance with onset or first recognition during pregnancy [1, 2]. In 2017, it was estimated that around 14% of pregnancies were affected by GDM worldwide [3]. The prevalence of GDM varies among countries and regions and is substantially impacted by the diagnostic criteria employed [3,4,5,6]. GDM is associated with increased risk of acute and chronic disease for both mother and developing fetus [1, 4, 7, 8]. Adverse fetal outcomes associated with GDM include increased risk of insulin resistance, macrosomia, preterm birth, respiratory distress, neonatal intensive care unit admission and stillbirth [9,10,11]. Adverse maternal outcomes associated with GDM include depression, a 7 to tenfold increase in the risk of developing Type 2 Diabetes Mellitus (T2DM; relative to non-GDM women), elevated risk of liver and renal disease, more adverse lipid profiles and a twofold increase in risk of cardiovascular disease [9,10,11], including insulin resistance.

There is no uniform consensus on the optimal diagnostic criteria for the diagnosis of GDM. The first diagnostic test for GDM recommended by O’Sullivan and Mahan in 1964 [12] employed a fasting three-hour oral glucose tolerance test (OGTT) using 100 g of glucose with whole-blood analyses, with two or more elevated measurements at fasting 3 h required for a GDM diagnosis [9]. A series of protocol amendments followed, leading to the development of a two-step protocol based around an initial screening test (1 h, non-fasting 50 g glucose challenge with cut-offs ranging from 130–140 mg/dl) followed by a diagnostic glucose tolerance test (measuring fasting, 1 h, 2 h, and 3 h glucose levels) [9, 12]. More recently, based on the finding of the Hyperglycemia and Adverse Pregnancy Outcome (HAPO) Study, a one-step screening strategy proposed by the International Association of Diabetes and Pregnancy Study Groups (IADPSG) recommended the use of a fasting two-hour 75 g oral glucose tolerance test [13]. Although the one-step IADPSG has the obvious advantage of requiring only a single test and one elevated glucose measurement, its use has raised concerns regarding GDM overdiagnosis [9]. Interestingly, several studies have reported that the prevalence of GDM as two to three-fold higher using the IADPSG one-step approach compared to the two-step screen and diagnose protocol, but no clear improvement in pregnancy outcomes. Highlighting the lack of consensus in the field, Fu and Retnakaran [9] note that although the one-step IADPSG protocol is endorsed by the International Federation of Gynecology and Obstetrics, the American Diabetes Association and the World Health Organization (WHO), the two-step screen and diagnose protocol is endorsed by the National Institutes of Health and the American College of Obstetricians and Gynecologists [9].

Irrespective of the diagnostic approach used, the current paradigm has a number of inherent disadvantages. OGTT is time consuming for clinicians and patients, it cannot easily be applied to the total population and is associated with a high false positive rate [14]. Results can be impacted strongly by pre-analytical laboratory practices; for example, room temperature glycolysis by leukocytes and erythrocytes prior to centrifugation can reduce glucose levels between five and seven percent per hour [15]; in a recent Australian study of 12,317 women, when centrifugation was performed within ten minutes of sample collection the GDM diagnosis rate nearly doubled from 11.6% to 20.6% using the IADPSG criteria [16]. Secondly, OGTT at 24–28 weeks of gestation does not facilitate treatment early in pregnancy. As articulated by Sweeting and colleagues [11], although most international guidelines recommend early antenatal GDM testing for high-risk mothers, there is no current consensus on testing approach or diagnostic thresholds [11]. Moreover, there is a lack of evidence to support improved pregnancy outcomes with the early diagnosis and treatment of GDM based on current approaches [11]. There is, however, evidence to show that a range of first trimester biomarkers can be used to predict GDM development later in pregnancy, and that fetal macrosomia can occur prior to a diagnosis of GDM being made [9]. What is clear, however, is the expectation that early and accurate prediction of GDM risk can lead to interventions that can help to better health outcomes for both mothers and babies [17,18,19].

State of the art

With this objective in mind, several models have been developed to diagnose GDM during the early stages of gestation [20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35]. Some of these models use simple variables, such as age, previous GDM, a first-degree relative with a family history of diabetes, multiple pregnancies, fasting plasma glucose (FPG), glycated hemoglobin (HBA1c) and triglyceride [20]. A rapidly growing body of evidence shows that the application of machine learning (ML) to analyze data of this nature, and more general biophysical and socio-economic metrics (i.e., easily obtained from a patient history early in pregnancy) may allow a new means by which early and accurate predictions of GDM risk may be made [36]. Critically, such predictions may be able to be scaled to a population level as they do not require the taking of liquid biopsies, the administration of screening or diagnostic tests, and convey comparably little per-test cost. ML approaches have shown success in the prediction of preeclampsia [37], GDM from electronic health records [22], and pattern recognition [38]. In GDM prediction, various models have been used including Deep Neural Network (DNN) [20], Logistic Regression (LR) [21], Gradient Boosting [22], a LR and Extreme Gradient Boosting (XGBoost) [23], and Random Forest (RF) with LR [24]. A recent review [36] of ML-based models for the prediction of GDM before 24–28 weeks of pregnancy reported the viability of this approach to make predictions from general patient data, and emphasized the use of generic clinical variables. The best results of previously published models using similar input variables and GDM criterion are summarized in Table 8. Although several studies focusing on the prediction of GDM have been presented, a model that can reach high sensitivity and specificity for early prediction of GDM, and with the least number of variables, is still clinically needed. Additionally, variables that are widely available for screening examinations during pregnancy will allow a massive application of the prediction model, including low-income areas where more complex tests are not available, or may not be able to be executed in a highly standardized fashion (i.e., rigorous pre-analytical sample processing).

The main objective of our ML models is to predict the risk of developing GDM early in pregnancy in order to facilitate preventive treatment and reduce the risk of adverse maternal and fetal outcomes. As this was a retrospective study, all patients had OGTT data available for validation of the GDM diagnosis. It is worth noting that the OGTT was not used to develop the models but rather to validate the diagnosis of GDM. In the present submission we report the development of twelve different ML models, and the optimizing of their hyperparameters for the prediction of GDM, to achieve the highest classification performance, and the application of a variable selection process. Redundant data was eliminated to improve model performance.

Materials and methods

Database

The dataset used in this study was obtained from patients attending the Obstetrics and Fetal Medicine Unit of the Hospital Parroquial de San Bernardo, Santiago, Chile. The dataset included registries from 1,611 different pregnant patients, from 2019 to 2022. The patients included in the dataset have all the available variables/completed; patients with missing data are not included. A diagnosis of GDM was made using the IADPSG/HAPO criteria for gestational diabetes [13, 39], i.e., oral glucose tolerance test (75 g) fasting glycemia ≥ 92 mg/dl, or 2 h glycemia ≥ 153 mg/dl in the second trimester. Patients with Diabetes Mellitus that had been diagnosed before pregnancy were excluded from the dataset. Data was obtained during regular maternal visits at up to the 20th week of gestation. The third column of Table 1 shows the information on the variables and the gestational week at which the information was collected. Most of the data was obtained during the first maternal visit that happened anytime between the 4th and 20th weeks of pregnancy. We also added a histogram (Fig. 1) showing the number of patients per gestational week for the first maternal visit. As in previous work [20, 22, 24, 27, 28, 30, 32, 35], our study was retrospective and therefore the dataset was available as described. Patients with Diabetes Mellitus diagnosed before pregnancy were excluded from the dataset. The data for the input to the model of each continuous variable was normalized (by subtracting the average and dividing it by the standard deviation), e.g., age, weight, height, and Body Mass Index (BMI) at the first visit, and the first trimester fasting glucose level. The database was divided into three partitions: training set (70%), validation set (10%), and testing set (20%).

Table 1 Clinical variables of the patients. IQR, interquartile range
Fig. 1
figure 1

Histogram showing the number of first maternal visits per gestational week

Data augmentation

Data augmentation (DA) is a common method used in ML to improve training results [40, 41]. We generated a DA method on the training set adapted to the diagnosis of GDM by restricting the data values within physiological ranges for each input. The ranges for the creation of new data were given by a specialist in Obstetrics/Gynecology. The DA approach was used to create new patients for training the models based on the original patients, changing some input values slightly as follows: i) Age: Newly created patients must be in a range of ± 2 years compared to the original ones; ii) First Trimester Glycemia Test: New created patients must be in a range of ± 5 mg/dL only if the original patient has a result between 66 and 94 mg/dL, or over 105 mg/dL in this test; iii) Height: Newly created patients must be in a range of ± 3 cm compared to the original ones; iv) Weight: Newly created patients must be in a range of ± 5 kg compared to the original ones; and v) BMI: The BMI was adapted to the changes of height and weight in the newly created patients. A new patient should not be created if the new BMI classification was different from that of the original patient. We used the BMI classification proposed by the WHO [42].

For the experiments we also considered a limited range for the DA range of values provided by a medical specialist. The original and the limited range values are shown in Table 2. Several cases for DA were determined by increasing the number of cases in the training set to generate a total number of cases reaching values of 120%, 140%, 160%, 180% and 200%, relative to the original number of cases, which was 100%.

Table 2 Data augmentation (DA) range of values provided by the medical specialist, and a limited range of values both are used for the experiments

Prediction models

Twelve different ML models and their hyperparameters were optimized to achieve the highest prediction performance. Gaussian Naïve Bayes (GNB) and Bernoulli Naïve Bayes (BNB), Decision Trees (DT), Support Vector Machines (SVMs), Multi-Layer Perceptron (MLP), K-Nearest Neighbors (KNN), Logistic Regression (LR), Random Forest (RF), Extra Trees (ET) [43, 44], Balanced Random Forest (BRF) [45], Gradient Boosting (GB), implemented in Extreme Gradient Boosting (XGB) [46], and Light Gradient Boosting Machine (LGBM) [47] approaches were used. All the models were trained with the training set computing over 3,000 combinations of hyperparameters. For example, for the SVM, various types of kernels were used; for the MLP, different combinations of layers and solver were used; for the models based on Trees, various types of “criteria” were used; and for ensemble, different numbers of estimators were employed, among many other hyperparameters.

Model implementation and hyperparameters

The models were implemented in Python 3.9.12 using Scikit-Learn [43], Imbalanced-Learn [45], XGBoost [46], and LGBM [47] libraries. The main hyperparameters used for each model are: GNB “var_smoothing” [43]; BNB”alpha” [44]; DT”criterion”,”max_depth”, “max_leaf_nodes”, “splitter” [43]; SVM “kernel”, “degree”, “decision_function_shape”, “C” [43]; MLP “solver”, “hidden_layer_sizes”, “activation”, “learning_rate_init”, “max_iter”, “learning_rate”, “early_stopping” [43]; KNN “algorithm”, “leaf_size”, “p”, “n_neighbors” [43]; LR”C”,”solver” [43], RF, ET and BRF “n_estimators”, “criterion” [43, 45]; XGB “n_estimators”, “eta”, “booster”, “gamma”, “max_depth” [46]; LGBM “n_estimators”, “boosting_type”, “learning_rate” [47].

Table 3 shows all the hyperparameters that were used in the Grid Search, and the range of values analyzed.

Table 3 Hyperparameters used in each model type

Model evaluation

The results obtained with the combination of hyperparameters values were assessed in a fivefold cross validation (CV) [48] using data from the training set and performing a grid search on the hyperparameter values. Grid search allows finding near optimal values for the hyperparameters via multiple evaluations of various combinations for each one. An input selection [49] was performed to select the best variables to be used in the prediction task to improve the model results and reduce input redundant variables to each model. The input variable selection was performed using 3 methods: F-test of ANOVA (Analysis of Variance), Chi-Square Test, and Mutual Information (also known as Information Gain) [43]. The models were trained, evaluated, and tested with various combinations of input variables selected by these 3 methods. After adjustment with the training set, the top 15% of the models ranked with the highest area under the ROC curve, AUCROC [50], were selected and assessed on the validation set. Models with the best results on the validation set were selected to obtain a good balance between high Sensitivity and good Specificity [50]. Finally, the selected models were assessed in the test set as a measure of generalization performance. The test set was not used in any previous step involving training or selection of the best models. Models were also trained using DA on the training and validation sets, but no DA was performed on the test set. The best results were chosen using sensitivity and specificity as the main metrics of performance. The accuracy, sensitivity, specificity and recall macro are measured with a specific decision threshold, calculated by using the validation dataset to determine this threshold. The ROC curve is created based on the different decision thresholds that modify sensitivity, also known as True Positive Rate (TPR), as a function of the false positive (FP). The formulas are the following: Accuracy = (TP + TN)/(TP + FP + TN + FN), Sensitivity = TP/(TP + FN), Specificity = TN/(TN + FP), Recall Macro = (Sensitivity + Specificity)/2.

Results

Population characteristics

A total of 1,611 pregnant women were included in this study. The database was partitioned into 1,127 cases for the training set, 161 in the validation set, and 323 (39 positive of GDM) were part of the test set. The prevalence of GDM was 14.21% (229/1,611). The input variables to the models are described in Table 1.

Variable selection

The most relevant 12 variables selected by the 3 methods: F-Test ANOVA, Chi-Square, and Mutual Information, are displayed on Table 4.

Table 4 The most relevant twelve variables for GDM prediction were selected by using four methods: F-Test ANOVA, Chi-Square, Mutual Information and BRF

We selected the most important variables (features) in the dataset by removing irrelevant or redundant variables. This allows us to have a small number of variables which is useful for a clinical application. The methods used for this purpose are commonly employed in ML (F-test of ANOVA, Chi-Square Test, and Mutual Information). This variable selection also avoids the overfitting problem and achieves improved performance compared to that of using all the features [49]. For example, variables such as Pregnancy Type or Stillbirth are not selected by the variable selection methods, but may decrease the performance of models such as Multi-Layer Perceptron. Additionally, one of the models used to select variables was the BRF (see Table 4). The ranking obtained with a nonlinear model, BRF, is similar to those obtained with statistical methods, confirming that these are the relevant variables.

Model performance

Table 5 shows the model type, number of input variables, whether or not DA was used, with “w/o DA” meaning that Data Augmentation is not used in this model, “DA LE”, meaning Data Augmentation w/Limited Expert range, “DA EO”, meaning Data Augmentation w/Expert original range, and the results of the following: Accuracy, Sensitivity, Specificity, Recall Macro, AUCROC, False Positives (FP), False Negatives (FN), and FP + FN. Table 5 show the top 4 models for each sensitivity level with the model that has the highest AUCROC in bold type, for models with up to 12 variables. All these metrics were computed for each model in the test set. As mentioned in the Methods section, the test set was only used to test the generalization capacity of the models. The test set was not used to train or to select the hyperparameters of the models. On Table 5 we show the results of models that reached a sensitivity above 0.9231 in the test set (model numbers 1 to 16), while model numbers 17 to 36 show the results of models with sensitivity above 0.7949 but below 0.9231 in the test set. Models with high sensitivity allow minimizing FN when screening patients. Sensitivity is important since the main goal is to prevent the serious consequences of GDM that may occur in mothers and babies even several years after pregnancy. Our method allows identifying many possible models for various levels of sensitivity and specificity. For example, model numbers 29–32 on Table 5 all have a high sensitivity of 0.82 and a specificity in the range 0.72–0.74, with accuracy between 0.73–0.75; AUCROC of 0.81; and Recall Macro between 0.77 and 0.78. A model could be selected from these ranges to have a good compromise between low numbers of FN and FP as is shown in the last column of Table 5.

Table 5 Top four models for different sensitivity levels, sensitivity ≥ 0.9231 (model number 1 to 16) and with sensitivity < 0.9231 and ≥ 0.7949 (model number 17 to 36), and up to 12 variables

Another possible choice of model could be model 17 (Table 5) with sensitivity of 0.89 that requires just 5 variables (1TFG, Age, BMI, Maternal Weight, and Gravidity). This model reaches an accuracy of 0.65, a specificity of 0.62, Recall Macro of 0.76, and AUCROC of 0.82. Models 17–20 reach the same sensitivity of 0.89 with small changes in accuracy, specificity, Recall Macro and AUCROC. The best models for sensitivity 0.89 are all MLPs. It can be seen on Table 5, and on Fig. 2 that there are several choices of models for predicting various levels of sensitivity, with a trade-off on specificity.

Fig. 2
figure 2

Surface with all models available, including various values of hyperparameters, for various levels of error (FP + FN), True Positives, and number of variables. The red dots represent the best models in bold type from Table 5 with sensitivity above 0.9231 (model numbers 1, 5, 9, and 13), and the yellow dots represent the best models from Table 5 with sensitivity above 0.7949 but below 0.9231 (model numbers 17, 21, 25, 29, and 33)

Figure 2 shows two different views of the same surface plotting the model results for various values of the total number of errors (FP + FN), True Positives, and number of input variables for each model. Several choices of the models are available for reaching high sensitivity (low FN), and high specificity (low FP) with a small number of input variables. On the surface plotted in Fig. 2 the red dots represent the best models in bold type from Table 5 with sensitivity above 0.92 (model numbers 1, 5, 9, and 13), and the yellow dots represent the best models from Table 5 with sensitivity above 0.79 but below 0.92 (model numbers 17, 21, 25, 29, and 33).

Figure 3 shows the ROC curves for each of the 9 best models with a fixed sensitivity starting at sensitivity of 1 (a), up to a sensitivity 0.79 (d). These best models for each sensitivity level appear in bold type in Table 5. Figure 3(a) shows the ROC curves for the best models with sensitivities of 1, 0.9744 and 0.9487. Figure 3(b) shows the ROC curves of the best models with sensitivities of 0.9231, 0.8974 and 0.8718. Figure 3(c) shows the ROC curves of the best models with sensitivities of 0.84, 0.82, and 0.79. Finally, Fig. 3(d) shows the ROC curves for model number 29 in Table 5 with the best recall macro (gray), and a comparison with the same model having DA (cyan), and the same model with a lower number of variables (pink). This model, number 29, has the lowest number of FP + FN.

Fig. 3
figure 3

a ROC curves of the best models with sensitivities of 1 (MLP 12 variables), 0.9744 (MLP 8 variables), and 0.9487 (SVM 5 variables). b ROC curves of the best models with sensitivities of 0.9231 (SVM 5 variables), 0.8974 (MLP 5 variables), and 0.8718 (MLP 7 variables). c ROC curves of the best models with sensitivities equal to 0.8462 (MLP 6 variables), 0.8205 (SVM 12 variables), and 0.7949 (SVM 7 variables). d ROC curve for model number 29 in Table 5 with the best Recall Macro (gray, SVM 12 variables), and a comparison of the same model with Data Augmentation (cyan), and a model with a lower number of variables (pink, SVM 7 variables)

Table 6 shows the best models for different sensitivity levels, with more than 12 input variables. It can be observed that models 38, 42, 43 and 45 reached a slightly better FP + FN than our best selected models shown on Table 5. Nevertheless, the number of required input variables is more than doubled. For example, model 25 requires 6 input variables while model 43 requires 15 input variables for the same sensitivity. A much larger number of input variables would be more difficult to implement in clinical practice.

Table 6 Best models for different sensitivity levels, with a number of input variables > 12

In the clinical context, one of the main focuses of the GDM specialists is the balance between sensitivity and specificity. High sensitivity avoids errors in detecting patients with the illness (low FN), while high specificity decreases the FP number. Tables 5 and 6 show a trade-off between sensitivity and specificity in our results, yielding a high, but not maximum, AUCROC. The models are ordered on Table 5, first by a sensitivity level, and then other selected metrics, such as specificity and AUCROC. The main metrics used in the final selection of our models were sensitivity and specificity. We also used a Balanced Random Forest (BRF) model that had good performance on imbalanced datasets, that achieved good performance, although not better than that of the models presented on Tables 5 and 6.

On Table S1 (Additional file 1), we show the Mean AUCROC, 95% confidence interval, and standard deviation (STD) of the different models presented on Tables 5 and 6, calculated by using ten different seeds for the initialization of the models.

Table 7 presents performance comparisons among the models with Data Augmentation (w/DA), and without Data Augmentation (w/o DA). The comparisons include the same models.

Table 7 Comparison of performance between models with Data Augmentation (w/DA), and without (w/o) data augmentation

Discussion

The principal findings of this study are: i) Early prediction of GDM within early stages of pregnancy using regular examinations/exams; ii) The development and optimization of twelve different ML models and their hyperparameters to achieve the highest prediction performance; iii) a data augmentation method is proposed to allow reaching excellent GDM prediction results with various models; and iv) several model results are, in general, better than previously reported methods generated using similar input datasets, and the models studied allow the selection of several alternatives to achieve a desired sensitivity and specificity.

A recent study by Pillay and co-workers [51] reported sensitivity and specificity data for two-step oral glucose challenge tests with 140- and 135-mg/dL at or after 24 weeks of gestation [51]; these two cut-off levels had sensitivities of 82% and 93%, respectively, and specificities of 82% and 79%, respectively, when assessed against Carpenter and Coustan criteria [51]. Interestingly, the authors also concluded that although the application of the one-step (IADPSG) protocol significantly increased the likelihood of GDM detection (11.5% vs. 4.9%; five randomised control trials, 25,772 subjects), there was no improvement in health outcomes [51]. It is possible that the use of the IADPSG protocol may be over diagnosing risk in the assessed populations and as a result the deployment of interventions to patients that would otherwise go untreated conveyed no benefit. A second interpretation is that the interventions targeted to women detected with the one-step test were ineffectual when deployed at or towards the end of the second trimester. In keeping with the potential benefit of a ML-based system allowing for earlier GDM risk prediction, it is tempting to speculate that earlier identification and intervention allocation may improve treatment benefit.

Comparison with state of the art

In the present study, the best performing models (i.e., SVM 12; Table 5) using data collected prior to 20 weeks of gestation had a sensitivity of 82% and specificity of 74%, coming quite close to that of the two-step protocol widely used in the United States at later gestations. In our study, we developed a group of 12 models for early diagnosis of GDM, with data that are commonly acquired at the early stages of pregnancy during prenatal care visits to gynecologists/obstetricians. The ease of data collection should facilitate the future of these models in clinical practice. Another important consideration is that sensitivity is crucial since the main goal is to prevent serious consequences of GDM for mothers and babies, many of which will impact them for several years after pregnancy. In cases of lower specificity (higher FP), additional tests could be used to improve diagnosis, although this would come with additional cost, inconvenience, and risk. Also, in many cases the main treatment involves diet and exercise which are not harmful. From our variable selection methods, the most important variables for GDM diagnosis were related to glucose metabolism (first trimester fasting glycemia), physical status (weight and BMI), age, and hypertension. The use of DA had a positive effect in most models, improving specificity up to 51.43% and AUCROC up to 3.70% with the same sensitivity. The best model results, for each sensitivity level, was reached in 7/9 cases with DA and in 2/9 with no DA.

The limited public availability of datasets for informing previously published work makes direct comparisons of model performance difficult [20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35]. Nevertheless, a general assessment can be undertaken by comparing the result ranges from different metrics obtained on various datasets. However, there are important aspects, such as characteristics of the population, and diagnostic criteria, that vary between countries/regions in the different studies analyzed, and therefore, these aspects should be considered when comparing the different datasets. Table 8 shows a comparison of model results from the present study against those of recent studies assessing ML-driven diagnosis of GDM risk. In general, our models performed better in AUCROC than comparable models generated with similar input variables and the same or similar GDM diagnosis criteria [20,21,22, 25,26,27,28, 30, 31]. As explained previously, sensitivity is important due to the possible adverse effects of GDM on the mother and baby later in life. Other models [20, 22,23,24, 29, 32,33,34,35] that required additional complex data are not listed in Table 8. In some cases, such as those presented in the meta-analysis [52], more complex variables were employed on the models such as ultrasound screening data, or biochemical data of liver/renal/coagulation function at the prenatal visit. For example, a comparison between our model 33 SVM 7 Variables DA LE (Table 8), and the work of Wu and colleagues [20] (Table 8) yielded a higher sensitivity (13.55%), and a higher specificity (6.14%). Our model 17 MLP 5 Variables DA EO (Table 8) vs. Pintaudi et al. [28] (Table 8), reached a similar sensitivity but had an improved specificity (56.70%). A different criterion for GDM diagnosis was by Kumar and coworkers [31] (WHO, 1999), with which GDM was diagnosed if fasting OGTT ≥ 126 mg/dL and/or 140 mg/dL in a 2 h OGTT. Another model was implemented by them [31] using the same GDM diagnosis criterion as ours, IADPSG/HAPO, reaching an AUCROC of 0.73, with a fivefold stratified CV. ML models have also been applied for predicting Diabetes Mellitus [53].

Table 8 Results of top models for various levels of sensitivity compared to those from the published literature using similar input variables and the same GDM diagnosis criterion

Table 9 shows a list of the input variables used in each of the best models selected, including those used for comparison, and those developed and selected by the authors. It can be observed that some of the best solutions require only five input variables. When choosing these models for a clinical application, only 5–7 variables will need to be measured in each patient to diagnose GDM with these models. This will facilitate the possible application of these models in clinical practice. Developing accurate ML models for predicting GDM is an important step towards implementing early prediction and treatment strategies for patients. The next step should be to prospectively apply them in a clinical setting to validate and evaluate their performance.

Table 9 Input variables used in each model including those used for comparison, and those of the best models selected by our method

In the present study, twelve ML models and their hyperparameters were optimized for early (20 weeks of gestation or earlier) GDM with high sensitivity, specificity, AUCROC, and Recall Macro. The models could predict GDM with a good degree of accuracy before 20 weeks of gestation, and with variables that are widely available from screening examinations The variables required by most of the models were age, weight, BMI, and FPG which is consistent with previous publications [20,21,22, 25,26,27,28, 30, 31]. Variable selection was performed by three methods and results show that several models reached good performance with as few as 5–7 input variables, while other models required more, including up to 12 variables. Choosing models with high GDM prediction performance, a low number of input variables, and widely available variables will facilitate the possible application of these models in low income settings. Although patient data from previous publications are often not available, comparing the results obtained for various metrics show that, in general, our models performed favorably in comparison with the existing literature. In conclusion, our data demonstrate that ML-analysis of patient data sets from early pregnancy may serve as a cost-effective and efficacious means of detecting GDM risk early in pregnancy.

We described all steps required to implement, train and test the models. In particular, we used a test partition that is different from the training and validation partitions, to improve the generalization capacity of the models. Many of the previous reported work did not state explicitly using an independent partition for testing [20, 21, 24,25,26,27,28,29,30,31,32, 35]. This study provides a valuable contribution by utilizing and comparing a broad range of ML models (12), which differs from many other studies that often use only one type of model, such as Logistic Regression. Additionally, various metrics have been employed to compare the performance of each model, including a wide range of variables that could potentially be selected for clinical implementation. This approach allows for a more comprehensive assessment of the potential utility of different ML models in predicting GDM and facilitates the identification of the most effective models for future clinical implementation.

As with any study of this nature, the findings need to be assessed in light of the ground-truth data set from which they were drawn. For the present study, we used a single center population drawn from a socio-economically vulnerable medical center in Santiago, Chile. Accordingly, a cautious approach should be taken in extrapolating these findings to a wider socio-economic grouping, and to the maternal situation in other regions. The strengths of this study include a well-characterized pregnancy cohort and robust data collection. Future iterations of this work will involve the cross-population analysis of GDM risk and the comparison of predictive outcomes from different populations to assess the broad applicability of model performance. While the variables used in the different ML models show promising predictive capacity for GDM, the addition of other inputs such as biomarkers could potentially further improve their performance. As such, future studies may consider incorporating additional data sources to enhance the accuracy of GDM prediction models.

These findings are of particular importance given the increasing prevalence of GDM in the maternal population and the significant impacts (both patient well-being and financial) that derive from poorly controlled glucose levels in pregnancy. For example, recent modeling from the United States suggests that, in 2014, the short-term costs of GDM were $1.8 billion [54]. The cost of treatment for T2DM is routinely around $3,500 per year [55]. Given estimates that one in six pregnancies are impacted by GDM, even a small improvement in outcomes deriving from early risk identification and timely intervention would yield profound public health benefits and health system cost savings.

Conclusions

The principal findings of our study are: Early prediction of GDM within early stages of pregnancy using regular examinations/exams; the development and optimization of twelve different ML models and their hyperparameters to achieve the highest prediction performance; a novel data augmentation method is proposed to allow reaching excellent GDM prediction results with various models. Several model results are, in general, better than previously reported methods generated using similar input datasets, and the provided results allow the selection of several alternatives to achieve a desired sensitivity and specificity. Choosing models with high GDM prediction performance, a low number of input variables, and widely available variables will facilitate the possible application of these models in most settings.

Availability of data and materials

The datasets used in this study are not publicly available due to privacy reasons. The dataset is provided by Hospital Parroquial de San Bernando, access to this data may be provided to qualified researchers upon request and permission of this institution (sillanes@uandes.cl).

The code used for the analysis, data cleaning, and model implementation are not available due to proprietary reasons and requires the data to be used. The models were implemented with Python using libraries which are publicly available for anyone that wants to replicate the experiments.

Abbreviations

1TFG:

First trimester fasting glucose test

ANOVA:

Analysis of variance

AUCROC:

Area under of curve of receiver operating characteristic

BMI:

Body mass index

BNB:

Bernoulli Naïve Bayes

BRF:

Balanced random forest

CV:

Cross validation

DA:

Data augmentation

DNN:

Deep neural network

DT:

Decision tree

ET:

Extra trees

FN:

False negative

FPG:

Fasting plasma glucose

FP:

False positive

GB:

Gradient boosting

GDM:

Gestational diabetes mellitus

GNB:

Gaussian Naïve Bayes

HAPO:

Hyperglycemia and adverse pregnancy outcome

HBA1c:

Glycated hemoglobin

IADPSG:

International association of diabetes and pregnancy study groups

KNN:

K-nearest neighbors

LGBM:

Light gradient boosting

LR:

Logistic regression

ML:

Machine learning

MLP:

Multi-layer perceptron

OGTT:

Oral glucose tolerance test

RF:

Random forest

SVM:

Support vector machine

T2DM:

Type 2 diabetes mellitus

WHO:

World health organization

XGB:

Extreme gradient boosting

References

  1. American Diabetes Association Professional Practice Committee. 2. classification and diagnosis of diabetes: standards of medical care in diabetes—2022. Diabetes Care. 2021;45:S17–38. https://doi.org/10.2337/dc22-S002.

    Article  Google Scholar 

  2. Wendland EM, Torloni MR, Falavigna M, Trujillo J, Dode MA, Campos MA, et al. Gestational diabetes and pregnancy outcomes - a systematic review of the World Health Organization (WHO) and the International Association of Diabetes in Pregnancy Study Groups (IADPSG) diagnostic criteria. BMC Pregnancy Childbirth. 2012;12:23. https://doi.org/10.1186/1471-2393-12-23.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Cho NH, Shaw JE, Karuranga S, Huang Y, da Rocha Fernandes JD, Ohlrogge AW, et al. IDF Diabetes Atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045. Diabetes Res Clin Pract. 2018;138:271–81. https://doi.org/10.1016/j.diabres.2018.02.023.

    Article  CAS  PubMed  Google Scholar 

  4. Casagrande SS, Linder B, Cowie CC. Prevalence of gestational diabetes and subsequent type 2 diabetes among U.S. women. Diabetes Res Clin Pract. 2018;141:200–8. https://doi.org/10.1016/j.diabres.2018.05.010.

    Article  PubMed  Google Scholar 

  5. Zhou T, Du S, Sun D, Li X, Heianza Y, Hu G, et al. Prevalence and trends in gestational diabetes mellitus among women in the United States, 2006–2017: a population-based study. Front Endocrinol. 2022;13:868094. https://doi.org/10.3389/fendo.2022.868094.

    Article  Google Scholar 

  6. Lee KW, Ching SM, Ramachandran V, Yee A, Hoo FK, Chia WA, et al. Prevalence and risk factors of gestational diabetes mellitus in Asia: a systematic review and meta-analysis. BMC Pregnancy Childbirth. 2018;18:494. https://doi.org/10.1186/s12884-018-2131-4.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Lowe LP, Metzger BE, Dyer AR, Lowe J, McCance DR, Lappin TRJ, et al. Hyperglycemia and Adverse Pregnancy Outcome (HAPO) study: associations of maternal A1C and glucose with pregnancy outcomes. Diabetes Care. 2012;35:574–80. https://doi.org/10.2337/dc11-1687.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Vandorsten JP, Dodson WC, Espeland MA, Grobman WA, Guise JM, Mercer BM, et al. NIH consensus development conference: diagnosing gestational diabetes mellitus. NIH Consens State Sci Statements. 2013;29:1–31.

    PubMed  Google Scholar 

  9. Fu J, Retnakaran R. The life course perspective of gestational diabetes: an opportunity for the prevention of diabetes and heart disease in women. eClinicalMedicine. 2022;45:101294. https://doi.org/10.1016/j.eclinm.2022.101294.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Plows J, Stanley J, Baker P, Reynolds C, Vickers M. The pathophysiology of gestational diabetes mellitus. Int J Mol Sci. 2018;19:3342. https://doi.org/10.3390/ijms19113342.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Sweeting A, Wong J, Murphy HR, Ross GP. A clinical update on gestational diabetes mellitus. Endocr Rev. 2022;43:763–93. https://doi.org/10.1210/endrev/bnac003.

    Article  PubMed  PubMed Central  Google Scholar 

  12. O’Sullivan JB, Mahan CM, Charles D, Dandrow RV. Screening criteria for high-risk gestational diabetic patients. Am J Obstet Gynecol. 1973;116:895–900. https://doi.org/10.1016/s0002-9378(16)33833-9.

    Article  CAS  PubMed  Google Scholar 

  13. Metzger BE, Gabbe SG, Persson B, Buchanan TA, Catalano PA, Damm P, et al. International association of diabetes and pregnancy study groups recommendations on the diagnosis and classification of hyperglycemia in pregnancy. Diabetes Care. 2010;33:676–82. https://doi.org/10.2337/dc09-1848.

    Article  CAS  PubMed  Google Scholar 

  14. Agarwal MM, Dhatt GS, Shah SM. Gestational diabetes mellitus. Diabetes Care. 2010;33:2018–20. https://doi.org/10.2337/dc10-0572.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Sacks DB, Arnold M, Bakris GL, Bruns DE, Horvath AR, Kirkman MS, et al. Guidelines and recommendations for laboratory analysis in the diagnosis and management of diabetes mellitus. Diabetes Care. 2011;34:e61–99. https://doi.org/10.2337/dc11-9998.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Potter JM, Hickman PE, Oakman C, Woods C, Nolan CJ. Strict preanalytical oral glucose tolerance test blood sample handling is essential for diagnosing gestational diabetes mellitus. Diabetes Care. 2020;43:1438–41. https://doi.org/10.2337/dc20-0304.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Choudhury AA, Rajeswari VD. Gestational diabetes mellitus - a metabolic and reproductive disorder. Biomed Pharmacother. 2021;143:112183. https://doi.org/10.1016/j.biopha.2021.112183.

    Article  CAS  PubMed  Google Scholar 

  18. Bhavadharini B, Mahalakshmi MM, Anjana K, Uma R, Deepa M, Unnikrishnan R, et al. Prevalence of gestational diabetes mellitus in urban and rural tamil nadu using IADPSG and WHO 1999 criteria (WINGS 6). Clinical Diabetes and Endocrinology. 2016;2:8. https://doi.org/10.1186/s40842-016-0028-6.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Crowther CA, Hiller JE, Moss JR, McPhee AJ, Jeffries WS, Robinson JS. Effect of treatment of gestational diabetes mellitus on pregnancy outcomes. N Engl J Med. 2005;352:2477–86. https://doi.org/10.1056/NEJMoa042973.

    Article  CAS  PubMed  Google Scholar 

  20. Wu Y-T, Zhang C-J, Mol BW, Kawai A, Li C, Chen L, et al. Early prediction of gestational diabetes mellitus in the chinese population via advanced machine learning. J Clin Endocrinol Metab. 2020;106:e1191–205. https://doi.org/10.1210/clinem/dgaa899.

    Article  PubMed Central  Google Scholar 

  21. Zheng T, Ye W, Wang X, Li X, Zhang J, Little J, et al. A simple model to predict risk of gestational diabetes mellitus from 8 to 20 weeks of gestation in Chinese women. BMC Pregnancy Childbirth. 2019;19:252. https://doi.org/10.1186/s12884-019-2374-8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Artzi NS, Shilo S, Hadar E, Rossman H, Barbash-Hazan S, Ben-Haroush A, et al. Prediction of gestational diabetes based on nationwide electronic health records. Nat Med. 2020;26:71–6. https://doi.org/10.1038/s41591-019-0724-8.

    Article  CAS  PubMed  Google Scholar 

  23. Liu H, Li J, Leng J, Wang H, Liu J, Li W, et al. Machine learning risk score for prediction of gestational diabetes in early pregnancy in Tianjin China. Diabetes/Metabolism Res Rev. 2021;37:e3397. https://doi.org/10.1002/dmrr.3397.

    Article  CAS  Google Scholar 

  24. Wu Y, Ma S, Wang Y, Chen F, Zhu F, Sun W, et al. A risk prediction model of gestational diabetes mellitus before 16 gestational weeks in Chinese pregnant women. Diabetes Res Clin Pract. 2021;179:109001. https://doi.org/10.1016/j.diabres.2021.109001.

    Article  PubMed  Google Scholar 

  25. Wang J, Lv B, Chen X, Pan Y, Chen K, Zhang Y, et al. An early model to predict the risk of gestational diabetes mellitus in the absence of blood examination indexes: application in primary health care centres. BMC Pregnancy Childbirth. 2021;21:814. https://doi.org/10.1186/s12884-021-04295-2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Guo F, Yang S, Zhang Y, Yang X, Zhang C, Fan J. Nomogram for prediction of gestational diabetes mellitus in urban, Chinese, pregnant women. BMC Pregnancy Childbirth. 2020;20:43. https://doi.org/10.1186/s12884-019-2703-y.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Tong J-N, Chen Y-X, Guan X-N, Liu K, Yin A-Q, Zhang H-F, et al. Association between the cut-off value of the first trimester fasting plasma glucose level and gestational diabetes mellitus: a retrospective study from southern China. BMC Pregnancy Childbirth. 2022;22:540. https://doi.org/10.1186/s12884-022-04874-x.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Pintaudi B, Vieste GD, Corrado F, Lucisano G, Pellegrini F, Giunta L, et al. Improvement of selective screening strategy for gestational diabetes through a more accurate definition of high-risk groups. Eur J Endocrinol. 2014;170:87–93. https://doi.org/10.1530/EJE-13-0759.

    Article  CAS  PubMed  Google Scholar 

  29. Shen L, Sahota DS, Chaemsaithong P, Tse WT, Chung MY, Ip JKH, et al. First trimester screening for gestational diabetes mellitus with maternal factors and biomarkers. Fetal Diagn Ther. 2022;49:256–64. https://doi.org/10.1159/000525384.

    Article  PubMed  Google Scholar 

  30. Pan Y, Hu J, Zhong S. The joint prediction model of pBMI and eFBG in predicting gestational diabetes mellitus. J Int Med Res. 2019;4:300060519889199. https://doi.org/10.1177/0300060519889199.

    Article  CAS  Google Scholar 

  31. Kumar M, Chen L, Tan K, Ang LT, Ho C, Wong G, et al. Population-centric risk prediction modeling for gestational diabetes mellitus: a machine learning approach. Diabetes Res Clin Pract. 2022;185:109237. https://doi.org/10.1016/j.diabres.2022.109237.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Li L, Zhu Q, Wang Z, Tao Y, Liu H, Tang F, et al. Establishment and validation of a predictive nomogram for gestational diabetes mellitus during early pregnancy term: a retrospective study. Front Endocrinol. 2023;14:1087994. https://doi.org/10.3389/fendo.2023.1087994.

    Article  Google Scholar 

  33. Kurt B, Gürlek B, Keskin S, Özdermir S, Karadeniz Ö, Buçan-Kırkbir İ, et al. Prediction of gestational diabetes using deep learning and Bayesian optimization and traditional machine learning techniques. Med Biol Eng Computation. 2023. https://doi.org/10.1007/s11517-023-02800-7.

    Article  Google Scholar 

  34. Wu S, Li L, Hu K-L, Wang S, Zhang R, Chen R, et al. A Prediction Model of Gestational Diabetes Mellitus Based on OGTT in Early Pregnancy: A Prospective Cohort Study. The Journal of Clinical Endocrinology & Metabolism. 2023. https://doi.org/10.1210/clinem/dgad052.

  35. Wei Y, He A, Tang C, Liu H, Li L, Yang X, et al. Risk prediction models of gestational diabetes mellitus before 16 gestational weeks. BMC Pregnancy Childbirth. 2022;22:889. https://doi.org/10.1186/s12884-022-05219-4.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Mennickent D, Rodríguez A, Farías-Jofré M, Araya J, Guzmán-Gutiérrez E. Machine learning-based models for gestational diabetes mellitus prediction before 24-28 weeks of pregnancy: a review. Artificial Intellig Med. 2022;132:102378. https://doi.org/10.1016/j.artmed.2022.102378.

    Article  Google Scholar 

  37. Li S, Wang Z, Vieira LA, Zheutlin AB, Ru B, Schadt E, et al. Improving preeclampsia risk prediction by modeling pregnancy trajectories from routinely collected electronic medical record data. npj Digital Med. 2022;5:68. https://doi.org/10.1038/s41746-022-00612-x.

    Article  Google Scholar 

  38. Zambrano JE, Benalcazar DP, Perez CA, Bowyer KW. Iris recognition using low-level CNN layers without training and single matching. IEEE Access. 2022;10:41276–86. https://doi.org/10.1109/ACCESS.2022.3166910.

    Article  Google Scholar 

  39. Coustan DR, Lowe LP, Metzger BE, Dyer AR. The Hyperglycemia and Adverse Pregnancy Outcome (HAPO) study: paving the way for new diagnostic criteria for gestational diabetes mellitus. American J Obstet Gynecol. 2010;202:654.e1-654.e6. https://doi.org/10.1016/j.ajog.2010.04.006.

    Article  Google Scholar 

  40. Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data. 2019;6:101. https://doi.org/10.1186/s40537-019-0197-0.

    Article  Google Scholar 

  41. Montecino DA, Perez CA, Bowyer W. Two-level genetic algorithm for evolving convolutional neural networks for pattern recognition. IEEE Access. 2021;9:126856–72. https://doi.org/10.1109/ACCESS.2021.3111175.

    Article  Google Scholar 

  42. World Health Organization. A healthy lifestyle - WHO recommendations, https://www.who.int/europe/news-room/fact-sheets/item/a-healthy-lifestyle---who-recommendations; 2010 [Accessed 20 Dec 2022].

  43. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.

    Google Scholar 

  44. Perez CA, Gonzalez GD, Medina LE, Galdames FJ. Linear Versus Nonlinear Neural Modeling for 2-D Pattern Recognition. IEEE Transact Syst Man Cybernetics - Part A: Syst Humans. 2005;35:955–64. https://doi.org/10.1109/tsmca.2005.851268.

    Article  Google Scholar 

  45. Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res. 2017;18:1–5.

    Google Scholar 

  46. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2016. https://doi.org/10.1145/2939672.2939785

  47. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. in Advances in Neural Information Processing Systems, 2017.

  48. Cawley GC, Talbot NLC. on over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res. 2010;11:2079–107.

    Google Scholar 

  49. Tapia JE, Perez CA, Bowyer KW. Gender classification from the same iris code used for recognition. IEEE Trans Inf Forensics Secur. 2016;11:1760–70. https://doi.org/10.1109/TIFS.2016.2550418.

    Article  Google Scholar 

  50. Fawcett T. An introduction to ROC analysis. Pattern Recogn Lett. 2006;27:861–74. https://doi.org/10.1016/j.patrec.2005.10.010.

    Article  Google Scholar 

  51. Pillay J, Donovan L, Guitard S, Zakher B, Gates M, Gates A, et al. Screening for Gestational Diabetes. JAMA. 2021;326:539. https://doi.org/10.1001/jama.2021.10404.

    Article  PubMed  Google Scholar 

  52. Zhang Z, Yang L, Han W, Wu Y, Zhang L, Gao C, et al. Machine learning prediction models for gestational diabetes mellitus: meta-analysis. J Med Internet Res. 2022;24:e26634. https://doi.org/10.2196/26634.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Cichosz SL, Johansen MD, Ejskjaer N, Hansen TK, Hejlesen OK. Improved diabetes screening using an extended predictive feature search. Diabetes Technol Ther. 2014;16(3):166–71. https://doi.org/10.1089/dia.2013.0255.

    Article  PubMed  Google Scholar 

  54. Lenoir-Wijnkoop I, van der Beek EM, Garssen J, Nuijten MJC, Uauy RD. Health economic modeling to assess short-term costs of maternal overweight, gestational diabetes, and related macrosomia - a pilot evaluation. Front Pharmacol. 2015;6:103. https://doi.org/10.3389/fphar.2015.00103.

    Article  PubMed  PubMed Central  Google Scholar 

  55. Fitria N, van Asselt ADI, Postma MJ. Cost-effectiveness of controlling gestational diabetes mellitus: a systematic review. Eur J Health Econ. 2018;20:407–17. https://doi.org/10.1007/s10198-018-1006-y.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Not applicable

Funding

This work was supported in part by Agencia Nacional de Investigacion y Desarrollo (ANID) under Basal funding for Scientific and Technological Center of Excellence, IMPACT, #FB210024, and FONDECYT 1231675, also by the Dept. of Electrical Engineering, Universidad de Chile, and by the Dept. of Obstetrics and Gynecology, Faculty of Medicine, Universidad de los Andes.

Author information

Authors and Affiliations

Authors

Contributions

GC contributed with the machine learning design of the study, statistical analysis, data-cleaning, machine learning models implementation, draft and revisions, MM performed data collection, data analysis, results interpretation and paper edition, AP and MPM helped with data analysis, clinical interpretation and final draft revision, PE contributed with the machine learning and statistical methods, results and paper draft revisions, MC and MWK performed data analysis, clinical interpretation and conclusions as well as helped writing the final version of the paper, SEI and CAP contributed with the conception and design of the study, designed the methods, interpreted the results and conclusions from the clinical as well as de machine learning perspectives, wrote and edited the final version of the paper.

Corresponding authors

Correspondence to Sebastian E. Illanes or Claudio A. Perez.

Ethics declarations

Ethics approval and consent to participate

The data usage was approved by the by the institutional review board (IRB) of Hospital Parroquial de San Bernardo, Santiago, Chile, who determined that the research does not involve human subjects because the project uses previously collected, de-identified data. The “Comité de Etica del Hospital Parroquial de San Bernardo” waived the need for informed consent. All methods were carried out in accordance with relevant guidelines and regulations.

Consent for publication

Not applicable.

Competing interests

All authors declare no competing interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

Table S1. The Mean AUCROC bracketed values are at the 95% confidence interval, and standard deviation (STD) of the different models presented in Tables 5 and 6. STD uses four decimals.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cubillos, G., Monckeberg, M., Plaza, A. et al. Development of machine learning models to predict gestational diabetes risk in the first half of pregnancy. BMC Pregnancy Childbirth 23, 469 (2023). https://doi.org/10.1186/s12884-023-05766-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12884-023-05766-4

Keywords