Fetal Gestational Age Prediction in Brain Magnetic Resonance Imaging Using Artificial Intelligence: A Comparative Study of Three Biometric Techniques

: Accurately predicting a fetus’s gestational age (GA) is crucial in prenatal care. This study aimed to develop an artificial intelligence (AI) model to predict GA using biometric measurements from fetal brain magnetic resonance imaging (MRI). We assessed the significance of using different reference standards for interpreting GA predictions. Measurements of biparietal diameter (BPD), fronto-occipital diameter (FOD), and head circumference (HC) were obtained from 52 normal fetal MRI cases from Rush University. Both manual and AI-based methods were utilized, and comparisons were made using three reference standards (Garel, Freq, and Bio). The AI model showed a strong correlation with manual measurements, particularly for HC, which exhibited the highest correlation with actual values. Differences between GA predictions and picture archiving and communication system (PACS) records varied by reference, ranging from 0.47 to 2.17 weeks for BPD, 0.46 to 2.26 weeks for FOD, and 0.75 to 1.74 weeks for HC. Pearson correlation coefficients between PACS records and GA predictions exceeded 0.97 across all references. In conclusion, the AI model demonstrated high accuracy in predicting GA from fetal brain MRI measurements. This approach offers improved accuracy and convenience over manual methods, highlighting the potential of AI in enhancing prenatal care through precise GA estimation.


Introduction
Accurately predicting gestational age (GA) holds paramount importance in the field of obstetrics.It has numerous applications, including pregnancy dating, assessing fetal growth and development, determining the timing of delivery and interventions (such as administering steroids), detecting fetal growth restriction, preterm labor, and other complications.The ability to make informed obstetrical management decisions is crucial for ensuring optimal maternal and fetal outcomes [1,2].
Additionally, it enables healthcare professionals to distinguish between normal and abnormal findings.GA is also crucial when interpreting test results, such as the maternal triple-screen blood test, which is normally conducted and interpreted between the 15th and 18th week of pregnancy [3].
According to updated guidelines, GA is crucial when interpreting maternal blood tests in the first trimester (10-13 weeks) or second trimester (quad screen; 15-22 weeks) [4].These guidelines recommend the use of combined first-trimester screening, which includes ultrasound measurements and maternal blood tests, to assess the risk of chromosomal abnormalities in the fetus.The use of GA in interpreting maternal blood tests is important because it helps determine the appropriate reference ranges and cutoff values for these tests.GA provides valuable information about the stage of pregnancy and allows for accurate interpretation of test results.In cases where there is a discrepancy between GA based on the last menstrual period (LMP) and ultrasound estimates, US clinicians may revise the due date if the difference exceeds a certain threshold, typically ±7 days up to 20 weeks' gestation [5].
GA can be determined using the first day of the mother's last menstrual period (LMP) or ultrasound (US) measurements [6].While the US is generally more accurate, measurement errors and biological variability can also affect its accuracy.The accuracy of gestational age dating decreases with time since conception.In the first trimester, the error range is within 1 week, increasing to 1-2 weeks in the second trimester and up to 1 month in the third trimester [7].When using menstrual dating alone to determine GA, estimations were inaccurate in 11% to 42% of cases.However, data still support the use of US dating, despite its sources of inaccuracy [2].The American College of Obstetricians and Gynecologists (ACOG), the American Institute of Ultrasound in Medicine (AIUM), and the Society for Maternal-Fetal Medicine (SMFM) specifically recommend using US dating to determine GA if the LMP is unknown or inconsistent with US dates.
While sonography is the primary method for evaluating fetal anomalies, it has limitations regarding specificity and visualization.MRI can be a valuable adjunct to sonography, particularly when sonographic findings are inconclusive, as it can provide additional diagnostic information and help improve the accuracy of prenatal diagnoses [8,9].MRI has certain key advantages over ultrasound as a method for GA estimation.These advantages include high spatial resolution, better soft tissue contrast, and the ability to visualize the whole extent of the brain irrespective of the fetal head position, making more accurate measurements.This translates to a 23% improved diagnostic accuracy of MRI over ultrasound at a GA of 18-24 weeks and 29% at >24 weeks [10].Furthermore, better contrast resolution also aids in making measurements of finer anatomical structures such as the length/area of the vermis, which is usually challenging with ultrasound due to significant shadowing from posterior fossa osseous structures.MRI can also perform volumetric measurements of different brain structures, cortical gyrification, and depth of sulci, which have been shown to strongly correlate with GA.Importantly, these measurements remain valid even in the presence of congenital anomalies, such as hydrocephalus, where routine ultrasound measurements fail to predict GA [10].
Ultrasound parameters used to determine gestation age differ based on the trimester and include crown-rump length (CRL), biparietal diameter (BPD), corrected biparietal diameter (BPDC), and femur length.The MSD is used in the early part of the first trimester, up to 6 weeks.The CRL is used in the later part of the first trimester, up to 12 weeks.The BPD, cBPD, and femur length are all used for gestational age dating during the second and third trimesters.The BPD measures the diameter of a transverse section of the fetal skull at the level of the parietal eminences.The BPDC is calculated as BPDC = (BPD * FOD)/1.265 The occipitofrontal diameter, measured as the length from the nose to the occipital bone, is also an important biometric parameter for gestational age dating.
Compared to ultrasound, fetal MRI offers several advantages for GA prediction.MRI provides better image resolution and tissue contrast, enabling more accurate fetal anatomy and development measurements.Additionally, MRI can provide more comprehensive information about fetal brain development, which is crucial for detecting abnormalities and planning interventions.Fetal MRI is also less dependent on the operator's skill, which can lead to greater consistency and accuracy in GA prediction.These advantages make fetal MRI a promising GA prediction and prenatal care tool.
While ultrasound is the primary modality for determining gestational age due to its widespread availability and accuracy, MRI can provide additional information in certain clinical situations.Here are some roles of MRI in determining gestational age: 1.
Evaluation of fetal anomalies: MRI is particularly useful in assessing fetal anomalies and structural abnormalities that may impact GA determination.It can provide detailed images of the fetus and surrounding structures, allowing for a comprehensive evaluation of fetal development and identification of any abnormalities [11,12] AI has been utilized in processing anatomic fetal brain MRI to automatically predict landmarks and perform segmentation.Various AI models, including convolutional neural network and U-Net, have been employed and achieved accuracy levels of 95% and higher.AI has shown potential in aiding the preprocessing [13] and post-processing [14] of fetal images, as well as in image reconstruction.Additionally, AI can be applied to tasks such as gestational age prediction, with an accuracy of one week [15], fetal brain extraction [16,17], fetal brain segmentation [18], and placenta detection.Furthermore, certain linear measurements of the fetal brain, such as cerebral and bone biparietal diameter, have been proposed as potential applications of AI in this field.
Deep learning techniques have shown great potential in accurately predicting GA in fetuses using magnetic resonance imaging (MRI).Artificial neural networks (ANNs) and convolutional neural networks (CNNs) are two types of deep learning computing paradigms that have been utilized for medical image recognition tasks [19].CNNs have demonstrated high accuracy in predicting the chronological age of adults using brain MRI scans [20].By applying these techniques to fetal MRI scans, researchers can accurately determine the gestational age of fetuses, which is essential for appropriate obstetrical management decisions and optimal maternal and fetal outcomes.
The application of artificial intelligence techniques, especially segmentation techniques, to extract fetal brain structures has revolutionized the field of prenatal imaging analysis.DynUNet (Dynamic U-Net) is a deep learning network architecture for image segmentation tasks.It has great potential in various image segmentation tasks and has been widely used in computer vision.Based on this framework, we automatically combine OpenCV (Open Source Computer Vision) edge detection, convex hull extraction, minimum circumscribed matrix, and other algorithms to obtain key data such as BPD, FOD, and HC for GA prediction.
This study aims to apply deep learning techniques to fetal MRI subjects to determine GA accurately.For this purpose, we used three trusted variables: BPD, FOD, and HC.We measured them manually (performed by a radiologist) as well as by using an AI tool.Finally, we compared the accuracy of GA prediction using both methods in GA-BPD, GA-FOD, GA-HC, and average GA.

Dataset and Measurement
We obtained measurements such as biparietal diameter (BPD), fronto-occipital diameter (FOD), and head circumference (HC) from a dataset comprising 52 normal fetal MRI cases with T2-HASTE sequences from Rush University.Both manual and AI-based methods were utilized to acquire these measurements.We also employed three reference papers (Garel, Freq, and Bio) for comparison purposes.
In this study, all patients included were those referred for MRI due to suspicions raised during ultrasound examinations.All available and normal cases with fetal brain MRI, which were accessible for downloading from the picture archiving and communication system (PACS), were enrolled in the study.Additionally, we excluded all cases with suspected cerebral and extracerebral fetal malformations to ensure a more homogeneous dataset.Cases that did not meet our predefined criteria for maternal obesity and oligohydramnios were also identified and excluded.
Manual measurements were performed by an expert radiologist using PACS imaging software.BPD was measured as the maximum distance between the inner edges of the parietal bones, while FOD was measured as the maximum distance between the inner edges of the frontal and occipital bones [21].For HC, the measurement was calculated using a specific formula: The in-to-in method of measurements was consistently employed for all three parameters-biparietal diameter (BPD), fronto-occipital diameter (FOD), and head circumference (HC).The manual measurements were indeed conducted on the raw PACS (picture archiving and communication system) images.We did not employ any additional graphical overlays, boxes, or loops for manual measurements.We added this part to the paper: • The pictures below are samples of our manual measurement.

Dataset and Measurement
We obtained measurements such as biparietal diameter (BPD), fronto-occipital diameter (FOD), and head circumference (HC) from a dataset comprising 52 normal fetal MRI cases with T2-HASTE sequences from Rush University.Both manual and AI-based methods were utilized to acquire these measurements.We also employed three reference papers (Garel, Freq, and Bio) for comparison purposes.
In this study, all patients included were those referred for MRI due to suspicions raised during ultrasound examinations.All available and normal cases with fetal brain MRI, which were accessible for downloading from the picture archiving and communication system (PACS), were enrolled in the study.Additionally, we excluded all cases with suspected cerebral and extracerebral fetal malformations to ensure a more homogeneous dataset.Cases that did not meet our predefined criteria for maternal obesity and oligohydramnios were also identified and excluded.
Manual measurements were performed by an expert radiologist using PACS imaging software.BPD was measured as the maximum distance between the inner edges of the parietal bones, while FOD was measured as the maximum distance between the inner edges of the frontal and occipital bones [21].For HC, the measurement was calculated using a specific formula: The in-to-in method of measurements was consistently employed for all three parameters-biparietal diameter (BPD), fronto-occipital diameter (FOD), and head circumference (HC).The manual measurements were indeed conducted on the raw PACS (picture archiving and communication system) images.We did not employ any additional graphical overlays, boxes, or loops for manual measurements.We added this part to the paper:

•
The pictures below are samples of our manual measurement.
Axial T2-HASTE fetal MRI image for BPD measurement-defined as the widest diameter of the fetal skull measured in a transverse plane using the inner edge to inner edge method.
Axial T2-HASTE fetal MRI image for BPD measurement-defined as the widest diameter of the fetal skull measured in a transverse plane using the inner edge to inner edge method.
Sagittal T2-HASTE fetal MRI image for FOD measurement-defined as the distance between extreme points of frontal and occipital lobes using the inner edge to inner edge method.

AI-Based Fetal Brain Measurement Method:
Our AI model for fetal brain measurement leverages a combination of deep learning and computer vision techniques to automate the process:

•
Fetal brain extraction: To initiate the measurement process, we employ the Dynamic U-Net tool, which is a deep learning pipeline based on the nnU-Net adaptive framework for U-Net-based medical image segmentation.Specifically, we utilize the PyTorch-based MONAIfbs (MONAI fetal brain segmentation) toolkit to perform automatic fetal brain segmentation on HASTE-like MR images.

•
Defining the length and width of the brain: After successfully obtaining the brain mask, we utilize OpenCV for further analysis.We first apply the "findContours" function to extract the edges of the fetal head mask from the MRI images.Subsequently, we employ the "convex hull" function to determine the envelope, essentially creating a simplified outline of the fetal head.Then, the "minAreaRect" function identifies the minimum enclosing rectangle around this envelope.This rectangle is essential in measuring the brain dimensions.

•
Measuring perimeter, length (FOD), and width (BPD): To calculate the perimeter of the fetal head (HC), we use the "arcLength" function, which provides the path length of the contours.Simultaneously, the "minAreaRect" function not only identifies the rectangle but also gives us the width and height of the fetal head.These dimensions are used to measure the width (BPD) and length (FOD) of the brain.

•
Choosing the median of axial series: Typically, patients have multiple sequences, such as axial, sagittal, and coronal.We address this by choosing the median of all axial series as the final automatic measurement result, ensuring a consistent and reliable outcome.
It is important to note that the BPD and FOD measured in the clinic are obtained from other coronal and sagittal sequences, respectively.Nevertheless, our AI model's results exhibit a high degree of correlation with these clinical measurements, demonstrating the effectiveness and accuracy of our automated fetal brain measurement process.
Figure 1 and Scheme 1, illustrate the automatic measurement process of biparietal diameter (BPD), fronto-occipital diameter (FOD), and head circumference (HC) on T2 fetal Sagittal T2-HASTE fetal MRI image for FOD measurement-defined as the distance between extreme points of frontal and occipital lobes using the inner edge to inner edge method.

AI-Based Fetal Brain Measurement Method:
Our AI model for fetal brain measurement leverages a combination of deep learning and computer vision techniques to automate the process:

•
Fetal brain extraction: To initiate the measurement process, we employ the Dynamic U-Net tool, which is a deep learning pipeline based on the nnU-Net adaptive framework for U-Net-based medical image segmentation.Specifically, we utilize the PyTorchbased MONAIfbs (MONAI fetal brain segmentation) toolkit to perform automatic fetal brain segmentation on HASTE-like MR images.

•
Defining the length and width of the brain: After successfully obtaining the brain mask, we utilize OpenCV for further analysis.We first apply the "findContours" function to extract the edges of the fetal head mask from the MRI images.Subsequently, we employ the "convex hull" function to determine the envelope, essentially creating a simplified outline of the fetal head.Then, the "minAreaRect" function identifies the minimum enclosing rectangle around this envelope.This rectangle is essential in measuring the brain dimensions.

•
Measuring perimeter, length (FOD), and width (BPD): To calculate the perimeter of the fetal head (HC), we use the "arcLength" function, which provides the path length of the contours.Simultaneously, the "minAreaRect" function not only identifies the rectangle but also gives us the width and height of the fetal head.These dimensions are used to measure the width (BPD) and length (FOD) of the brain.

•
Choosing the median of axial series: Typically, patients have multiple sequences, such as axial, sagittal, and coronal.We address this by choosing the median of all axial series as the final automatic measurement result, ensuring a consistent and reliable outcome.
It is important to note that the BPD and FOD measured in the clinic are obtained from other coronal and sagittal sequences, respectively.Nevertheless, our AI model's results exhibit a high degree of correlation with these clinical measurements, demonstrating the effectiveness and accuracy of our automated fetal brain measurement process.
Figure 1 and Scheme 1, illustrate the automatic measurement process of biparietal diameter (BPD), fronto-occipital diameter (FOD), and head circumference (HC) on T2 fetal MRI.The yellow box represents BPD and FOD, while the red circle indicates HC measurement.
MRI.The yellow box represents BPD and FOD, while the red circle indicates HC measurement.

Prediction of Fetal Age
To predict the age of the fetus, we used the measurements of BPD, FOD, and HC obtained both manually and automatically (by the AI model).We then compared the age Scheme 1.An integrated methodology for fetal brain measurement: Leveraging the PyTorch-based MONAIfbs toolkit for automated segmentation, employing OpenCV for dimensional analysis, and ensuring consistency through median selection of axial series.

Prediction of Fetal Age
To predict the age of the fetus, we used the measurements of BPD, FOD, and HC obtained both manually and automatically (by the AI model).We then compared the age predicted by the AI model versus the age predicted by the manual measurements on different metrics: BPD, FOD, HC, and corrected BPD.
To establish a reliable basis for comparison, the predicted age was determined by applying standard fetal growth charts based on biparietal diameter (BPD) and frontooccipital diameter (FOD) measurements from established clinical references commonly used in prenatal care: 1.
MRI of the Fetal Brain Normal Development and Cerebral Pathologies, 1st ed.2004 Edition, by C. Garel [22] (we refer to this reference as Garel in the paper).Supplement Tables S1  and S2.

2.
Also, we used another trusted reference, mainly for ultrasound (Hadlock FP, Deter RL, Harrist RB, et al.: Fetal biparietal diameter: A critical reevaluation of the relation to menstrual age using real-time ultrasound.J Ultrasound Med 1982 [23].We refer to this as Freq in our paper (Supplement Tables S3 and S4 are derived from the reference table in the Hadlock et al. paper).
The three references that we used in our paper, namely, Garel, Bio, and Freq, were not only based on expert consensus but were also cross-validated by a perinatologist, further confirming their clinical relevance.Furthermore, we sought the input of an experienced pediatric neuroradiologist to ensure the radiological aspects of the study were robust and aligned with clinical standards.

Statistical Analysis
Statistical analysis was conducted using the Python Version 3. Paired t-tests were employed to analyze the differences between the age predicted by the AI model and the manual measurements.A significance level of p < 0.05 was used to determine statistical significance.Additionally, Pearson's correlation coefficient was calculated to assess the correlation between the age predicted by the AI model and the manual measurements.

Ethical Considerations
This study received approval from our institutional IRB, and a waiver of informed consent was granted, given the study's retrospective nature.All patient data were anonymized and handled with utmost confidentiality throughout the entire duration of the study.

Results
The provided data include comparisons between GA measurements obtained from different methods (manual and AI) and different biometric parameters (BPD, FOD, BPDC, HC).All the outputs of our measurements are in Table 1: We compared the manual measurements performed by a radiologist with the measurements obtained through an AI model; we present the results in three main parts: - a.When BPD was used as an index for GA predictions, the differences between GA predictions in manual and AI measurements were as follows (in weeks):
b.When comparing BPD measurements with GA in the PACS (picture archiving and communication system), the differences were as follows (in weeks):

Corrected BPD
In the Freq reference, a corrected BPD measurement was also suggested (in weeks).The difference between the manual GA predictions and AI measurements was 0.47 weeks.
• When comparing the corrected BPD measurements with GA in the PACS, the differences were as follows (in weeks): GA in PACS vs. manual corrected BPD measurements: 1.30; GA in PACS vs. AI corrected BPD measurements: 1.24.

Fronto-Occipital Diameter (FOD)
a. When FOD was used as an index for GA predictions, the differences between the manually predicted GA and AI measurements were as follows:
b.When comparing FOD measurements with GA in the PACS (picture archiving and communication system), the differences were as follows:

•
Garel reference: GA in PACS vs. manual FOD measurements: a.When HC was used as an index for GA predictions, the differences between GA prediction in manual and AI measurements were as follows:
b.When comparing HC measurements with GA in the PACS (picture archiving and communication system), the differences were as follows: • Freq reference: GA in PACS vs. manual HC measurements: 1.40 weeks; GA in PACS vs. AI HC measurements: 1.05 weeks.
• Bio reference: GA in PACS vs. manual HC measurements: 1.74 weeks; GA in PACS vs. AI HC measurements: 1.26 weeks.
(HC measurements were not available in the Garel references.)Part 3.

2: Comparison of Predictions among References
In this section, we compared the measurements obtained from the AI model and the manual method for three variables: Biparietal diameter (BPD), fronto-occipital diameter (FOD), and head circumference (HC).The objective was to determine which reference yielded stronger correlations between the measurements.

Comparison of BPD using References
The following figure and tables present the correlation of the gestational age (GA) predictions based on biparietal diameter (BPD) measurements in the three different references, along with the correlation with the picture archiving and communication system (PACS) (Figure 2, Tables 2 and 3).
b.When comparing FOD measurements with GA in the PACS (picture archiving and communication system), the differences were as follows: a.When HC was used as an index for GA predictions, the differences between GA prediction in manual and AI measurements were as follows:
b.When comparing HC measurements with GA in the PACS (picture archiving and communication system), the differences were as follows:

Part 3.2: Comparison of Predictions among References
In this section, we compared the measurements obtained from the AI model and the manual method for three variables: Biparietal diameter (BPD), fronto-occipital diameter (FOD), and head circumference (HC).The objective was to determine which reference yielded stronger correlations between the measurements.

Comparison of BPD using References
The following figure and tables present the correlation of the gestational age (GA) predictions based on biparietal diameter (BPD) measurements in the three different references, along with the correlation with the picture archiving and communication system (PACS) (Figure 2, Tables 2 and 3).

Comparison of FOD Using References
The figures and tables presented below illustrate the correlation of GA predictions based on biparietal diameter measurements using the three different references and the correlation with the picture archiving and communication system (Figure 3, Tables 4 and 5).

Comparison of FOD Using References
The figures and tables presented below illustrate the correlation of GA predictions based on biparietal diameter measurements using the three different references and the correlation with the picture archiving and communication system (Figure 3, Tables 4 and  5).

Comparison of HC and BPDC Using References
The figure and tables below show the correlation of the GA predicted according to the HC and BPDC in the three references, as well as with PACS (Figure 4, Tables 6 and 7).1.000000

Comparison of HC and BPDC Using References
The figure and tables below show the correlation of the GA predicted according to the HC and BPDC in the three references, as well as with PACS (Figure 4, Tables 6 and 7).

Part 3.3: Comparison of Manual Measurements versus AI
In this part, we compared the manual (radiologist) measurement of indexes (BPD, FOD, HC) versus AI measurement.We utilized statistical measures, including mean absolute error (MAE) and root mean squared error (RMSE), to assess the accuracy of our predictions.Lower scores in these measures indicate higher prediction accuracy.Additionally, we employed the Pearson correlation coefficient (r) to evaluate the linear correlation between the AI and manual measurements.The close-to-1 values of the correlation coefficient signify a strong positive correlation, indicating that the AI predictions align well with the manual measurements (Table 8, Figures 5-8).The MAE (mean absolute error) is 7.2755 and represents the mean absolute difference between the actual (HC_sorted_m) and predicted (HC_a) values.It shows that, on average, the predicted value differs from the actual value by 7.2755 mm.The RMSE (root mean squared error), which means the square root of the average squared difference between the actual and predicted values, is 8.1365.It measures the typical difference between actual and predicted values in the same units as the data (in this case millimeters).The Pearson correlation coefficient (r) is 0.9973, indicating a strong positive linear relationship between HC_sorted_m and HC_a.This means that when the value of HC_sorted_m increases, HC_a also tends to increase, and vice versa.The closer the r value is to 1, the stronger the The MAE (mean absolute error) is 7.2755 and represents the mean absolute difference between the actual (HC_sorted_m) and predicted (HC_a) values.It shows that, on average, the predicted value differs from the actual value by 7.2755 mm.The RMSE (root mean squared error), which means the square root of the average squared difference between the actual and predicted values, is 8.1365.It measures the typical difference between actual and predicted values in the same units as the data (in this case millimeters).The Pearson correlation coefficient (r) is 0.9973, indicating a strong positive linear relationship between HC_sorted_m and HC_a.This means that when the value of HC_sorted_m increases, HC_a also tends to increase, and vice versa.The closer the r value is to 1, the stronger the correlation between the two variables.In this case, a strong correlation indicates that the predicted value (HC_a) is closer to the actual value (HC_sorted_m).

Discussion
In this study, we aimed to evaluate the performance of an AI model in predicting the gestational age (GA) of fetuses using three variables measured in fetal brain MRI: biparietal diameter (BPD), fronto-occipital diameter (FOD), and head circumference (HC).The study used a dataset of 52 normal fetal MRI cases from Rush University, which included T2 HASTE sequences.BPD, FOD, and HC measurements were obtained manually by a radiologist and by an AI model using a Dynamic U-Net model to extract the fetal brain, and then, measure BPD and FOD automatically.
The differences between manual and AI GA measurements vary depending on the specific biometric parameter used.However, the different measurements have a high correlation, indicating a consistent relationship between the methods.The analysis also suggests that AI-based measurements of HC show a stronger correlation with the actual values compared to BPD, FOD, and BPDC.We provide discussion of the results in the following three sections.

Discussion for Part 3.1 of Results with Biometric Measurements (BPD, FOD, HC)
The analysis of the data provides insights into the accuracy of predicting gestational age (GA) using different biometric measurements, namely, biparietal diameter (BPD), fronto-occipital diameter (FOD), and head circumference (HC).The key findings are discussed below:

BPD:
The BPD is one of the most measured parameters in the fetus.Campbell was the first investigator to link fetal BPD to gestational age [25]; however, since this original report, numerous publications on this subject have appeared in the literature [26][27][28].
The BPD may be rapidly and reproducibly measured by ultrasound examination from 12 weeks' gestation until the end of pregnancy.The BPD is imaged in the transaxial plane of the fetal head at a level depicting the thalamus in the midline, equidistant from the temporoparietal bones and usually with the cavum septum pellucidum anteriorly [29].Although several methods have been used to measure BPD, the most accepted method is measurement from leading edge to leading edge (outer to inner).
According to similar studies, the accuracy of estimating gestational age using BPD depends on the stage of pregnancy [30].Between 12 and 26 weeks of gestation, the BPD measurement can provide an estimation accurate to within ±10 to 11 days.As the pregnancy progresses beyond 26 weeks, the accuracy of BPD measurement decreases, and it can have an error of up to ±3 weeks near term [31].
Several factors can affect the accuracy of BPD measurements.Biological factors such as differences in maternal age, parity (number of previous pregnancies), pregnancy weight, geographic location, and specific population characteristics can contribute to variations in BPD measurements.Technical factors like measurement techniques, interobserver error, and the use of single or multiple measurements can also influence the accuracy of BPD in estimating gestational age [27,32].
Although most dating curves show a general relationship between BPD and gestational age, there can be significant differences in estimating gestational age based on a particular BPD measurement.Additionally, the accuracy of BPD measurements is highest when the shape of the fetal head is appropriately ovoid.If the head shape is unusually rounded or elongated, BPD measurements may overestimate or underestimate gestational age, respectively.
-BPD results in our study: the differences between the manual and AI measurements of BPD were relatively small across the different references, indicating good agreement.
According to the Garel reference, the difference in GA predictions was 0.66 weeks, demonstrating a close alignment between the two methods.Similar differences were observed when considering the Freq and Bio references.However, larger differences were observed when comparing BPD measurements with GA in the PACS, ranging from 1.24 to 2.17 weeks.These differences varied depending on the specific reference used, emphasizing the influence of reference selection on the accuracy of GA predictions.

FOD:
To assess the appropriateness of head shape, the BPD can be compared with the FOD [33], and the ratio of these diameters is called the cephalic index (CI).The normal range for the CI is between 0.70 and 0.86 (±2 standard deviations).In cases where the fetus has an abnormal cephalic index (which is rare, noted in less than 2% of fetuses before 26 weeks' gestation), gestational age estimates may be more accurately determined using other fetal parameters such as head circumference.
Similar to the BPD measurements, the differences between manual and AI measurements of FOD were also relatively small.According to the Garel reference, the difference in GA predictions was 0.59 weeks, indicating a strong agreement between the two measurement approaches.The difference was slightly smaller when considering the Bio reference.However, larger differences were observed when comparing FOD measurements with GA in the PACS, ranging from 1.77 to 2.26 weeks.As with BPD, these differences varied based on the specific reference used.

HC:
Head circumference (HC) measurement can be utilized to estimate gestational age similarly to the biparietal diameter (BPD) measurement.While tracing the outer perimeter of the head using a trackball on ultrasound equipment or a digitizer is the most reliable method for measuring HC, there is also a formula that involves using the BPD and frontoccipital diameters to calculate HC, with a maximum error of 6% [34].
The accuracy of estimating gestational age through HC measurement is comparable to that of BPD measurement [35].However, in cases where the fetus has an abnormal head shape, such as brachycephaly or dolichocephaly, HC may be a more precise indicator of fetal age compared to BPD [34].
-HC in our study: The differences between manual and AI measurements of HC were comparable to those observed for BPD and FOD.According to the Freq reference, the difference in GA predictions was 0.75 weeks, indicating a reasonably close alignment between the two measurement methods.The difference was slightly larger when considering the Bio reference.When comparing HC measurements with GA in the PACS, differences ranging from 1.05 to 1.74 weeks were observed.As with BPD and FOD, these differences varied based on the specific reference used.
The Pearson correlation coefficient is almost equal to 1, indicating a robust linear association between the HC measurements derived from the FOD and BPD-based formula and the AI-generated measurements.
Nonetheless, it is important to note that the MAE and RMSE errors are relatively elevated, suggesting that the calculated values tend to be consistently lower than the actual measurements (as shown in Figure 8).Given the clinical importance of these measurements, it is prudent to explore more suitable formulas for the computation of HC based on FOD and BPD.
The overall high correlation coefficients across all measurements suggest that the AI system's predictions align well with the manual measurements.This indicates that the AI system could be a valuable tool to assist with these measurements.However, further validation and testing in a broader range of clinical scenarios are necessary to ensure its performance in various settings.
Compared with the previous findings, it is evident that the MAE/RMSE errors measured by manual HC and AI measurements are relatively larger.This discrepancy arises because manual HC is derived from equations involving BPD and FOD.In contrast, AI directly extracts HC from the original MRI image by accumulating each small polyline segment.The AI-based measurement of HC aligns better with the actual fetal values and exhibits a stronger Pearson correlation coefficient compared to BPD, FOD, and corrected BPD (BPDC).Thus, it can be considered a more recommended HC measurement method.
The findings suggest that the AI model demonstrates good agreement with manual measurements across all three biometric measurements (BPD, FOD, HC).The differences observed in GA predictions between the manual and AI measurements highlight the importance of reference selection when interpreting the accuracy of the predictions.These results emphasize the need for further validation and testing of the AI model in diverse clinical scenarios to ensure its reliable performance for estimating gestational age using biometric measurements derived from fetal brain MRI.
The use of multiple fetal growth parameters, including BPD, HC, AC, and FL measurements, can improve the accuracy of gestational age assessment [31].To enhance the precision of determining gestational age, Hadlock and colleagues employed a combination of multiple measurements [36,37].When multiple parameters predict the same endpoint, combining their mean gestational ages increases the probability of correctly predicting that endpoint.This approach enhances accuracy compared to relying on a single parameter alone.However, if the estimates from different parameters are significantly different, averaging multiple parameters may decrease the accuracy of the best predictor(s).It is important to avoid averaging fetal growth parameters in certain conditions, such as fetal macrosomia, intrauterine growth retardation, and congenital anomalies, as it may not provide accurate results.

Discussion for Part 3.2 of Results: Comparison of Predictions between References
Comparing the Garel (Gael), Bio, and Freq references for accuracy in gestational age (GA) prediction using biometric measurements provides valuable insights into the performance of these reference charts.This analysis helps determine which reference chart is better suited for accurate GA estimation based on the measurements obtained from the AI model and manual methods.
In the case of BPD measurements, Table 2 demonstrates the differences in GA prediction between these references, revealing that the Bio reference yields the smallest difference (1.41 weeks) compared to the Garel (1.92 weeks) and Freq (1.90 weeks) references when measured manually.However, when using AI measurements, the Bio reference outperforms the other references even more, with a difference of 1.24 weeks compared to the Garel (2.17 weeks) and Freq (2.17 weeks) references.This suggests that the Bio reference consistently provides more accurate GA predictions for BPD measurements, regardless of whether manual or AI methods are employed.
For FOD measurements, Table 4 highlights the differences in GA prediction among these references.Interestingly, when measured manually, the Garel reference offers slightly better accuracy, with a difference of 1.89 weeks, compared to the Bio reference, which has a difference of 2.26 weeks.However, AI measurements indicate a different trend, with the Garel reference achieving a smaller difference of 1.77 weeks, while the Bio reference still has a difference of 2.26 weeks.This inconsistency suggests that the choice of reference may impact the accuracy of GA prediction when utilizing AI technology for FOD measurements.
In the case of HC and BPDC measurements, as shown in Table 6, the differences in GA prediction between the references vary.The Bio reference again demonstrates better performance, with a difference of 1.74 weeks for manual measurements and 1.26 weeks for AI measurements.The Freq reference follows closely, with differences of 1.30 weeks (manual) and 1.24 weeks (AI).The Garel reference, on the other hand, has a slightly larger difference of 1.40 weeks for manual measurements but a smaller difference of 1.05 weeks for AI measurements.These results indicate that the Bio reference consistently provides accurate GA predictions for HC and BPDC measurements, followed closely by the Freq reference, while the Garel reference tends to yield slightly less accurate results.
In summary, when comparing the Garel (Gael), Bio, and Freq references, the Bio reference often emerges as the most accurate for GA prediction across various biometric measurements.It demonstrates robust performance in both manual and AI measurements, making it a reliable choice for GA estimation.However, the choice of reference may influence the accuracy of GA predictions for specific biometric measurements, emphasizing the importance of selecting the most suitable reference chart based on the clinical context and measurement methods used.

Discussion for Part 3.3 of Results: Comparison of Manual Measurements versus AI
In the provided results, we have the mean absolute error (MAE), root mean squared error (RMSE), and Pearson correlation coefficients (r) for different measurements obtained from the manual and AI methods.Here is the interpretation: 1.
BPD: The Pearson correlation coefficient (r) is 0.9973, indicating a strong positive linear relationship between the manually measured HC and the AI-predicted HC.
The findings indicate that head circumference (HC) has larger mean absolute error (MAE) and root mean squared error (RMSE) values compared to other measurements such as biparietal diameter (BPD), fronto-occipital diameter (FOD), and corrected BPD (BPDC), suggesting a higher average difference between the actual and predicted values.Despite this, the HC measurements still demonstrate a strong correlation (r = 0.9973) with the AIpredicted values.These larger MAE and RMSE values in HC may be attributed to normal variations, while the AI-based HC measurement offers improved accuracy compared to manual measurements for predicting gestational age.
The study described has several advantages over similar studies: 1.
Comprehensive evaluation: The study assesses the AI model's performance in predicting gestational age using multiple biometric measurements, providing a comprehensive analysis.

2.
Comparison with different references: The study compares AI predictions with multiple references and assesses their correlation with the picture archiving and communication system (PACS).

3.
Statistical evaluation: The study uses statistical measures like MAE, RMSE, and Pearson correlation coefficients to evaluate the accuracy and correlation of the AI model's predictions.4.
Inclusion of manual measurements: Manual measurements are included as a reference for comparison, allowing assessment of the agreement between AI and human experts.

5.
Focus on AI versus manual measurements: The study compares AI and manual measurements, evaluating their accuracy and correlation for each biometric parameter.6.
Discussion of clinical implications: The study discusses the clinical significance of the findings, highlighting the importance of reference selection and the potential benefits of integrating AI models in prenatal care.
Overall, this study provides a comprehensive evaluation of an AI model's performance in predicting gestational age using fetal brain MRI measurements, offering valuable insights for researchers and practitioners in prenatal care.
In a similar study, Shi et al. [38] evaluated the ability of various biometric measurements derived from MRI to accurately determine the GA of fetuses in the second half of gestation.The study utilized MRI scans of 637 fetuses and evaluated nine standard fetal biometric parameters.Regression models were constructed to predict GA based on these measurements, and a polynomial regression model was found to be the best descriptor.The study concluded that MRI biometry measurements offer a potential estimation model of fetal gestational age in the second half of gestation.Both our study and that of Shi et al. contribute to the understanding of utilizing MRI-based biometric measurements for estimating gestational age.Also, our study provides additional insights into the potential of AI models to enhance accuracy and efficiency in prenatal care.
Also, in another study Burgos-Artizzu et al. [39] aimed to assess the performance of an AI method for estimating GA in second-and third-trimester fetuses by analyzing fetal brain morphology on standard cranial ultrasound sections.The AI method was compared to existing formulas based on standard fetal biometry.The study used routine fetal ultrasound scans and analyzed transthalamic axial plane images from 1394 patients.The AI method, either alone or in combination with fetal biometric parameters, showed a 95% confidence interval error of 14.2 days and 11.0 days, respectively, compared to 12.9 days for the best method using standard biometrics alone.In the third trimester, the AI method combined with biometric parameters had a lower error of 14.3 days compared to 17 days for fetal biometrics, while in the second trimester, the errors were 6.7 and 7 days, respectively.The AI method performed particularly well in estimating GA for small-for-gestationalage fetuses.Compared to our study, both studies demonstrate the effectiveness of AI in estimating GA using different imaging modalities.While the Artizzu et al. study focuses on ultrasound-based analysis of fetal brain morphology, our project utilizes fetal brain MRI and biometric measurements.Both approaches show promise in improving the accuracy of GA estimation and have the potential to enhance prenatal care practices.
In a similar study, Kojita et al. [17] evaluate the performance of a deep learning model for predicting gestational age using fetal brain MRI acquired after the first trimester.The model was compared to the traditional method of estimating gestational age using BPD measurement.A total of 184 T2-weighted MRI scans from fetuses were included in the study.The deep learning model was trained on a subset of cases and validated on another subset, while the remaining cases were used as test data.The model's prediction of gestational age showed a substantial correlation with the reference standard (ρc = 0.964), outperforming the BPD prediction (ρc = 0.920).Both the model and BPD predictions had larger differences from the reference standard as gestational age increased.However, the upper limit of the model's prediction was significantly shorter than that of BPD.The study, similar to our study, concludes that deep learning can accurately predict gestational age from fetal brain MRI acquired after the first trimester, providing potential benefits for prenatal care in cases where early ultrasound measurements are lacking.

Limitations:
The fetal MRI offers valuable insights but comes with considerations related to its limitations, fetal movement, expertise, and accessibility.Careful evaluation is needed when choosing the right imaging method in clinical practice.Below is a detailed limitation of fetal MRI: 1.
Limitations of fetal brain MRI: • Fetal movement issues: Fetal motion during MRI scans can lead to blurred images and impact measurement accuracy [40].Techniques to minimize motion effects are still evolving.

•
Limited spatial resolution: Fetal MRI, while having better spatial resolution than ultrasound, can still face limitations in visualizing tiny fetal structures [40].This can affect the precision of gestational age measurements.

•
Signal-to-noise ratio: Fetal movement can introduce challenges in acquiring clear images, even with fast single-shot sequences.Additionally, maintaining an adequate signal-to-noise ratio can be demanding.

•
Rapid changes and geometric distortions: The fetal brain undergoes rapid developmental changes in utero, and small fetal structures can be distorted within the maternal anatomical context.

•
Expertise and interinstitutional variability: Fetal MRI requires specialized expertise for image acquisition and interpretation, which may not be widely available.Protocols, imaging platforms, and operator practices for fetal brain MRI can significantly differ across institutions, leading to inconsistencies in image quality and interpretation [40].

•
Cost and accessibility: MRI is costlier and less accessible than ultrasound.This can create disparities in access to advanced prenatal imaging.

2.
Limitations of database size: We acknowledge the limitation of a relatively small database.This is a common challenge in medical AI research, and expanding the dataset is an avenue for future work.Our study leverages AI's ability to mitigate some of these limitations by optimizing measurement processes and reducing bias.The robustness of our statistical approach, such as the use of median combining, ensures that measurements are less affected by random errors or outliers.

3.
While the number of patients may not have been statistically justified in a prospective study, this retrospective analysis provides an initial insight into the potential utility of AI in gestational age prediction using MRI measurements.We recognize that larger, prospective studies are warranted to further validate and refine these findings.
Our study underscores the potential of deep learning and AI models in accurately predicting gestational age from second-and third-trimester fetal brain MRI.This is especially crucial for pregnancies underserved during the first trimester, when ultrasound-based measurements become less accurate.Technical challenges and patient factors, such as maternal obesity, fetal positioning, and oligohydramnios, can limit the effectiveness of ultrasound.While fetal MRI has its challenges, our research demonstrates the potential of AI in overcoming some of these limitations and enhancing the accuracy of gestational age predictions.Further investigation and collaboration are necessary to continue improving prenatal care and expanding the scope of AI in fetal brain MRI.

Conclusions
In this study, we aimed to evaluate the performance of an AI model in predicting the gestational age (GA) of fetuses using biometric measurements obtained from fetal brain MRI, specifically biparietal diameter (BPD), fronto-occipital diameter (FOD), and head circumference (HC).In addition to manual measurements, we developed an AI model based on the Dynamic U-Net architecture to extract the fetal brain and calculate these variables automatically.
Our dataset included 52 normal fetal MRI cases with T2-HASTE sequences from Rush University.This study's results demonstrate the AI model's high accuracy and potential in predicting GA.The AI-based BPD, FOD, and HC measurements showed strong correlations with manual measurements.Notably, the AI-based HC measurement exhibited a stronger correlation with actual values compared to BPD, FOD, and corrected BPD (BPDC), suggesting its reliability as a recommended method for accurately predicting gestational age.
Comparisons between manual and AI measurements revealed small differences in BPD and FOD across different references.However, when comparing measurements with GA in the picture archiving and communication system (PACS), differences varied based on the reference used, highlighting the importance of reference selection.
BPD measurements are commonly used to estimate gestational age during pregnancy.However, their accuracy depends on factors such as gestational age, biological and technical variations, and the shape of the fetal head.Other fetal parameters may be used in cases where the head shape is abnormal to improve the accuracy of gestational age estimation.
Integrating AI models in prenatal care offers several advantages, including improved accuracy, automation, and time efficiency.The AI-based measurements demonstrated consistent correlations with manual measurements, supporting their reliability in assessing fetal development and monitoring pregnancies.The developed AI model provides an accurate and efficient prediction of gestational age, which can aid in clinical management, evaluation of fetal growth, and timely interventions.By reducing human error and variability associated with manual measurements, the AI model has the potential to enhance the precision and effectiveness of prenatal care.
In summary, this study underscores the potential of AI models in accurately predicting gestational age using biometric measurements derived from fetal brain MRI.The strong correlations between manual and AI measurements validate the accuracy of the AI model, particularly in HC predictions.The findings support the integration of AI as a valuable tool in prenatal care, empowering clinicians with automated and reliable GA prediction and contributing to improved decision making and patient care.

Figure 1 .
Figure 1.Automatic measurement of BPD, FOD, and HC on T2 fetal MRI.(a) AI-processed mask: The AI model directly processes the mask to obtain measurement data, as shown by the yellow box representing the biparietal diameter (BPD) and the length indicating the fronto-occipital diameter (FOD).The red circle depicts the measurement of head circumference (HC).(b) Overlay on original MRI: The mask from (a) is superimposed onto the original MRI image; visually representing the measurements.(c) Measurements on other series: Further showcasing the versatility of the AI model, measurements on other series of the same patient demonstrate its consistent performance across different image series.

Figure 1 .Scheme 1 .
Figure 1.Automatic measurement of BPD, FOD, and HC on T2 fetal MRI.(a) AI-processed mask: The AI model directly processes the mask to obtain measurement data, as shown by the yellow box representing the biparietal diameter (BPD) and the length indicating the fronto-occipital diameter (FOD).The red circle depicts the measurement of head circumference (HC).(b) Overlay on original MRI: The mask from (a) is superimposed onto the original MRI image; visually representing the measurements.(c) Measurements on other series: Further showcasing the versatility of the AI model, measurements on other series of the same patient demonstrate its consistent performance across different image series.

Figure 2 .
Figure 2. Correlation of BPD in three references.Figure 2. Correlation of BPD in three references.

Figure 2 .
Figure 2. Correlation of BPD in three references.Figure 2. Correlation of BPD in three references.

Figure 3 .
Figure 3. Correlation of FOD in three references.Figure 3. Correlation of FOD in three references.

Figure 3 .
Figure 3. Correlation of FOD in three references.Figure 3. Correlation of FOD in three references.

Figure 4 .
Figure 4. Correlation of HC and BPDC in three references.

Figure 4 .
Figure 4. Correlation of HC and BPDC in three references.
A.A., M.P.S. and S.A.; supervision, H.A.A., M.K. and S.B.; funding acquisition, S.B.All authors have read and agreed to the published version of the manuscript.Funding: The findings of this project are from the Robert McCormick Diagnostic Chair Spending fund No. 840152-03 of Rush University Medical Center in Chicago, Illinois, belonged to Dr. Sharon Byrd.Institutional Review Board Statement: Our study was a retrospective study conducted by the Declaration of Helsinki and was approved by the Institutional Review Board of Rush University Medical Center (ORA Number: 18111906-IRB01-AM04; Date of approval: 15 April 2019).Given the retrospective nature of the study, written informed consent was not obtained from the included pregnant women, as permitted by the Institutional Review Board guidelines for retrospective research.Informed Consent Statement: Patient consent was waived due to the nature of the study and the retrospective analysis used anonymous imaging data.

Table 1 .
Names of outputs of our measurements.
GA in PACS vs. AI HC measurements: 1.26 weeks.(HC measurements were not available in the Garel references.)

Table 2 .
Difference of GA prediction according to BPD measurement.

Table 3 .
Pearson correlation coefficient scores of correlation with measurements.

Table 3 .
Pearson correlation coefficient scores of correlation with measurements.

Table 4 .
Difference of GA prediction according to FOD measurement.

Table 5 .
Pearson correlation coefficient scores of correlation with measurements.

Table 5 .
Pearson correlation coefficient scores of correlation with measurements.

Table 6 .
Difference of GA predication according to HC and BPDC measurements.

Table 6 .
Difference of GA predication according to HC and BPDC measurements.

Table 7 .
Pearson correlation coefficient scores for correlation with measurements.

Table 8 .
Statistical comparison of manual vs. AI measurements.
Reprod.Med.2024, 5, FOR PEER REVIEW 14 The MAE is 1.6442, indicating the average absolute difference between the actual BPD values and the predicted values obtained from either manual or AI measurements.-TheRMSE is 1.9790, representing the square root of the average squared difference between the actual and predicted BPD values.It gives an idea of the typical difference between the actual and predicted values.-ThePearson correlation coefficient (r) is 0.9963, indicating a strong positive linear relationship between the manually measured BPD and the AI-predicted BPD.A value close to 1 indicates a strong correlation.