Manual versus deep learning measurements to evaluate cumulus expansion of bovine oocytes and its relationship with embryo development in vitro

Cumulus expansion is an important indicator of oocyte maturation and has been suggested to be indicative of greater oocyte developmental capacity. Although multiple methods have been described to assess cumulus expansion, none of them is considered a gold standard. Additionally, these methods are subjective and time-consuming. In this manuscript, the reliability of three cumulus expansion measurement methods was assessed, and a deep learning model was created to automatically perform the measurement. Cumulus expansion of 232 cumulus-oocyte complexes was evaluated by three independent observers using three methods: (1) measurement of the cumulus area, (2) measurement of three distances between the zona pellucida and outer cumulus, and (3) scoring cumulus expansion on a 5-point Likert scale. The reliability of the methods was calculated in terms of intraclass-correlation coefficients (ICC) for both inter-and intra-observer agreements. The area method resulted in the best overall inter-observer agreement with an ICC of 0.89 versus 0.54 and 0.30 for the 3-distance and scoring methods, respectively. Therefore, the area method served as the base to create a deep learning model, AI-xpansion, which reaches a human-level performance in terms of average rank, bias and variance. To evaluate the accuracy of the methods, the results of cumulus expansion calculations were linked to embryonic development. Cumulus expansion had increased significantly in oocytes that achieved successful embryo development when measured by AI-xpansion, the area-or 3-distance method, while this was not the case for the scoring method. Measuring the area is the most reliable method to manually evaluate cumulus expansion, whilst deep learning automatically performs the calculation with human-level precision and high accuracy and could therefore be a valuable prospective tool for embryologists.


Introduction
Mammalian oocytes are surrounded by multiple layers of specialized somatic cells, called cumulus cells [1].These cells are connected to the oocyte through transzonal projections, providing the oocyte with metabolites and regulatory molecules during the maturation process [2][3][4].Likewise, during fertilization, the cumulus cells attract [5], trap [5], and select [6] spermatozoa, and prevent premature hardening of the zona pellucida [7], which are all actions required for successful fertilization.As such, cumulus cells assist the oocyte in completing the processes of maturation, fertilization, and early embryonic development [6,7].To effectively carry out their function, cumulus cells need to respond to gonadotropins whereby a mucinous matrix of hyaluronic acid is created extracellularly, causing the cumulus to expand [8][9][10][11][12].Since adequate cumulus expansion has been correlated with high developmental potential [13,14], the degree of cumulus expansion is considered an important parameter for oocyte quality [15,16] and has been correlated with successful fertilization in mice [12] and embryo development in pigs [17].
Multiple methods for the assessment of cumulus expansion have been described [9][10][11][18][19][20][21][22][23][24][25][26][27], including both invasive and non-invasive techniques.Quantitative measurement of hyaluronic acid is the most precise approach [10,26], but it impairs further embryo development and can represent health hazards for the operator since it requires the application of radioactive material [26].The most frequently used method for cumulus expansion assessment is a classification into groups by scoring the degree of cumulus expansion on a Likert scale [14,19,21,23,25,28].This procedure was first proposed by Downs in 1989 [21] and is favored among embryologists since it is fast and easy to perform directly via microscopy while manipulating the oocyte for in vitro processing.No additional equipment is necessary and further in vitro development of the oocyte remains possible.The downside of this approach is that it is highly dependent on the expertise of the embryologist and is thus considered to be a subjective method.The development of built-in cameras and image analysis software, first reported by Furnus et al., 1998 [27] enabled alternative methods such as measuring the area [9,27] and longitudinal axes of the cumulus cells [18,24,29,30], aiming to diminish subjectivity.Yet, all proposed visual methods are time-consuming and none of them can record the three-dimensional structure of the cumulus-oocyte complex (COC).Also, no direct comparisons had been made to evaluate the precision, accuracy, and repeatability of the proposed methods.
In all cases, the results remain dependent on the subjectivity introduced by the human observer.A possible solution to remove this human factor is to apply deep learning (DL) modeling.Deep learning is a subfield of artificial intelligence that is based on learning hierarchical knowledge from data rather than rule-based programming [31].The number of applications for artificial intelligence is growing in several areas of medicine and biology [32][33][34][35], and lately in assisted reproduction, in order to automate time-consuming and subjective tasks [36][37][38].Non-invasive methods to assess cumulus expansion, both qualitatively and (semi-) quantitatively, depend deeply on human interpretation and require a considerable amount of time.Hence, the development of a tool that can objectively and accurately automate this task could improve the reliability of the correlation between cumulus expansion and oocyte developmental capacity and facilitate the assessment for researchers and embryologists.
Cumulus expansion is an important marker of successful COC maturation and its assessment is often used in reproductive biotechnologies [12,39].However, reports on cumulus expansion measurement lack uniformity as none of the available non-invasive methods can be considered as a gold standard.Besides, all of these techniques are time-consuming and involve a degree of subjectivity.In the present manuscript, the results are reported of a side-by-side comparison between three commonly used evaluation methods to identify the method with the best intraclass correlation coefficient (ICC) between and within observers.The goal of this study is to provide evidence-based data on the different measurement techniques, so that in future studies a consentient approach can be used for cumulus expansion measurement.Additionally, based on the results of this comparison, AI-xpansion, a DL method to automate cumulus expansion measurement, is created.

Materials and methods
Ethical approval was not necessary for this experiment, since ovaries were collected post-mortem from cows in a commercial slaughterhouse.

Oocyte collection and in vitro maturation
Bovine COCs were matured in vitro as previously described by Azari-Dolatabad et al. [18].In brief, bovine ovaries were collected from a local slaughterhouse and processed within 2 h after collection.Ovaries were rinsed three times in warm (37 • C) physiological saline supplemented with kanamycin (25 mg/mL) and sterilized with 90 % ethanol.Cumulus-oocyte complexes were aspirated from follicles between 4 and 8 mm in diameter, using an 18-gauge needle attached to a 10 mL syringe and transferred into a 15 mL tube containing 2.5 mL of HEPES Tyrode's albumin-pyruvate-lactate (HEPES-TALP) medium.Cumulus-oocyte complexes were subsequently cultured individually in droplets of μL of maturation medium (TCM-199, supplemented with 20 ng/mL epidermal growth factor (EGF) and 50 μg/mL gentamicin) covered with 7.5 mL paraffin oil (SAGE oil for tissue culture, ART-4008-5P, Cooper Surgical Company) for 22 h at 38.5 • C in 5 % CO 2 in humidified air.

Image acquisition
Images from COCs were acquired at 0 h of in vitro maturation (pre-IVM stage) and 22 h after in vitro maturation (post-IVM stage) using an inverted Olympus stereomicroscope, connected to a ToupCam camera in concordance with ToupView software (version 3.7.13270.20181102).Images were obtained under the same magnification (56X), with the plane of focus on the zona pellucida of a single oocyte in the middle of the field.Images were saved as PNG files at a resolution of 2592 × pixels in RGB.Sixty-eight images were excluded since the zona pellucida was not clearly in focus, the demarcation of the outer cumulus cells was not clear and/or the COC rotated into the 3-dimensional field during maturation, causing the pre-and post-maturation images to be acquired from different 2-dimensional fields.After exclusion, a total of 232 paired (before and after maturation) images of COCs were presented to three observers.

Cumulus expansion measurement
Cumulus expansion of bovine COCs during the process of in vitro maturation was measured by three independent observers, with sufficient experience.The observers scored each COC by three different methods (area, 3-distance, and scoring method) in duplicate, at different times, and in random order.

Area
The area of the pre-and post-IVM COCs was measured by drawing the contour of the COC using the freehand selection in ImageJ software [40] (version 1.49 q; National Institutes of Health, USA).To calculate the absolute growth, the pre-IVM COC area was subtracted from the post-IVM COC area.The difference in growth was divided by the area of the pre-IVM COC to compute the cumulus expansion (%) (Fig. 1).

3-Distance
The shortest, medium and longest distance between the zona pellucida and the extremes of the cumulus cells was measured in pre-IVM COCs, using the straight lines tool in ImageJ.Next, the average value of these three distances was calculated.The same was done for post-IVM COCs.The average value related to the pre-IVM COC was subtracted from the post-IVM COC's average value, which resulted in absolute growth.Cumulus expansion (%) was then determined by dividing the absolute growth by the average value of the pre-IVM COC (Fig. 1).

Scoring
The images of pre-IVM and corresponding post-IVM COCs were directly compared to each other to evaluate the expansion of the A. Raes et al. cumulus cells by assigning a score on a 5-point Likert scale, as previously described by Downs [21].In brief, the score ranged from 0 to 4, with "0": no expansion; "1": separation of only the outermost layers of cumulus cells; "2": further expansion involving the outer half of the cumulus oophorous; "3": further expansion up to, but not including the corona radiata; and "4": complete expansion, including the innermost corona radiata cells.(Fig. 2).

AI-xpansion: a deep learning model for cumulus expansion measurement
As the area method resulted in the highest ICC for inter-and intraobserver agreement, the automatization of this method was pursued for the creation of AI-xpansion: a DL algorithm that recognizes the area of the COC automatically and can thus be used for measuring cumulus Fig. 1.Cumulus expansion measurements applying the area and 3-distance method.For the area method, the contour of the cumulus cells was drawn to calculate the area before (a) and after (b) maturation.For the 3-distance method, the shortest (S), medium (M), and longest (L) distance between the zona pellucida and the outer border of the cumulus was measured and the mathematical average was calculated before (c) and after (d) maturation.The scale bar applies to all images.Fig. 2. Scoring method to measure cumulus expansion.Cumulus-oocyte complexes were compared before and after maturation (rows) and scored (columns) as follows: "0" No expansion; "1" separation of only the outermost layers of cumulus cells; "2" further expansion involving the outer half of the cumulus oophorous; "3" further expansion up to, but not including the corona radiata; "4" complete expansion including the innermost corona radiata cells.Scale bar of the lower image is also applicable to the image above.
expansion.AI-xpansion, technically explained in detail in Athanasiou et al. [41] exploits DL modeling to automatically calculate cumulus expansion.To construct the DL model, AI-xpansion combines transfer learning and image pre-processing with a U-Net network inspired by Ronneberger et al. [42].
The adopted U-Net architecture has a contraction segment made up of four blocks.Each block includes two 3 × 3 convolutional layers, followed by a ReLU [43]activation layer and a 2 × 2 max-pooling layer with a stride of 2. One of these blocks also incorporates a dropout layer with a probability of 0.5.On the other side, the expansive segment is comprised of four blocks that include an up-sampling transposed convolutional layer, a layer for concatenation, two 3 × 3 convolutional layers, a ReLU activation layer, and finally, an additional convolutional layer at the end.
We utilized the Keras [44] open-source library and TensorFlow [45] as the underlying framework to implement this architecture.The Dice coefficient [46]was chosen to gauge the effectiveness of the segmentation method.This coefficient measures how well two areas overlap in space, with a value between 0 and 1.A value of 0 means no overlap, while 1 indicates complete overlap.The equation for this coefficient is: where y stands for the ground truth, x represents the input image, and f (x) is the model's prediction.To train the U-Net architecture, we used Dice loss as the performance metric.The formula for Dice loss can be articulated as: The improvement of AI-xpansion's performance relied on transfer learning from a publicly available dataset of melanoma images [47].This was used to generate the first pre-trained DL model, which served as a starting point for training AI-xpansion with images of pre-and post-IVM COCs.From data acquired in section 2.3, a subset of 100 COCs was randomly selected, due to image annotation costs.This is translated into 200 annotated images.Experts provided segmentation masks by manually drawing the area of pre-and post-IVM COCs.These images and their corresponding segmentation masks were used as the training input for AI-xpansion.
To evaluate the segmentation models, i.e. the ability of AI-xpansion to detect the cumulus demarcation, a 10-fold cross-validation was used since the dataset size was limited and this approach provided stable results [48].At each fold, 90 COCs (pre-and post-IVM; 180 images) were used for training, and 10 COCs (20 images) for validation.The model was trained on mini-batches of 32 for 200 epochs.For each fold, a segmentation mask was generated for the 20 validation images, resulting in a total of 200 masks after going through all the folds.Then, the generated masks were compared with masks of the same COCs that were provided by each of the experts (Fig. 3).To evaluate the similarity between two masks, the dice coefficient was used [46].The mean dice performance of the majority-vote model converged at around 95 %.

Cumulus expansion and embryo development
Bovine COCs (n = 427, 14 replicates) were harvested and matured in vitro as described in 2.2, with the additional condition that only oocytes surrounded by a compact cumulus of at least 5 layers, and a homogeneously dark or slightly granular ooplasm were selected for further processing.Frozen sperm of a bull of known fertility was thawed in a 38 • C water bath and passed over a Percoll gradient (GE Healthcare Biosciences, Uppsala, Sweden) to select viable spermatozoa.The COCs were co-incubated with selected spermatozoa in individual droplets (1 COC/20 μL) of IVF-TALP medium supplemented with bovine serum albumin (BSA; Sigma A8806; 6 mg/mL) and heparin (20 μg/mL) up to a concentration of 1 × 10 6 spermatozoa/mL.Droplets were covered with paraffin oil and incubated for 21h at 38.5 • C in 5 % CO 2 humidified air.After in vitro fertilization, cumulus cells were removed by gentle pipetting (140 μm EZ-Tip®, CooperSurgical, Malov, Denmark), and excessive sperm cells were removed by washing the presumed zygotes in HEPES-TALP medium.Subsequently, zygotes were transferred to individual droplets (1 zygote/20 μL) of synthetic oviductal fluid medium supplemented with 0.4 % BSA (Sigma A9647) and ITS (5 μg/mL insulin +5 μg/ mL transferrin +5 ng/mL selenium) covered with paraffin oil.Embryos were cultured until day eight post-fertilization, at 38.5 • C in 5 % CO 2 , 5 % O 2, and 90 % N 2 .At day eight of culture, embryonic development was reported as blastocyst rate (i.e.embryos that developed to the blastocyst stage (defined by the presence of a blastocoel or cavity) divided by the total number of presumed zygotes).
Images of the COCs were acquired before and after maturation as previously reported (2.3), and cumulus expansion was evaluated by one observer using the area, 3-distances, and scoring method (2.4) and by AI-xpansion (2.5).

Statistical analyses
To compare the three different methods for cumulus expansion measurement, the following variables were studied: (1) median cumulus expansion of all COCs considered, (2) inter-observer agreement, i.e. the variation in cumulus expansion as scored between the three different observers, (3) overall inter-observer agreement, i.e. the average of the two measurement repetitions for the inter-observer agreement (as the measurements were performed in duplicate) and (4) intra-observer agreement, i.e. the variation in cumulus expansion scored between repeated measurement for each of the observers.
The data were analyzed with Python, version 3.10.6.A two-way random effects model was used to evaluate inter-observer agreement A. Raes et al. and a one-way random effects model was performed to evaluate the intra-observer agreement for each of the observers.Consequently, the ICCs and their 95 % confidence interval were computed using the intraclass_corr function in the Pingouin Python statistical library [49], version 0.5.2.The code used for the evaluation is available at the following link: https://github.com/IIIA-ML/cumulus_expansion_variance_analysis.The ICC values were interpreted as proposed by Landis and Koch [50]: <0.20, poor agreement; 0.20-0.39,fair agreement; 0.40-0.59,moderate agreement; 0.60-0.79,good agreement, >0.80, very good agreement.Data are reported as ICC and 95 % confidence intervals.
A retrospective sample size calculation was performed in R (version 4.2.2) and R studio (2023.09.0 Build 463), using a balanced one-way analysis of variance test.The significance level was set at 0.05, the power was 0.80, the number of groups was 3 and the effect size was calculated as the smallest difference that was reported between interobserver ICCs, which was 0.13.This resulted in a minimal required amount of 191 cumulus expansion measurements per technique.In this study, the cumulus expansion of 232 COCs was measured per technique.
To compare the results of the DL method with the results of the human observers, the similarity among the annotations of each of the three observers was computed and compared with the similarity between each of the human observers and the proposed method.To evaluate the similarity between AI-xpansion and human observers, different metrics were used: average rank, bias, and variances.The first one focused on how many of the AI-xpansion estimations were closer to the observers.To do so, the average rank metric was used, in which the estimators were ranked concerning the proximity of their score to the score of reference.For example, if observer 1 is set as a reference, then the scores of observer 2, observer 3, and AI-xpansion, are compared to the score of observer 1.The scores are then ranked from 1 to 3, according to which estimator has the minimum distance to the reference.The same procedure was accomplished for every observer.Finally, the lowest value indicates better estimation, according to Supplementary Equation S1.Additionally, to support the results, the biases and variances among the AI-xpansion estimator and the observer's annotations were examined, according to Supplementary Equations S2 and S3 respectively.
The association between cumulus expansion and embryo development was analyzed using the Mann-Whitney U test.The significance level was set as α = 0.05.The test is used to compare the cumulus expansion of COCs that did not develop successfully with that of those that did develop successfully.We considered as the null hypothesis H 0 that both expansion samples come from the same distribution, and the alternative hypothesis is that one of the samples presents larger expansions than the other.

Results
Three different methods to measure cumulus expansion were evaluated by three observers.The working principle and user-friendliness, (based on equipment and time requirements) of these methods are summarized in Table 1.Data concerning the distribution of cumulus expansion, measured by the three methods, is illustrated in Supplementary Fig. S1.

Inter-observer agreement
For all three methods, the agreement between the observers was evaluated by calculating the corresponding ICC, as illustrated in Fig. 4a and Table 2.This ICC was calculated in duplicate, as the measurements were performed twice by every observer for every method.In both repetitions, the inter-observer ICCs for the area method showed a very good level of agreement, while the 3-distance method resulted in a moderate level of agreement.The inter-observer agreement for the scoring method was fair in both repetitions.An overall inter-observer agreement level was calculated for every method, where both repetitions of the inter-observer agreement were considered.This resulted in a very good overall agreement for the area method, a moderate overall agreement for the 3-distance method, and a poor overall agreement for the scoring method, as shown in Fig. 4b and Table 2.

Intra-observer agreement
The intra-observer agreement was evaluated by ICC calculations for every method and every observer (Fig. 4c; Table 2).Overall, intraobserver agreements for observers 1, 2, and 3 were very good for the area method, while moderate to good for the 3-distance method.The results for the scoring method varied per observer, as the level of intraobserver agreement ranged from poor, over moderate, to good.

AI-xpansion processor capacity
As the area method resulted in the highest ICC values for measuring cumulus expansion manually, a DL model, AI-xpansion, was created based on this method.The pre-processing stage, during which the region of interest (i.e. the COC) was detected, was able to correctly determine 98 % of the region of interest.Failure of detection was due to a very low signal of the COC and to the presence of an oil droplet in the image which interfered with the model.These two cases were excluded from further analyses, leading to a total of 98 COCs used to train the model as explained in section 2.5.

Al-xpansion performs similarly to human observers
The average rank metric, i.e. the comparison of the different scores among the estimators, is reported in Table 3.In 2 out of 3 cases, AIxpansion had a lower average rank metric than the human observers and performed better compared to the other observers.Overall, the performance of AI-xpansion in measuring the COCs' area was similar to that of the observers (p = 0.15).
Bias and variance among the human observers and between human observers and AI-xpansion are reported in Supplementary Tables S1 and  S2, respectively.Measuring cumulus expansion using AI-xpansion resulted in lower bias and less variance compared to the human observers in 1 out of 3 times.For the remaining 2 cases, AI-xpansion scored similarly for both bias and variance to the human observers, proving that AI-xpansion reaches a human-level performance.

Table 1
Comparison of methods to measure cumulus expansion.Methods were compared by three observers and evaluated for equipment-and time requirements.+, easy or low; ++, moderate; +++, complicated or high.a Equipment required: apart from a stereomicroscope, software was needed to calculate cumulus expansion for the area and 3-distance method.
A. Raes et al.

Cumulus expansion is related to embryo development
To evaluate the accuracy of the different methods, the association between embryo development and cumulus expansion was studied.In this study, cumulus expansion was measured by the area, 3-distance-, scoring method, and AI-xpansion Embryo development was defined successfully when the embryos reached the blastocyst stage at day eight post-fertilization.From a total of 427 presumed zygotes, 118 developed successfully into a blastocyst (27.6 %) whilst 309 (72.4 %) failed and arrested their development at an earlier stage.

Table 2
Intraclass correlation coefficients for three cumulus expansion measurement methods.Data are reported as intraclass correlation coefficients with their respective 95 % confidence intervals.

Discussion
In this study, the three most commonly used methods for noninvasive cumulus expansion evaluation were compared in terms of inter-and intra-observer agreement.Since no gold standard is described, this study aimed to provide evidence-based data on the reliability of the different measurement techniques.We demonstrated that comparing the area of the COC before and after IVM is the most precise and repeatable non-invasive method to assess cumulus expansion manually.As a result, the area method was used as a basis to develop a DL model, AI-xpansion.This model measures the area of COCs with similar reliability to the human observers and objectively calculates cumulus expansion with limited time and labor requirements.A significant association was demonstrated between embryo development and cumulus expansion, when measured by AI-xpansion, the area-and the 3-distance method.Consequently, oocyte developmental competence could partially be predicted by cumulus expansion, employing these three measurement techniques.
When comparing the three most commonly used techniques for manual evaluation of cumulus expansion, the area method was shown to be the least prone to subjectivity.The obtained results for cumulus expansion were similar between the different observers, as reflected by the very good inter-observer agreement.From this, it can be concluded that the selection of an observer is of little consequence for the area method [51].The 3-distance and scoring method resulted in a moderate and fair inter-observer agreement respectively (Fig. 5a).This indicates that the results of cumulus expansion measured by multiple random raters will be more distinct from each other and that these methods are more vulnerable to observer bias [51].
An embryologist's self-consistency in rating cumulus expansion is highest when he or she employs the area method, as suggested by the very good intra-observer agreement for the area method for all three observers.The intra-observer ICC values for the 3-distance method varied from moderate to good, although the values were close to each other for the three observers.This result indicates that the selfconsistency in measuring 3 distances is not optimal.This is concordant for the three observers.This result is in contrast with the high variation that was noted in intra-observer agreements for the scoring method between the different observers (i.e.poor, moderate, and good for observers 1, 2, and, 3, respectively; Fig. 5c).Consequently, it can be suggested that the scoring method is the least reproducible manner to evaluate cumulus expansion.
Although the scoring method is the most commonly used technique to evaluate cumulus expansion, this manuscript shows that it is also the least reliable technique.The inadequate levels of inter-and intraobserver agreement for the scoring method are in agreement with what is known in other research fields, where the use of a scoring system, such as a Likert scale, is often debated due to its subjective nature [52][53][54].The large variation in morphological presentation of cumulus expansion (and COC shape as such) may contribute to the subjective interpretation.It can therefore be expected that human observers have a different interpretation of the categorization of cumulus expansion, hence perhaps the low ICC values for the scoring method.Also, the design of the scoring system, e.g. the number of points on the Likert scale, could influence its reliability [52,55].
The general performance of the area method was more repeatable than the 3-distance method.This can be explained by the fact that the three distances are chosen arbitrarily, and the selection of the "shortest", "medium" and "longest" distance may thus be ambiguous.Also, in some images the zona pellucida is not clearly distinguishable because of an overlying cloud of cumulus cells, which could also explain the moderate reliability of the 3-distance method.Nonetheless, the area method contains some scope for subjectivity as well (although limited), as no ICC score was equal to 1.This variation was probably caused by the demarcation of the outer cumulus cells, which could be open for interpretation, especially when the outer cumulus cells are completely expanded.Moreover, the inability to evaluate the 3-dimensional shape of the COC is an important limitation associated with all visual inspection methods, as positional changes of the COC in the 3-dimensional field could result in erroneous measurements and calculations.This drawback is absent in non-visual methods for cumulus expansion measurement, like spectrophotometry and high-performance liquid chromatography to measure the amount of hyaluronic acid-degrading products in conditioned media [56].Still, these methods are not routinely performed in the IVF lab since specific equipment and expertise are required.
In recent years, assisted reproductive technologies have benefited from the increased use of artificial intelligence techniques such as DL and image segmentation [37,[57][58][59][60][61][62], although segmenting images of bovine COCs had not been done yet.AI-xpansion was therefore developed to automatically measure the area of pre-and post-IVM bovine COCs by image segmentation and to consequently calculate the relative expansion of the cumulus cells.Although AI-xpansion was created using bovine COCs, its algorithm could be extrapolated to other species where IVM is performed, such as horses [63], wildlife [64], and humans [65].The performance of AI-xpansion reaches the same level of reliability as the human observers, as the average rank metric was not significantly different between the human observers and AI-xpansion.In addition, the efficiency of our DL model is at human-level as well, since it generates results with similar bias and variance as human observers, and even outperformed human observers in one case.However, when cumulus cells expand extremely, the border between the cumulus and the background may become translucent.This can cause difficulties in surpassing the human-level performance.However, if more images of COCs were to be presented to AI-xpansion to cover the extreme cases in a follow-up study, this minor limitation could be resolved.Further refinement of AI-xpansion with using more diverse COC images, or the investigation of other AI techniques for objective cumulus expansion measurement, could serve as an interesting topic of future studies.AI-xpansion measures cumulus expansion from a 2-dimensional image.Future research may focus on 3-dimensional imaging, for example, the optimization of 3-D microscopy such as lens-free imaging [66] could be proposed to evaluate cumulus expansion in 3 dimensions without affecting the COC's developmental capacity.
The biological relevance of cumulus expansion during oocyte maturation is well-known [7,25] and was confirmed in this study.The association between cumulus expansion and successful embryo development was demonstrated here for the first time in an individual bovine in vitro model.According to the obtained results, cumulus expansion, measured by AI-xpansion, area-or 3-distance method, is significantly higher in COCs with successful developmental competence, compared to COCs that failed to develop.This is in agreement with previous studies, where a significant positive correlation was shown between cumulus expansion (measured using derivates of the area method) and fertilization potential in mice [12]; and blastocyst development in pigs [17].In these studies [12,17], COCs and embryos were cultured in groups, which inhibits linking cumulus expansion of a specific COC to this COC's developmental outcome.The present manuscript is the first to report a direct and individual association between cumulus expansion and embryonic development in a bovine in vitro model while considering different measurement methods.

Conclusion
The area method was the most reliable method to measure cumulus expansion by visual inspection, whereas the scoring method, which is most frequently used in literature, was the least reliable.Next, the area method was used to create an objective alternative, the DL algorithm AIxpansion.AI-xpansion could be a useful tool for embryologists and researchers in the in vitro embryo production lab to adequately measure cumulus expansion with human-level performance.The biological relevance of measuring cumulus expansion was confirmed, since by using either AI-xpansion, the area and the 3-distance method, we were able to show that median cumulus expansion was significantly increased in competent COCs as opposed to COCs that failed to reach the blastocyst stage.

Fig. 3 .
Fig. 3. Visual comparison of segmentation masks.From the original cumulus-oocyte complex (COC) image (a), a segmentation mask was drawn around the contours of the COC by human observers (b).This mask was compared to the segmentation that was performed by the deep learning model, AI-xpansion (c).

Fig. 4 .
Fig. 4. Intraclass correlation coefficients (ICC) and their 95 % confidence intervals were calculated and displayed according to their corresponding level of agreement for (a) inter-observer agreement, (b) overall inter-observer agreement, and (c) intra-observer agreement.Cumulus expansion of all cumulus-oocyte complexes was evaluated twice per method, resulting in two ICC values for inter-observer agreement per method.

Fig. 5 .
Fig. 5. Distribution of cumulus expansion in cumulus-oocyte complexes (COCs) that resulted in unsuccessful and successful embryo development.Cumulus expansion was measured using (a) AI-xpansion, (b) the area method, (c) the 3-distance method and (d) the scoring method.Cumulus expansion was significantly different between unsuccessful and successful COCs (*p ≤ 0.05), when being measured by AI-xpansion, area-and the 3-distance method.

Table 3
Average rank metric calculated in the comparison between the different human observers and human observers vs AI-xpansion.This table shows the similarity between the average rank of three observers (O1-O3) and the deep learning method (AI-xpansion).Scores closer to zero indicate that the performance is closer to the reference observer (columns).