BEAN: Brain Extraction and Alignment Network for 3D Fetal Neurosonography

Brain extraction (masking of extra-cerebral tissues) and alignment are fundamental ﬁrst steps of most neuroim-age analysis pipelines. The lack of automated solutions for 3D ultrasound (US) has therefore limited its potential as a neuroimaging modality for studying fetal brain development using routinely acquired scans. In this work, we propose a convolutional neural network (CNN) that accurately and consistently aligns and extracts the fetal brain from minimally pre-processed 3D US scans. Our multi-task CNN, Brain Extraction and Alignment Network (BEAN), consists of two independent branches: 1) a fully-convolutional encoder-decoder branch for brain extraction of unaligned scans


Introduction
Accurate brain extraction and alignment are crucial for reliable neuroimaging pipelines, with fetal neuroimaging being no exception.Improperly aligned images can make inter-and intra-subject comparisons a challenge, and potentially misrepresent structural differences.Similarly, an accurate brain extraction (i.e., binary masking of the brain) ensures that extra-cerebral structures such as the skull, amniotic fluid, and maternal tissues do not affect any subsequent analysis, with the added benefit of improving performance by reducing the amount of processed information.However, manual alignment and extraction of brain images are tedious, time-consuming tasks that require a high level of expertise, especially for three-dimensional (3D) images.The need for automated solutions to these tasks has resulted in several algorithms being proposed for more traditional 3D neuroimaging modalities such as Mag-netic Resonance Imaging (MRI) ( Klein et al., 2009( Klein et al., , 2010;;Makropoulos et al., 2018 ).However, although ultrasound (US) is the preferred imaging modality for prenatal care and fetal growth assessment, there is a lack of development of such solutions for 3D US, limiting its potential as a neuroimaging modality.Therefore, in this work we propose an automated solution specifically designed for the tasks of automated fetal brain alignment and extraction from 3D US scans.While MRI is arguably the most widely used neuroimaging modality and therefore has had the largest amount of development for automated fetal brain alignment and extraction ( Makropoulos et al., 2018;Studholme, 2011;Torrents-Barrena et al., 2019 ), this imaging modality is not used as part of the routine clinical practice during pregnancy.Although MRI is generally considered safe for fetal imaging ( Ray et al., 2016 ), patients are typically referred for a secondary MRI scan for diagnostic confirmation, only if abnormalities are suspected during the routine second-trimester anomaly US scan ( Prayer et al., 2017 ).In contrast, the use of US to monitor and assess the development of the fetus during pregnancy is standard clinical practice worldwide ( Haratz and Lerman-Sagie, 2018;Kim et al., 2008;Namburete et al., 2018 ), and it is used during the routine clinical examination to assess the anatomical integrity of the brain throughout gestation.
While the clinical adoption of 3D US over 2D US has been relatively slow due to higher costs and limited perceived benefits ( Prager et al., 2010 ), current research has highlighted its potential, and the advantages of this modality have resulted in its recommendation by the updated International Society of Ultrasound in Obstetrics and Gynecology (ISUOG) Practice Guidelines of 2021 ( Paladini et al., 2021 ).
The ubiquity of US and availability of large 3D US datasets, such as the INTER-GROWTH-21 st study ( Papageorghiou et al., 2014 ) offer the possibility to study and assess the development of the brain from its early stages up until birth.However, the difficulty of analyzing 3D US brain datahas limited its clinical use to 2D assessment, in which biometric measurements are manually collected for growth monitoring, or in which structures are qualitatively assessed ( Paladini et al., 2007 ).This, in turn, severely limits the amount of anatomical information on which the assessment is performed.Additionally, 3D fetal neurosonograms can be used to perform substantially more complex tasks, such as automated extraction and evaluation of oblique planes ( Carneiro et al., 2008;Huang et al., 2015;Li et al., 2017;2018;Yaqub et al., 2015 ), anatomybased gestational age prediction ( Namburete et al., 2015( Namburete et al., , 2017;;Wyburd et al., 2021 ), and 3D segmentation of fetal brain structures for volumetric assessment ( Becker et al., 2010 ;Cerrolaza et al., 2017 ;Gutiérrez-Becker et al., 2013 ;Hesse et al., 2022 ;Huang et al., 2018 ;Velásquez-Rodríguez et al., 2015 ;Yaqub et al., 2013 ).A reliable, automated solution to fetal brain extraction and alignment would facilitate further development of such algorithms as well as their potential implementation for clinical assessment.
While clinical guidelines for acquisition of 3D neurosonograms have been proposed in order to minimize positional variability of the brain between scans and facilitate analysis ( Paladini et al., 2007;Pilu et al., 2007 ), the results rely heavily on the expertise of the sonographer and cannot always account for varying fetal pose during image acquisition.This results in an improved but still highly inconsistent brain alignment between fetal scans.A reliable and accurate solution for the tasks of brain alignment and extraction would therefore be paramount for 3D neurosonograms to become standard clinical practice.
To date, little work has been published for the automated extraction and alignment of the fetal brain from 3D US scans ( Cerrolaza et al., 2017;Chen et al., 2012;Cuingnet et al., 2013;Kuklisova-Murgasova et al., 2013;Moser et al., 2020;Namburete et al., 2018 ).This can be attributed, in part, to the intrinsic challenges associated with this imaging modality, as well as the effect that temporal changes of the fetal head have on image quality.Specifically, during gestation, the brain grows in size and the internal structures undergo morphological changes ( Prayer et al., 2006 ), causing a large intrinsic variability in the fetal cohort.The increasing ossification of the skull during gestation affects the interaction of the tissues with the US beam, resulting in a variation of shadows, occlusions, and reverberation artifacts throughout pregnancy which vary depending on the relative orientation and position of the probe relative to the skull.Finally, due to the acoustic cavity created by the increasingly calcified skull, the hemisphere of the brain that is located closest to the US probe has most of its structural information obscured, resulting in an intrinsically asymmetric representation of the brain.All of these challenges make the development of a reliable and accurate automated solution for brain alignment and extraction difficult.
Deep Learning (DL)-based approaches proposed for medical imaging have increased dramatically ( Suzuki, 2017 ), with the effectiveness of networks such as the U-Net ( Ronneberger et al., 2015 ) and Spatial Transformer Networks ( Jaderberg et al., 2015 ) having a profound impact in the development of segmentation and alignment solutions for medical data, respectively.Specifically, DL solutions have shown great potential for processing US image data, obtaining accurate predictions in tasks such as standard plane detection ( Baumgartner et al., 2017;Chen et al., 2015;Yaqub et al., 2017 ) and fetal structure detection, localization, and segmentation ( Gutiérrez- Becker et al., 2013a ), in spite of the aforementioned image quality challenges.This makes DL an ideal candidate for developing a generalized solution to the automated brain alignment and extraction of 3D neurosonograms.
In this work, we build upon our preliminary results for automated fetal brain extraction from 3D neurosonograms ( Moser et al., 2020 ) and propose an end-to-end, multi-task Convolutional Neural Network (CNN) that extracts and aligns the fetal brain from 3D US scans.Our multi-task CNN, Brain Extraction and Alignment Network (BEAN), consists of two independent networks: 1) a fully-convolutional, encoderdecoder network for brain extraction of unaligned scans, and 2) a cascade regression-based network for rigid alignment of the brain to a common coordinate space.These concurrent but independent networks result in a fast, accurate, and consistent alignment and extraction of the fetal brain from minimally pre-processed 3D US scans.
In summary, contributions proposed in this work are: • We adapt the fully-convolutional fetal brain extraction network published in Moser et al. (2020) to predict masks for 160x160x160 with a spline upsampling step, which results in better performance without significant computational cost.
• We propose a fully-automated alignment regression network with a cascade architecture that accurately aligns the fetal brain from minimally pre-processed 3D US scans to a common coordinate space.• We combine both networks into a single, end-to-end, Brain Extraction and Alignment Network (BEAN), that returns the extracted and aligned brain from a minimally-preprocessed 3D US scan, aiming to help facilitate the development of neuroimage analysis pipelines for intrauterine brain development.

Related works
The importance of reliable and accurate brain extraction and alignment for neuroimage analysis has lead to an extensive list of methods being proposed for MR imaging ( Eskildsen et al., 2012;Guha Roy et al., 2018;Han et al., 2018;Huo et al., 2019;Isensee et al., 2019;Kleesiek et al., 2016;Klein et al., 2009;2010;Mohseni Salehi et al., 2017;Nestares and Heeger, 2000 ), with libraries and toolboxes such as FSL ( Jenkinson et al., 2012 ) and BrainVISA ( Cointepas et al., 2001 ) being commonplace in most pipelines.In particular, there is an extensive list of methods proposed for fetal brain extraction and alignment from 3D MRI images.In contrast, considerably fewer solutions have been proposed for fetal brain 3D US scans, in spite of the increased use of 3D US during routine clinical assessment and the recent endorsement of this modality by ISUOG ( Paladini et al., 2021 ).Therefore, in order to provide a richer technical context and insight to our work, in this section we summarize relevant works proposed for both imaging modalities.However, it is important to note that MRI and US images are acquired through entirely different physical interactions, which results in differences in the structural information rendered, as well as other intensity properties, such as contrast and sharpness.Therefore, while a method that works well for MRI could potentially be adapted to work for 3D US, such an implementation is non-trivial and the lack of one-toone intensity mapping between US and MRI would potentiate the need for major modifications to most intensity-based approaches.

Brain extraction: 3D US
Several publications have proposed methods for automated segmentation of cerebral structures from 3D US ( Becker et al., 2010b;Gutiérrez-Becker et al., 2013b;Rodríguez et al., 2015;Velásquez-Rodríguez et al., 2015b;Yaqub et al., 2013;2012;Hesse et al., 2022 ).However, there is currently very limited work published on the topic of automated fetal brain extraction from 3D US scans.Cerrolaza et al. (2017) proposed using a structured geodesic random forest approach that implements structured labels and semantic features for automated segmentation of the fetal skull.While this solution showed good performance, it resulted in a significantly lower sensitivity when compared to a classic non-structured random forest method, as well as a U-Net based CNN method.In a subsequent study ( Cerrolaza et al., 2018 ), their improved method for 3D skull segmentation used a two-step CNN approach.The first CNN performed an initial segmentation of the skull, which was then used to derive an Incidence Angle Map (IAM) and a Shadow Casting Map (SCM), which provided relevant complementary information of the underlying physics of US image acquisition.The initial segmentation was then passed along with the two derived maps as input for the second CNN, which returned the final segmentation.Namburete et al. (2018) proposed a multi-task fully convolutional neural network that addressed brain localization, segmentation, and alignment to a canonical coordinate system.The alignment task of this work is discussed in Section 2.2 .The brain segmentation task consisted of predicting the fetal brain segmentation of 2D US image slices with a CNN network.These predictions were then stacked together, creating a 3D volume to which an ellipsoid was fitted as an approximation of the fetal brain.However, although this network showed good performance in their testing, it has two important limitations that affect its potential.Firstly, by predicting the brain segmentations from 2D images, the 3D context of the original volume is lost.This context is important information that could be used to improve the segmentation.Secondly, the ellipsoid approximation of the shape of the brain does not represent its actual morphology and will therefore always include background voxels in its brain segmentation.
To the best of our knowledge, our preliminary work ( Moser et al., 2020 ) is currently the only approach that directly extracts the brain from fetal 3D US scans, without relying on shape approximations.The work presented here is an extension of that network.

Brain alignment: US
Similarly to the task of fetal brain extraction from 3D US, there is very limited work published on the topic of automated fetal brain alignment from 3D US.Chen et al. (2012) proposed a method based on a coarse-to-fine strategy for fetal head registration and segmentation.The eye of the fetus was first detected through Gabor Features ( Kamarainen, 2012 ) to identify head-pose, which was then followed by registering a reference model of the fetal head to the imaged head.However, the reliance on a reference model, as well as the presence of strong edges in the image strongly limit its implementation.It is important to note that this pipeline required the user to manually select a Volume Of Interest (VOI) in the original scan before the registration takes place.
Circumventing the need for user intervention, Cuingnet et al. (2013) proposed a multi-step approach for automated alignment of the fetal brain from 3D US scans.This relied on an anisotropic plate diffusion filter, a spheroidal shell approximation of the general shape of the skull, weighted Hough transform to detect the mid-sagittal plane, and a minimum intensity gradient across the skull surface to detect the neck location.This method was tested on an age range of 19-24 gestational weeks.The performance of this approach was evaluated by computing the maximal distance inside the skull between three groundtruth planes (transthalamic, transventricular, and transcerebelar planes) and the ones obtained from their alignment.However, this approach is strongly limited by the reliance on a clear view of the neck and eye, limits its implementation in cases where these structures are occluded or affected by acoustic shadows.
As mentioned in Section 3.1.3, Namburete et al. (2018) proposed the use of a multi-task fully Convolutional Neural Network (CNN) that addressed brain alignment to a referential coordinate system.The alignment was achieved by defining a parametric coordinate system based on skull boundaries, location of the eye sockets, and head pose.The network performed 5 tasks for each 2D axial slices of a 3D US scan: 3 classification tasks (slice located near crown or neck, ante-posterior orientation of the head, and presence or absence of eye in the slice) and 2 segmentation tasks: eye segmentation, and brain segmentation.The transformation matrix for rigid alignment of the brain was then calculated by multiplying four separate matrices based on the combined prediction of the stacked 2D predictions of the 5 tasks.This was tested on a gestational age range of 21 to 30 weeks.However, one strong limitation of this approach is the loss of 3D context for the predictions, since they are done for each 2D slice independently.This also means that there is a need for a post-processing step to fuse the network prediction results to estimate the transformation matrix for alignment.The transformation matrix for the alignment is calculated based on these 2D segmentations, causing any discrepancies between slices to impact the alignment result.Similarly to Cuingnet et al. (2013) , the reliance on a clear view of the eye also limits its implementation.
More recently, Wright et al. (2019) proposed using the LSTM spatial co-transformer solution proposed in Wright et al. (2018) to co-align images of the fetal head to a canonical pose, showing remarkable results.The network performs an initial alignment, which is then iteratively refined by groupwise registration with a saliency-weighted compounded volume.The predictions are performed independently for each image, allowing for the alignment of single scans.However, analogous to Wright et al. (2018) , the network relies on multiple views of the same subject for training, which is not always available.
Also relevant to US brain alignment are the works of Kuklisova-Murgasova et al. (2013) and Wright et al. (2018) which are described in Section 2.4 .

Brain extraction: MRI
Several solutions have been proposed for the task of fetal brain extraction from MRI scans.Anquez et al. (2009) proposed using a template-matching approach to locate the fetal eye, subsequently segmenting the surrounding skull bone using a graph cut approach.However, this method relies on the eye structure to be clearly visible inside the scan, as well as a clear contrast between the skull and the brain.Automated solutions that rely on brain templates have also been proposed ( Gholipour et al., 2012;Habas et al., 2010;Taimouri et al., 2015;Taleb et al., 2013;Tourbier et al., 2017;Wright et al., 2014 ).While these works showed good results for fetal brain segmentation of MRI scans, such solutions tend to struggle on scans that show a large deviation from the average, such as structurally abnormal and pathological cases.
Machine Learning (ML) solutions have also been explored for fetal brain extraction from MRI scans.Kainz et al. (2014) proposed a Random Forest classifier trained on 3D Gabor descriptors, with a subsequent segmentation refinement using a 2D level-set.Keraudren et al. (2014) used Scale-Invariant Feature Transform (SIFT) features for localization clustered with k-means, and subsequently classified with a Support Vector Machine (SVM) classifier and refined using a RF classifier on patches of the 2D slices.
More recently, the high performance of CNNs for this task has been thoroughly demonstrated by the works of Rajchl et al. (2016b) , Rajchl et al. (2016a) , Mohseni Salehi et al. (2017) , Khalili et al. (2017) , Khalili et al. (2019) , Ebner et al. (2020) , andRanzini et al. (2021) .These solutions rely on an initial 2D brain segmentations from each slice using CNNs, which can then combined into a 3D volume.Although such an approach results in a more compact network, it greatly limits the contextual information available to the network.However, in the case of fetal brain MRI segmentation, the larger inter-slice spacing and potential motion corruption between neighbouring slices make this a compelling trade-off ( Ebner et al., 2020 ).

Brain alignment: MRI
In the case of fetal MRI, research is heavily focused on the reconstruction of the 2D slices into a 3D volume rather than alignment of the 3D brain itself due to the nature of how the MRI scans are generated ( Ebner et al., 2020;Hou et al., 2018;Miao et al., 2016 ).While fast imaging methods such as Single-Shot Fast Spin Echo (SSFSE) can minimize the effects of fetal motion in-plane, motion-corruption is still a common issue between slices of the stack.In contrast, while motion artifacts can still occur when acquiring 3D US scans, the shorter acquisition time of this imaging modality strongly minimizes their occurrence.The acquisition can also simply be repeated if such artefacts are indeed noticed by the sonographer.A reconstruction of 2D slices into a 3D volume is therefore not needed for 3D US.However, there are two particular methods that particularly relevant for this work.Kuklisova-Murgasova et al. (2013) proposed a method for registration of 3D US to a 3D MRI image by creating a simulated US image from the MRI data, on which they used block-matching and normalized cross-correlation to align it to the US volume.The registration was divided into two agespecific groups: one for 18 to 22 gestational weeks, and one for 29 gestational weeks.To assess the performance for the younger group, 4 brain structures were manually segmented in both US and MRI scans and compared to registration using label consistency as a similarity measure and gradient descent as the optimization method.Since the same structures were not visible in the older group, 5 other structures were manually segmented and the mean Target Registration Error (mTRE) of these labels after registration was calculated.Wright et al. (2018) proposed a LSTM spatial co-transformer for group-wise co-registration of MRI and 3D US scans to a canonical pose.This method refines the alignment by iteratively predicting the residual transformations and subsequently warping the images to update them.

Proposed method
For this work we aimed to develop a network that could accurately extract and align the brain from 3D US with a minimum amount of preor post-processing, while yielding state-of-the-art performance.This led us to design a network that performs each of these tasks independently, allowing for the optimal network architecture to be used for each.The brain extraction step (the first half of our proposed BEAN network), was inspired by the U-Net ( Ronneberger et al., 2015 ), which has demonstrated to be a reliable, flexible, and accurate approach for image segmentation in a multitude of image modalities.As for the brain alignment step (the second half of BEAN), our initial source of inspiration was the Spatial Transformer Networks work of ( Jaderberg et al., 2015 ), since we wanted a simple yet robust solution to the challenging task of automated alignment of the fetal brain.Their work was an essential starting point for BEAN, where we've modified their architecture to allow for serial transformations to be performed on the original input, minimizing the loss of information caused by the intermediate resampling steps.

Network architecture
Our proposed BEAN network, as shown in Fig. 1 , is sub-divided into two networks: one for the task of brain extraction and the other for brain alignment.While these networks receive the same input and operate concurrently, they operate entirely independently from each other and the results of one have no impact on the results of the other.

Notation
To facilitate the understanding of our equations and results, we introduce the notation used for the volumes and parameters involved.We denote the original 3D US scan as  ∈ ℝ 3 and the ground-truth brain mask as  ∈ ( ℝ ) 3 , with the value of their individual voxels described as  ( , ,  ) and  ( , ,  ) , respectively.Similarly, the groundtruth alignment parameters are represented by  , with individual parameters shown as   .Their predicted counterparts are denoted by M and p , respectively.Finally, if a scan or brain mask has been aligned, the parameters used will be noted as a superscript.For example, the ground-truth mask aligned with the predicted alignment parameters is represented by  p , while the predicted mask aligned with the groundtruth parameters is M  .

Network pipeline
The BEAN network takes in the original 3D US scan  as input and it subsequently passes it as input to the Extraction and Alignment components, as shown in Fig. 1 .The predicted brain mask M as well as the predicted brain alignment parameters p are then passed as input, along with the original scan  into the BEAN transformation layer  .In this layer, the predicted parameters p are used to calculate the corresponding similarity transform, which is then used to align the predicted brain mask ( M p ) as well as the original scan (  p ) to the common coordinate space.Finally, the aligned scan  p is multiplied with the aligned mask M p , resulting in the masked and aligned scan  p M .We note that the input volumes  are scaled down from 160 × 160 × 160 to 80 × 80 × 80 inside the networks before being passed to their first convolutional layers, as shown in Fig. 1 .This drastically reduces the memory requirements by 87% for both the Extraction and Alignment networks, allowing for training on a single consumer-grade GPU (Nvidia GTX 1080 Ti).However, the prediction of both networks are for the original size of 160 × 160 × 160 .The Extraction network performs a final spline upsampling on the predicted mask, which is then binarised, while parameters predicted by the Alignment network are independent of size, since the translation is normalised, as described in Section 3.1.4 .All results and figures contained in this work have been generated at the 160 × 160 × 160 scale.

Brain extraction network
The brain extraction network is based on an encoder-decoder architecture that inputs the original scan  and outputs a predicted binary volume M containing the extracted brain.The encoder is comprised of an AvgPool Layer that downsamples the input scan from 160 × 160 × 160 to 80 × 80 × 80 , followed by 5 Convolutional Blocks (see Fig. 1 ) with a 2 × 2 × 2 MaxPooling Layer between them.The decoder consists of 4 Convolutional Blocks, with a 2 × 2 × 2 Upsampling Layer and concatenating skip connections between them, as shown in Fig. 1 .A Sigmoid Layer of kernel size 1 × 1 × 1 , predicts the brain probability maps.These are then upsampled back to the original size of 160 × 160 × 160 using a spline interpolation, and subsequently a threshold of 0.5 returns the predicted binary mask M of the brain inside the original 3D US scan.All 3D Convolutional Layers of the extraction network have a kernel size of 3 × 3 × 3 with 8, 16,32,64,64,64,64,64, and 64 output channels, respectively.The only differences between this network and the one published in our preliminary results in Moser et al. (2020) are the downsampling and upsampling steps.

Brain alignment network
The brain alignment network consists of two cascading regression subnetworks that input the original scan  and output the predicted transformation parameters needed to align the input to a common coordinate space.These parameters  consist of three Euler angles   ,   and   , three Cartesian shift values   ,   and   , normalized to [−0 .5 , 0 .5] , and a scaling factor   .We have chosen a single scaling factor for all 3 dimensions of the volume in order to keep the voxel spacing isotropic and preserve the aspect ratio.Each subnetwork has the same architecture, with an AvgPool Layer that downsamples the input scan from 160 × 160 × 160 to 80 × 80 × 80 , followed by 5 Convolutional Blocks (see Fig. 1 ) and two Fully Connected Layers (FCL).The 3D Convolutional Layers of both subnetworks have a kernel of size 3 × 3 × 3 and 32, 64, 128, 256, and 512 output channels respectively.The output of the final Convolutional Block is passed to two FCL with 256 and 7 output channels respectively.The output of the first subnetwork (Coarse) are the global similarity transform parameters p  , which are then passed to the Coarse transformation layer   along with the original scan  .Inside T  , a volume-centered similarity transform matrix is calculated using p  , which is subsequently applied to  , resulting in the coarsely Fig. 1.Schematics of the BEAN network and its pipeline.The input  is a 160 × 160 × 160 3D US scan of the fetal brain, which is passed as input to the Extraction and Alignment networks, resulting in the predicted brain mask M and brain alignment parameters p .The BEAN transformation layer  combines  , M , and p , returning the aligned brain mask M p , the aligned scan  p , and the masked and aligned scan  p M .The specific architecture of the Extraction and Alignment networks is shown below the BEAN pipeline.For each volume, the orthogonal midplanes of an example scan are shown.Note that the perceived change in the order of midplanes between input and output is due to the differences between the original position of the fetal brain as acquired by the sonographer, and the canonical coordinate space.aligned scan  p  .This  p  is then passed as input to the second subnetwork (Fine), which predicts the alignment parameters p  required to refine its alignment to the common coordinate space.Finally, p  is passed as input to the Fine transformation layer   , along with p  .Here, the volume-centered similarity transforms calculated from p  and p  are multiplied, resulting in the final similarity transform, which is ultimately decomposed into the predicted alignment parameters p .

3D US Data
A total of  = 1185 3D US scans of the fetal brain were used for training (829) and testing (356) BEAN.These were obtained from the INTERGROWTH-21 st ( Papageorghiou et al., 2014 ) study dataset and represent the gestational range of 14.4 to 30.9 gestational weeks (GW) distributed as shown in Fig. 2 .The 3D scans were obtained in eight sites in eight countries using a Philips HD-9 ultrasound machine with a V7-3 curvilinear abdominal transducer.The volumes were re-sampled to an isotropic voxel size of 0.6 mm in each dimension, and subse-quently centre-cropped to a size of 160 × 160 × 160 voxels.For reference, the median dimensions of the original scans were 444 × 254 × 111 (IQR = 59 × 88 × 36 ) while their median voxel size was [0 .32 , 0 .51 , 0 .85] mm (IQR = [0 .05 , 0 .15 , 0 .28] mm).

Data annotation
In order to train the proposed BEAN network, a separate set of ground-truth labels were generated for the brain extraction and alignment components of the framework.
To generate the ground-truth alignment parameters, each brain was manually aligned to a canonical coordinate space using a custom-built tool, as shown in Fig. 3 a.The brain was iteratively translated and rotated until the mid-sagittal, mid-coronal, and mid-axial planes corresponded to the central orthogonal planes of the volume.Finally, each brain was isometrically scaled to match the average brain size at 30 GW.The translation, rotation, and scaling transforms applied in each step were multiplied together, and the resulting transform was deconstructed into the Due to the complexity of visually assessing 3D US images, manual labeling of the brain extraction masks by a clinician would be extremely time-consuming and require a high degree of expertise.In order to circumvent this problem, we used the fetal brain atlas labels generated by Gholipour et al. (2017) as templates.These atlases were generated using MRI images and represent the average brain shape for each GW in the range of 21 to 38 weeks.An example of these atlases is shown in Fig. 3 (b).We binarised the atlas to convert it into a brain mask and we manually aligned each mask to the same coordinate space as the 3D US scans, as shown in Fig. 3 (b).Finally, we aligned each brain mask to the 3D US scans of the same gestational age by transforming it using the inverse of the corresponding alignment similarity transform  −1  as shown in Fig. 3 (c).These binary masks of the brain in their original locations are the ground-truth brain extraction labels  .

Training
In order to train and test BEAN with representative data of the entire dataset, we randomly selected 30% (356 volumes) for a holdout testing dataset.We performed a three-fold cross-validation with the remaining data, splitting them into 70% (580 volumes) for training and 30% (249 volumes) for validation.Note that the random selection of scans was performed proportionally to the gestational ages in order to ensure that the training, validating, and testing datasets represent the entire gestational range of the original dataset, with a Kolmogorov-Smirnov p-value > 0.6 for all data splits.The best model was then retrained with the entire dataset and its performance was evaluated with the holdout testing dataset.The same split was used to train the Extraction and Alignment components of BEAN.
The Extraction network was trained end-to-end for 200 epochs saving the best version, with a batch size of 4, the Adam optimizer, and an empirically selected learning rate of 10 −3 .The loss function   was a based on the Dice Similarity Coefficient (DSC) described in Section 4 , and is shown in Eq. ( 2) .
The Alignment network was trained in two steps.First, the Coarse Alignment was trained for 200 epochs, with a batch size of 4, the Adam optimizer, and an empirically selected learning rate of 10 −4 .The loss function   was the sum of the Mean Absolute Error (MAE) between the predicted alignment parameters p and the ground-truth parameters  , and the   of the ground-truth brain extraction mask  aligned with  and p ,   and  p , respectively.This is described in Eq. ( 3) with  = 7 denoting the number of alignment parameters.
After training the Coarse Alignment , the Fine Alignment was trained in the same manner but with two important differences.Firstly, the trained weights of the Coarse Alignment were transferred to the Fine Alignment as a warm start to help it converge faster.Secondly, the input data is now the original scan aligned with the predictions from the Coarse Alignment p  , and the predicted parameters p  aim to finish aligning this input to the original scan aligned with  .

Ethics statement
The 3D US scans used in this work were obtained from the INTERGROWTH-21 st study, which was approved by the Oxfordshire Research Ethics Committee 'C' (ref: 08/H0606/ 139), as well as the research ethics committee and corresponding regional health authorities of the participating institutions.Written consent to be involved in the project was provided by all participants Villar et al. (2013) .

Evaluation
In order to fully analyze the accuracy and reliability of BEAN, we define seven measures that will describe the performance of the predictions from the network against the ground-truth.

Statistical Significance
To assess the statistical significance when comparing these measures, we first determined the normality with a D'Agostino and Pearson's test d 'Agostino (1971) , followed by a paired Student's ttest Student (1908) or a Wilcoxon signed-rank test Wilcoxon (1992) , for normal and non-normal samples, respectively.We used a significance threshold of  < 0 .05 for both tests.

Centroid Distance
The Centroid Distance (CD) measures the Euclidean distance between the centroids of two binary volumes, with an ideal value of 0. For the Extraction network, it is calculated between the predicted mask M and ground-truth  , and it helps evaluate how accurately the predicted mask locates the brain in the scan  .A more detailed description can be found in Appendix A .

Dice Similarity Coefficient
The Dice Similarity Coefficient (DSC) gives a measure of the overlap between the predicted brain mask and the ground-truth, with an ideal value of  = 1 if the overlap is perfect, and  = 0 if there is no overlap at all.A more detailed description can be found in Appendix A .For the Extraction network, analogous to Eq. ( 2) , the DSC is calculated between the predicted brain mask M and the ground-truth mask  .For the Alignment network, the DSC is calculated between   and  p instead.

Hausdorff Distance
One of the limitations of DSC is that since it is a global measure, local discrepancies are often overlooked.To determine whether the predicted mask is consistent throughout the brain or if there are specific areas of discrepancy with the ground-truth, we use the Hausdorff Distance (HD) as a measure, as described in Huttenlocher et al. (1993) .For the Extraction , this is calculated between M and  , while for the Alignment it is calculated between  p and   .

Mean Squared Error
The Mean Squared Error (MSE) measures the average squared difference between the predicted alignment parameters p and the groundtruth  , as described in Appendix A

Symmetry Coefficient
As mentioned in Section 1 , there is an asymmetry of structural information between the two brain hemispheres due to the interaction between the skull and the US beam.To assess if the Extraction network is predicting both sides of the brain symmetrically, we align the predicted brain mask M with the ground-truth alignment parameters  , obtaining M  .We then mirror the right hemisphere through the sagittal plane and calculate the DSC between both hemispheres, as shown in Eq. ( 4) .We name this measure Symmetry Coefficient (SC), and a visual representation is shown in Fig. 4 .

Spatial Spread of Landmarks
To assess the impact of the Alignment network of BEAN on the spatial distribution of the brain structures of the original scan, a single point was placed on each face as well as on the geometric center of each aligned 3D US scan, as shown in Fig. 8 a.Each color represents a different landmark (left, right, top, bottom, front, back, center).These landmarks were then transformed back to their respective unaligned position, resulting in the original Spatial Spread of Landmarks (SSL) of the unaligned dataset shown in Fig. 8 b.Finally, the parameters predicted by using a single alignment subnetwork (Coarse) and the two alignment subnetworks (Coarse + Fine) were then used to align the landmarks ( Fig. 8 c and 8 d, respectively).
We define two measures to analyze the SSL of each case: the Mean Shift of SSL and the Mean Spread of SSL.To obtain them, we first calculate the mean position of each landmark.If we define the position of every individual landmark by    and their goal position by   , where  is the landmark and  is the scan number, we can calculate the mean position of each landmark over all scans with Eq. ( 5) , where  denotes the total number of scans.
The Mean Shift of SSL assesses the accuracy of the alignment.It is the mean Euclidean distance between the mean positions  and the goal positions  , as shown in Eq. ( 6) with  = 7 representing the total number of landmarks.

Mean Shift of SSL ( 𝐤
The Mean Spread of SSL assesses the precision of the alignment.It describes the spread of each landmark point cloud and is determined by the mean Euclidean distance between all points of a landmark and their mean position as shown in Eq. ( 7) .

Network Consistency
Since the Extraction and Alignment networks of BEAN operate independently of each other, it is important to determine if the results of both are consistent.For this, we use the volume  of the predicted mask  and the predicted scaling parameter   .As mentioned in Section 3.3 , all scans have been scaled to match the average brain size at 30 GW, which has a volume denoted by  30 .Therefore, the scaling parameter   is proportional to the ratio between  30 and the volume  of extraction brain mask  as shown in Eq. ( 8) .
We define the Network Consistency (NC) as shown in Eq. ( 9) , with an ideal value of NC = 1 .

Brain extraction
As stated in Section 4 , we assess the performance of the Extraction network of BEAN by calculating the Center Distance (CD), Dice Similarity Coefficient (DSC), and Hausdorff Distance (HD) between the predicted brain mask M and the ground-truth  , as well as the Symmetry Coefficient (SC) of M .As a baseline comparison, we use the results already presented in Moser et al. (2020) which comprise the approximation of the brain mask generated by the method proposed in Namburete et al. (2018) , as well as the best-performing configuration of the preliminary results of our network.As shown in Table 1 , BEAN outperforms the solution presented in Namburete et al. (2018) in every measure.The Extraction network of BEAN obtained a mean CD of 0.98 mm, a DSC of 0.94, and a HD of 7.87 mm.These results represent a 276%, 9.6%, and 92% improvement over the results obtained with the method used in Namburete et al. (2018) .Additionally, the standard deviations of our results also show a much higher precision for BEAN.Finally, BEAN shows a 22% improvement of the SC with a result of 0.95 when compared to 0.74 obtained with Namburete et al. (2018) .The addition of an upsampling step before the thresholding the brain mask results in a small improvement when compared to our previous work.The CD and HD results are 27.9% and 13% smaller, both statistically significant (see Section 4 ).In contrast, DSC and SC showed no statistically significant changes.
Examples of the outline of the predicted and ground-truth extractions can be seen for different views of a range of gestational ages in Fig. 5 .The network performs accurately for the entire gestational range of our dataset, regardless of scale, orientation, and rotation.As the examples for gestational ages 26 and 30 show, the prediction works reliably even if parts of the brain are not contained in the 3D US beam or volume.When analyzing the performance difference between scans that were imaged from the left or right cerebral hemispheres, i.e., where the probe was Fig. 6.Evaluation performance of the Extraction network of BEAN for each GW.The average result of each measure is shown as a dashed line.

Table 1
Testing results of the fully-trained Extraction network compared against method from Namburete et al. (2018) and our preliminary work from Moser et al. (2020) .The best result for each measure has been highlighted in boldface font, with an arrow indicating whether a higher (up) or lower (down)  Fig. 7.The value of every parameter predicted by the Alignment network has been plotted against their target value, with the color of each value representing the GW of the 3D US scan.The linear regression for each parameter is shown in blue, with its  2 score displayed in the bottom-right corner.The line of equality is shown as a dashed line on every plot.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)located on the left or right side of the fetal brain, only the DSC result was statistically significant but the relative difference between them was less than 1%.

Brain extraction by gestational week
Fig. 6 shows the performance of each performance measure for the Extraction network for each GW.The performance of the CD shows a consistent trend throughout the gestational period of 14 to 30 weeks.The DSC, however, shows a drop in performance between weeks 14 and 19.This is likely due to the smaller brain volume of the earlier GWs, which increases the DSC sensitivity to mislabeled voxels since the overall true-positive voxels are fewer.We observe a similar trend for the SC, which is also likely to be due to the same reason.Regardless of this effect, we note that the lowest average DSC is still around 0.90.The performance of the HD, however, shows a decrease from 25 GW onward.As mentioned in Section 3.3 , the ground-truth extraction masks are based on the atlases from Gholipour et al. (2017) , which are created from MRI data.Unlike 3D US, this imaging modality has the ability to separate the cerebrospinal fluid from the brain tissue.As a result, the ground-truth masks are structurally different from what can be observed with US around the edges of the brain.This is particularly evident for the gap between the occipital cortex and the cerebellum, which can be seen in Fig. 3 and was discussed in depth in Moser et al. (2020) .Since these gaps widen as the fetal brain develops, so would the HD between the ground-truth and the predicted masks increase as well.

Brain alignment
The Alignment network of BEAN is evaluated by calculating the MSE between the predicted parameters p and the ground-truth parameters  , as well as the DSC and HD between the ground-truth mask  aligned with the predicted alignment parameters  p and with the ground-truth parameters   .As a baseline comparison, we assess the performance of the Alignment network using a simple Mean Absolute Error (MAE) loss function  MAE during training, as well as using only one subnetwork (Coarse) instead of two (Coarse+Fine).The results in Table 2 show that the combined loss function  comb =  MAE +  DSC outperforms a basic  MAE in every measure, with an improvement of 14.3% for MSE, 1.3% for DSC, and 10.6% for HD.However, this improvement is only statistically significant for the latter two measures, highlighting

Table 2
Testing results of the Alignment network.The results are shown for two types of loss functions used during training: Mean Absolute Error (  MAE ) of the predicted parameters compared to the ground-truth, and the sum of MAE with the Dice Similarity Coefficient (  MAE +  DSC ) loss of 3D groundtruth brain-masks transformed with the predicted and ground-truth parameters.The results of using a single alignment subnetwork (Coarse) are shown to display the improvement from our cascading two-subnetwork approach (Coarse+Fine).The performance tests are Mean Squared Error (MSE) of the predicted parameters, as well as Dice Similarity Coefficient (DSC) and Hausdorff Distance (HD) of the transformed masks.The values for HD are shown in voxels of the aligned volume.The best result for each measure has been highlighted in boldface font, with an arrow indicating whether a higher (up) or lower (down) value is desirable.the fact that simply assessing the predicted parameters is not enough for a reliable alignment performance evaluation.We note here that using  DSC as a loss function by itself was not enough supervision for training and therefore never managed to converge.Table 2 also shows that alignment with two cascaded subnetworks (Coarse+Fine) outperforms a single subnetwork (Coarse) in every measure, with an improvement of 81.5% for MSE, 3.6% for DSC, and 24.4% for HD, all of which are statistically significant.We also notice an improvement of 87.9% for the standard deviation of MSE, 12.2% for the standard deviation of DSC, and 5.3% for the standard deviation of HD, showing that the Coarse+Fine alignment is not only more accurate but also more consistent.We also remark that a third subnetwork was considered but early results showed an inability to generalise and was therefore not further developed.
Table 3 shows the MSE results shown in Table 2 separated for rotation parameters ( MSE  ), translation parameters ( MSE xyz ), and scaling parameters ( MSE  ).These results show that while the inclusion of a Fine alignment step improves the performance for all parameters, the largest effect is on the rotation parameters, for which MSE  shows a decrease of 78% and 80% for  MAE and  MAE +  DSC , respectively.In contrast, the added supervision of  DSC has a smaller but more equitative impact on the parameters.For Coarse, the performance improvements shown in Table 2 are due to a 33% lower MSE xyz which has a bigger impact on the overall performance at this stage than the 6.5% and 182% increase of MSE  and MSE  , respectively.However,  DSC improves performance for all parameters when using a Coarse+Fine alignment.
The performance for each individual parameter can be seen in Fig. 7 , where they have been plotted against their target value, along with their linear regression and the corresponding  2 score.The Euler angles   , Fig. 8. Visualization of Spatial Spread of Landmarks (SSL) as described in Section 4 .A single point has been located at the middle of each face of each 3D US scan: Top (blue), bottom (red), front (purple), back (orange), left (green), right (cyan), and center (black).These points are the goal positions for the aligned volumes, shown in (a) along with an outline of an ideally aligned brain of 30 GW for reference.Using the groundtruth alignment parameters, they have been transformed to their original positions in their unaligned scans (b).The possible orientations of the head when acquiring the scan causes the clusters to be divided among opposing faces of the volume.The results of a single alignment subnetwork (Coarse) and of our cascading two-subnetwork (Coarse+Fine) architecture are shown in (c) and (d) respectively.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)  , and   lie consistently on the line of equality with only a handful of outliers and no visible correlation with gestational age.We observe that the angles are distributed in two distinct clusters as a result of the clinical acquisition protocol, which does not take into account the topbottom and left-right orientation of the brain, as shown in the XY-planes of Fig. 5 .The predictions of   also show better performance than   and   , which seem to be more sensitive to the quality of the scan, since the outliers had large acoustic shadows or a higher amount of motion blur.The predicted Cartesian shifts   ,   , and   are highly consistent, lying on top of the line of equality.These 6 parameters show only a weak negative correlation with the gestational age of the fetus, with their Pearson's correlation coefficients lying between -0.04 and -0.26.Finally, the predicted scaling   shows a high degree of consistency, with a single outlier being a 30.4GW scan with particularly strong acoustic shadows and some sections of the brain located outside of the 3D US beam.The expected correlation with gestational age is also clearly visible (Pearson's correlation coefficient of -0.94), with a slight drop in performance for earlier GWs where the required scaling is largest.
As described in Section 4, Fig. 8 shows the SSL, with Fig. 8 (a) showing the goal position of our 7 landmarks (top, bottom, left, right, front, back, and center) as well as an outline of an ideally aligned brain of 30 GW for reference.Fig. 8 (b) shows the positions of these landmarks in their original, unaligned scans.Here, the landmark positions are dispersed widely.The spread of the central landmark (black) shows a large variation in the location of the brain with respect to the scan.For the remaining landmarks, there is a large radial spread from the central landmark which represents the scaling variation between volumes, as well as a large axial spread which represents the rotational variation.Fig. 8 (b) also shows that each landmark cluster is split along two sides of the volume.In comparison,Fig. 8 (c) and 8 (d) show the landmark positions after Coarse and Coarse+Fine alignments, respectively.In both cases, the clusters have been dramatically reduced in size.The cluster of the central landmark is now considerably smaller, which indicates that the center of the brains are closer to the center of the volume.The predicted scaling has greatly reduced the radial spread, while the rotation parameters have reduced the axial spread.We observe that the cluster split seen in Fig. 8 (a) is no longer present.In comparison with Coarse alignment, Fine alignment manages to reduce the spread of the clusters even further, but this improvement is not as remarkable as that from the original positions to the aligned positions.
For a quantitative assessment of these results, the Mean Shift and Mean Spread of SSL as described in Section 4 of the original positions,   4 .We observe a reduction of the Mean Shift of 96.9% after Coarse alignment, and 97.19% after Fine alignment.The Mean Spread shows a 69.1% reduction after Coarse alignment, and a 73.9% reduction after Fine alignment.

Brain alignment by gestational week
To provide a deeper understanding of the performance of the Alignment network, we assess the predictions for each GW of our dataset,  which ranges from 14.4 to weeks.Fig. 9 shows breakdown each performance measure for the Coarse and Fine alignments.
The performance improvement as well as the consistency throughout the range of gestational ages when comparing both alignment steps is evident.The Fine alignment increases performance for every gestational age, especially around the extrema of the gestational age range.All three measures show a decline in performance for 14-16 GW as well as 30 GW, which is expected due to their smaller sampling sizes as shown in Fig. 2 .However, there are certain characteristics of the data that are most likely related to this performance drop as well.In the case of the early GWs (14-16 GW), this is most likely a result of the larger transformation required, since all scans are aligned and scaled to the 30 GW scans.While scaling to the largest brains minimizes the information loss caused by downsampling, it results in errors in the predicted angles and shifts of smaller brains being amplified by the relatively larger scaling factor.We note that although the aforementioned performance drops are noticeable, the median performance is still high for every week, with GW 15, for example, obtaining a median MSE of 0.027 and a median DSC of 0.895.
As a final evaluation of the alignment performance, Fig. 10 shows the average volumes for several GWs.These have been split into Coarse and Fine alignment to qualitatively illustrate the differences between them.

Full network results
In Section 5.1 and 5.2 , we have shown the results of our full assessment of the performance of the brain alignment and brain extraction of BEAN.Since the Extraction and Alignment networks are fully independent, this represents the complete performance assessment of BEAN.However, in order to test if both networks are consistent with one another, we must analyze the Network Consistency (NC) score introduced in Section 4 .On average, BEAN obtained a NC score of 0 .31 ± 1 .10 , indicating a very high degree of consistency between the Extraction and Alignment networks.This performance drops for earlier GWs, which is most likely due to the relative impact that each voxel has on the total predicted brain mask, as well as the larger scaling parameters.As shown in Fig. 11 , the performance of 14 GW and 15 GW shows the lowest NC, which is consistent with the results seen in Section 5.1 and 5.2 .
For a final demonstration of the performance of the network, we have chosen an example gestational age of 22 weeks, and compared the mean results of the full network with the mean results obtained using the ground-truth mask and alignment parameters, which is shown in Fig. 12 .The results of BEAN are similar to the expected ground-truth, both in terms of alignment and extraction, highlighting its exceptional performance.The only major difference is the discrepancy in the space between the occipital cortex and the cerebellum that can be clearly seen in the sagittal plane view, as mentioned in Section 5.1 .

Discussion
In this paper we present BEAN, a multi-stage convolutional neural network, that automatically extracts the fetal brain from minimally preprocessed 3D US scans and aligns it to a common space.BEAN extracts the fetal brain without the need for shape approximations, allowing for the actual shape of the brain to be analysed, which is fundamental to the analysis of brain maturation.It also predicts the brain alignment parameters using the entire scan for context.As a result, it does not require specific structures such as the eyes to be clearly visible, something that is not always possible due to the unpredictable positional variability of the fetal head in the womb during gestation.The modularity of BEAN also allows for the predictions to be used as needed.The independent nature of the Extraction and Alignment subnetworks allows for brain extraction to be performed without the need for alignment and vice-versa.Similarly, the parametric prediction of the Alignment network makes it possible to perform partial alignments, e.g. the brains can be aligned with and without scaling.As a result, BEAN is a fast, modular, ageindependent, and consistent solution for the extraction and alignment of the fetal brain from 3D US with state-of-the-art performance.
The methodology presented in this work aims to help automate the tasks of brain alignment and extraction from 3D neurosonograms in order to facilitate the development of neuroimage analysis pipelines for intrauterine brain development ( Namburete et al., 2015;Yaqub et al., 2015 ).The lack of robust automated solutions for these tasks, in part due to the relatively lower contrast and higher imaging artifacts when compared to other modalities such as MRI, has limited the usage of 3D US both for research of brain maturation.It has also limited the potential of 3D US for clinical use, since clinical evaluation 3D volumes without automated assistance tools is considerably harder than that of 2D planes.The recent recommendation by ISUOG to use 3D scans when possible ( Paladini et al., 2021 ) has made the need for such automated brain extraction and alignment tools even more urgent.
To the best of our knowledge, BEAN is the first network that extracts the 3D fetal brain directly from a 3D scan without the need to analyse each 2D frame independently.As a result, the BEAN has access to much richer contextual information, which results in better performance and consistency, as shown in Section 5 .The extracted brain is a direct segmentation from the original 3D US scan and does not require any approximations of the brain shape.Additionally, while the performance improvements gained in comparison to Moser et al. (2020) by adding the upsampling step are small, it allows for state-of-the-art performance without requiring any changes to the network architecture or size, and with a negligible additional computational cost.The extraction performance achieved by BEAN rivals to that of state-of-the-art solutions for fetal mentioned in Section 2 as Ebner et al. (2020) , which achieved a similar DSC score of 0.94.
BEAN was trained on 829 and tested on 356 3D US scans spanning the gestational age range of 14 to 30 weeks, representing healthy fetuses from several ethnic and geographical groups.During this gestational period, the fetal head undergoes rapid structural changes, including ossification of the skull and development of brain structures, which result in a large intrinsic variability of the structural information of the 3D US scans.Nevertheless, without the need of any additional inputs or pre-processing (besides isotropic sampling and volume centre-cropping) our results show that BEAN performs consistently across this gestational range and structural variability with only some performance loss around the edges of the gestational range, which is expected due to the broad age distribution of our dataset ( Fig. 2 ).
Although BEAN was trained on data from a multi-site study, every 3D US scan collected using the same model of US machine and transducer.As such, a possible limitation is that the performance of BEAN could vary for scans obtained with different equipment.While multi-site datasets of other modalities are being harmonised to remove scanner biases ( Dinsdale et al., 2021 ), in 3D US the operator settings (e.g.time-gain compensation) have a much larger impact on the characteristics of the image.Therefore, the variability introduced by the settings used in different sites should counterbalance this limitation.Additionally, the data used to train BEAN was from the INTERGROWTH-21 st ( Papageorghiou et al., 2014 ) study.which was specifically designed to image healthy women that experienced a normal pregnancy.As a result, the data used to train and test BEAN excludes scans exhibiting particularly abnormal brain development.This could potentially hamper BEAN's applicability in such cases and needs to be explored.The independent nature of the Extraction and Alignment subnetworks might also prove limiting in the performance of BEAN.Although this architecture has many benefits as already discussed, training each task independently might be limiting the overall performance.Multi-task learning has shown to improve the performance and flexibility of Deep Learning methods ( Namburete et al., 2018;Ruder, 2017 therefore a performance comparison of such an approach against the current architecture of BEAN is still needed.
As mentioned in Section 5 , BEAN shows a decreased performance for scans with particularly poor quality, i.e., large amount of acoustic shadows or motion blur.
In summary, BEAN presents a fast, consistent, and robust DL solution for the automated extraction and alignment of the fetal brain from 3D US scans to a common coordinate space.As a result, BEAN represents a helpful tool to aid the development of neuroimage analysis pipelines for the development of the fetal brain during gestation.The performance of the network should only improve with larger, more diverse datasets, and it will be interesting to see how it performs with abnormally developed brains.

Fig. 2 .
Fig. 2. Histogram of the distribution of the gestational age of the dataset ( Papageorghiou et al., 2014 ), separated into training, and testing data.

Fig. 3 .
Fig. 3. Demonstration of the process of data labeling.First, the individual 3D US scans are manually aligned with to a common canonical space, as shown in a), in order to obtain the ground-truth alignment parameters.This alignment is performed with the transform   .Then, as shown in b), the spatiotemporal atlases (tissue) from Gholipour et al. (2017) are binarised and aligned to the same canonical space as the scans (  ′  ).Finally, c) shows that the corresponding binary brain masks are aligned to the original position of the fetal brain for each GW using the inverse transformation  −1  from the ground-truth parameters.

Fig. 4 .Fig. 5 .
Fig. 4. Visual representation of how the Symmetry Coefficient is (SC) calculated.The SC is the Dice Similarity Coefficient (DSC) between the left hemisphere and right (mirrored) hemisphere, with the intersection between them shown in magenta.

Fig. 9 .
Fig. 9. Performance of the Alignment network of BEAN for each GW.The testing results of Mean Squared Error (MSE) of the predicted parameters, as well as the Dice Similarity Coefficient and Hausdorff Distance of the transformed binary brain masks are plotted for both the Coarse and Fine alignments.The average of each alignment step is shown as a dashed line of the corresponding color.Note that the y-axis scale for MSE Parameters is logarithmic, in order to show all outliers while still showing the improvements between coarse and fine alignments.

Fig. 11 .
Fig. 11.Evaluation of the Network Consistency (NC) measure described in Section 4 for GW.It represents the difference between the scaling parameter  of the Alignment network, the volume of the brain mask predicted by the Extraction network M in relation to the average brain volume of the brain at 30 GW V 30 .A result zero represents ideal consistency between both The mean are shown as a dashed line.

Fig. 12 .
Fig. 12.Comparison of mean predicted output of BEAN, i.e., Extraction and Alignment , and the ground-truth for gestational week 22.Top: Ground truth.Middle: Prediction.Bottom: Difference between ground truth and prediction.
value is desirable.CD: Centroid Distance of binary masks.HD: Hausdorff Distance.DSC: Dice Similarity Coefficient.SC: Symmetry Coefficient.

Table 3
Mean squared error results of the Alignment network separated by parameter type: rotation parameters   ,   , and   ( MSE  ), translation parameters   ,   , and   ( MSE xyz ), and scaling parameter   ( MSE  ).The best result has been highlighted in boldface font.

Table 4
Evaluation results of the Spatial Spread of Landmarks (SSL) shown in Fig. 8 > .The best result for each measure has been highlighted in boldface font.The equations describing the Mean Shift of SSL and the Mean Spread of SSL are found in Section 4 .