Spinal Cord Segmentation in Ultrasound Medical Imagery

In this paper, we study and evaluate the task of semantic segmentation of the spinal cord in ultrasound medical imagery. This task is useful for neurosurgeons to analyze the spinal cord movement during and after the laminectomy surgical operation. Laminectomy is performed on patients that suffer from an abnormal pressure made on the spinal cord. The surgeon operates by cutting the bones of the laminae and the intervening ligaments to relieve this pressure. During the surgery, ultrasound waves can pass through the laminectomy area to give real-time exploitable images of the spinal cord. The surgeon uses them to confirm spinal cord decompression or, occasionally, to assess a tumor adjacent to the spinal cord. The Freely pulsating spinal cord is a sign of adequate decompression. To evaluate the semantic segmentation approaches chosen in this study, we constructed two datasets using images collected from 10 different patients performing the laminectomy surgery. We found that the best solution for this task is Fully Convolutional DenseNets if the spinal cord is already in the train set. If the spinal cord does not exist in the train set, U-Net is the best. We also studied the effect of integrating inside both models some deep learning components like Atrous Spatial Pyramid Pooling (ASPP) and Depthwise Separable Convolution (DSC). We added a post-processing step and detailed the configurations to set for both models.


Introduction
Two dimensional (2D) Ultrasound (US) is a standard modality in medical imaging [1][2][3][4][5]. It has several advantages. First, it does not have harmful effects on the human body. Second, it is relatively low cost compared with other modalities. Third, it provides real-time imaging during surgical operations [1,4,5]. The research community is actively concerned about the advancements made in the   The laminectomy is a surgical operation performed by neurosurgeons and orthopedic surgeons to relieve compression on the spinal cord. This pressure may engender mild to severe back pain or difficulty in walking or in controlling limb functions. It can also have other symptoms that can interfere with daily life. The surgical procedure creates more space for the spinal cord and nerve roots to relieve abnormal pressure on the spinal cord by removing the laminae and the intervening ligaments [3]. Figure 3 shows the laminectomy surgery and the spinal cord. In normal situations, the spinal cord could not be imaged using ultrasound imaging because ultrasonic waves do not pass well through bones [8]. In fact, the spinal cord is completely enclosed inside the bones of the spinal column. During laminectomy, ultrasonic waves can pass through the created bone defect to give real-time exploitable images of the spinal cord. Ultrasound imaging can demonstrate the spinal cord and the surrounding structures. This helps greatly to confirm the adequacy of the spinal cord decompression [8]. The ultrasound is also able to detect spinal cord pulsation and record it in a video format [8]. The created video can be saved for later use and analysis. It is believed that a decompressed spinal cord tends to have better pulsation. Spinal cord pulsation, however, is variable from one person to another [8]. Providing an automatic solution for detecting the exact boundaries of the spinal cord is helpful for automatic interpretation and analysis of spinal cord pulsation. This spinal cord pulsation values could be interpreted alongside the electrocardiogram (the electrical activity of the heart) to interpret if both signals are synchronized, or there is some latency to interpret [8].
But the great challenge here is the quality of the ultrasound image as boundaries could be obscured and some parts could be of foggy appearance [8]. This is due to the different attenuations applied to the sonographic waves while passing through the human body. This makes the automatic segmentation more difficult to achieve. But the service that gives this solution for neurosurgeons strengthens the interest for further examination to resolve this issue. Figure 4 gives a clear representation of how the spinal cord is represented in ultrasound imaging. Since 2012 [9] deep learning based approaches have shown an attractive efficiency in object segmentation for multiple types of imaging (RGB imaging, aerial imaging, multi-spectral imaging etc.) [10][11][12][13][14]. This was a source of inspiration for the medical imaging community to move their interest towards a great adoption of these approaches for different medical imaging modalities (MRI, CT Scan, 2D Ultrasound, 3D Ultrasound etc.) [4].
Concerning the ultrasound medical segmentation, different works have been made in the detection of different parts of the body [4] (breast image segmentation, vessel image segmentation, heart image segmentation etc.). However, as demonstrated in Section 2, no one has treated the spinal cord image segmentation in 2D ultrasound medical imagery. In this study, we deduced, based on the current state of the art of image semantic segmentation, the best solution to adopt for our specific task. Our objectives in this paper were:

•
Treating, for the first time, the problem of spinal cord segmentation in ultrasound imaging.

•
Introducing, based on the state of the art in the area of image semantic segmentation, the best model for two case studies. The first case study is when the spinal cord is already in the train set of the model. The second case study is when the spinal cord does not belong to the train set of the model. We constructed a separate dataset for each scenario and selected the best model in each case study.

•
Studying the integration of some successful deep learning components like ASPP (Atrous Spatial Pyramid Pooling) and DSC (Depthwise Separable Convolution) inside the selected models. • Improving the performance of the chosen model by adding a post-processing step and selecting the right configuration setting for both models.
The rest of the paper is organized as follows: Section 2 was dedicated for a review about some related works that studied the spine anatomy in ultrasound imagery. Section 3 introduced the architecture of the selected models (FC-DenseNets and U-Net) and the deep learning components (ASPP and DSC) that we used for spinal cord ultrasound image segmentation. Section 4 discussed the experiments we made and the best approach to adopt for the different targeted scenarios.

Related Works
The analysis of ultrasound medical images of the spine was the subject of many studies in recent years. Ultrasound is a safe, radiation-free imaging modality that is easy to use by neurosurgeons compared to other medical imaging modalities. Pinter et al. [5] introduced a real-time method to automatically delineate the transverse processes parts of the vertebrae using ultrasound medical imagery. The method creates from the shadows cast by each transverse process a three-dimensional volume of the surface of the spine. Baum et al. [6] used the results of this study to build a step-wise method for identification of the visible landmarks in ultrasound to visualize a 3D model of the spine. Sass et al. [7] developed a method to make a navigated three-dimensional intraoperative ultrasound for spine surgery. They registered the patient automatically by using an intraoperative computed tomography before mapping it to a preoperative image data to get visualized navigation of the common parts of the spine. Hetherington et al. [15] developed a deep learning model named SLIDE to discriminate many parts of the spinal column using only 2d ultrasound with 88% cross-validation accuracy. The model acts in real-time (40 frames per second). Conversano et al. [16] developed a method for estimation of spine mineral density from ultrasound images. This method is useful for the diagnosis of osteoporosis. Zhou et al. [1] developed an automated measurement of the spine curvature from 3D Ultrasound Imaging. The method is useful for the diagnosis of scoliosis. Inklebarger et al. [17] developed a method to visualize transabdominal lumbar spine image using portable ultrasound. They specified a settings of machine configurations and probe selection to follow in order to obtain a profitable image. Karnik et al. [18] made a review of applications of the ultrasound imaging for the analysis of neonatal spine with emphasis on cases where it is the primary imaging modality to use. Di Pietro et al. [19] made a similar study on the examination of the neonatal and infant spine from the ultrasound. Ungi et al. [20] also made a review about applications of tracked ultrasound modality in navigation during spine Interventions. Chen et al. [21] made a pairwise registration of 2D ultrasound (US) and 3D computed tomography of the spine. The model is done using Convolutional Neural Network (CNN) before the refinement of the registration using an orientation code mutual information metric. Shajudeen et al. [22] developed a method for automatic segmentation of the spine bones surface in ultrasound images. The method can be extended to any bone surface presented in ultrasound images. Hurdle et al. [23] made a review about the use of ultrasound imagery for guidance in the diagnosis of many spinal pains.
From the above, we observed that a great number of studies had treated the use of ultrasound imagery in the diagnosis of clinical symptoms associated with the spine. Nevertheless, they are limited to the identification and segmentation of the bones of the spine. Up to our knowledge, no one treated the identification and segmentation of the spinal cord from ultrasound imaging. During laminectomy surgery, ultrasound images of the spinal cord could be visualized and analyzed. Surgeons usually refer to these images to study the effect of the laminectomy surgery and the spinal cord pulsation to confirm the adequacy of spinal cord decompression. Hence, we targeted this problem in this study and aim to provide an automatic solution for the segmentation of spinal cord using the recent advances in deep learning algorithms.

Proposed Method
This section gives an in-depth representation of the architecture of the models that have proven efficiency in our task of segmentation of the spinal cord inside ultrasound imagery. As U-Net and FC-DenseNets have proven efficiency in this mission, we will explain in detail the related concepts such as Convolutional Neural Network (CNN), U-net, DenseNets, and Fully Convolutional DenseNets. We will introduce later the Atrous Spatial Pyramid Pooling (ASPP) and the Depthwise Separable Convolution (DSC) and emphasize their usefulness in improving the performance of the chosen models.

CNN: Convolutional Neural Network
If we have data that contains N training samples {( , where x represents the annotated input, y represents the label, Convolutional neural network (CNN) is able to construct a model F that maps the relationship between the input data x and the output data y. The CNN is built by stacking a series of layers that perform operations like convolution using a kernel function, non-linear activation, and max pooling. A training process of the model is used to give this built CNN model the set of parameters that best fits this mapping relationship with the minimal error. The training process includes five steps. The first step makes the initialization of the parameters and weights of CNN with random values. The second step is the forward propagation phase. During this step, a training sample (x i , y i ) is passed to the network, x i is transferred from the input layer to the output layer. Finally, we get the output o i , which is formulated as: L is the number of layers, w L is the weight vector of the Lth layer F L . The third step consists of estimating the loss function or the cost function, which is the function that calculates the error margin between the resulting output o i and the correct output value y i . The fourth step will make the correction of the weight vector w 1 , w 2 , w 3 , w L to minimize this loss function following this optimization problem: where l is the loss function. Usually, like here in our work, the cross-entropy loss function is used, as we made in the training phase of the model we adopted in this paper. In fact, we used a weighted version of cross-entropy to alleviate the imbalance between the class Spinal Cord and the class Other.
To solve the numerical optimization problem, we use back-propagation and stochastic gradient descent methods. After this step, a more adequate set of parameters and weights is given to our model. In the fifth step, we will repeat the second, third, and fourth steps through all of the training data: . Usually, this training will end by converging the loss function into a small value. This convergence is assured especially when we use a state of the art architecture like the architectures that we tested in this paper. Passing the full training data one time through the CNN network is named one epoch. Training CNNs usually involves running multiple epochs. Many techniques are used to make the cost function decrease faster. First, the Batch normalization technique [24] introduced in 2015 is becoming a state of the art method used, like we used here in our work, for normalizing the output of convolutional layers and fully connected layer before applying the non-linear activation function. This has a significant effect on making the loss function converge faster. Also, optimizers are also used to make the loss function converge faster. We used in our experiments the ADAM (Adaptive Moment Estimation) [25] introduced in 2015 as it is becoming a state of the art gradient descent optimizer.

Semantic Segmentation and U-Net Architecture
The semantic segmentation is the task of classifying every pixel inside an image into a meaningful category. In medical image analysis, semantic segmentation is a highly pertinent task. Computer-aided diagnosis relies heavily on the accurate segmentation of the organs and structures of interest in the captured medical image modalities (MRI, CT, Fluoroscopy, Ultrasound...). The success made by Convolutional Neural Networks had profoundly influenced the area of semantic segmentation. Many architectures had been proposed. Among the most used architectures in medical image segmentation, U-Net [26] is a state of the art model, Figure 5 represents the architecture of U-Net. U-Net model is an encoder-decoder architecture. The encoder is the contracting part on the left, and the decoder is the expansive path on the right side. The encoder part contains a series of 3 × 3 convolutions followed by a ReLU (Rectified Linear Unit) activation function. A 2 × 2 max-pooling operation is done after a series of consecutive convolutions for downsampling. The decoder part consists of upsampling the feature vector obtained at the end of the encoder part by a series of 2 × 2 up-convolutions to reconstruct the segmentation map at the size of the input image at the end. Between every up-convolutions operations, there are a series of 3 × 3 convolutions followed by ReLu similarly to the decoder part. Skip connections are added by copying the feature map of the encoder part with the correspondingly feature map of the decoder part. In the end, a sigmoid activation function is used to generate for each feature vector the desired class category. The network is trained end to end using back-propagation and stochastic gradient descent. As being discussed in the experimental part of this study, U-Net has proven efficiency where the spinal cord pattern is not previously learned by in the train set.

DenseNets and Fully Convolutional DenseNets
In 2017, Huang et al. [27] proposed the architecture of Densely Connected Convolutional Networks (or DenseNet) for image classification tasks. In this network, every layer is connected to every other layer in a feed-forward manner. Consequently, the input of every layer contains a concatenation of the feature maps of all preceding layers. Also, its own feature map is used as input in all subsequent layers. The network outperforms the state of the art networks on image classification problem, making it a strong feature extractor that can be used as a building block for other tasks like semantic segmentation. The network presents many advantages over its competitors. It mitigates the vanishing-gradient problem and reinforces feature propagation and reuse with smaller number of parameters. The architecture of the Dense block in Densenet is illustrated in Figure 6. Formally, if we consider an image x 0 that passes through the DenseNet network that contains L layers. Each layer applies a transformation F l () with l refers to the layer index. F l () is composed of Convolution, Batch Normalization [24], ReLU [28] and Pooling [29]. If we set x l as the output of the lth layer, x l = F l (x l−1 ) for the traditional convolutional network. But in the dense block of the Densenet architecture, the lth layer of receives as input the feature map of all the preceding layers as expressed in Equation (3): where [x 0 , x 1 , ..., x l−1 ] is the concatenation of the feature maps generated in the preceding layers.
The architecture of Densenet is reused within the semantic segmentation context in the algorithm Fully Convolutional DenseNet (or FC-DenseNet) [30] by merging it inside an U-Net like model. The architecture of FC-Densenet is illustrated in Figure 7. The encoder path of FC-DenseNet corresponds to a DenseNet network that contains dense blocks separated by normal layers (Convolutions, Batch normalization, ReLu, and pooling). These normal layers form the transition down block that reduces the spatial resolution of each feature map using the pooling operation. The last layer of the encoder path is denoted as the bottleneck of the network. FC-Densenet adds a decoder path that aims to recover the original spatial resolution of the image. This decoder part contains Dense Blocks separated by Transition up blocks. The Dense blocks are similar to their corresponding Dense blocks in the encoder. The Transition Up block contains the up-sampling operations (transposed convolutions) necessary to compensate the down-sampling operations (pooling) in the encoder path. Similarly to U-Net, the Dense block of the two paths are connected by skip connections to guide the reconstruction of the input spatial resolution through the up-sampling part of the network. We can note that in the down-sampling path, the output of the dense block is concatenated to the output of the previous transition block to form the input for the next Transition down block. This operation is not used in the up-sampling path to reduce computations because already we have skip connection concatenated to every Dense Block input. The last layer in FC-Densenet is a 1 × 1 convolution followed by a softmax layer to generate for every pixel the per class distribution. The network is trained using cross-entropy loss calculated in a pixel-wise manner. The network can be trained from scratch without the need to train a feature extractor on external data as done in many state of the art segmentation algorithms. In our study, FC-DenseNet is proven to be the best tested algorithm in segmenting spinal cord when the pattern is already learned inside the training set. In this cases study, it outperforms U-Net.

Atrous Spatial Pyramid Pooling
As the shape and the size of objects inside the image may differ, the concept of image pyramid [31] had been introduced to improve segmentation precision. It consists on extracting features from different scales in a pyramid like approach before interpolating and merging them. But calculating the feature maps for every scale separately increases the size of the network and leads to heavy computations with risk of over-fitting. This is why a relevant method that combines the multiscale information in an efficient way is needed. Spatial pyramid pooling (SPP) [32] is proposed to treat this problem. SPP was first proposed to solve the issue of random input size of proposals in object detection [32]. SPP divides the randomly sized images into spatial bins before applying the pooling operation on every bin and concatenating them to obtain the fixed feature map size associated with this input image. Although its efficiency in capturing multi-scale features from the image, SPP is not well adapted to image segmentation because the pooling operations lose the pixel details needed for this task. Hence, we substituted the normal pooling layers in SPP by atrous convolutions with different sampling rates. Then, the extracted features from every sampling rate are merged to obtain the final feature vector. This method is called Atrous Spatial Pyramid Pooling (ASPP). The atrous convolution helps to have different receptive field for a convolution kernel by merely changing the sampling rate. This approach is the base of the state of the art model DeepLabv 3 plus [33], which is a segmentation model different in architecture from U-Net and FC-DenseNet. It is currently the best algorithm tested on PASCAL VOC dataset [34]. This is why we decided to study the effect of inserting an ASPP module inside the FC-DenseNet. Figure 8 illustrates the version of ASPP used in DeepLab v3 plus and in our experiments. Figure 9 shows the insertion of the ASPP module inside the DenseNet block to form an ASPP Dense Block. We will study the effect of this insertion inside the experimental part.

Depthwise Separable Convolution
Recently, MobileNet [35] had been introduced for efficient memory use and firstly designed for mobile and embedded devices. MobileNet is based on using Depth-wise Separable Convolutions (DSC), which have two benefits over traditional convolutions. First, they have a lower number of parameters to train as compared to standard convolutions. This makes the model more generalized and reduces overfitting. Second, they need a lesser number of computations for the training. DSC comprises of separating the standard convolution into two successive convolutions. The first convolution is performed separately over each channel of the input layer. Then, we apply a 1 × 1 convolution to the output feature maps from the previous step to get the final output layer. Figure 10 illustrates the difference between standard convolution and depthwise separable convolution.

Postprocessing
To raffinate the segmentation map generated by the semantic segmentation model, we applied a list of post-processing operations to improve the accuracy of segmentation. Figure 11 represents the status of a segmentation map of the spinal cord before and after the post-processing step. The post-processing step is divided into four sub-steps. In every sub-step, we apply a different morphological operation on the segmentation map generated from the previous sub-step. The first sub-step is to remove small objects that the number of pixels is less than 1000. In fact, we always have in the ground truth one connected blob that corresponds to the spinal cord, with a number of pixels that is surely bigger than 1000 pixels. So, we removed small objects that contains a number of connected pixels less than 1000, and the degree of connectivity is set to 1. In fact, these removed objects correspond evidently for false-positive pixels. After that, we apply the second sub-step to the segmentation map generated from the first sub-step. This sub-step corresponds to removing small holes inside the spinal cord boundary. In fact, we are sure that the connected pixels corresponding to the spinal cord contain no holes. Hence, we removed these holes based on this property. Then, the third sub-step corresponds to apply a morphological closing using a square filter of size 4 × 4 to make the boundary smooth and comparable to the boundary in real images. Then, we apply the last sub-step, which corresponds to a morphological opening using a square filter of size 4 × 4. This intended to recover effects of the closing operation by filtering out the components that probably exceed the real spinal cord boundary. We will study in detail the effect of the post-processing step in the experimental part.

Experimental Results
This section will confirm the efficiency of the chosen approaches (FC-DenseNet and U-Net) by describing the implemented experiments and discussing the found results.

The Used Datasets
To confirm the approaches adopted, we constructed 2 different datasets (Dataset-A and Dataset-B) based on ultrasound medical images collected from 10 patients during the laminectomy surgical operation. The surgeries were performed in King Saud University Medical City. All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of the Institutional Review Board of King Saud University Medical City (Project No. E-15-1602).
The first dataset is built to test the efficiency of the algorithm where the spinal cord exists within the training set. This dataset is formed by collecting significant images from the recorded ultrasound video captured during the laminectomy surgery. We mean by significant the existence of the spinal cord boundary in the images as sometimes the video contains many frames without clear spinal cord boundaries because the ultrasound probe was not set in the right place on top of the spine where the laminae were removed. Then, we cropped the zone containing the spinal cord into 256 × 256 sized images. After that, we provided the pixel-wise manual segmentation of the image labeled by an expert and guided by two neurosurgeons working in the field of spine surgery. After that, we subdivided images for every patient into the train and the test sets following the rule of 80/20, which means putting 80 % of the labeled data inside the train set and 20% inside the test set. Table 1 provides details about Dataset-A used in validating the performance of the segmentation model on spinal cords already provided on the training set. We built Dataset-B to validate model performance on spinal cords not previously provided in the train-set. It is formed by collecting ten different subsets. Every subset Dataset-B i is formed by putting all the images corresponding to the ith patient inside the test set and putting all the images of the other patients inside the train set. Then, measures will be calculated by averaging the results generated by the segmentation model on every Dataset-B i . We aim by this method to test the generalization of the model on spinal cord patterns not provided in the train set in a cross-validation manner. Table 2 gives us an idea about the composition of every Dataset-B i inside Dataset-B. We note that in every Dataset-B i , the test set always contains spinal cord patterns not already learned by the model. This allows us to judge the ability of the model to be applied to new images taken for a new patient, which is the most probable scenario is real cases.

The Evaluation Metrics
To measure the efficiency of the semantic segmentation model, five metrics are used-Intersection over Union (IoU), Accuracy, Recall, Sensitivity, and Dice coefficient. The most important metric that allows to judge the global efficiency of the model is the IoU. IoU is calculated for every class independently before concluding the average IoU for all the classes. Giving two sets of data A and B, IoU is computed using the expression below: Going deeper, the IoU is computed using four measures: TP (True Positives), TN (True Negatives), FP (False Positives), and FN (False Negatives). If we consider a semantic class C, TP is the total of pixels of class C that was classified successfully as C by the algorithm. TN is the number of pixels that don't belong the class C and the algorithm did not associate them to C. FP corresponds to the number of pixels that don't belong to C and the algorithm associated them falsely to the class C. FN is the count of pixels that belong to the class C and the algorithm was not able to classify them as C. Hence, IoU can be expressed explicitly as: Similarly, the other metrics are computed using the following expressions:

Selection of the Best Performing Algorithm on Dataset-A
We began by evaluating the state of the art segmentation algorithms on Dataset-A. This aims to measure the ability of the algorithm to segment spinal cords that were already learned in the training set. We tested five state of the art algorithms. First, we tested Fully Convolutional DenseNets [30], we took the version FC-DenseNet103 as it is being considered as the best variation of FC-DenseNets [30]. We also tested Deeplab v3 plus [33], it is currently the state of the art model tested on Pascal VOC semantic segmentation dataset [34]. We tested PSPNet [36], which is currently the state of the art on Cityscapes semantic segmentation dataset [37]. We also tested two other state of the art algorithms, which are U-Net [26] and BiseNet [38]. To train these algorithms we used Semantic Segmentation Suite [39]. It is an open-source framework that contains implementations of some semantic segmentation algorithms in Tensorflow [40].
We made the training of the chosen algorithms on Dataset-A for 400 epochs. Figure 12 shows the measure of IoU on the test set of Dataset-A after every epoch of the training. It shows that FC-DenseNet103 and U-Net outperform clearly other algorithms. FC-DenseNet103 is slightly better than U-Net because the DenseNets block is more able to capture complex pattern provided in the data. Other algorithms are not capable to capture similarly the complexity of the data patterns inside small datasets like Dataset-A.  Table 3 shows the metrics measured of every one of the tested algorithms. The table shows that FC-DenseNet103 outperforms all other algorithms in all the metrics.    In order to improve more the accuracy, we picked out FC-DenseNet103 and applied a list of modifications. The modifications are detailed in the next sub-section

ASPP DenseBlock
In order to capture the multi-scales features inside the data, we replaced the DenseNet block in FC-DenseNet103 illustrated in Figure 6 by the ASPP-DenseNet block illustrated in Figure 9. We noted that the modification improves the convergence of the algorithm without improving the IoU. Figure 15 shows the IoU change after every epoch of the training.  Table 4 shows the metrics measured before and after applying the ASPP block on FC-DenseNet103. The ASPP is a widely used concept in semantic segmentation. Some studies show an increase of efficiency when integrating it with FC-DenseNet in some tasks like breast tumor segmentation [41]. However, It does not help in improving the results in spinal cord segmentation. This is probably because the size of the spinal cord in the images has small variance and does not have multi-scale patterns to capture contrarily to the breast tumors [41], which differs largely in size.

Depthwise Separable Convolution
In order to reduce the size and the computations needed to train FC-DenseNet, we changed all the convolutional layers in the network by Depthwise Separable Convolutions (DSC). As expected, the operations reduce the size of the parameters without affecting the measured metrics. Only a slight decrease in convergence is noted as shown in Figure 16. Table 5 shows the metrics measured before and after applying the Depthwise Separable Convolutions (DSC) on FC-DenseNet103. On the other hand, Table 6 shows the reduction of size after the application of DSC.

Post-Processing
The morphological operations used in the post-processing step help to use geometric patterns existing in the data to improve accuracy after the application of the model. By applying the post-processing operations explained in Section III.F, an increase in the performance of segmentation is noted, as shown in Figure 17 and Table 7.

Set of Training Configurations
To more fine-tune the FC-DenseNet network, we concluded some configurations to use during the training of the network to improve the network efficiency: • Data augmentation: we used vertical flip and horizontal flip transformation, brightness change of degree 20% and rotation transformation in the range of 20 degree. • Weighted cross-entropy: We gave the weight 0.95 for the weight of the class spinal cord. • ADAM optimizer: the best learning rate to use is 0.0001.

Selection of the Best Performing Algorithm on Dataset-B
After choosing the right algorithm and the right configuration to use on Dataset-A, we pass to Dataset-B. We aim by experiments on this dataset to test the generalization of the model and to segment successfully new spinal cords that are not in the training set. To select the best performing algorithm, we run the five segmentation model already tested on Dataset-A and trained it on every dataset Dataset-B i for 400 epochs. After that, we measure the average on all measures to judge fairly the algorithm that has performed well on new spinal cords. As shown in Tables 8-12, U-Net [26] outperforms clearly the state of the art algorithms in this task.
We deduce from the experiments that U-Net is able to learn complicated patterns from only a small sized data without memorizing the patterns existing in the train set. Hence, it will be the first choice algorithm to be adopted in the next steps of spinal cord analysis in ultrasound imagery. To improve moreover the performance of U-Net, we will cite in the next sub-section the right configuration to use. The integration of ASPP inside U-Net will not be tested following the conclusion we made on experiments made on Dataset-A. ASPP will not have a significant impact on accuracy because the patterns in our task do not have multiscale features to be learned. However, We will study the impact of the other modifications on U-Net (DSC, Post-processing, and set of training configurations). To reduce the size and the computations needed for training U-Net, we substituted all the convolutional layers in the model by Depthwise Separable Convolution (DSC). We note that the size of the model was reduced by a factor of 4, and the global cross-validation accuracy was reduced by 0.0055, as shown in Tables 13 and 14. These are the operations and the configuration to be used with U-Net to improve segmentation accuracy: • Post-processing: This is an important step to implement especially when dealing with patterns not learned inside the train set. This will have a significant impact on reducing the False Positives and the False Negatives and increasing consequently the True Positives and the True Negatives. • ADAM optimizer: the best learning rate we tested in this task is 0.0001. Table 15 illustrates the effect of using these predefined configurations on the improvement of IoU on Dataset-B using U-Net. We conclude that the post-processing step and the set of training configurations have a significant impact on improving the segmentation accuracy of U-Net.

Conclusions
In this study, we provided a solution to the problem of segmentation of the spinal cord in ultrasound imaging. This is done based on the state of the art algorithms existing in semantic segmentation. We constructed two datasets (Dataset-A and Dataset-B). Dataset-A tests the performance of the algorithm on spinal cord patterns already provided in the train set. Dataset-B tests the performance of the algorithm on new spinal cord patterns that are not provided in the train set in a cross-validation way. On Dataset-A, FC-DenseNet103 outperforms all the state of the art methods due to its capability to learn complex data patterns using the DenseNet Block. On Dataset-B, U-Net is the best due to its ability to learn complex patterns from a limited size of data without memorizing exactly the train set patterns. We found that the integration of the Atrous Spatial Pyramid Pooling module did not improve the performance of the model on our task. This is probably because the spinal cord in ultrasound images has a small variance in size and does not have multi-scale features to be captured using ASPP. We found that the Depthwise Separable Convolution (DSC) significantly reduces the size of the model, and the computations needed to train it without affecting the performance. We demonstrated that the post-processing step has a significant impact on improving the segmentation accuracy. We also concluded some configurations to be used during the training to fine-tune the model for best performance. Our work is the first step towards the automatic analysis of the spinal cord using intra-operative ultrasound medical imaging. The next task will be to automatically extract the pulsation curve of the spinal cord from the ultrasound video. This is useful to better analyze the movement of the spinal cord during the laminectomy operation and confirm the extent of decompression.