Bone age assessment from articular surface and epiphysis using deep neural networks

: Bone age assessment is of great significance to genetic diagnosis and endocrine diseases. Traditional bone age diagnosis mainly relies on experienced radiologists to examine the regions of interest in hand radiography, but it is time-consuming and may even lead to a vast error between the diagnosis result and the reference. The existing computer-aided methods predict bone age based on general regions of interest but do not explore specific regions of interest in hand radiography. This paper aims to solve such problems by performing bone age prediction on the articular surface and epiphysis from hand radiography using deep convolutional neural networks. The articular surface and epiphysis datasets are established from the Radiological Society of North America (RSNA) pediatric bone age challenge, where the specific feature regions of the articular surface and epiphysis are manually segmented from hand radiography. Five convolutional neural networks, i.e., ResNet50, SENet, DenseNet-121, E ffi cientNet-b4, and CSPNet, are employed to improve the accuracy and e ffi ciency of bone age diagnosis in clinical applications. Experiments show that the best-performing model can yield a mean absolute error (MAE) of 7.34 months on the proposed articular surface and epiphysis datasets, which is more accurate and fast than the radiologists. The project is available at https: // github.com / YameiDeng / BAANet / , and the annotated dataset is also published at https: // doi.org / 10.5281 / zenodo.7947923.


Introduction
Bone age assessment not only plays a vital role in the health care of children, but also can monitor the growth development, especially for the diagnosis of genetic and endocrine diseases [1][2][3][4], such as adrenal cortical hyperplasia [5], precocious puberty [6], and pituitary dwarfism [7]. Therefore, accurate bone age assessment is of great significance for preventing and treating developmental system diseases.
In clinical diagnosis, bone age assessment mainly depends on experienced experts examining hand radiography through Greulich-Pyle (GP) [8], Tanner-Whitehouse (TW2) [9] and RUS-CHN [10]. Specifically, the GP approach, a simple atlas-based method, can determine the bone age by matching the hand radiograph with the most similar bone region in the reference atlas [8]. The TW is a scoring method by adding the scores that vary by race and gender in a specific set of regions of interest [11]. For the RUS-CHN method, it tends to evaluate each bone in the hand radiography, including radius, ulna, metacarpal bone, and phalange [12], as shown in Figure 1(b). These traditional methods estimate skeletal age by evaluating the region of interest (ROI) in hand radiography, but they are time-consuming and always lead to subjective variability between predictions and concrete results. Since the deep learning method has shown excellent performance in many fields [13][14][15], it is recommended for bone age assessment, and it can be divided into ROI-free [16][17][18][19] and ROI-based [20][21][22] approaches. In the ROI-free methods, Mutasa et al. [23] propose a hidden layer-customized residual network for age prediction, where the convolutional neural networks (CNNs) take images as the input and predict bone ages. These networks can estimate the skeletal age through feature extraction, but some feature regions of bone are ignored by the networks [21]. For the ROI-based approach, it can first obtain the regions of interest according to prior knowledge and then generate predicted bone age results. For example, Li et al. [24] propose a region aggregation graph convolutional network to obtain the bone age, where the convolutional neural network (CNN) is utilized to extract the key regions. Liu et al. [21] propose a novel self-supervised attention network to perform bone age assessment by discovering the informative ROIs in hand radiography, as shown in Figure 1(c). Although these ROI-based methods show comparable performance on the bone age assessment, the specific ROIs of hand radiography are not investigated, and these regions of interest are exactly expected in clinical diagnosis.
From the above observations, the existing methods for bone age assessment from hand radiography have several problems, which can be described as follows: 1) The clinical manual bone age diagnosis method is based on the region of interest for bone age diagnosis, but it is very time-consuming and may produce large errors.
2) The ROI-free deep learning methods can predict bone age by performing feature extraction directly on the whole image, while some feature regions of bone are ignored.
3) Although the ROI-based approaches can estimate the skeletal age through general regions of interest, most of them have not explored the specific ROIs used in clinical applications.
To address such issues, the articular surface and epiphysis datasets are proposed for bone age assessment using deep convolutional neural networks. Specifically, for the first problem, deep learning networks are utilized to assess bone age in the regions of interest to accelerate the process and improve accuracy. For the last two issues, the specific ROIs from hand radiography, i.e., articular surface and epiphysis, are segmented and exploited to evaluate bone age. The main contributions of this article are described as follows: 1) The articular surface and epiphysis datasets, derived from the RSNA dataset [25], are proposed for bone age assessment, where the key feature regions are marked by the radiologists using LableMe 4.6.0 [26].
3) Three bone feature regions, including articular surface, epiphysis, and epiphyseal line from hand radiography, are used as the benchmarks to perform bone age assessment. Extensive experiments demonstrate that the articular surface and epiphysis from hand radiography are beneficial for bone age assessment.
The rest of this paper is organized as follows: the proposed articular surface and epiphysis datasets are described in Section 2, while Section 3 presents deep convolutional neural networks for bone age assessment on the proposed datasets. Section 4 shows the experiments and results analysis, and the conclusions are given in Section 5.

Articular surface and epiphysis datasets
Although the existing methods can predict bone age through general ROIs, they do not explore specific regions of interest, resulting in an inexplicable reference for clinical diagnosis. Inspired by it, the articular surface and epiphysis datasets, derived from the RSNA dataset, are proposed to investigate the effect of specific areas on bone age, where 1068 X-ray images with different ages (8-224 months) are randomly selected to form the new datasets. Specifically, there are 126, 365, 326, 223, and 28 images in 9-48, 48-96, 96-144, 144-192, and 192-228 months, respectively, as shown in Figure 2(a). These samples are distributed in different-group years, which guarantees that the proposed datasets are effective for bone age assessment in clinical practice. During the experiments, the proposed database is split into the training set, validation set, and test set with the numbers of 875, 96, and 97, respectively. Next, the articular surface and epiphysis datasets are described as follows.

Articular surface dataset
As suggested from [32] that the epiphyseal region in the articular surface is often used by radiologists for bone age diagnosis. For example, the distal bone and joint bone are closed in a hand X-ray across the adolescent subjects [33]. Based on it, the articular surface of hand radiography is used as the feature region to access the bone age. An example of the articular surface is shown in Figure 2(b), where the articular surface of the radius ulna, carpal, metacarpal-phalanges, phalanges, and epiphysis are marked as the feature regions. The specific implementation process is described as follows.
• During the screening process, we first sort the all samples in the RSNA dataset by age, and then randomly select images from different age groups to ensure that the proposed datasets satisfy the original distribution.
• In the labeling process, two radiologists use LabelMe4.6.0 [26] to independently mark bone feature regions on hand radiography according to [34], where the bone regions are determined by the shape, size, and color, and another expert reviews and aggregates it.
• Finally, the resulting feature area images are combined with the bone age label to establish the proposed articular surface dataset.

Epiphysis datsets
The size and shape of hand bone is also an important indicator to diagnose bone age [35]. For example, the size of the distal bone represents the development of the human epiphysis, which can provide an effective basis for predicting the height of the human [36]. To this end, the epiphysis dataset is established for bone age assessment, where the detailed information is the same as the articular surface dataset except for the feature region. An example of the epiphysis is shown in Figure 2(c), where the radius ulna, carpal, metacarpal-phalanges, phalanges, and epiphysis are included in the new hand radiography. In addition, the creation process of the epiphysis dataset is the same as that of the articular surface dataset.
3. Deep convolutional neural networks for bone age assessment on articular surface and epiphysis datasets The key regions of interest from hand radiography, i.e., articular surface and epiphysis, are often employed to perform bone age diagnosis by radiologists [8][9][10], but it is time-consuming and always leads to large errors between the predicted results and actual one. To solve this problem, the deep convolutional neural networks are exploited for bone age assessment on the proposed articular surface and epiphysis datasets, where five popular deep learning models,i.e., ResNet50 [27], SENet [28], DenseNet-121 [29], EfficientNet-b4 [30], and CSPNet [31], are used as the feature extractor, and then the multi-layer perceptron is exploited to produce the bone age, as shown in Figure 3, which are described as follows. Figure 3. The framework of bone age assessment from hand radiography using deep neural networks, where the deep convolutional neural network is first regarded as the feature extractor to obtain the feature maps, and then the multi-layer perceptron is exploited to predict the final bone age.

Bone age assessment using residual convolutional neural network-ResNet50
Residual convolutional neural network (ResNet), one of the most widely used deep learning models, can provide direct gradient feedback for network training by designing skip-connection. This can help the network provide a direct path to the very early layers in the network, making the gradient update of these layers easier. Based on it, the ResNet50 is used as a predictor for bone age assessment, defined as where SC and Conv denote the skip connection and convolutional layers, respectively, as shown in Figure 4(a). Specifically, the identified skip connections are embedded in the convolution layer to prevent network gradient degradation, and the last multi-layer perceptron can predict bone age by mapping the feature representation to the actual sample space.

Bone age assessment using attention convolutional neural network-SENet
Attention mechanism [37] has attracted extensive attention in medical image analysis, such as [38][39][40]. SENet [28], an effective attention mechanism, is designed to learn the details and model the global dependencies for improving the network performance. Inspired by it, the SENet is applied to obtain the bone age on the proposed dataset, defined as where SQ and EX represent the Squeeze operation and Excitation operation, respectively, as shown in Figure 4(b). Specifically, the squeeze operation is utilized to compress the feature map into the feature vector, and the attention mechanism is employed to generate the attention weight matrix to obtain detailed information.

Bone age assessment using dense convolutional neural network-DenseNet
DenseNet is a convolutional neural network, which uses dense blocks to transfer the maximum information to each layer in the network, where each layer takes the information of all previous layers as input and then transmits the feature maps into all subsequent layers [29]. Similarly, the DenseNet is exploited to perform bone age assessment on the proposed dataset, defined as DenseNet(X l ) = F[SC(X 0 , · · · , X l−1 ) + X 0 , · · · , X l−1 ], (3.3) where SC and F denote skip connection and convolution operation, respectively, as shown in Figure 5, and each layer takes the output of all previous layers as input and then transmits the feature maps into all subsequent layers.

Bone age assessment using efficient convolutional neural network-EfficientNet
EfficientNet is an adaptive convolutional neural network, which optimizes the network by searching for the appropriate receptive field, depth, and resolution of convolutional layers [30], and it shows excellent performance on the computer vision tasks [41,42]. Motivated by this, the EfficientNet is used as the backbone for bone age assessment on the proposed articular surface and epiphysis datasets, where the MBconv module, containing the skip connection and squeeze-excitation, is applied to generate the fine feature maps from the raw image.

Bone age assessment using cross stage partial network-CSPNet
The Cross Stage Partial Network (CSPNet) is designed to provide abundant gradient combination and reduce the calculation for the deep learning model by dividing gradient flow to different convolutional layers [31], where the feature maps are separated into a dense block and a transition layer, and then it will integrate with the transmitted feature map into the next layer. Thanks to its powerful excellent performance on computer vision [43], the CSPNet is employed to perform bone age on the proposed dataset, as shown in Figure 7. Specifically, the feature maps are separated into a dense block and a transition layer, and then the other one is integrated with transmitted feature maps into the next one.

Evaluation metric
For all radiographs, the mean absolute error (MAE) and root mean square error (RMSE) between prediction and actual age are used as evaluation metrics [20], defined as and where S f (n) represents the actual bone age, and S f (n) represents the predicted bone age.

Implementation details
These five deep learning models are conducted by Pytorch 1.8.0 [44] on a workstation with four NVIDIA GeForce GTX 1080Ti GPUs, where the SGD optimizer [45] with an initial learning rate of 5e 5 is employed to train these networks, including ResNet50, SqueezeNet, DenseNet-121, EfficientNet-b4, and CSPNet, while the epoch, batch size, and momentum are set as 8, 100, and 0.9 respectively, the StepLR is used as the learning scheduler, and L1Loss is regarded as the loss function. In addition, all color images are scaled to 576 × 876, and they are normalized by subtracting the mean of 0.485, 0.456, and 0.406, and dividing the standard deviation of 0.229, 0.224, and 0.225, respectively at the three channels. Furthermore, the horizontal flip, vertical flip, and random rotation are applied to the proposed datasets for data augmentation.

Performances on articular surface dataset
To validate the effectiveness of the proposed articular surface dataset, five networks are used for bone age assessment. The experimental results are shown in Table 1, and it can be seen that the CSPNet [31] can achieve the best performance for bone age assessment, with 7.34 months of the MAE and 9.75 months of the RMSE, respectively, demonstrating that the proposed articular surface dataset can be an effective database for bone age assessment using deep learning models. Furthermore, we also compare the proposed method with two automatic bone age assessment methods (i.e., SIMBA [46] and Chen et al. [47]) and two radiologists, and the results are listed in Table 2. It can be inferred that the proposed method performs better and faster than automatic methods and manual diagnosis, where the average reading time of radiologists are 37.22s and 105.39s, respectively, while that of ours is 0.07s, showing that the proposed method is more accurate and fast than the radiologists.
In addition, the test results for each image are shown in Figure 9(a), and the test results of different age groups are plotted in Figure 9(c), where the first to fifth rows represent the results of the DenseNet-121 [29], ResNet50 [27], SENet [28], EfficientNet-b4 [30], and CSPNet [31], respectively. It can be noted that all models show a high sensitivity to these subjects between 9 and 192 months because the features are prone to explicitly discriminated on the radiograph. Besides, there are some slightly larger errors between 192-224 months generated by DenseNet, but the CSPNet [31] can achieve the MAE of 7.32 months, demonstrating the effectiveness of the proposed articular surface dataset.

Performances on epiphysis dataset
Similarly, these five popular deep-learning networks are applied for bone age assessment on the proposed epiphysis dataset. The results are given in Table 1, we can infer from it that the MAE and RMSE obtained by all models are less than 10 months, which proves that the epiphysis dataset is beneficial for bone age assessment. Moreover, the computer-aided methods of SIMBA [46] and Chen et al. [47], and two manual assessments, are used as the comparison method to validate the superiority of the proposed method on the epiphysis dataset, and the test results are given in Table 2. From it we can see that the proposed method achieves the best performance on the epiphysis dataset with the speed of 0.07 scan/s, demonstrating that it is prone to clinical practice. Moreover, the specific test results conducted on the epiphysis dataset are shown in Figure 9, where the results of the DenseNet-121 [29], ResNet50 [27], SENet [28], EfficientNet-b4 [30], and CSPNet [31] are given in first to fifth rows, respectively. It can be observed from (b) in Figure 9 that most of them show comparable results across the ground truth, indicating their excellent performance for bone age assessment. Furthermore, it can be seen from (c) in Figure 9 that the CSPNet [31] can get the MAE of 13.2 months between 192 and 224 months, achieving improvements of 9.39 months compared with the DenseNet, which shows that the performance for bone age assessment on the epiphysis dataset can be advanced with the optimization of the deep learning models.

Ablation studies of skeletal feature regions
In clinical diagnosis, radiologists mainly assess bone age by comparing bone characteristic areas in hand radiographs with a reference. In order to validate the impact of these bone characteristic areas, a number of ablation experiments are conducted on the feature regions, including the hand articular surface (HAS) region, hand epiphysis (HE) region, and hand epiphyseal line (HEL) region, as shown in Figure 8.

Impact of hand articular surface region
In medical imaging, radiologists often make bone age diagnoses by comparing hand articular surface regions. On the other hand, the CSPNet [31] shows excellent performance on the proposed datasets. Inspired by this, we explore bone age assessment in hand articular surface region using CSPNet [31]. The results are shown in Table 3. It can be seen that the CSPNet [31] can achieve the MAE of 7.34 months and the RMSE of 9.35 months in the hand bone region, which shows that the hand articular surface region can be used for bone age assessment in clinical practice. The epiphysis is one of the important indicators for evaluating the growth and development of children. In general, the epiphyseal area has a linear relationship with age, and it can be used to assess bone age clinically [48]. Motivated by this, the bone age assessment is conducted on the hand epiphysis region using CSPNet [31] for bone features extracting. The results are given in Table 3. It can be observed that the CSPNet [31] can achieve the MAE of 7.71 months and the RMSE of 9.95 months in the hand epiphysis region, demonstrating that it is feasible to use a neural network to evaluate bone age on the hand epiphysis region.

Impact of hand epiphyseal line region
Inspired by the above two feature regions, the hand epiphyseal line region is employed for bone age assessment using CSPNet [31]. To evaluate its effectiveness, we conduct a large number of experiments on the hand epiphyseal line region, and the results are listed in Table 3. It can be inferred that the CSPNet [31] shows the best performance in the hand epiphyseal line region, achieving the MAE of 12.28 months and the RMSE of 16.11 months, indicating that bone age assessment in the hand epiphyseal line region is helpful for clinical practice.
In a word, it is considered to evaluate bone age in the hand articular surface and epiphysis region, but the performance on the articular surface is better than that of epiphysis using CSPNet [31]. Therefore, we recommend the hand articular surface region as the benchmark for bone age assessment using deep learning models.

Limitations and future works
As shown in the experiments, five deep convolutional neural networks are employed for bone age prediction on the proposed articular surface and epiphysis datasets, which are faster and more accurate than the radiologists. However, there are still some limitations to the proposed method. First of all, the annotations would be firstly required by radiologists, and then the bone age can be obtained from the convolutional neural networks. Second, the proposed articular surface and epiphysis datasets in this study have only 1,068 images, which is limited to the original RSNA dataset, which consists of 14,236 images. Third, the proposed datasets mainly focus on the articular surface and epiphysis of hand radiology, where some interest of regions, e.g., phalanx, are missed, which will help to improve the performance of bone age assessment.
In future works, three key measures will be conducted as follows. First, a segmentation network would be required as a pre-processing step to make the proposed method really usable. Then, more work will be performed to expand the proposed articular surface and epiphysis datasets, so as to it is beneficial to clinical practice. Finally, more characteristic regions of interest will be incorporated into our research work to improve the performance of bone age diagnosis.

Conclusions
In this paper, deep convolutional neural networks are employed for bone age prediction on the proposed articular surface and epiphysis datasets, where the convolutional neural networks are exploited to improve the accuracy and efficiency of bone age assessment. Extensive experiments on the proposed datasets demonstrate their effectiveness for bone age assessment using deep learning models, further assisting radiologists in clinical diagnosis.

Conflicts of interest
No potential conflict of interest.