Dental Age Estimation Based on X-ray images

Chronological age estimation using panoramic dental X-ray images is an essential task in forensic sciences. Various statistical approaches have proposed by considering the teeth and mandible. However, building automated dental age estimation based on machine learning techniques needs more research efforts. In this paper, an automated dental age estimation is proposed using transfer learning. In the proposed approach, features are extracted using two deep neural networks namely, AlexNet and ResNet. Several classifiers are proposed to perform the classification task including decision tree, k-nearest neighbor, linear discriminant, and support vector machine. The proposed approach is evaluated using a number of suitable performance metrics using a dataset that contains 1429 dental X-ray images. The obtained results show that the proposed approach has a promising performance.

To avoid the problems of facial age estimation, forensic scientists usually attempt to estimate the age of individual through estimating the development stages of a tooth and linking the estimated stage to the most probable age. Like every part of human body, teeth have different development stages. Dental X-ray images are the main and commonly used technique in dental age estimation performed by forensic scientists [Čular, Tomaić, Subašić et al. (2017)]. Dental X-ray images contains all the information needed to assess the dental development [Ozaki and Motokawa (2000)]. One method is presented by Haavikko K. Where sketches of teeth at different development stages and a table that associate each stage to an age are employed. Because of inter-sex variation, age estimates are provided for males as well as females. However, this manual age estimation is a tedious, time-consuming, and subjective task. Hence, an automated dental age estimation is needed to improve the age estimation accuracy and repeatability [Čular, Tomaić, Subašić et al. (2017)]. Traditional dental age estimation methods may involve several steps including image preprocessing, segmentation, feature extraction, classification or regression. This objective of these methods is to determine the age group of persons in case of classification or to determine the exact age of persons in case of regression. The success of each step in these methods is highly dependent of the success of the preceding steps. In other words, the success of feature extraction step depends on the success of the segmentation step. Similarly, the success of the classification or regression step depends on the success of both segmentation and feature extraction steps. In addition, segmentation and feature extraction are non-trivial problem-dependent tasks [Razzak and Naz (2017)]. On the other side, deep learning approaches have been effectively employed to solve many problems in several research areas including computer vision, natural language process, and object recognition. Deep learning-based methods are called end-to-end learning-based method in which deep neural networks such convolutional neural networks can work directly on the input images and produces the required output without the need to perform intermediate steps such as segmentation and feature extraction. However, designing and training deep neural networks is a difficult and time-consuming process. So, rather than designing and training deep neural networks from scratch, it is possible to use pre-trained deep networks to perform the required tasks. This is known as transfer learning. According to Castellucio et al. [Castelluccio, Poggi, Sansone et al. (2015)], there are two ways to apply the concept of transfer learning. The first includes obtaining the features extracted from the input images by getting the values of the last fully connected layer of the net [Athiwaratkun and Kang (2015)]. Then, it employs another classifier to perform the classification process. The second way involves modifying the structure of the network by dropping out high-level layers. This process is called the network fine-tuning. In the proposed work, the first way is used to build the proposed automated dental age estimation approach. In this paper, a novel automated dental age estimation approach is proposed based on dental X-ray images. The objective of the proposed approach is to determine the age group of a person using its dental X-ray images among different seven age groups. The proposed approach embraces three main steps: image preprocessing, feature extraction and classification. The feature extraction step is performed using two convolutional deep neural networks namely AlexNet and ResNet-101 while the classification step is performed using a number of well-known classifiers including decision tree, k-nearest neighbor, linear discriminant, and support vector machine. The remaining sections of this paper are organized as follow: Section 2 covers different age estimation methods. Section 3 describes the proposed automated dental age estimation in detail. Section 4 includes the implementation details and results analysis. Finally, Section 5 contains the conclusion and future work.

Related works
Several approaches have been proposed in order to achieve accurate dental age estimation. This section is dedicated to cover some of the recent works in the field. Hemalatha et al. [Hemalatha and Rajkumar (2018)] have presented a classification model for dental age estimation for Indian kids based on Demirjian's approach. Given the RMI tooth image, the proposed approach starts with preprocessing the images for noise removal and more smoothing. Then, the teeth are segmented using Active Contour Model (ACM) and various features are extracted from the segmented teeth. The extracted features include GLCM features, Haufsdroof distance, geometric features, etc. Finally, a fuzzy neural network is used to perform the classification process. A dataset that consists of images that belongs to 100 healthy persons with ages between 4 and 18 years is used to evaluate the proposed approach. The obtained results show that the proposed approach has 89% accuracy. Štepanovský et al. [Štepanovský, Ibrová and Buk (2017)] have compared the performances twenty-two age estimation approaches in dental age estimation in terms of accuracy and complexity. The used dataset contains the dental X-ray images of 976 persons (662 boys and 314 girls). The experimental results show that the best methods are a tabular multiple linear regression model, an M5P tree model and support vector machine (SVM) model with polynomial kernel function. Čular et al. [Čular, Tomaić, Subašić et al. (2017)] have introduced a dental age estimation method based on panoramic X-ray images. The proposed method employs both active shape model (ASM) and active appearance model (AAM) to detect the outer contour of teeth. Then, statistical models are used for feature extraction and a neural network for age estimation. A dataset that include X-ray images of 203 persons is used to validate the performance of the proposed method. The obtained results show that the ASM model and AMM model have mean absolute error (MAE) of 2.481 and 2.483, respectively. Sironi et al. [Sironi, Taroni, Baldinotti et al. (2018)] have proposed an age estimation method based on assessing the volume of the pulp chamber. The measurement of pulp chamber volume is done with the help of 3-D cone beam computed tomography (CBCT) images. In the proposed work, a Bayesian network is used for dental age estimation. The proposed method is evaluated using a dataset that includes information of 286 healthy persons. The obtained results show that the proposed method has a promising performance in terms of accuracy, bias, and sensitivity. Tao et al. [Tao, Wang, Wang et al. (2019)] have proposed a dental age estimation approach in which the age is predicted using Multi-layer Perceptron algorithm. Leave-one-out cross-validation is employed during the training process in order to address the overfitting problem. The experiments are conducted on the dataset that contains images that belong to 1636 persons (787 males and 849 females). The obtained results show the superiority of the proposed method compared to other traditional methods including Demirjian's method and Willem's method in terms of RMSE, MSE, and MAE. de Back et al. [de Back, Seurig, Wagner et al. (2019)] have presented a dental age estimation approach based on dental X-ray images. The proposed approach employs a Bayesian convolutional neural network for both age prediction and uncertainty estimation. It has been evaluated by conducting experiments on a dataset that contains 12000 dental X-ray images. The obtained results show that the proposed approach has a concordance correlation coefficient of 0:91. Kim et al. [Kim, Bae, Jung et al. (2019)] have introduced another automated dental age estimation approach that depends on deep learning. In the proposed approach, a convolutional neural network (CNN) is employed for age estimation. The used dataset contains dental X-ray images for 9435 individuals (4963 male, 4472 female) that have been organized in three age groups. The obtained results show that the proposed approach has a good performance. Asif et al. [Asif, Nambiar, Mani et al. (2019)] have proposed a statistical dental age estimation approach that depends on volumetric analysis of the pulp/tooth ratio. The authors have employed simple linear regression and Pearson correlation analysis in order to perform the intended task. The used dataset contains 300 CBCT scans for 153 male and 147 female that are classified into five age groups. The obtained results show that the proposed approach has 6.48 MAE. Farhadian et al. [Farhadian, Salemi, Saati et al. (2019)] have introduced another dental age estimation approach by means of pulp-to-tooth ratio. The proposed approach has employed a neural network to perform the age estimation task. The experiments that have been conducted on a dataset that consists of 300 CBCT scan have shown the superiority of the neural network compared to another regression model. The proposed approach has 4.4 RMSE and 4.12 MAE.

Proposed work
In this section, an automated dental age estimation approach is proposed based on dental X-ray images. The proposed approach aims to determine the age group of persons using their dental X-ray images. As shown in Fig. 1, the proposed approach consists of three main steps: image preprocessing, feature extraction, and classification. In the image preprocessing step, the dental X-ray images are converted in the RGB color model and resized to a certain size. Additionally, a data augmentation process is applied to enlarge the size of dataset. In the feature extraction step, the concept of transfer learning is adopted to perform this task. Two pre-trained deep neural networks, namely AlexNet and ResNet, are employed to extract the discriminant features. Finally, a number of classification models are used to perform the classification task including Decision Tree (DT), K-Nearest Neighbor (K-NN), Linear Discriminant (LD), and Support Vector Machine (SVM). The detailed descriptions of the different steps are given in the following subsection.

Image preprocessing
In this step, three main operations are performed to make the dental X-ray images ready for subsequent steps. First, dental X-ray images are converted into RGB color model. Then, dental X-ray images are resized to fit the requirements of the used deep neural networks. The size of dental X-ray images is set to 277*277 with AlexNet while the images' size is set 224*224 with ResNet. Finally, a data augmentation step is performed to build a large dataset that is suitable for deep neural networks. In this step, two operations are performed namely, translation and reflection. In the translation operation, the dental X-ray images are randomly shifted along the X-axis and Y-axis with a shift value bounded by the interval [-30,30]. In the reflection operation, the dental X-ray images are mirrored along the vertical access. An example for the data augmentation process is shown in Fig. 2. The left column contains the original dental X-ray images. The middle column contains the translated versions. The last column contains the mirrored versions. The first row contains a dental X-ray image for a female while the second row contains a dental x-ray image for a male.

Feature extraction
Convolutional Neural Networks (CNN) is a popular architecture for deep neural networks that achieved many breakthroughs in many fields including machine learning and computer vision. CNNs can successfully accomplish their work without being affected by tilting, translation, and scaling [Yu, Chang, Yang et al. (2017)]. CNNs usually include three layer types: convolutional layer, pooling layer, and fully connected layer (See Fig. 3). The role of convolutional layer is to compute the weighted sum, to add the bias value to the weighted sum, and to apply an activation function called the rectifier linear unit (ReLu), which is defined using Eq. (1), on the addition result. On the other side, the objective of pooling layers is to manage the overfitting by decreasing the number of features obtained from the convolutional layer. Finally, the fully connected layers aim to gathering all the feature of descriptor to be classified using the last layer [Thanh, Vununu, Atoev et al. (2018)]. Re ( ) max(0, ) (1) Lu x x = In the proposed work, two well-known pre-trained convolutional neural networks called AlexNet [Krizhevsky, Sutskever and Hinton (2012)] and ResNet-101 ] are used to perform the feature extraction step using the concept of transfer learning. Generally, transfer learning is used for a number of reasons. First, training a CNN from scratch using random initial values is difficult due to the absence of large datasets. Hence, using the weights of a pre-trained net as initial values can be useful in addressing many of the problems in hand. Second, training a very deep network from scratch is a time-consuming process that needs sophisticated machines with expensive GPUs. Finally, there is no clear theoretical guidance that can help in selecting the appropriate topology, training method, parameter values, etc.

Feature extraction using AlexNet
AlexNet is a popular CNN that was proposed by Krizhevsky et al. [Krizhevsky, Sutskever and Hinton (2012)] to compete in the ILSVRC-2010 challenge for classifying the ImageNet database. It contains five convolutional layers, three fully connected layers, as well as max-pooling layers. All of the eight layers need to be trained. In AlexNet, the overfitting problem is addressed using a number of ways including normalizing the local response, data augmentation, and the dropout approach in which the output of hidden neurons is set to zero with a probability 0.5. The dropout process is performed on the first two full-connected layers. In the feature extraction step, the last three layers of the original AlexNet are freezed and replaced with other three layers that suit the classification problem in hand. The eliminated layers are the last fully connected layer, the softmax layer, and the output layer. The features are obtained from the last fully connected layers after completing the training process using our dataset. The length of each feature vector is 4096.

Feature extraction using ResNet-101
Deep residual networks are extremely deep architectures that have achieved high accuracy and good convergence behavior in many recognition and classification problems ]. Deep residual networks have been proposed to address the problem of accuracy degradation that occurs when deep networks start to converge. The idea behind the residual learning is shown in Fig. 4. Simply, each few stacked layers fit a residual mapping rather than directly fitting the required underlying mapping. In other words, if the required underlying mapping is denoted by H(x), the stacked nonlinear layers fit another mapping which is defined as F(x)=H(x)-x. Hence, the original mapping is reformulated as f(x)+x which can be achieved using feed forward neural networks and shortcut connections ].

Figure 4: The idea of residual learning
ResNet is a very deep network which is designed and built based on the principle of residual learning ]. It is the winner of ILSVRC 2015 in image classification challenge. Several variants have been designed that belongs to the ResNet family including ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152. The number in the network name denoted the number of layers included in the network. In the proposed work, the pre-trained residual network ResNet-101 is used to perform the feature extraction step. The length of the resulting feature vector is 2048.

Classification
In the proposed dental age estimation approach, the classification step is performed used a number of well-known classification models. The goal of the used classification models is to determine the age group of a person based on the feature vector extracted from his dental X-ray image. The used classifiers include decision tree (DT) [Quinlan (1986)], K-nearest neighbor (K-NN) [Cover and Hart (1967)] using Euclidian distance and K is set to 1, linear discriminant (LD) [Zhao, Chellappa and Nandhakumar (1998)], and support vector machine (SVM) [Vapnik (1998)] using different kernel functions.

Implementation and experiments
This section includes the implementation details of the proposed dental age estimation approach. Additionally, it describes the used dataset. Moreover, several experiments are conducted to evaluate the proposed approach in terms of a number of performance measures.

Dataset description
The used dataset is obtained from Ebtisama clinic in Kuwait. It consists of 1429 dental X-ray images that belong to persons of different genders and different age groups. A sample of the dental X-ray images form the used dataset is shown in Fig. 5.

Figure 5: A sample from the used dataset
In the proposed dental age estimation approach, eight age groups are considered. The details of the different age groups and the number of images contained in each group are shown in Tab. 1.

Implementation and experimental results
The proposed automated dental age estimation is implemented using Matlab 2018a. The performance of the proposed approach is assessed using the following performance measures:

TP TN
The used classifiers are evaluated using the 10-fold cross validation approach. The DT classifier is implemented using max split is equal to 20. The K-NN classifier is implemented using K=1. The standard SVM works on two classes only while the proposed work needs multi-class classifier. So, a modified version of the standard SVM called Multi-Class Support Vector Machine [Weston and Chris (1998)] is employed for building a classifier that is capable of differentiating among several classes. Moreover, several kernel functions are used with the SVM classifier including Gaussian kernel function, cubic kernel function, and quadratic kernel function. AlexNet is trained with the following settings: the patch size is equal to 5, the number of epochs is equal to 6, and the learning rate is equal to 1*e-4. Similarly, The ResNet is trained using the following settings: the patch size is equal to 10, the number of epochs is equal to 6, and the learning rate is equal to 1*e-4. All the experiments are executed using GPU NVIDIA GE FORCE 920M 4 GDDRAM. The obtained results are shown Tabs. 2-13.            In addition, a summary for the obtained results for the compared methods are shown in Tab. 14. Based on Tab. 14, it is observed that the K-NN and DT classifiers have the best accuracy values whether the extracted features are Alex-Net based features or ResNet based features while the SVM with quadratic kernel function have the worst performance in terms of classification accuracy. Also, it is noticed that the K-NN classifier with AlexNet based features provides the best performance in terms of specificity followed by the DT classifier.
Regarding the precision, the best performance is achieved using DT classifier whether the extracted features are Alex-Net based features or ResNet based features or K-NN classifier with ResNet based features. Finally, the best values of recall and F-measure are achieved using K-NN classifier with AlexNet based features and K-NN with ResNet based features, respectively. Hence, the K-NN classifier is the best classifier followed by the DT classifier while the SVM with quadratic kernel function is the worst. Also, based on the obtained results, the AlexNet based features is better than the ResNet based Features. The obtained results are visualized in Fig. 6.

Conclusion and future work
Transfer learning has proved its effectiveness in many machine learning and object recognition problem. In this paper, a transfer learning based dental age estimation has been proposed using dental X-ray images. The proposed dental age estimation approach can classify the dental X-ray images into eight age groups. In the proposed approach, the dental X-ray images are preprocessed and the features are extracted using two wellknown deep neural networks namely, AlexNet and ResNet. Finally, several classification models have been employed to perform the classification task including decision tree (DT), linear discriminant (LD), k-nearest neighbor (K-NN), and support vector machine (SVM). Based on the experimental results, the AlexNet based features is better than the ResNet based features. Also, the k-NN classifier is the best in terms of the different performance metrics compared to other classifiers. In the future, we intend to evaluate the proposed approach using larger dataset and other classification models.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.