Sex Determination of Three-Dimensional Skull Based on Improved Backpropagation Neural Network

Sex determination from skeletons is a significant step in the analysis of forensic anthropology. Previous skeletal sex assessments were analyzed by anthropologists' subjective vision and sexually dimorphic features. In this paper, we proposed an improved backpropagation neural network (BPNN) to determine gender from skull. It adds the momentum term to improve the convergence speed and avoids falling into local minimum. The regularization operator is used to ensure the stability of the algorithm, and the Adaboost integration algorithm is used to improve the generalization ability of the model. 267 skulls were used in the experiment, of which 153 were females and 114 were males. Six characteristics of the skull measured by computer-aided measurement are used as the network inputs. There are two structures of BPNN for experiment, namely, [6; 6; 2] and [6; 12; 2], of which the [6; 12; 2] model has better average accuracy. While η = 0.5 and α = 0.9, the classification accuracy is the best. The accuracy rate of the training stage is 97.232%, and the mean squared error (MSE) is 0.01; the accuracy rate of the testing stage is 96.764%, and the MSE is 1.016. Compared with traditional methods, it has stronger learning ability, faster convergence speed, and higher classification accuracy.


Introduction
Forensic anthropologists throughout the world are faced with a tough battle in keeping up with the changing crime behaviour. A possible improvement in counteracting criminal trends is to maximize the available evidence, which may be gleaned from incomplete and often fragmentary skeletal materials. In this regard, sex determination remains a critical aspect of human identification from skeleton in forensic cases as it reduces the number of possible matches by 50%, whilst jointly serving as baseline data for identification procedures such as facial reconstruction [1]. erefore, sex identification for unknown skeleton is an important work. According to experience and previous studies [2][3][4], sex classification using pelvis morphological characteristics has the highest accuracy. However, in most cases, we could only get completely skull rather than skeleton, and as skull is composed of hard tissue, it is easily preserved. erefore, sex identification through the skull has become a core content of forensic anthropology. e common sex classification includes morphology discriminant method and measurement discriminant method. Traditional sex identification of the skull mainly depends on anthropologists' visual morphology assessment of two state characteristics of sexes and draws conclusions through naked eye observation and experience. Krogman [5] used the morphological method to identify 750 known sex skulls, and the correct rate was 82-87%. Ramsthaler et al. [6] used the kappa statistic to quantify the disagreement between two different observers on gender visual morphology assessment, with a consistency of 90.8% only. With the rapid development of computer technology, computer-aided measurement is increasingly used for the extraction of skull feature items. Shui et al. [7] selected 133 threedimensional skull models in Xi'an area, measured 14 indexes of skull by computer software, and established multiple sex discriminant functions with the stepwise Fisher method and carried out the back generation test. e male discrimination rate was 87.5%, and the female discrimination rate was 86.7%. Liu [8] analyzed the feature point data of 142 cases of Han skull orthotopic X-ray. Using the SPSS software to analyze, the discriminant regression equation was established and the accuracy rate was 95%. Franklin et al. [9] used OsiriX software to calibrate 31 skulls of 400 skull reconstructions from Australian CTscans, measured 18 characteristics by MorphDb measurement software, and established a gender discriminant function with a recognition accuracy of 90%. Tanya et al. [10] used Sidexis XG software to measure the maxillary sinus of skull radiographs on 50 adult digital skull radiographs. e maxillary sinus index was calculated, and discriminant function analysis was performed. e discriminant equation was determined with a gender of 68%. In summary, we can see that the morphological discrimination method is simple and easy to implement, but it depends too much on expert knowledge and subjective experience, with insufficient theoretical knowledge and low recognition rate. e method of measurement and discrimination is objective, and the recognition rate has been improved, but most of the methods used are based on discriminant analysis to design prediction rules. However, all the results obtained by using these prediction models indicate that the relationship between the probability of an individual belongs to a certain sex and the explanatory variables (bone measurements) are not linear [11,12].
To solve these problems, in this paper, we propose a method of sex identification based on improved BP neural network. It takes the skull features measured by computer software as input and the result of sex classification as output. By learning the sample, the approximate function relationship between input and output is determined so as to realize gender classification. is is a nonlinear classification method. Unlike DFA, BPNN does not require distributional assumptions of the variables and is able to model all types of nonlinear functions between input and output of a model [13]. e advantages of this method are as follows: firstly, it needs no professional qualification; secondly, it can fully approximate the complex nonlinear relationship of skull data; and finally, it can get a high recognition rate.

Materials.
is research is carried out on a database of 267 whole-skull CT scans (153 females and 114 males) on voluntary persons that mostly come from the Uighur ethnic group in the north of China (females aged 18-88 and males aged 20-84 ). e images of each subject are restored in DICOM format with a size of approximately 512 × 512 × 250. Each 3D skull surface is extracted from the CT images and is represented as a triangle mesh of about 220,000 vertices. All the skulls are substantially complete; that is, each skull contains all the bones from calvarias to jaw and has full mouth of teeth.
All the samples are transformed into a uniform coordinate system so as to eliminate the inconsistence in position, pose, and scale caused by data acquirement. e uniform coordinate system is determined by four skull landmarks, left porion, right porion, left (or right) orbitale, and glabella (denoted as L p , R p , L o , and G). e Frankfurt plane [14] is determined by three points, L p , R p , and L o . e coordinate origin (denoted as O) is the intersection point of the line L p R p and the plane that contains point G and orthogonally intersects with line L p R p . We take the line OR p as x-axis. e z-axis is the line through the point O and with the direction being the normal of the Frankfurt plane. en, y-axis is obtained by the cross product of z − and x − axis. Once the uniform coordinate system is defined, all the prototypic skulls are transformed into it. Finally, the scale of all the samples is standardized by setting the distance between L p and R p to unit, i.e., each vertex (x, y, z) of the skull is scaled by (x/|L p − R p |, y/|L p − R p |, z/|L p − R p |). One skull in the uniform coordinate system is shown in Figure 1.
e data used in this paper included 267 skulls consisting of 153 females and 114 males derived from the Visualization Technology Institute of Northwest University in China. e collected data were then measured by computer-aided measurement.
ere are six variables for gender determination in 3D skull. ey are cranial sagittal arc, cranial sagittal chord, apical sagittal arc, apical sagittal chord, occipital sagittal arc, and Occipital sagittal chord. All measurements are represented by symbols, as shown in Table 1.

Backpropagation Neural Network.
In this paper, the technical specific BPNN of artificial neural network is proposed for gender determination. ANN can be classified into feed forward and recurrent, according to their connectivity. e ability of ANN to predict outcomes accurately depends on the selection of proper weights in the training. Training or learning is the relationship between inputs and target. e learning rules defined as network processes aim to adjust weights and biases [15]. It uses the rapidest descent to continuously adjust the weights and thresholds of neural network by backpropagation, so as to minimize the sum of the square error of the network [16]. ree types of learning of neural network methods are supervised, unsupervised, and reinforced [17]. In supervised learning, the network is provided with inputs and desired outputs or target values. In unsupervised learning, on the other hand, the weights and biases are modified only through response to network inputs, using mean squared error (MSE) to measure the performance of the models. MSE is the average of the squares of the difference between each output and the desired output, given by the following equation: ANN is learned by the backpropagation algorithm in which the errors of the hidden layer units are determined by the errors of the output layer units [18]. e self-learning of BP neural network usually has two parts: one is the forward transmission of information; another is the reverse transmission of error between excepted output and actual output. e structure of BP neural network consists of three parts: input layer, hidden layer, and output layer. e model of BP neural network is shown in Figure 2 [19]. e weighted value between input layer and hidden layer is w ij , and the weighted value between hidden layer and output layer is w jk . e transfer function of neural network is unipolar sigmoid function which is f( [20].
In accordance with the gradient descent method, the data transmit from the input layer to the hidden layer which is O i � x i . After the hidden layer receives the data from the input layer, the first thing we should do is weighted sum as net j � M i�1 w ij x i . And then, the data are transferred to the output layer through the transfer function. e output of hidden layer . e learning rules for the standard BP network are selflearning weighted coefficients including the weighted coefficient between the input layer and hidden layer and the weighted coefficient between the hidden layer and output layer. e following are the two rules: where η is the learning rate and d k is the expectation output value. E � (1/2) L k�1 (d k − y k ) 2 is the error [21]. e following is the calculation method of BP neural network to adjust the weighted coefficients Δw jk and Δw ij : 2.3. Improved Backpropagation Neural Network. In practical applications, there are many shortcomings in the basic BP neural network algorithm. e commonly recognized problem is that the convergence speed is slow and it is easy to fall into a local minimum. In addition, there are still shortcomings of poor stability and low generalization ability. is method improves the deficiencies of the BP neural network algorithm.   e momentum term is added to improve the convergence rate and avoid falling into local minimum. e selection of learning step in the BP algorithm is very important. e convergence speed of the network increases with the increase of η value, but if the η value is too large, it will cause oscillation instability. e easiest way to solve this problem is to add a momentum term, that is,

Z-axis
where α is a momentum term, usually an integer, and (n + 1) represents the (n + 1)th iteration. ηδ j (n)o j indicates that the (n + 1)th correction of ω ij should keep the nth correction to a certain extent. Adding momentum in the BP algorithm can not only fine tune the correction amount of connection weights and accelerate the convergence speed but also avoid falling into local minima [22][23][24].

L 2 Regularization
Method. e BP neural network has the characteristics of weak stability, which makes the gender prediction value misjudged in the case of little difference in skull characteristics. Considering the overlap of skull sex determination, this paper proposes adding the L 2 regularization term to the objective function to make the model more stable. After adding regularization, the objective function of the BP neural network becomes min : where y p is the gender classification predicted by the model and y r is the true sex classification. λ is the regularization coefficient. ‖C‖ 2 2 is a regularization term, and its calculation method is to calculate the square of the ownership value and then find the square root.

Adaboost Integration
Algorithm. In addition to the above possible problems, the BP neural network is still too sensitive and the model generalization ability is not strong enough. Adaboost is a relatively mature and widely used ensemble algorithm, which can significantly improve the accuracy and generalization ability of the algorithm [25]. Several BP neural networks are combined to make the neural networks complementary. e final result of the algorithm is weighted by the results of all BP neural networks. For N training samples ((x 1 , y 1 , z 1 ), (x 2 , y 2 , z 2 ), . . . , (x N , y N , z N )), T (specifically artificially given) BP neural networks are established. en, the initial weight of the sample is set as follows: where D t (i) represents the weight of the sample in the t iteration.
Under the D t (i), the weak learner h t (x) is trained (that is, the tth BP neural network), and the error ε i and average error ε t of each sample are calculated. ε i and ε t are used to calculate the weight of the current weak learner and update the sample weight of the next iteration (that is, the (t + 1)th BP neural network): where W t is the weight of the tth weak learner. D t+1 (i) is the weight of the t + 1 BP neural network samples. e above steps are iterated T times to obtain the Adaboost integrated prediction method. When forecasting, each weak learner is weighted to get the final prediction result: Using the improved BPNN algorithm, we can get more accurate results than other single nonlinear models.

Discussion
e data used are 267 skulls, including 153 females and 114 males. e data collection is measured using the metric method. e data are measured and stored in the Excel  25 . Figures 3 and 4 demonstrate that the BPNN structure used in this example is made up of six inputs based on skull variables (CSA, CSC, ASA, ASC, OSA, and OSC). e hidden layer given in Figure 3 consists of 6 neurons, and the hidden layer given in Figure 4 consists of 12 neurons. e output layer consisted of two neurons, namely, female and male. After designing the layering of BPNN, the calculating process of BPNN is developed in MATLAB R2012a.
Before learning process, parameters to be used must be defined. In this research, learning process was stopped after 100,000 iteration epochs using log-sigmoid for activation function, and momentum (α) was 0.1; 0.5; 0.9 and learning rate (η) was 0.1; 0.5; 0.9 (Table 2). Computing error in the output layer was backpropagated to earlier ones in order to update the current input-hidden layer weights and outputhidden layer weights. By updating these weights, the network would learn to reach the target. e target reached is 1 for female and 0 for male. In the algorithm, the error was calculated in the output, and the new values of weights were computed in each layer until the error was minimized to a considerable value. e measurement of ANN performance was observed by using the MSE and total prediction accuracy of the network to the tested data. And, training is best when the ANN is capable to achieve the lowest MSE value.
In the learning process of BPNN, the experiment repeats 10 times and the results are outlined in Tables 3 and 4.  Table 3 describes the best training and testing results obtained by performing the experiment of the structural model 10 times [6; 6; 2]: η � 0.9 and α � 0.9. e average accuracy obtained for training is 96.145% and testing is 95.336%. e results of the structural model [6; 12; 2] can be seen in Table 4. It indicates that the performance of each η and α yields different results in both training and testing. e experiment was repeated 10 times. e highest accuracy was found while η � 0.5 and α � 0.9, namely, 97.232% and 96.764% of the training and testing classification rates, respectively. e average accuracy results of the two structural models are shown in Figures 5 and 6. e results of the structural model [6; 6; 2] can be seen in Figure 5. e average accuracy of the training phase is higher than the test phase. e results of the structural model [6; 12; 2] can be seen in   Figure 3: e architecture of BPNN with the hidden layer consisting of 6 neurons.  Computational and Mathematical Methods in Medicine Figure 6. e average accuracy of the training phase is also higher than the testing phase. Comparing the results of the two structural models, we can see that, for the same η and α, the average accuracy of the training phase and the testing phase of the structural model [6; 12; 2] is higher than that of the structural model [6; 6; 2]. e comparison between the BP neural network and standard classification techniques for sexual dimorphism, that is, univariate and multivariate discriminant analysis (using six variables) and logistic regression (using six variables), are presented in Table 5.
e BP neural network using the six variables had an accuracy rate of 96.764%.
In this paper, two classic sex determination methods (i.e., discriminant analysis and logistic regression) were compared with an artificial neural network. e BP neural network using all six variables gives the best overall results (96.764%) and achieves the highest rate of correctly classified individuals. Mahfouz et al. [11] used the linear discriminant classification method for patella to get a correct classification rate of 90.3%, while using feed forward backpropagation neural network to get 96% classification accuracy. Usually, the correct rate of sex classification for patellae is only about 85% [26,27]. ese results reflect other studies that neural networks with better results than other linear methods (e.g., logistic regression and discriminant analysis).

Conclusion
is paper presents a complete classification framework for gender determination in forensic anthropology. After analyzing the standard BP neural network algorithm, we propose an improved BP neural network algorithm, which points out the disadvantages with the algorithms above. It adds the momentum term to improve the convergence speed and avoids falling into local minimum. e regularization operator is used to ensure the stability of the algorithm, and the Adaboost integration algorithm is used to improve the generalization ability of the model. e final experiment shows that the [6; 12; 2] structure of BPNN achieves the best results in the skull data set of this paper, namely, 97.232% training and 96.764% testing. Compared with other classification techniques, BPNN can improve the result of gender determination with providing high-accuracy result. Moreover, although we use CT scans to construct 3D-point cloud model of the skull in this work, the BPNN model we build can also deal with 3D models constructed in any way such as laser scan 3D camera. Next, we should collect a larger sample to build a neural network-based model that will be implemented for practical applications of sex assessment of an unknown bone in forensic cases.
In the future work, in terms of gender determination, classification techniques can be combined to provide higher accuracy and better techniques.

Data Availability
e .obj format 3D model data used to support the findings of this study may be released upon application to the Northwest University Visual Technology Institute via the email xnliu@nwu.edu.cn.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this paper. Computational and Mathematical Methods in Medicine 7