Gingivitis detection by Fractional Fourier Entropy and Particle Swarm Optimization

INTRODUCTION: we propose a detection model of gingivitis based on feature extraction based on particle swarm optimization neural network with fractional Fourier entropy. OBJECTIVES: For the sake of reduce the diagnostic burden of doctors' frequent and high concentration. METHODS: Primarily, Fourier transform is applied to the collected image signal, and the entropy is extracted from the gingival image by Shannon entropy so as to get the input value. The particle swarm optimization algorithm was combined with the extraction eigenvector is used to detect whether patients have gingivitis in final. RESULTS: The experimental results show that this method can reduce the unnecessary space of image detection, reduce the complexity of image information, and achieve a sensitivity of 79.00±1.61%, specificity of 80.89±1.87% and accuracy of 79.94±0.96%. CONCLUSION: This optimized algorithm can effectively and accurately cluster the sample data, and the accuracy is also higher than the advanced gingival image diagnosis method, making the gingivitis diagnosis more accurate. Our neural network has good trainability and recognition ability, which makes a unique contribution to medically intelligent detection methods for gingival treatment.


Introduction
Gingivitis is an acute and chronic inflammation of the gingival tissue caused by bacterial infections, foreign irritations and food blockages [1].The symptoms are mainly red and swollen gums, swelling and pain, and even bleeding [2].If not treated in time, it may develop to the deep level and lead to periodontitis.Early symptoms of periodontitis are not obvious and may occasionally show bleeding after brushing, similar to the symptoms of gingivitis [3].However, by the time people find abnormal loosening of teeth, periodontitis has often progressed to the * Corresponding author.Email: yy284@leicester.ac.uk middle or late stage, and the teeth have gaps and difficult occlusion [4].
Common inflammation of the gums and chronic periodontitis are diagnosed by measuring the depth of the periodontal probe and showing signs of bleeding during the measurement, and using imaging to assess the loss of alveolar bone.However, it is possible for the examiner to obtain different results with different probes, and repeated field measurements may cause great pain to the patient.As a result, generations of several new probes have been invented to boost the accuracy of periodontal probe depth measurements.Moreover, the need for a non-invasive diagnostic approach is made more apparent by the fact that only trained dental specialists can perform the measurements accurately.At present, the number of Yan Yan 2 patients seeking orthodontic treatment increased, the orthodontic treatment process by making it hard to take appropriate oral health protection, and easy to cause temporary symptoms of gingivitis, which means that the dentist will take more careful monitoring of patients with gingival condition, in the treatment of not careful enough in the process of diagnosis, easily lead to periodontal deteriorating even further.Therefore, image analysis is the initial key to monitor the gingival condition of patients.
Alalharith, et al. [5] investigated and developed a fast regional CNN for detection of gingivitis, in which the model was divided into two models, one for detection of regions of interest for model tooth localization, and the second model for detection of gingival inflammation to automatically detect periodontitis in orthodontic patients using intraoral images.Obuchowicz, et al. [6] uses different texture features to transform digital oral radiography images of patients with dental caries.Through experiments, it has been proved that K-means clustering and first-order feature can significantly improve the spot detection of caries, and can well define sharp areas at image edges, while first-order features can realize the possibility of observation of demineralized areas and better obtain detection results of lesions.Li, et al. [7] proposes an intelligent gingivitis detection method based on multichannel gray level co-occurrence matrix and PSO neural network.The method of naive Bayesian classifier, the method of wavelet energy combined with SVM, the method of extreme learning machine (ELM), and the method of ELM combined with contrast limited adaptive histogram equalization are added in the experiment.Five detection methods were compared simultaneously, and the optimal detection results are obtained.Human awareness of the symptoms of gingivitis is weak.The improvement of medical equipment is one of the essential issues in the medical field.Clinical practice and records are extremely important in supporting early detection and analysis of disease.Our work is the processing and intelligent detection of dental pictures.In the stage of gingival image processing, fractional Fourier entropy is used to extract the eigenvalue of the image instead of the global image, which can validly lessen the complexity of image analysis in the detection system, and extracting image features is an important progress in improving detection efficiency.In this study, we combine the extracted image features with the particle swarm optimization multi-layer perceptron for recognition, and test the dental image.The detection results of the model show its superior ability of identification, and it has excellent prospects of application.

Methodology
In the phase of gingivitis classification, dental color images are used to predict abnormal dental changes.The color dental photos were stochastically gathered from patients' gum area using a digital SLR camera.The intraoral image data set of this study was supported by Nanjing Stomatological Hospital.We manually cropped with the display range of teeth and gums in the aggregate 180 gingival pictures.The fractional Fourier entropy was used to features extraction of gingival color images and then the particle swarm optimization based multilayer perceptron was used to identify gingivitis by a 10-fold cross-validation algorithm.

Gingival image feature extraction
In 2015, a new method Fractional Fourier Entropy (FRFE) was put forward to picture feature extraction [8,9].This extraction way can effectively lessen the problem of running collapse of program and time waste caused by a large amount of data occupying space.The detection veracity of the program is influenced by the quality of image features.As a new feature extraction tool in timefrequency domain FRFE can almost perfectly process the unstable signal of the source image.At present, many other classification methods have different advantages in image classification field.For instance, one of the commonly used methods, Discrete Wavelet Transform [10], has the defects of optimal selection of wavelet function and decomposition level [11].Although it has more strength in the field of image categorization, FRFE is chosen as the feature vector of image preprocessing in the experiment with better performance and precision [12].
FRFE is derived from Shannon entropy extraction and obtained by two-dimensional transform fractional Fourier transform spectrum extraction.Firstly, the visual image of gingival was transformed and analyzed by Fourier transform.Fourier transform is the mapping of time domain to frequency domain, showing the frequency component from the image signal, and is usually used to analyze and determine the signal and stationary signal.However, due to its lack of considering non-stationary signals, fractional Fourier transform is proposed to deal with non-stationary signals in common use [13][14][15][16].Shannon entropy quantifies and statistics the epochmaking significance of information in the process of communication [17].In this experiment, it is used to extract fractional Fourier entropy as the feature vector of image information substitution.
We fetch information from the gingival photo as the input signal, and the fractional Fourier transform matrix is: Wherein,  stands for Angle, the signal is represented by () as   .Time and frequency are represented by  and  , respectively.  (, ) is the function of kernel, the formula is: Here  stands for the discrete virtual unit.We use   represents the amplitude, the formula shows below: To address the divergence of functions cot and csc, we giving a multiple of  to : Where  can be treated as any integer.
The fractional Fourier transform contains the characteristics of time domain and frequency domain [18][19][20].The two-dimensional discrete fractional Fourier transform of gingival image has two angles, which is assumed to be ,  and expressed as  , .The subset of discrete random variable is assumed to be N of { 1 ,  2 ,  3 , … ,   }, and  is entropy.The probability mass function of information content is expressed by (), and  is regarded as the expectation function.Then, the entropy calculation formula is Finally, applying the entropy operator  to the spectra of 25 2D-FRFT images, we suppose that  is a photo of gingivitis, and the formula () representing FRFE is:

Multi-Layer Perception
Multi-layer perceptron (MLP) classifier is a front-feed ANN based sorting technique [21], which is composed of simple interconnections of neurons or nodes [22].It is a model showing the nonlinear mapping of input and output vectors, as shown in Figure 1.The nodes are connected by export signals and weights, which are a non-linear transfer simplicity modifies the node input sum function.It allows multiple layers of perceptrons to approximate nonlinear functions and is the simplicity superposition of plenty nonlinear transfer functions [23][24][25].The export of the node is scaled according to the connection weight and serves as the enter information to the next layer of network nodes [26].It means that information is processed forward, and because of this, multilayer perceptrons are also called front feed neural networks [27].The structure of the multilayer perceptron is not fixed, but it is usually composed of multiple layers of neurons.The input layer transfers the feed in vector to the model and does not participate in the calculation.A multilayer perceptron may contain more than one hidden layer, and finally an output layer, which is described as fully connected between the layers of the network, where every node in the upper and lower layers is connected to each other [28][29][30].Multilayer perceptrons can approximate any smooth, measurable function between input and output vectors by selecting an appropriate set of weights and transfer functions.Multilayer perceptual devices have the ability to learn by themselves through training [31].The training of the model first requires a set of training data.After repeatedly training the data in the multi-layer perceptron and constantly adjusting the weights in the network [32], the required model is finally obtained.The result of this process is to map the coding of system attributes to different sections of the neural network.If the multilayer perceptron gives an import vector that does not belong to the training pair after training, it will simulate the model and produce the relevant output vector.The error between the actual function value and the anticipated function value indicates the degree of success of the training.
The network structure of the perceptron includes input, output, weight, feed forward operation, activation function.Activation function can be show as: where  is the input,  is the weight of the corresponding input  ,  is biases, and  is the expected calculation result; We use the biases  is taken as a value in , and a corresponding analog input  is added to make  equal to 1 to simulate the expression, which is convenient for matrix operation.Logits = ( •  + ) (9) The sum of the values in parentheses is called the perceptron's feedforward operation, which has not yet been computed into the activation function, and is written as logits.single-hidden layer neural network or two-layer perceptron.For the  th hidden layer, the following properties are generally available: Every neuron in layer  is connected to the output of every neuron in layer  − 1; Every neuron in layer  is disconnected from each other; At the same time, a new nonlinear activation function, Back Propagation algorithm and optimization algorithms including gradient descent, random gradient descent, Minibatch are added.For MLP with  + 1 layers, this can be written in matrix form as follows: The advantage of MLP is that they can learn nonlinear models and learn in real time.However, MLP also have disadvantages such as MLPs with hidden layers containing a non-convex loss function greater than the minimum value, different random initial weights may obtain disparate validation accuracy.At the same time, MLP requires adjusting a number of hyperparameters, such as hidden neurons, hidden layers, and iteration times.

Particle Swarm Optimization for gingivitis detection
Particle swarm optimization (PSO) algorithm is an evolutionary computing technology proposed by Kennedy, et al. [33] in 1995.It belongs to a kind of swarm intelligence algorithm.It derived from the study on the bird's predatory behavior and is designed by simulating the predatory behavior of birds.In the process of searching, birds let other birds know their location through their information.Through such cooperation, they judge the best solution and pass that information on to the whole flock.Finally, the whole flock can quickly locate around the food source, that is, we can find the best solution and the convergence problem in the model.
The algorithm was primally inspired by the regularity of bird swarm activity, and then a simplified model based on swarm intelligence was established.Based on the observation of animal swarm behavior, PSO makes the movement of the whole group evolve from disorder to order in the problem-solving space by using the information sharing of individuals in the group, so as to obtain the optimal solution.PSO algorithm is an evolutionary algorithm, it starts from random solution, through iteration to find the best solution.It also assesses the exceptional performance of the solution through fitness.It does not have the "crossover" and "mutation" operation of genetic algorithm, so it is simpler than the rules of genetic algorithm, it follows the current search for the best value to find the global optimal.
This algorithm has appealed to researcher's attention because of its advantages such as simply implementation, fast convergence and high accuracy, and has shown its high performance in working out practical problems.PSO is a parallel algorithm.In the particle swarm optimization algorithm, any bird in the search space represents the result of each optimization problem.Each particle has its own adaptive value determined by the optimization function, and each particle has a speed that determines the direction and distance it flies, from which the flock follows the current optimal particle to search the solution space.
PSO is is initialized into a set of random solutions.Then the best solution is obtained by iteration.The particle updates itself with each iteration by tracking two "extremes.".Individual extremum represents the optimal solution found by the particle itself, which is called   .The other extremum represents the best solution found for the entire population, and this extremum   is the global extremum.Alternatively, instead of using the entire population as a particle's neighbor, the extreme value among all the neighbors is a local extreme value.The fundamental of PSO algorithm is to find the best solution through the information sharing and cooperation and among the individual particles in particle swarm.Suppose the particle of the PSO algorithm is a bird, and they have the position and speed of these two attributes, and then according to oneself has to find the nearest food solution and reference the sharing in the whole cluster find latest solution to transform our direction of flight, and we will seek out the whole population is large probability in the same place.This is the area closest to food for the flock, where conditions are suitable for living and food is easier to find.The process of PSO is shown in the Figure 3. PSO has the advantages of easy to implement, fast fitting speed, high precision, and easy parallel operation.However, as the population is randomly initialized, the final iteration result is greatly affected by it, and the setting of parameters also has a great influence on the final result.

10-fold cross validation
In machine learning, it is generally not possible to use all data to train the model, otherwise there will be no data set to verify the model and evaluate the prediction effect of the model.To solve this problem, ten-fold cross validation is often used to test the accuracy of the algorithm.There The most important function of ten-fold cross validation is model selection, which can also be called hyperparameter selection.In this case, the data set needs to be divided into three parts: training set, verification set and test set, and the division of training set and verification set adopts the way of N-fold intersection.Verification set is applied to check training conditions of the neural network in the learning process, so as to determine the appropriate hyperparameters.The test set aims to check the generalization capability of the neural network after the training.The specific process is that, firstly, various model selections are verified on the verification and training set, and then selected the minimum average error of the neural network.After selecting the appropriate model, the training set and verification set can be combined, the model can be trained again on the above to get the final model, and then a test set is used to test its ability to generalize.
The core idea of ten-fold cross-validation is to divide the data set several times and average the results of multiple assessments, so as to eliminate the adverse effects caused by unbalanced data division in a single partition.Because such adverse effects are more likely to occur on small data sets, ten-fold cross validation is more advantageous on small data sets.
There are abundant of experiments using a good deal of image database and various learning techniques show that the ten-fold cross-validation is an appropriate choice to obtain the best error estimate.Therefore, the data sets are divided into ten parts, of which nine pieces are taken as training data and one piece is taken as test data, and the experiment is carried out.Each test results in a corresponding accuracy rate or error rate.The average of the accuracy of ten times is used as the estimation of the algorithm accuracy.For the sake of the reliability of database, the average value of ten-fold cross-validation is usually obtained after multiple times.But there are some theories that show that a five-fold or a twenty-fold cross validation almost similar with ten-fold cross validation.

Dataset
A total of 5 cases in our data set were provided by Nanjing Stomatological Hospital.The photos were taken randomly from patients with gingivitis and healthy gums.In order to ensure fairness and accuracy, uniform inclusion criteria were developed for the healthy gingival data group and the gingivitis data group.Digital single lens reflex was used to randomly screen several teeth with gingival inflammation or healthy teeth to obtain images.
The training data set was composed of 90 images of gingivitis images and 90 images of healthy gingival.The field of view diameter is 26 ~ 100 mm with 0.2 ~ 0.41 mm the voxel resolution.The width and length of the region of interest were clipped to lessen the unequal conditions of the dental region used in the experiment and to ensure the quality of the experimental image data.Samples of two images are provided below in Figure 4.

The Results of Fractional Fourier Entropy
Fractional Fourier transform (FRFT) contains features in the frequency and time domain, and features in the domain of frequency described by different sections have different features spectra.We extracted a gingivitis image for FRFT.As shown in the Figure 5, the 25 spectra contain vector angle information of different radians and angles.When the vector value is (1.0,1.0), the information entropy is basically unchanged, and spectral information is not included.When the vector approaches 0, the entropy value drops.

Figure 5. FRFT results of gingivitis images
After the discrete FRFT, the image can analyse the amplitude-frequency and phase-frequency and calculate the spectrum.The relationship between entropy and FRFT of different orders is shown in the Table 1.The entropy value of the block matrix is calculated as the input value of the classifier.

The Results of Particle Swarm Optimization
We combined PSO algorithm with advanced feature extraction fractional Fourier entropy method to establish a gingivitis detection model.Seven criteria of accuracy, specificity, F1 score, precision, sensitivity, Matthews correlation coefficient (MCC) and Fowlkes-mallows index (FMI) were used to compare and validate the models.MCC is an equivalently balanced index to measure the possibility of comprehensive consideration of models by dichotomies, and it can also be used in the unbalanced samples in some case.The value range of MCC is [−1, 1], where 1 means that predicted answer is exactly the same as the actual answer; and 0 means that the predicted results do not meet the requirements; -1 means that the predicted result is absolutely discrepant with the actual result.The MCC substantially describes the relation coefficient between the actual values and predicted results.FMI is the geometric average of the recall rate and accuracy rate calculated from the clustering result and the real value, and the range of value is [0,1].When the output value is closer to 1, shows the better the model performance.The test data are shown in the ROC curve combines sensitivity and specificity as shown in Figure 6.It can veraciously reflect the relationship between specificity and sensitivity of our detection model, it is also a practical representative of the veracity of classification methods.The ability of our classifier to recognize gingival image samples under a certain threshold can be easily detected by using ROC curves.The ROC curve of our diagnostic method for gingivitis is shown below.By means of ROC curve, we found that the optimal diagnostic threshold value of our diagnostic method was about 0.8734.When the ROC curve is closer to 1, the classification results tend to be more accurate.Therefore, the point on the ROC curve closest to 1 has the maximum sum of specificity and sensitivity values.This point or its adjacent point is often referred to as the optimal critical point and also the diagnostic reference value.Through the ROC curve, the clinical accuracy of the diagnostic method can be observed, and it can be confirmed that our method can well assist physicians in making judgments.

Comparison to State-of-the-art Approaches
We found some information on the advanced methods and compared our methods, in order to determine that the combination of our methods produces a new and better detection model.The first method for comparison is the detection model based on 2D discrete wavelet transform to extract the image feature set Naive Bayes classifier.The second method is based on wavelet energy feature extraction and SVM classification.The third and fourth methods are the combination of method based on gray level co-occurrence matrix in the image processing and the detection method of ELM.The latter adds an adaptive equalization function which increases the contrast limitation.The fifth and sixth methods are feature extraction based on fractional Fourier entropy combined with standard genetic algorithm, compared with the former, the latter increases hidden neuron optimization.Finally, our method uses particle swarm optimization and fractional Fourier entropy for eigenvalue extraction.PSO algorithm can find the best solution through information sharing and cooperation among single particles in the population.Moreover, the model is simple and easy to implement, and has the advantages of not mass parameter adjustment.The feature extraction method based on fractional Fourier entropy greatly improves the shortcomings of existing methods, such as large memory space, high operating pressure, and reduced detection accuracy caused by excessive range of receptive field.
The overall score of our method is basically better than that of the former from the Table 3. Especially compared with the latest two methods, we use the same method of feature extraction of gingival image, but our PSO algorithm method is about 1%-4% higher than the SGA algorithm in sensitivity, specificity and accuracy.The accuracy was about 0.8% better, the sensitivity was about 1% better, and the specificity was about 3% better than the standard genetic algorithm with hidden neuron optimization.Compared with the original method, the overall score is about 13-18% higher, indicating that our algorithm still has a leapfrog improvement.

Conclusions
In this paper, a feature extraction method based on Particle Swarm Optimization (PSO) neural network combined with fractional Fourier entropy was proposed to detect gingivitis symptoms.
The method of fractional Fourier entropy was used to extract gingival image data samples for feature analysis and processing, and the neural network optimized by PSO was used to detect the extracted feature data.
The experimental results show that our method has higher classification accuracy and is relatively reasonable and reliable.It provides a new automatic detection model for the diagnosis of gingivitis and extends the performance and function of the intelligent gingival diagnosis system.This method provides a new direction for auxiliary dentists to diagnose gingivitis, and has important significance in promoting the diagnostic efficiency of neural network performance testing [40].In the future, we will optimize the gingivitis detection algorithm to obtain a superior accuracy and a less burdensome method.

Figure 1 .
Figure 1.The structure of MLP

3 xFigure 2 .
Figure 2. The step function of the perceptron

Figure 3 .
Figure 3.The process of particle swarm optimization

EAI
Endorsed Transactions on e-Learning 02 2021 -04 2021 | Volume 7 | Issue 21 | e5 is a simple way to divide the entire data set into two parts, one for training and one for validation, which is the training set and the test set.However, it also has two shortcomings.The choice of the final model and parameters largely depends on the method of dividing the training set and the test set in the experiment.This method usually tests the model with new data at the end of training.However, the model has the problem of adapting to the new data test effect, and the generalization ability and overfitting of the model cannot be determined.It is difficult to evaluate the generalization capability of the model when the data set is limited.Under the same condition, cross-validation can avoid data directly entering the network training, such as ten-fold cross validation is ten model averaging training at the same time, the equivalent of a data set can be different tests are carried out with the same model, but each training data set and not all the same, equivalent to extend the data set.To a certain extent, this model has certain generalization ability.Compared with the traditional partitioning method, it can avoid the limitation and particularity of fixed partitioning data set, and this advantage is more obvious in small scale data set.

Figure 4 .
Figure 4. Image Sample of Gingivitis

Figure 6 .
Figure 6.The ROC curve of our diagnostic method for gingivitis detection (AUC = 0.8734)

Table 1 .
Entropy of FRFT results in gingivitis images

Table 2 .
Ten-fold cross validation results of the proposed method

Table 3 .
Comparison of our with state-ofthe-art algorithms