Application of Machine Learning to Stomatology: A Comprehensive Review

In recent years, machine learning methods has been widely used in various fields, such as finance, spatial sciences, smart grid, intelligent transportation, renewable energy, agriculture, especially medicine. In the era of big medical data, the advantage of machine learning is that it can predict and diagnose through the analysis of a large number of clinical data, and its performance is very close and competitive to or even better than the performance of clinicians. This article focuses on the application of machine learning techniques in the field of stomatology and detailedly describes application cases involving oral cancer, dental caries, periodontitis, dental pulp diseases, periapical lesions, oral implants, and orthodontics. Finally, the research obstacles and future work are discussed.


I. INTRODUCTION
The combination of big data and artificial intelligence has been regarded as the ''fourth industrial revolution'' [1]. As shown in Fig.1, machine learning (ML), as the main branch of artificial intelligence, involves cross-disciplines in many fields [2]. Through the training of a large amount of data, the model based on ML techniques can obtain the ability of prediction and decision-making [3]. Commonly used ML techniques include Support Vector Machine (SVM), Logic Regression (LR), Naive Bayesian Classifier, Decision Tree (DT), Random Forest (RF), Extreme learning machine (ELM), fuzzy k-nearest neighbor (FKNN), Convolution Neural Network (CNN), etc. ML techniques have been widely utilized in many fields (united with optimization cores or not) with superior results compared to the alternative solu-The associate editor coordinating the review of this manuscript and approving it for publication was Jenny Mahoney.
tions [4]- [31]. Financial institutions used ML techniques to analyze the outbreak and contagion of systemic risks in financial networks to improve financial supervision systems and to establish credit assessment models to help identify defaulters and non-defaulters [32], [33]. Besides, staff can use ML techniques to predict stock prices and exchange rates [34]- [36]. The ML model can monitor the status of wind turbines, report faults in time, and predict wind power [37]- [39]. Smart grids based on ML techniques can reduce greenhouse gas emissions by improving the efficiency of energy transmission and utilization [40]- [43]. An intelligent transportation system based on ML techniques can provide excellent functions such as traffic flow prediction, accident detection, route optimizatio, pavement detection, etc. [44]- [49]. Intelligent buildings based on ML techniques can better serve occupants and reduce energy consumption [50]- [52]. The application of ML techniques in the agricultural production system can provide the functions of yield prediction, disease detection, weed detection@comm etc. [53]- [56]. The application of ML techniques to weather research can predict rainfall and snow [57], [58]. The aid of M techniques can make better use of solar energy, such as predicting solar radiation and optimizing solar system equipment [59]- [65]. Increasingly ML techniques methods were applied to speech recognition [66], [67], image recognition [68], face recognition [69], emotion recognition [70], [71], and action recognition [72]. In the field of medicine, ML techniques have been widely used in cardiology, ophthalmology, nephrology, radiotherapy, neurology, endocrinology, oncology, stomatology, psychology, etc. [73]- [86]. The advantage of ML techniques is the ability to analyze a large amount of data to achieve prediction and diagnosis. The application of ML techniques to medical data has the potential to solve the problems of diagnostic difficulties, low diagnostic efficiency, and diagnostic errors. For example, SVM, combined with the information gain (IG) method, was used to analyze the results of blood routine to achieve early diagnosis and predict the prognosis of patients with paraquat poisoning [87]. SVM combined with fisher discriminant analysis of arterial blood gas (ABG) index and extreme learning machine with gray wolf optimization algorithm (GWO-ELM) based on analysis of blood coagulation, liver, and kidney index can predict the prognosis of patients with paraquat poisoning [88], [89]. Both diagnosis systems based on the Particle Swarm Optimization algorithm (PSO) and enhanced Fuzzy k-nearest neighbor algorithm (PSO-FKNN) or based on maximum relevance minimum redundancy (mRMR) feature selection and Kernel Extreme Learning Machine (KELM) can allow early diagnosis of Parkinson's disease [90], [91]. An expert diagnosis system based on fisher score, PSO, and SVM (FS-PSO-SVM) can assist doctors in diagnosing thyroid disease [92]. A modal based on the local fisher discriminant analysis (LFDA) and SVM can diagnose hepatitis disease [93]. An SVM classifier based on a rough set (RS) was developed to diagnose breast cancer [94].
Obviously, when it is difficult for doctors to predict and diagnose diseases early, ML techniques can detect slight changes and analyze large amounts of data to solve these problems. The results obtained by ML techniques are highly accurate and even exceed human judgment in some cases. With the development of stomatology, disease prevention, and proper treatment are gradually concerned by dentists. However, due to extensive complex clinical data and young dentists' lack of experience, it is difficult to attain this objective without the assistance of auxiliary tools. Therefore, ML techniques have been widely used in recent years. This survey aims to review the applications of ML techniques in oral cancer, dental caries, periodontitis, pulp disease, periapical lesions, dental implants, and orthodontics, as shown in Fig. 2. Finally, the research obstacles and future work are discussed.

II. APPLICATION OF MACHINE LEARNING TO STOMATOLOGY A. ORAL CANCER
The morbidity and mortality of oral cancer are gradually increasing around the world, accounting for 2% and 1.9% of the total morbidity and mortality of cancer in 2018, respectively [95], [96]. Oral Squamous Cell Carcinoma (OSCC), which has the characteristics of a high degree of malignancy and poor prognosis, is the most common form of oral cancer. OSCC usually occurs in the anterior 2/3 of the tongue, gums, floor of the mouth, lips, and so on. To decrease the morbidity and mortality of oral cancer, there are four applications of ML techniques in oral cancer.

1) PREDICT PROGNOSIS
Cancer biomarkers are defined as substances that are directly produced by tumor cells or non-tumor cells induced by tumor VOLUME 8, 2020 tissue. The existence, pathogenesis, and prognosis of cancer can be judged by the detection of cancer biomarkers [97]. However, a limited number of cancer biomarkers are used in the clinic [98]. By taking advantage of the ability of ML techniques to analyze large amounts of data, more accurate cancer biomarkers can be found. Small nucleolar RNAs (snoRNAs) play important roles in tumorigenesis [99]. Xing et al. [100] identified survival-related snoRNAs of Head and neck squamous cell carcinoma (HNSCC) by Cox regression analysis. A five-snoRNA (SNORD114-17: ENSG00000201569, SNORA36B: ENSG00000222370, SNORD78: ENSG00000212378, U3: ENSG00000212182, and U3: ENSG00000212195) was finally obtained to construct risk model. By calculating risk score according to the five-snoRNA, patient with high risk score often had a significantly worse prognosis. Yang et al. [101] revealed 90 upregulated differentially expressed genes (DEGs) of OSCC samples in the Gene Expression Omnibus (GEO) database. Of note, they found a rarely reported gene among the 90 DEGs, called Aurora Kinase A and Ninein Interacting Protein (AUNIP), which is involved in the regulation of cell cycle [102], [103].
Furthermore, they used the LR model to assess the performance of AUNIP in diagnosing OSCC in The Cancer Genome Atlas (TCGA) database. Due to AUNIP, overexpression is closely related to the poor prognosis of OSCC patients; it can be used as a candidate biomarker to predict the prognosis of them. Messenger RNA (mRNA) plays an essential role in physiological and pathological processes and has potential predictive abilities [104], [105]. Cao et al. [106] constructed a predictive model based on multivariate Cox analysis, which can accurately predict the three-and fiveyear survival rates of patients using three mRNA markers (CLEC3B, C6, and CLCN1).

2) PREDICT LYMPH NODE METASTASIS
Lymph node metastasis, which is the most common cancer metastasis pathway, can lead to poor prognosis [107], [108]. Bur et al. [109] accurately predicted the pathological lymph node metastasis of OSCC patients with T1-2N0 by analyzing clinicopathological data using a decision forest algorithm, and the prediction result is better than that of using tumor depth of invasion (DOI) method. Contrast-enhanced CT is the most widespread imaging method to examine the status of cervical lymph nodes. Ariji et al. [110] used the deep learning system ''DIGITS'' and CNN ''Alexnet'' to analyze the contrast-enhanced CT image. The prediction result of extranodal extension (ENE) has high accuracy, and the diagnostic performance is better than that of radiologists. Kann et al. [111] also used CT images to train the model based on 3D CNN to predict ENE. Carnielli et al. [112] combined laser microdissection (LMD) and proteomics to analyze invasive tumor front (ITF) and inner tumor. They found seven proteins (CSTB, NDRG1, LTA4H, PGK1, COL6A1, ITGAV, and MB) might be relative to prognostic. Furthermore, they used selective response monitoring mass spectrometry (SRM-MS) to detect the abundance of seven proteins in saliva samples of OSCC patients and then analyzed the SRM-MS results by ML techniques (Linear SVM, RBF SVM, DT, LR, RF, Perceptron, and Naive Bayes). Eventually, three specific peptides (LTA4H, COL6A1, and CSTB) in saliva could predict lymph node metastasis, and RF showed the best performance.

3) ASSESS RISK OF CANCERATION
Oral Potentially Malignant Disorders (OPMPs) possess the potential of canceration, including oral leukoplakia (OLK), oral erythroplakia (OEK), oral lichen planus (OLP) and so on [113], [114]. Before developing into OSCC, most patients will go through a long-term OPMDs stage [115]. Therefore, it is vital to judge the risk of OPMDs patients effectively. Visually enhanced lesion (VEL) scope and toluidine blue (TB) staining are the most extensive non-invasive detection techniques for assessing the risk of carcinogenesis of OPMDs [116]. Wang et al. [117] used the RF algorithm to establish a prediction model, which can predict the risk of canceration of OPMDs utilizing the above-mentioned two non-invasive detection techniques and patient personal information. Exfoliative cytology is also a non-invasive technique for primary screening and early diagnosis of OSCC [118]. Feres et al. [119] collected the results of exfoliative cytology, histopathology, and clinical follow-up of normal, OLK, and cancer patients. Using the Peaks-RF algorithm and the information mentioned above, an oral cancer risk index (OCRI2) was constructed as a quantitative measure of cancer risk. For example, OLK patient has a high risk of canceration when the index is greater than 0.5. In addition to OPMPs, people who smoke regularly also have a high risk of oral cancer [120]. Dey et al. [121] obtained images of oral epithelial samples using differential interference contrast (DIC) microscope, which is an optical tool that can provide a pseudo-three-dimensional (3-D) image. Then, they extracted the morphological and textural features of the cell images from habitual smokers, non-smokers and pre-cancer patients. Based on these features, the cellular abnormalities of habitual smokers could be revealed by using an SVM classifier. It can be seen that habitual smokers might be predicted the risk of developing oral cancer through cytomorphological and SVM classifiers.

4) EARLY-DIAGNOSE CANCER
Many patients are diagnosed at an advanced phase and miss the best opportunity for early intervention and treatment, so it is necessary to explore an effective method of early diagnosis. Zlotogorski-Hurvitz et al. [122] combined Fourier-transform infrared (FTIR) with ML techniques (principal component analysis-linear discriminant analysis (PCA-LDA) or SVM classification) to detect the subtle changes of proteins, lipids and nucleic acids in salivary exosomes that can do early diagnosis for the oral cancer. Hyperspectral imaging (HSI) is a non-contact and non-invasive optical diagnostic imaging technique. Lu et al. [123] detected tongue cancer in rats using HSI and classification models. (Linear SVM, Ensemble LDA, LDA, RF, QDA, RBF SVM, and RUSBoost). Furthermore, generate color-coded lesion prediction maps, which were similar to the gold-standard color maps. LDA showed the best predictive performance among the seven classification models.

B. PERIODONTITIS
Periodontitis is a chronic inflammatory disease that destroys periodontal tissue and can lead to tooth loss if left untreated [130]. Besides, periodontitis can increase the probability of atherosclerosis, rheumatoid arthritis, aspiration pneumonia, and cancer, affecting overall health. To better prevent and treat periodontitis, there are three applications of ML techniques in periodontitis.

1) ANALYZE RELATED MICROBIOTA
The occurrence and development of periodontitis is the result of the combined action of total microorganisms in plaque [131]. Periodontitis-related microorganisms can be explored by applying ML techniques. Chen et al. [132] analyzed and compared the composition of the subgingival plaque microbial community between normal subjects and patients with periodontitis by employing a 16S rRNA metagenomics method. By using nonparametric Kruskal−Wallis tests to assess microbes in the microbial community. The algorithm was used to select the critical microbes as the feature combination. The feature combination and machine learning algorithms (deep learning, SVM, RF, and logistic regression (LR)) were used to construct prediction models to predict the health status of patients with periodontitis. Eventually, RF shows the best prediction performance. Torres et al. [133] combined supervised ML and genome assembly to find new bacteria called Candidatus Bacteroides Periocalifornicus (CBP) associated with periodontitis in periodontal samples. Due to CBP is closely related to the members of the red complex which is 3 kinds of microorganisms highly related to the development of periodontitis [134], it is speculated that CBP may be a candidate member of the red complex.

2) DIAGNOSE PERIODONTITIS
Periodontitis has different types of clinical characteristics and distinct responses to treatment. Therefore, it is significant to diagnose and classify periodontitis accurately. Feres et al. [119] used the SVM classifier to analyze the content of 40 kinds of bacteria, which are obtained by utilizing checkerboard DNA-DNA hybridization to investigate subgingival plaque samples (chronic periodontitis (ChP), generalized aggressive periodontitis (AgP) and periodontal health (PH)). The method can classify AGP, CHP, and pH, and distinguish AGP from CHP. Although subgingival plaque samples can well reflect the condition of periodontitis, this is an invasive procedure that requires the expertise of doctors. In the previous report, periodontal microorganisms were also found on buccal mucosa and supragingival plaque beside subgingival pockets. Na et al. [135] found that the buccal site with the LMT algorithm and supragingival site with LogitBoost showed acceptable performance in diagnosing periodontitis. In addition to the plaque, many other comprehensive factors are also closely related to periodontitis. Papantonopoulos et al. [136] establish an Ann's modal to distinguish AgP from ChP based on monocytes, eosinophils, neutrophils, and CD4 / CD8 ratio. Arbabi et al. [137] established a diagnosis tool based on ANNs, which can accurately diagnose periodontal disease by evaluating age, sex, probing depth, plaque index, and attachment loss index.

3) DETECT ALVEOLAR BONE LOSS
Periodontitis can cause periodontal bone loss (PBL) [138], early detection and treatment of PBL play an essential role in improving the treatment results of periodontitis. Kim et al. [139] proposed a deep neural transmission network (DeNT-Net) based on deep CNNs can accurately diagnose PBL on panoramic dental radiographs and provide the corresponding tooth numbers of the lesion. Krois et al. [140] also detected PBL on panoramic dental radiographs using CNNs. When the alveolar bone loss excessively horizontally, the teeth will be loosened. Lee et al. [141] used a CNNs algorithm to analyze radiographs to predict and diagnose tooth loosening caused by horizontal bone loss. However, the above three methods only detected the region of PBL, not quantitative analysis. Chang et al. [142] developed an automated method based on CNN for identifying the periodontal bone level, the cementoenamel junction (CEJ) level, and the teeth long-axis to detect and classify PBL on dental panoramic radiograph.

C. DENTAL CARIES
Dental caries is regarded as one of the most common oral diseases, which is the leading cause of tooth loss and pain [143]. Fortunately, it can be prevented and blocked progress in the early stage. The microorganisms in dental plaque depositing on the surface of the teeth produce acidic substances, which demineralize the enamel surface and destroy the tooth VOLUME 8, 2020 structure. ML techniques are applied in the following three aspects.

1) PREVENT DENTAL CARIES
Microorganisms in dental plaque produce acidic substances to demineralize enamel, so the quantitative analysis of dental plaque can evaluate the risk of dental caries. Dental plaque emits red fluorescence in Quantitative Light-induced Fluorescence (QLF) images [144]. Sultan et al. [145] established a prediction model based on CNN, which can analyze dental plaque in QLF images to assess caries risk. However, considering the cost of the equipment, digital cameras may be a better alternative. You et al. [146] established an AI model based on CNN. The model showed a satisfactory result in detecting dental plaque on primary teeth. It is worth noting that they used digital cameras to obtain tooth photos, which reduced the cost of clinical popularization. With the aging of the population, root caries seriously threatens the dental health of the elderly [147]. Hung et al. [148] combined ML techniques (SVM, extreme gradient boosting (XGBoost), RF, k-nearest neighbors (k-NN), and LR) with demographic and lifestyle factors to determine the risk of root caries and prompt doctors for early intervention. They found that SVM showed the best performance of prediction, and age was the most closely related to the root caries.

2) DETECT DENTAL CARIES
Doctors often miss caries using visual and tactile methods to detect teeth [149]. Imaging examination can improve the accuracy of diagnosis, but it is complicated for young doctors, who often ignore the details in the image. Luckily, this problem can be mitigated with the aid of ML techniques. Rad et al. [150] proposed a new dental X-ray image segmentation method based on the level set method, which has two steps; initial contour (IC) generation and intellectual level set segmentation. Then the segmented images were analyzed applying three steps (teeth isolation, feature map, detection process), and the dental caries were found in images. Patil et al. [151] used a multi-linear principal component analysis (MPCA) feature extraction technology and neural network (NN) classifier to establish a dental caries diagnosis model, which can identify dental caries in the X-ray image. Lee et al. [152] used the CNN algorithm to detect dental caries on periapical radiographs and revealed that CNN algorithms showed excellent performance. Due to the anatomy of the adjacent side of the tooth, it is hard for the doctor to observe adjacent caries in the X-ray image. Choi et al. [153] proposed an automatic detection system based on CNN crown extraction technology, which improves the accuracy of adjacent caries detection. In addition to the use of X-ray image, near-infrared transillumination (TI) imaging is also an effective method for the detection of dental caries according to the differences of near-infrared related to the degree of tooth mineralization. Casalegno et al. [154] presented a CNN model for automatically detecting and locating dental caries in TI Images. In particular, the detection of dental caries on the occlusal surface and adjacent surface performed high accuracy.

3) CLASSIFY DENTAL CARIES
The International Caries Detection and Assessment System (ICDAS) is the most widely used clinical evaluation system, which is divided into seven levels according to the severity of tooth damage [155]. Moutselos et al. [156] established a deep learning model (Mask R-CNN), which could detect dental caries in intraoral camera images and classify them according to ICDAS.

D. DISEASES OF DENTAL PULP AND PERIAPICAL LESION
Diseases of the dental pulp include pulpitis, pulp necrosis, and pulp degeneration. Root canal therapy is a common method for the treatment of dental pulp diseases, which is complicated and requires an accurate understanding of the anatomical structure of teeth. When diseases of dental pulp are not treated in a timely or unsuccessfully, inflammation can spread to the periapical region through the root canal, causing periapical lesions.

1) EXAMINE ANATOMY OF TEETH
In the process of permanent tooth eruption, the mandibular first molar is the first tooth to erupt. Due to long retention in the mouth and maintenance delayed after tooth eruption, the mandibular first premolar has a high risk of dental caries and subsequent pulp diseases [157], [158]. Therefore, the mandibular first molar often requires endodontic treatment. The distal root of the mandibular first molar may have extra root variation, and the misjudgment of the root anatomical structure of the mandibular first molar will lead to the failure of endodontic treatment [158], [159]. CBCT is the gold standard for determining the number of roots, but it is not suitable for everyone because of its high radiation dose [160]. In order to solve this problem, panoramic radiography is widely used in the clinic. Hiraiwa et al. [161] used a deep learning system (AlexNet and GoogleNet) to detect the distal root structure of mandibular first molar on panoramic dental radiographs, respectively. The diagnostic performance of both deep learning systems was slightly superior to that of radiologists with many years of experience.

2) EVALUATE DIFFICULTY OF TREATMENT
Before endodontic treatment, dentists should carefully evaluate the difficulty level of the case and the ability to handle the case to reduce the failure rate of treatment. American Association of Endodontists (AAE) Endodontic Case Difficulty Assessment Form is used to assess the difficulty level of the case. Mallishery et al. [162] combined SVM with AAE Endodontic Case Difficulty Assessment Form to evaluate the difficulty level of the case. This method is vital for developing countries, where dentists often neglect referral guidelines; for example, young doctors can select an appropriate treatment concerning the results of the evaluation or refer the case to an experienced doctor.

3) DIAGNOSIS PERIAPICAL LESION
Tissue biopsy is the gold standard for the analysis of periapical lesions, but it can prolong the treatment time because of its destructiveness. Okada et al. [163] constructed a noninvasive differential diagnostic tool for periapical lesions based on a graph-based random walk segmentation and an LDA-AdaBoost classifier that consists of LDA and AdaBoost classifier. Orhan et al. [164] designed an artificial intelligence diagnosis model based on deep CNN to determine the location of periapical lesions and calculate the lesion volume on CBCT images.

E. DENTAL IMPLANT
The dental implant is a method to repair dentition defects and dentition loss. With the continuous expansion of dental implant indications, dental implant therapy is suitable for increasing patients. Currently, many severe problems can be solved by using computer-aided systems based on ML techniques. The application of ML techniques is manifested in the following five main areas.

1) ANALYSE MECHANICALLY
When using Implant-tooth-supported fixed dentures treatment strategy, due to the original biomechanical destruction VOLUME 8, 2020 and daily mastication, the traditional methods are difficult to mimic the postoperative biomechanical behavior of alveolar bone. Zhang et al. [165], in order to solve this problem, the poroelastic finite element model (FEM) and the kernel least mean square (KLMS) were used to predict the alveolar bone material properties and the optimal treatment before operation.

2) PREDICT POSTOPERATIVE OUTCOME
Before the dental implant operation, the doctor will choose the appropriate dental implant plan according to the patient's individual situation and clinical experience, which is subjective. Liu et al. [166] used DT with both Bagging and Adaboost techniques to predict the failure rate of implant surgery. This helps dentists appropriately alter treatment strategies to reduce the risk of failure. Ha et al. [167] applied DT to discover that mesiodistally plays an essential role in successful dental implant surgery, so dentists should pay more attention to this factor.

3) PREDICT PERI-IMPLANT INFLAMMATION
Peri-implant inflammation is the most common postoperative complication of dental implants, which often leads to dental implant failure [168]. Timely diagnosis and appropriate treatment plan are the keys to the successful treatment of periimplant inflammation. Peri-implant inflammation is divided into three subtypes (purely plaque-induced, prosthetically, or surgically triggered peri-implantitis). Canullo et al. [169] used data mining tools containing regression methods and C4.5 DT to evaluate the characteristics of peri-implant inflammation and found that the three subtypes were independent and had their unique characteristics; hence it is necessary to provide appropriate causal treatment.

4) CLASSIFY BONE MINERAL DENSITY
The alveolar bone mineral density is often divided into different categories according to the quantity and quality of cortical bone and loose bone. Goiato et al. [170] found that the survival rate of dental implants was distinct in different alveolar bone mineral density. However, Misch et al. [171] found that the survival rate of all dental implants was similar when alveolar bone mineral density was correctly evaluated before the operation, and appropriate treatment is designed. Due to the bone mineral density largely depends on the shape of bone trabeculae, the combination of machine vision and ML techniques can thoroughly analyze the medical images of alveolar bone. Sorkhabi and Khajeh [172] proposed a 3D CNN method to evaluate the alveolar bone density from CBCT volumetric data.

5) DETECT MANDIBULAR CANAL
The inferior alveolar nerve (IAN) runs in the mandibular canal. The location of the mandibular canal is closely related to the dental implant. Any injury to the IAN could result in temporary or permanent damage, which is a severe complication of the dental implant. Therefore, it is a critical step to locate the mandibular canal before dental implant surgery accurately. Kwak et al. [173] detected and segmented the mandibular canal on CBCT images applying 3D U-Nets. What is remarkable is that both the above methods showed accurate results. The detection of the mandibular canal on CBCT images using deep learning methods can improve the accuracy of locating mandibular canal location and reduce the manual labor of dentists.

F. ORTHODONTICS
Orthodontic treatment is to adjust the abnormal relationship between maxilla and mandible, upper dentition and lower dentition, and between teeth and jaws to achieve the balance, stability, and beauty of the stomatognathic system. With the development of the economic level, people pay more and more attention to orthodontics. As the process of orthodontic treatment is lengthy, orthodontists need to improve their efficiency to meet the needs of society. The application of ML techniques can solve this problem.

1) SEGMENT TEETH
Tooth segmentation is one of the critical steps of computeraided orthodontic technology, which needs to accurately locate and extract the tooth shape in the patient's 3D digital dental mold. The accuracy of tooth segmentation is closely related to the treatment results. Tian et al. [174] recommend a method that uses sparse voxel octree and 3D CNNs to segment and classify teeth on 3D digital dental molds. The accuracy of tooth segmentation by this method is 89.81%. Xu et al. [175] proposed a 2-level hierarchical CNNs structure, including teeth-gingiva labeling and inter-teeth labeling for 3D dental model segmentation and refined the boundary using improved fuzzy clustering. The segmentation results can be directly applied to the orthodontic CAD system. Pei et al. [176] proposed a CBCT image segmentation approach based on a 3D exemplar-based random walk. It is satisfactory that the outcome of automatic segmentation is similar to that of manual segmentation. Juodzbalys et al. [177] developed a novel deep learning method called MeshSegNet, which is an extension of PointNet. Using this method, teeth were automatically labeled on raw dental surfaces obtaining from the 3D intraoral scanner (IOS).

2) PRE-ORTHODONTIC DECISION-MAKING
Deciding whether to extract premolars plays an essential role in orthodontic treatment planning. Inappropriate decisions will increase the difficulty of orthodontics and even lead to failure. Jung and Kim [178] constructed an intelligent decision model for the diagnosis of extractions based on NN with a back-propagation algorithm. The decision result of tooth extraction is obtained by inputting 12 cephalometric results and six additional indicators into the model.

3) FACIAL MEASUREMENT
Facial measurement, which is one of the critical steps in orthodontic treatment, helps to relate the soft tissues to the hard tissues. In the clinic, manual facial measurement is an inefficient process. Of note, the inaccurate location of the anatomical structure will lead to the inaccuracy of the measurement. Rao et al. [179] used a deep learning model based on You-Only-Look-Once (YOLO) architecture to recognize faces, and then use facial landmarks recognition model based on active shape model (ASM) algorithm to recognize facial landmarks. The landmarks are connected and measured automatically, and the results not only have higher accuracy but also significantly improve the efficiency compared with manual measurement. For different algorithms, orthodontists tended to focus on accuracy and efficiency. Park et al. [180] compared two of the latest ML techniques (YOLO version 3 (YOLOv3) and Single Shot Multibox Detector (SSD)) for automatic identify cephalometric landmarks. The results showed that YOLOv3 had a smaller identifying error and faster mean computational time. YOLOv3 seems to have more potential to be accepted by orthodontists.
To sum up, ML techniques have a great prospect in stomatology, especially in prediction and diagnosis. For prediction, the main highlights of ML are as follows. Biomarkers were screened to predict prognosis, lymph node metastasis, and the risk of oral cancer. ML combined demographic and lifestyle can determine the risk of root caries. For diagnosis, the main highlights of ML are as follows. The combination of ML and images can help dentists diagnose many diseases, such as oral cancer, PBL, periapical lesion, and dental caries. Also, mandibular canal and facial landmarks can be automatically recognized, and teeth can also be segmented automatically in digital models. For challenges and solutions, we will discuss them in detail in the next section.

III. CONCLUSION AND FUTURE WORKS
To sum up, ML techniques have been widely applied in stomatology due to some characteristics of stomatology. 1) In the field of stomatology, radiographic images are ubiquitous and the basis of follow-up treatment. Radiographic images are often used to identify lesions and determine the anatomical structures.
2) The data are sufficient. Some oral lesions are widespread, such as dental caries, periodontal diseases, periapical lesions. Moreover, the popularity of electronic medical records also makes it easy to obtain complete data. 3) Many factors cause some oral diseases. Four primary factors theory, which consists of microorganisms, time, host, and food, is a recognized cause of dental caries. The causes of temporomandibular joint disorder syndrome include immune, social, masticatory habits, anatomical factors, and so on. 4) The content of the data is varied, such as radiographic image, demographic, clinical presentation, pathological examination. Considering the third and fourth points, ML techniques are good at processing these data and finding these connections. 5) Some works are regular and repetitive. There are many apparent gold standards for diagnosing oral diseases on radiograph images. If the ML model can perform a satisfactory diagnostic level, it will significantly improve medical efficiency and reduce medical costs.
ML techniques show its unique advantages in the field of stomatology, including prediction and diagnosis. During the stage of prediagnosis, ML techniques can predict disease risk and prognosis based on the comprehensive information of patients. Further, dentists can select high-risk patients for early intervention and choose appropriate nursing methods after treatment. In this way, dentists can effectively allocate medical resources. During the stage of diagnosis, ML techniques exhibit relatively excellent performance compared with dentists, especially in the identification of radiological images and the integration of multiple factors. All these benefit from that ML techniques can recognize the complete details and subtle changes in the images. Notably, ML techniques can ensure absolute consistency of performance when dealing with large amounts of data.
It is worth noting that the request of ML techniques in medical imaging is more prominent. There are many algorithms applied in medical imaging, such as NN, SVM, DT, CNNs. Among the above algorithms, CNNs gets the most attention [181]. A remarkable advantage of CNNs, as compared with other algorithms, is that it can effectively determine features of medical images. The practical application of ML in medical imaging is as follows [182].
1) Detection of anatomical structures. The detection and location of anatomical structures in medical images is an essential step in the process of medical inspection. For example, the mandibular canal can be located on CBCT images, and the distal root structure of mandibular first molar can be detected on panoramic dental radiographs [161], [173]. Besides, combining cell images and ML can help pathologists to analyze tissues. For example, oral epithelial cells obtained by DIC microscope from habitual smokers can be used to assess the risk of developing oral cancer.
2) Segmentation. Before specific diagnoses, it is essential to segment-specific structure on medical images, especially teeth in orthodontic treatment. The tooth can be accurately segmented and labeled on digital images with the aid of ML [174], [175], [177].
3) Computer-Aided Detection. Lesion regions can be located on medical images for alerting clinicians in order to reduce the false-negative rate. PBL in periodontitis patients and dental caries can be detected on X-ray images [139], [140], [152], [153]. 4) Computer-Aided Diagnosis. In addition to the diagnosis of clinicians, Computer-Aided Diagnosis provides a second objective diagnosis. Oral cancer can be diagnosed with various types of images [123], [124], [127].
However, there are still the following problems in the application of ML techniques.
1) It is challenging to obtain numerous high-quality and unified data sets. In order to extract features more accurately for the training model, researchers need to obtain numerous accurate data sets. However, different professional levels of clinicians will lead to the uneven quality of data sets. Also, the total data of oral lesions seem to be sufficient in clinical, but the data are often divided into different hospitals instead of being shared considering patient privacy. So there are very little data available for training models compared with other fields.
2) It is risky to use predictive or diagnostic models. Although the model has high accuracy in laboratory tests, predictive or diagnostic models may be misdiagnosed in clinical due to the individual patients' symptoms.
3) Black box. In order to expansion the trust of the dentists, the models must be explained how they obtain the diagnosis and prediction results. Because of the complexity of the algorithm, these models are considered to be black boxes, which are difficult for researchers to explain the internal mechanism. 4) It takes much energies to train a large ML model. Training a complex ML model approximately produces CO 2 emissions, which is close to that of a roundtrip flight between New York City and San Francisco [183]. In future studies, 1) Protect the privacy of patients. Researchers must use legal and ethical means to obtain data. When using data, the principle of confidentiality must be observed. In order to solve the difficulty of gathering data due to the principle of confidentiality, researchers can exchange models instead of data, which can protect not only patient privacy but also ensure sufficient training data. 2) Increase the credibility of the model. Doctors should be involved from the early stage to the end stage of the developing model. As shown in Fig. 3, first of all, doctors should determine the purpose of the model and collect relevant data that must be ensured fairness and unbiased. Next, computer-related researchers train and test the model. Finally, the model must be validated in the clinic before it is widely used. The most noteworthy thing is that the black box should be explained. 3) Determine the extent of the use of the model. With the continuous updating of algorithms, some models show excellent diagnostic and predictive performance, which even exceed clinicians. In some simple repetitive tasks, for example, tooth labeling, ML can sometimes replace doctors and improve the efficiency of diagnosis and treatment. When making complex and important decisions, for example, determination of the appropriate operation type, doctors should consider their ability and the overall situation of patients instead of over-relying too much on ML techniques. Therefore, it is worth to pay more attention to know in what way we can combine clinicians with ML more reasonably. 4) Update machine learning models in time. In addition to the continuous optimization of ML algorithms, the increase of high-quality data also dramatically improves the performance of the model.