Scope and challenges of machine learning-based diagnosis and prognosis in clinical dentistry: A literature review

Background: Machine learning (ML) has emerged as a branch of artificial intelligence dealing with the analysis of large amounts of data. The applications of ML algorithms have also expanded to health care, including dentistry. Recent advances in this field point to future improvements in diagnostic techniques and the prognosis of various diseases of the teeth and other maxillofacial structures. Aim: The aim of this literature review is to describe the basis for ML being applied to different dental sub-fields in recent years, to identify typical algorithms used in the studies, and to summarize the scope and challenges of using these techniques in dental clinical practice. Relevance for Patients: The proficiency of emerging technologies that have begun to show encouraging results in the diagnosis and prognosis of oral diseases can improve the precision in the selection of treatment for patients. It is necessary to understand the challenges associated with using these tools to effectively use them in dental services and ensure a higher quality of care for patients.


Introduction
Artificial intelligence (AI) can be defined as the non-biological ability of a machine trying to imitate human intelligence to accomplish complex tasks, such as problem-solving, object and word recognition, and decision-making [1][2][3][4]. The first reports of its applicability in Medicine appeared in the early 1970s, with the advent of some experimental computer systems [5,6] and new surgical research by Gunn, who explored the possibility of diagnosing acute abdominal pain with computational data analysis [7].
The major advances in the use of AI techniques in Medicine have been limited to the prognosis and diagnosis of diseases or health events, thereby enabling clinical decisionmaking through the application of machine learning (ML) algorithms that use supervised learning [8]. Accordingly, several examples support the promising findings that have resulted from the application of these technologies, further encouraging their use [9][10][11][12][13][14][15].
The use of ML has also been expanded to other domains of healthcare, including dentistry. According to the PubMed repository records, reports describing the favorable results of the application of these techniques in various clinical disciplines in dentistry began to appear in the early 1990s. The scientific literature on the subject highlights orthodontics as one of the specialties where the use of ML has brought palpable benefits, especially through the extraction of characteristics from radiographic images [16][17][18][19][20][21][22]. Many studies using data mining have yielded reliable results for differential diagnosis of a wide range of oral diseases [23,24]. These reports indicate high accuracy in diagnosis, allowing the standardization of procedures and time optimization in the analysis of large databases. In addition, the previous studies report advances in Cariology, both in the generation of predictive risk models and in DOI: http://dx.doi.org/10.18053/jctres.07.202104.012 the estimation of useful patterns in the diagnosis of dental caries [25][26][27][28][29][30][31][32]. Thus, the recent application of ML in these fields may lead to future improvements in diagnostic techniques and in the prognosis of various diseases of the teeth and other maxillofacial structures [6].
However, there are few records that summarize the primary applications of ML in the wide spectrum of dental sub-fields and discuss the key factors guiding the application of these technologies in dental practice. Although the subject has gained importance in recent debates, most of the previous reviews focus on areas such as dental imaging and address specific algorithms. This is comprehensible if we take into account that the use of ML may not be the best solution for every problem. Even so, in recent times, there are several reports that bring the use of ML to other dental disciplines. Therefore, it is crucial to identify the impact of these techniques in the broad field of dentistry. Further, the challenges associated with the clinical application of these tools, such as the selection of dental datasets and reference standards, need extensive debate, and discussion [4,6,[33][34][35]. A comprehensive analysis of the current use of these techniques to support the decision-making of professionals would be of great interest to improve clinical judgment, reduce errors in diagnosis, and lower treatment costs in the dental office environment.
In this review, we describe the basis for ML being applied to dentistry and summarize the primary ML-based approaches in different dental sub-fields in recent times. We also analyze the typical algorithms used in studies that apply ML methodologies. Finally, we discuss some key factors that may guide the practical implementation of these technologies based on the current trends in this field.

ML
Recent years have seen rapid growth in data recording efforts due to the ability of computer systems to store and share large volumes of information. This explosion of information has often been called "big data." Therefore, it has become necessary to develop new procedures that combine statistical (mathematical) and computational patterns in the analysis and critical interpretation of datasets, to obtain valid criteria and make effective predictions [36]. In this context, ML emerges as a branch of AI that involves the analysis of large amounts of data [2]. ML algorithms learn from the previous examples, performing useful inferences to assist decisionmaking [37,38]. In the ML environment, the goals are focused on the recognition of significant data patterns within datasets as well as in proposing models that best explain the data [39].

ML paradigms
At present, the advances in these technologies allow for distinguishing between various types of learning methods and algorithms for data pattern recognition. Supervised, unsupervised, and reinforcement learning paradigms have been widely accepted for this purpose [39][40][41].
In the domain of supervised learning, ML operates through mapping functions between the available input and output variables [39]. It contributes to the enrichment of the analysis, bringing them closer to true or established criteria by involving labeled variables. This offers an advantage for its implementation in medical practice, replacing or complementing expert opinions in solving various problems. Supervised learning is recognized as the most widely used and promising approach in this field. These techniques had been generally used in dentistry for the prediction of some events or diseases [25,42], as well as in the diagnosis and classification of specific dental and oralmaxillofacial conditions [43][44][45].
In contrast, unsupervised learning approaches involve algorithms that are executed only from input data [39]. Here, the algorithm is trained for finding similarities, allowing for non-linear and interactive combinations of various predictors, identifying patterns in the data, and achieving reliable results in the analyses performed. Unlabeled dental patient databases allow for the recognition of labels like those related to certain patterns of bone loss associated with periodontal disease. This can create clusters that allow further analysis. The techniques described above can also be combined in a semi-supervised approach [46].
Finally, reinforcement learning can be represented as an extension of dynamic programming techniques, in which the underlying model is arrived at through complex mechanisms of positive reward and punishments [47]. In dental clinical practice, the high precision of algorithms using reinforcement learning has become evident in applications involving image processing [38,[48][49][50][51].

Clinical Applications of ML Algorithms in Dental Practice
We conducted a review of original studies that reported applications of ML algorithms in different fields of clinical dental practice published during the past 10 years. . Review papers, case series, case reports, editorials, letters, comments, conference abstracts, and papers limited to describing educational methodologies or the use of robotic devices were excluded from the search. A total of 542 titles were initially identified. After analyzing the abstracts of the studies that met the eligibility criteria, 62 papers were included in this review. In this review, we provide detailed comments on these findings and discuss some interesting aspects of the main ML algorithms used in these reports. (Table 1) The vast majority of ML applications in dental clinical practice appear to be linked to diagnostic abilities. In this sense, ML algorithms have allowed us to optimize and improve the use of available data in Orthodontics, with substantial contributions to the diagnosis of dental maxillofacial anomalies for the assessment of treatment needs through the use of training datasets containing radiographic images [16,17,19,20,22,50,[52][53][54][55][56][57].

Applications in orthodontics
A previous study in this sub-field successfully extracted the landmarks for craniofacial features and analyzed cephalometric variables to provide accurate diagnoses of dental deformities [53]. Standard procedures for cephalometric analyses are liable to multiple errors resulting from frequent deviations between the locations of various observers. AI techniques contribute to improving these procedures. In this proposal for automated dental deformity diagnosis methods, support vector machine (SVM) algorithms showed the best performance (98% accuracy) in comparison to the accuracy of other classifiers. SVM-based models have gained popularity in the ML community. These classifiers are the most effective means of separating data in multidimensional space [58,59]. It is possible to use this method to reduce the requirement for the number of images and increase the speed of the analysis when compared to the performance of the dentist. SVM algorithms have been described as appropriate for learning tasks in which the number of characteristics is large. Moreover, the complexity of SVM is not altered by the number of functions found in the training set, which is a desirable property for its practical applications [60].
Several studies point to the success of the application of different types of neural networks algorithms in the segmentation, automatic detection, analysis, and extraction of image features, to enable more effective diagnosis in Orthodontics [16][17][18][19]21,22,50,[54][55][56][57]61]. A recent study that used convolutional neural networks (CNNs) algorithms with a large dataset achieved high-quality training, generating a high-precision model for the analysis of cephalometric data. The model corresponded well with the criteria chosen by experienced human examiners, thus meeting the gold standard at a faster rate [22]. The use of CNNs algorithms has been widely disseminated in dentistry and it is recognized that these algorithms can create significant improvements in the quality of images by reducing dispersion and artifacts [35,41]. These deep learning algorithms are a priority in complex tasks in which there is a large amount of unstructured data, as would be the case with the classification of dental images, where their effectiveness has already been established. These classifiers have the ability to recognize hidden relationships between interdependent variables and estimate decision rules. They are preferred when precision is prioritized. However, classical algorithms are more useful for simple tasks in which there is no unstructured data and the interpretability of results is of higher priority [22,35,41].
Recent research has encouraged the use of neural networks for prediction tasks [18]. Hybrid genetic algorithms and artificial neural networks (ANNs) have been implemented in order to establish a prognosis of the size of non-erupted canines and premolars during the period of mixed dentition. The impacts of orthognathic treatment on facial attractiveness and age appearance for common patients and patients who have undergone cleft treatment have also been explored by applying CNNs [21,61]. One particular study does not recommend the application of CNNs algorithms in some cases [61]. However, the report describes possible limitations related to the characteristics of the study population and the specific scores used to measure the perception of physical attractiveness. These limitations may have contributed to the negative outcome of using CNNs.
It is necessary to emphasize that to achieve high clinical precision with the use of neural network algorithms, it is advisable to increase the training data set, obtain better estimates of the weights of the connections, and increase the classification and predictive power. Furthermore, hybrid dental data collected on a large scale from multiple healthcare institutions and public registries could be useful during the training phase. Interested researchers should collaborate and share their data collections to bring new advances in this field [62].

Applications in periodontics (Table 2)
In the field of periodontics, ML algorithms have demonstrated a good performance in working with molecular profile data, immunological parameters, bacterial profiles data, or clinical and radiographic variables of affected patients [42,[63][64][65][66][67]. Identification of bone levels, identification of bacteria in subgingival fluid samples, and analysis of gene expression profiles from periodontal tissue biopsies are crucial for obtaining clear evidence in the specific diagnosis of periodontal disease. This diagnosis could be complex for early practitioners during routine examinations, and the use of ML techniques can be highly productive in such cases. Most of these studies used SVM as the classifier in the analysis [63][64][65]67]. Two of the studies [64,67] used other algorithms such as naïve Bayes, ANNs, and decision tree algorithms. A decision tree provides a hierarchical organization from a root node, which is at the highest hierarchical level. It is commonly argued that the decision tree has high explanatory capacity and transparency, and also provides comprehensibility in the analysis [60]. Several authors have made suggestions to further improve the effectiveness and simplicity of the models [68,69]. Further, the use of ensemble methods makes it is possible to construct more robust tree models, such as bootstrap aggregated (or bagged) decision trees, random forest trees, and boosted trees. In the case of naïve Bayes, it is considered simple and computationally efficient and uses all available information to explain the decision, which is an advantage in the clinical environment [60].
Preliminary research applying ANNs and using abundant samples collected from previous studies shows evidence for the good performance of this classifier [42]. Models using ANNs have already demonstrated their effectiveness in other fields of Medicine [59,70].

Applications in oral medicine and maxillofacial surgery (Table 3)
In the last decade, the applications of ML-based diagnosis have been expanded to the segmentation and identification of maxillofacial cysts and other radiolucent lesions, as well as the diagnosis of other common oral diseases of growing interest in the oral medicine and maxillofacial surgery domain [23,24,43,44,71]. The gold standard for the final diagnosis of these lesions is a specific histopathological examination, which tends to be more invasive and entails greater cost and resources. Therefore, the advancement of new methodologies that lead to more simplified and precise diagnostic procedures requires special attention. Several studies have successfully applied the CNNs algorithm for this purpose [23,44,71]. One of the studies reported that the best performance was achieved by using SVM (96% accuracy) to classify dental periapical cysts and keratocysts in a set of 3D cone beam computed tomography (CBCT) images [43]. The study also explored other classifiers such as ANNs, k-nearest neighbors (k-NN), naïve Bayes, decision trees, and random forest. The application of the k-NN classifier is based on the principle that nearby data points have similar properties. Reports show that k-NN is very sensitive to irrelevant characteristics, which can make learning inefficient and affect the interpretation  of the final model. However, the transparency of this classifier makes it useful, as it reflects the intuition of human users. This characteristic is common to naïve Bayes classifiers as well [72]. In general, the selection of these algorithms for specific objectives is a difficult task. Further, the evaluation of their performances may be different according to the characteristics and pre-processing of the datasets [60,72].
Other studies have reported the application of selection casebased reasoning (CBR) in the analysis [24]. CBR provides feedback through the collection of the previous cases and learns from these. Thus, new rules can be defined even with the addition of new cases. Many conditions of the oral cavity show similar clinical signs and symptoms, which can often hinder the diagnosis [24,73]. This affects the reliability of the diagnosis and the subsequent treatment scheme to be followed by the patient. The CBR technology has been valuable in developing a meticulous and systematic approach for the unique identification of these diseases, seeking a more exact characterization of analogies and features that distinguish them. The results, though incipient, highlight the capability of the algorithms in improving the quality and efficiency of patient care through accurate diagnosis.
These techniques have also been applied in the classification of maxillary sinus lesions, detection, and segmentation of structures, as well as classification of the state of dental artifacts for efficient image veracity checks [51,[74][75][76][77].
The success of ML algorithms in predicting perioperative blood loss before orthognathic surgery has also been recently investigated [78]. The use of a random forest classifier could anticipate possible complications during the surgical procedures and calculate the likely perioperative blood loss before performing the surgery. This prediction could be useful in decision-making by professionals and patients in elective procedures and facilitate better management of surgical procedures. In a recent study, ANNs were used to determine the diagnosis of orthognathic surgery cases, showing a model success rate of 96% for diagnostic decisions [79]. (Table 4) Forensic dentistry and anthropological examinations have also been impacted by the advances of these modern tools. One of the studies that aimed to classify tooth types based on dental CBCT images applied CNNs algorithms with an accuracy of 88.8% [80]. Several studies have focused on the application of automated methods in the estimation of age using teeth, an important aspect of forensic dentistry [48,[81][82][83]. Manual methods that estimate age based on the stage of tooth development are painstaking and complex. An interesting study focused on the evaluation of skeletal patterns using automated algorithms [84]. Automated systems that contribute to improving the quality and speed of MLbased age estimation are very useful and warrant more attention devoted to their further development and validation [82]. (Table 5) Recent studies employing these techniques in endodontics have indicated several performance advantages. However, some of these studies constitute pre-clinical studies and the generalization of their results has been limited. Among the applications in this area, locating the minor apical foramen using feature extraction procedures from radiographs is distinguished [85,86]. These ex vivo studies applied ANNs and revealed promising results that should be further explored. The exact estimation of the working length is an important initial step to a successful root canal treatment. The introduction of improvements in locating the area of greatest constriction (minor apical foramen) in the root canal constitutes a key element of renewed interest for clinical endodontists.

Applications in endodontics
Another study aimed at evaluating the performance of ANNs in the diagnosis of vertical root fracture using a moderate sample of extracted teeth [87]. Although the results were successful, a larger number of teeth and other dental groups should be included in the analysis to arrive at more reliable conclusions. The previous studies that collected in vivo data were also reviewed [49,88,89]. These studies used CNNs algorithms to aid in the diagnosis of the number of roots of the mandibular first molars [88], to detect vertical root fractures on panoramic radiographs [89], and to improve the diagnosis of periapical pathosis on CBCT images [49]. One of the studies [88] found that deep learning systems using CNNs to process inexpensive radiographic images showed high DOI: http://dx.doi.org/10.18053/jctres.07.202104.012 accuracy (86.9%) in the differential diagnosis of a single or extra root in the distal roots of the mandibular first molars. (Table 5) Findings from previous studies already reflect possible applications, both for the development of diagnostic systems for caries lesions through images and for the estimation of the prognosis of the disease [26][27][28][29][30][31][32]90]. Furthermore, dental caries continues to be a major oral health problem for many groups of the population [91][92][93]. In this context, new alternatives and tools that guarantee improvements in current methods of diagnosis and prognosis will be well-received among practicing dentists and patients.

Applications in cardiology
One particular study [28] aimed to develop models for the prediction of root caries in adults and reported an excellent performance from a large database. Among the algorithms used, the SVM-based method demonstrated the best performance for the identification of root caries, with an accuracy of 97.1%, precision of 95.1%, sensitivity of 99.6%, and specificity of 94.3%. However, the study reported some problems concerning the use of crosssectional data, which could affect the predictive value of the model. Generalization and validation strategies using longitudinal data should be further explored. Another study, which stands out for the quality of its design, focused on the prediction of caries in geriatric patients using general regression neural networks (GRNN), and showed promising results, with a sensitivity of 91.41% on the training set and 85.16% on the test set [31]. GRNN represents an improved neural network technique based on nonparametric regression. These algorithms can be very useful for making predictions and comparisons of system performance in practice due to their ability to converge upon the underlying function of the data with only a few training samples. The additional knowledge required to obtain the adjustment successfully is relatively small and can be achieved without additional input from the user, which is highly advantageous [31,37].
A novel study evaluated, for the 1 st time, the cost-effectiveness of these technologies, which is a vital consideration for their application in the clinical environment [90]. The study reported encouraging results for the use of automated methods in the diagnosis of caries.

Applications in prosthetics, conservative dentistry, and implantology (Table 6)
Other subfields of dentistry such as prosthetics, conservative dentistry, and implantology have also benefited from the implementation of ML techniques. The use of ANN-based systems that support the prognosis of facial deformation after a complete prosthesis has been promoted in these fields. The experimental results showed that this method can predict the deformation of facial soft tissues quickly and accurately, which is valuable in making decisions to establish concrete treatment actions [94]. Other studies have focused on the classification of specific features of teeth using ANNs [95], the potential of ANNs in improving tooth color through computer-assisted systems [96], and the automated detection and classification of various dental restorations in panoramic radiographs using SVM [97]. The study using SVM reported high accuracy (93.6%).
Other applications focused on using CNNs to predict the probability of shedding composite resin crowns fabricated with computer-aided design [98]. Another study used the Extreme Gradient Boost (XGBoost) algorithm to develop a clinical decision support model for the prediction of tooth extraction therapy related to the subsequent use of dentures [99]. The algorithm showed an accuracy of 96.2% and proved to be a powerful classifier and regressor, generally reaching optimal performance in structured data.
In the field of implantology, two studies demonstrated a good performance of CNNs in detecting implant systems [45,100]. The placement of dental implants as a branch of rehabilitation treatment has recently become common among patients who require more aesthetically and functionally sound dental prosthetic rehabilitation. At present, a considerable number of implant systems are placed with different fixation structures and characteristics, such that their identification based on routine radiographic images can be complicated for clinical dentists. Proper identification of these systems would reduce more invasive treatments in which the patient requires some type of repair or repositioning of the implant system. Another application in this area is focused on the prediction of patients' mean peri-implant bone levels, which is very useful for estimating the survival of the implant and exploring plausible treatment alternatives that lead to the best outcomes [101]. This study used SVM for modeling and reported moderate performance.

Key Factors Influencing the Clinical Implementation of ML
The findings presented above point to a promising future of ML algorithms applied in dental practice. In this section, we will outline some key factors that should be considered to effectively guide the practical application of these methodologies.

Defining the clinical uncertainty
A clear definition of specific clinical uncertainty must be identified. The desired outcomes should be properly defined to guide the research. In addition, the objective should be adequately represented in terms of inputs and outputs, must correspond to plausible criteria, and consider the real clinical scenario in addition to results of other analyses. Reduction of heterogeneity between reports should be ensured with the use of clear and unambiguous guidelines. A previous study makes valuable suggestions in this regard [102].

Data management
Reproducing the findings of developmental studies in clinical practice poses certain challenges. Studies using ML attempt to generalize the links of input and output variables in the data set through the learning process. However, data sets are often confounded by noise. Therefore, the intrinsic characteristics of the DOI: http://dx.doi.org/10.18053/jctres.07.202104.012 data (e.g. categorical data, numeric data, and time-series), origin, volume, outliers, missing data, etc., could affect the reliability of the models and lead to false interpretations while evaluating performance [39]. For example, variations in classification criteria of periodontal disease (target condition) were detected when retrospective data were analyzed. Further, one must be aware of what is called "data leakage" [8,39,40]. Data leakage occurs when a certain attribute accidentally encodes the result (e.g. when the need for a partial denture already indicates the diagnosis of edentulism). Initially, covariates with the same meaning must be eliminated. Criteria on biological plausibility should guide the selection of covariates. Strategies to reduce data dimensionality are fundamental in ensuring simplicity and effectiveness of the models. Thus, data pre-processing, cleaning, and normalization are essential steps in all analyses [39,103]. Automated techniques such as constructive induction, attribute interaction discovery, and non-linear modeling approaches through embedded, filters or wrappers methods have been used for data mining [104,105]. One of the reviewed studies that aimed to estimate predictors associated with peri-implantitis implemented some of these techniques in an attempt to reduce the dimensionality of the data [101].
Moreover, biases can be introduced when the data do not capture the epidemiological reality of events [40]. For example, a dental caries prediction model built on specific attributes of a population of adolescent dental students from high-income countries may not be generalizable to make diagnostic inferences of periodontal disease for all the population, without including women in the initial training. These predictions would introduce certain disparities and false interpretations in the models. In this context, it is essential to collect representative samples based on idealized hypotheses. The use of traditional power calculation methods, considering the size of important clinical differences, similarity, non-inferiority, or superiority of the models using ML in contrast with the gold standard, may allow for later extrapolation of the results [106]. A previous study focused on the prediction of dental caries better illustrates this analysis [31]. Emphasis should be placed on the availability of data from real examples for subsequent application of the model in clinical practice [39,106]. Public access repositories that record specific medical and dental data could be very useful for achieving this goal [40]. One of the previous studies made use of publicuse data from the National Health and Nutrition Examination Survey and reported good performance of ML classifiers in predicting root caries in this dataset [28]. Repositories with specific dental information are scarce. In this sense, efforts to create interoperable data sources should be expanded to provide support for the implementation of these methodologies [39]. This will, in turn, help in the evaluation of their performance by achieving more transparency and reproducibility in reporting. We should note the variability related to differences in annotations and particular characteristics of the data sources (medical records, X-rays, photographs, laboratory tests, and models) between various repositories [39]. Therefore, we consider it essential to apply standards to AI-based technologies, creating a common nomenclature that facilitates the implementation of consistent methods of data storage and retrieval across public platforms. Moreover, dental records and medical reports should be preserved and submitted to expert committees to integrate into these repositories [107,108].

Training, validation, and test sets
Given the lack of certainty regarding the algorithm to obtain the best classifier, it is useful to have non-overlapping, random datasets for training, validation, and/or testing. Thus, comparisons between different algorithms can be made on the training set, the model can be tuned and optimized on the validation set, and the performance of the model can be evaluated on the test set [108]. This process reduces the possibility of over fitting the model conditioned by the memorization of the characteristics of the training data, which can cause the model to fail when used in an independent sample. Similarities in the success rates of the training set, validation set, and test set imply the best generalization of the model. A study that used neural networks in the diagnosis of teeth extractions followed by the above approach for evaluating the capacity of expert systems with ML showed adequate performance for the different sets. Thus, the system could be tested more extensively to support decision-making during dental practice [19]. Other approaches to guide validation, such as resampling methods (e.g. bootstrapping and cross-validation), have been heavily recommended in clinical prediction model guidelines, when is not possible to obtain a separate set [4,109,110]. However, the results should be interpreted with caution as only the best results were reported while comparing several algorithms using this procedure.
As the number of examples influences the learning process in ML, some strategies are expected to contribute to the improvement of training. Augmentation can be used when the image dataset is collected. This technique allows us to artificially expand the size of a dataset by creating modified versions of images, improving the ability of fit models to generalize what they have learned to new images. The use of transfer learning can also provide training opportunities and help improve model performance. Several previous dental studies have implemented these approaches [27,29,56,88]. Over-sampling and repeat-sampling techniques can also be applied to reduce biases and improve model performance [106,110].

External validation
Even when internal validation is applied reserving a separate set of the same sample for testing, training data would be too personalized and the performance of the classifiers could be overestimated. Therefore, to extrapolate the results and ensure certainty in the clinical performance of the models, verification through an independent set (external validation) is crucial. External validation should be done in cohort studies, ideally with data acquired independently by means of geographic (e.g. dataset collected from a department of pediatric dentistry in another district) or temporal (e.g. dataset collected from other pediatric patients in the same department after 6 months) splits. Open access data sets, if available, could also be obtained [39,106].

Unbalanced classes
One of the recurring problems encountered during medical diagnosis is facing unbalanced classes where there is a disproportionate number of observations in each class (e.g. caries present or caries absent; oral cancer or not oral cancer). In this scenario, the classifiers exhibit poor precision in the minority class, as standard algorithms are designed to maximize accuracy and reduce error rates. They ignore the difference between types of misclassification errors, which can prove costly in medical or dental practice [111]. For example, classification in the diagnosis of oral cancer may favor false negatives (individuals with cancer but classified as negative). Consequently, these patients would not receive medical care. In this scenario, the accuracy metric may show a high value but bring a misleading understanding of the actual performance of the model. Solutions for this problem have been suggested both at the data level and at the algorithmic level. Solutions at the data level include different ways of resampling such as random oversampling with replacement, random under sampling, directed oversampling (where the choice of samples to replace is reported, rather than random), directed under sampling (where the choice of examples to be eliminated is reported), oversampling with informed generation of new samples, or combinations of the above techniques. Solutions at the algorithmic level include adjusting costs of the various classes to counteract the class imbalance; for example, decision trees can be used to adjust the probabilistic estimate on the tree leaf [111,112].

Reference standard (ground truth)
In the field of data mining, the definition of the standard reference is important for labeling the data and evaluating the performance of the classifier. Arriving at an optimal definition can be challenging. The clinical reference standard is understood as the best available method to establish the presence or absence of the target condition. However, the gold standard would be an error-free reference standard [113]. Consequently, we can deduce that these reference standards are often not perfect. A concrete example is the routine visual and tactile examination used by the clinician for the diagnosis of dental caries, which can lead to errors. In this context, sensitivity or specificity values may be overestimated or underestimated. Evaluating the correlation between the performance of the classifier and the standard reference is necessary, and the use of external data may generate new adjustments. Assuming a composite reference standard that combines the results of multiple imperfect tests may be another alternative. Routinely used clinical data may add additional value to the analyses [114]. For example, when evaluating the presence of periodontitis, clinical observations, such as profiling of the pocket on probing, signs and symptoms of the periodontium, and bleeding, should be considered in addition to considering the bone level using radiographic criteria. This will provide a more robust standard and reduce possible misinterpretation. Panel or consensus diagnosis including a sufficiently large sample of experts could generate greater credibility compared to the isolated judgment of an expert. The inclusion of less experienced clinicians should be limited, as it can lead to errors in the interpretation of the algorithms' performance. In addition, the calibration process and methods to measure inter-and intra-variability should be considered [115].

Classifier selection
The selection of the appropriate classifier still constitutes a gap in the literature related to the topic. Most of the classifiers proposed in the studies reviewed by us showed good performance. In order to extend these studies to clinical dental practice, interpretable models are preferred. These models allow for a better explanation of the selected parameters and will be more useful in decision-making [110]. Therefore, if these classifiers show similar precision levels while comparing several models, the explainable ones should be prioritized. Accuracy results can be misleading, especially when using unbalanced classes. Metrics such as the confusion matrix and area under the receiver operating characteristic curve (AUC) can provide better insights into model performance in clinical practice [103,106,116]. Other metrics for evaluating models used for the detection or segmentation of objects in images have also been reported [102].

Deployment and clinical application
Some aspects should be carefully evaluated before extending the application of these technologies to clinical practice. As with their usefulness in decision-making, cost-effectiveness analysis, benefits in the quality of services, and acceptance of their introduction by patients and clinicians ought to be considered. However, studies on the practice usefulness of models using data mining in dentistry are rare [117]. A recent report that provides details in relation to the evaluation of cost-effectiveness in the detection of proximal caries supported the possibility of introducing these tools in clinical practice [90]. In general, it is necessary to check out-of-sample datasets to verify the usefulness of these technologies before their dissemination in dental practice.
Well-designed observational studies should focus on evaluating the impacts of these tools over time, by comparing scenarios wherein these tools have been used with those wherein they have not been used. The literature reviewed by us lacks randomized clinical trials, which should also be encouraged to improve the body of evidence on the potential for data mining in dental practice [106,117]. Moreover, studies conducted across institutions may better reflect the capabilities of these technologies and give greater external validity to the results [39].
Another key point is that these analyses are often reserved for specialists in an area with no appropriate interface provided for the implementation of these resources by routine health professionals. The interdisciplinary scientific community should perpetuate innovative solutions to contribute to improvements in this area. In dentistry, our findings revealed the development of several automated systems that have shown good performances, especially when applied to image-based diagnosis. Exploratory studies with a focus on the aforementioned aspects will be able to guarantee the practical expansion of these technologies in the clinical setting.
Further, the community of health professionals should also offer formal acceptance of use in service after rigorous analysis of these technologies to guarantee the effectiveness of their application. These technologies will also be a source of data storage for future research and evaluation of the best evidence related to certain diagnosis or treatment schemes [8]. However, the utility of these systems will not replace the function of the clinician, who generally evaluates other aspects in relation to the clinical, psychological, and behavioral conditions of the patient. Rather, it will serve to facilitate certain decisions in the real environment. The financial aspects of the implementation of ML are usually another concern. As these techniques undergo improvements with their continued use in healthcare, greater financial contributions will be required for research and further development [107,116,118].

Final Considerations
The use of ML has recently been expanded to different clinical dental specialties. Algorithms such as CNNs and SVM have shown promise in boosting the future use of these resources in different aspects of dental clinical practice. They offer a vast arsenal of tools to support diagnosis and prognosis and to improve clinical decisions. The ethical aspects of accessing and handling large amounts of sensitive information deserve special attention. Meticulous data pre-processing is recommended to extract useful models. The use of longitudinal study designs and verification through clinical trials should be guided by the advances of these novel techniques. External validation must also be extended to guarantee the use of efficient and generalizable models. In addition, the homogenization of the methodologies for the presentation of these studies in clinical practice should be systematically improved for a better understanding of such proposals. Future studies in the field of dentistry should be aimed at obtaining solid evidence on the performance of these models and promoting their introduction in general practice.