Longitudinal study for dental caries calibration of dentists unexperienced in epidemiological surveys

Abstract This study aimed to make a longitudinal analysis of interexaminer calibration reproducibility in diagnosing dental caries in posterior teeth, by examiners without previous experience in epidemiological studies. A group of 11 inexperienced examiners underwent theoretical-practical training and calibration assessments, assisted by a standard examiner. An examiner who did not participate directly in the research selected 5-year-old children with and without caries. The D3 diagnostic threshold was used to evaluate dental caries, based on the World Health Organization (WHO) criteria. The initial calibration (baseline) was performed after the theoretical-practical training session, and consisted of examining 20 children; the second calibration occurred three months later, and involved evaluating another 18 children. The interexaminer agreement was obtained by kappa statistics, and by overall percentage agreement. The paired t-test was applied to compare the values for kappa means and overall percentage agreement between the time points studied. At baseline, the values for kappa (> 0.81) and overall percentage agreement (> 95.63%) were considered high. At the 3-month calibration assessment, all the examiners showed some decrease in both kappa (p < 0.0001) and overall percentage agreement (p = 0.0102). The calibration process currently proposed by the WHO is effective. However, reproducibility was not maintained over time for inexperienced examiners evaluating the posterior teeth of 5-year-old children, under epidemiological conditions.


Introduction
7][18] Accordingly, the decline in the prevalence of caries disease in some regions of the world in recent years has made kappa statistics the most appropriate choice for assessing reproducibility. 4ive-year-old preschool children represent a significant challenge in epidemiological surveys, mainly because they may display more unpredictable behavior, be reactive to the environment, and move around frequently during the exam, making it difficult for the examiner to assess their oral health.  In ord to cope with these difficulties, examiners must be calibrated to evaluate this age group, since the most significant number of caries lesions in dentin are found in children under six years of age. 21,22n addition, it is difficult to evaluate posterior teeth reliably in the context of an epidemiological investigation.The prevalence of cavitated caries lesions is higher on the occlusal surfaces of posterior teeth than on the smooth surfaces of anterior teeth, and performing a diagnosis can be a challenge on occlusal surfaces, considering their greater anatomical complexity. 13,23Therefore, studies that report high values of agreement among examiners may not be portraying the actual difficulties involved in examining all the anterior and posterior tooth surfaces, and in diagnosing all the carious lesions that might be found.This difficulty may lead to an overestimation of the reported reproducibility values, hence to an erroneously reported low prevalence of the disease.
3][24][25] This imbalance restricts how effectively the calibration process can be determined, because the same yardstick being used to measure the agreement level of experienced examiners should be able to determine how well the calibration methodology can measure high levels of agreement for new examiners.
In this context, the aim of this study was to conduct a longitudinal analysis of the interexaminer reproducibility of examiners without previous experience in epidemiological surveys, taking into account the difficulties in detecting lesions in the posterior teeth of children with primary dentition, under epidemiological conditions.

Methodology Ethical aspects
This study was conducted in accordance with the Declaration of Helsinki, and was approved by the Research Ethics Committee of the School of Dentistry of Piracicaba (CAAE 06263219.4.3001.5418).

Study design Characterization of the examiners
A group of 11 dental students of a postgraduate course were randomly selected.They had previous experience in performing the dental examinations of children under 6 years old, in an indoor setting, but not under epidemiological conditions.The group was assisted by a standard examiner, with extensive experience as an examiner, and coordinator of epidemiological surveys. 26

Characterization of the children examined
Five-year-old children from a public school, previously authorized by their parents or guardians were included in the study.They did not use any orthodontic appliance, or have any physical or intellectual disability that could preclude their participation.Children both with and without dental caries were selected, although the caries were at different stages of progression.The two groups exhibited low caries prevalence (both at baseline and after 3 months) (Table 1).

Characterization of epidemiological examinations
Examinations were carried out at school, in an outdoor setting, with the examiner using the tactile visual method, with a flat # 5 mouth mirror, WHO 621 / ball point probes, and with prior tooth brushing.During the assessment, the children remained seated in front of the examiner.Previously trained note takers recorded dental caries on WHO forms, using the D3 diagnostic threshold 27 , frequently used in epidemiological surveys. 7,9,27,28It should be pointed out that only deciduous molars were evaluated.

Calibration process
The calibration process of the examiners was conducted in accordance with the WHO criteria. 26nitially, a theoretical discussion lasting 4 hours was held to establish the standardization of the 11 examiners in regard to the index used for caries evaluation.The standard examiner gave a precise theoretical explanation of the index codes and criteria, after which a theoretical expository exercise was conducted, by exhibiting photos of each clinical situation that could be found in the exams.
In the practical training stage, the standard examiner performed a clinical demonstration of how the exams should be carried out, using 2 children, illustrating how the materials should be positioned, and showing how the forms should be organized.The examiner explained the ergonomic aspects of a dentist's working postures, together with an oral health assistant.Three four-hour training sessions were carried out under the same circumstances and conditions as would be done in an outdoor setting, as described above, to resemble the reality of epidemiological surveys as closely as possible.At this stage, discussions were allowed between the dentists and the standard examiner regarding the clinical findings, diagnostic criteria, coding and registration errors, with the purpose of reaching an acceptable level of agreement.

Calibration evaluation
Calibration (baseline) was performed when the standard examiner judged that the examiners were qualified, and when an overall percentage agreement level of over 85%, and the kappa statistics (>0.80) had been obtained.At this stage, 20 children were evaluated in a 4-hour period under the same exam conditions, and in the same environment as the practical training session.
Three months after the baseline calibration, a new calibration assessment was made.At this time, the D3 codes and criteria were adopted, but the conditions for holding the dental examinations were not established, since the aim of this stage was to assess whether the baseline calibration levels had been maintained after three months.The examiners evaluated another group of 18 children, following the same criteria as those proposed in the initial stage, during one 4-hour period.It should be pointed out that no discussions were allowed between the examiners and the standard examiner at this stage.

Statistical analysis
The data were dichotomized for analysis purposes into with and without the disease in the posterior teeth; that is, the codes related to the present and past history of the caries experience, and those related to the absence and presence of the disease were grouped into two distinct blocks.The interexaminer agreement values at baseline and after three months were obtained using the kappa statistic, and overall percentage agreement.The paired t-test was applied to compare the values for kappa means and overall percentage agreement between the time points studied.The analyses were performed using the SAS statistical program (SAS Institute 2011 version 9.4, Cary, USA).

Results
Table 1 describes the sample characterization in relation to prevalence of caries at baseline and after three months.The prevalence of caries in the two groups differed, with a higher prevalence at baseline than after three months.The prevalence of caries was also higher when only the group of posterior teeth was considered.Table 2 shows the values for kappa and overall percentage agreement at baseline and after three months.At baseline, the interexaminer agreement for both kappa (> 0.81) and overall percentage agreement (> 95.63%) was high.After three months, a decrease was observed in the values for kappa and overall percentage agreement.Five examiners had kappa values below 0.61 (moderate agreement), and the other six had values ranging between 0.61 and 0.80 (substantial agreement), respectively. 16The values for overall percentage agreement ranged from 89.58 to 97.22.

Table 3 compares the values for mean kappa and overall percentage agreement for interexaminer reproducibility according to the time periods studied.
There was a significant decrease in both the mean kappa (p < 0.0001) and the overall percentage agreement (p = 0.0102) values, three months after the initial assessment (baseline).

Discussion
The representative samples of the studied population with the highest prevalence of the disease were selected based on the calibration exercises of the examiners determining dental caries, so that reliable values of reproducibility could be achieved. 11,26owever, the current global context of lower and polarized caries prevalence should be taken into account, 1,2,21,29 since this selection cannot be made in practice, when an epidemiological survey is performed.
Diagnosis using the WHO criteria for healthy anterior teeth is relatively easy, 27 since visualization is facilitated even under epidemiological conditions.1][32] Studies have shown that the kappa values for posterior teeth are considerably lower than those of anterior teeth. 13,30Under these  circumstances, therefore, it is essential to focus the calibration exercise on the group of teeth with the highest prevalence of the disease.
3][24][25] Thus, there is a gap in the literature regarding the calibration of caries.When examiners with no experience in the practice of oral health surveys are evaluated, it may not be possible to assess how effective the proposed methodology is for training new examiners.
In this study, the interexaminer kappa values were shown to be high at baseline.This outcome corroborated the results of other studies, 13,24,25 and reinforced the finding that the stages of theoreticalpractical training are efficient in assessments conducted in the short term.At this time point, even inexperienced examiners had no difficulty in assessing the presence/absence of caries.This could be attributed to the more recent performance of the theoretical and practical calibration stages; therefore, the codes and classification criteria were also more recent in the examiners' memory.
In contrast, there was a significant decrease in the mean kappa values for the calibration assessment after three months.Other longitudinal studies have shown that the kappa statistic remained high and stable after 12 months post theoretical-practical training stage. 13,24This could be attributed to the studies' having been conducted with examiners who were experienced in caries surveys, and who were used to the environment, the data collection, the calibration scheme, and the WHO criteria.In addition, the children evaluated were in the seven-year-old age group, and were selected with different levels of disease prevalence. 24urveys for collecting epidemiological oral health information on the population is an essential step in the organization, management and planning of public dental services. 6,21,33Failure to diagnose cavitated lesions has costly economic implications, since the cost of dental treatment is much higher in these cases. 6,34n the context of public oral health services, the calibration of new examiners is a practice required to train dentists who are not familiar with the indexes used, and the conditions of the outdoor settings where oral health surveys are often conducted.In these circumstances, the longitudinal design of this study was essential, because it demonstrated the failure to maintain the data reproducibility results over time.This failure may make the strategy used for calibrating the examiners in the present study inadequate for meeting future treatment demands in this age group.Moreover, it may lead to training unprepared examiners.
In our study, even within the context of the low prevalence of the disease, and the decrease in the number of children with caries lesions assessed after three months, the kappa value and the overall percentage of agreement dropped dramatically.In principle, this point could constitute a limiting factor of this study.However, it should be pointed out that although dentin caries prevalence has decreased worldwide, the number of initial enamel lesions may be increasing.This epidemiological change may have led to an increase in the disagreements in diagnosing conditions related to healthy and decayed teeth (enamel/dentin lesions).Furthermore, with time, examiners may not remember all the nuances regarding the WHO coding, and this may also give rise to disagreements among them.
In this respect, the methodology currently proposed by the WHO, 26 and followed in this study to evaluate caries in posterior teeth in the 5-year age group, proved not effective over time in maintaining the calibration status of examiners inexperienced in calibrating epidemiological surveys.This ineffectiveness reinforces the clinical and social relevance of the present study.The only way inexperienced examiners can become better prepared is if more time is dedicated to theoreticalpractical training than the time proposed at present.Moreover, additional periods of theoretical and practical classes for examining survey participants must be provided under epidemiological conditions.In view of the foregoing, a possible solution would be to recalibrate inexperienced examiners periodically, until they reach a minimum level of experience, so that reliable reproducibility values can be maintained over time.

Conclusion
The values for mean interexaminer kappa and overall percentage agreement of examiners with no previous experience in epidemiological caries surveys decreased when they were assessed three months after baseline.This outcome demonstrates that the calibration process currently proposed by the WHO does not maintain the initial calibration level of inexperienced examiners over time.The calibration process was based on the assessment by inexperienced examiners of the posterior teeth of groups of 5-year-old children, under epidemiological conditions.Therefore, a longer time of theoretical-practical training and periodic recalibration assessments is required until reliable levels of reproducibility can be maintained.

Table 2 .
Kappa values and overall percentage agreement for interexaminer reproducibility at baseline and after three months.

Table 3 .
Mean and standard deviation of kappa values and overall percentage agreement for interexaminer reproducibility in evaluating examiner calibration at baseline and after three months.