Evaluation of three classification systems for fractures of the distal end of the radius: Frykman, Universal and A.O.

(four and eight orthopedists) evaluated all the images at two different an interval of 1 The inter and intraobserver concordance analyzed using the weighted Kappa coefficient. t-test for paired applied to verify if there was a significant difference in the degree of inter-observer concordance between the instruments. Results: Universal classification showed great intra-observer reproducibility (k = 0.72) and moderate interobserver reproducibility (k = 0.48). Frykman had moderate and mild intra and interobserver reproducibility, respectively (k = 0.51 and 0.36). The classification of the group A.O. demonstrated mild intraobserver and interobserver reproducibility (k = 0.38 and 0.25, respectively). Conclusion: The highest intra and interobserver concordance was observed in the Universal classification, followed by Frykman and, finally, that of the group A.O. The reproducibility of the classification did not vary significantly with the degree of experience of the evaluator.

which can be classified from 1 to 8 5 . The Universal or Rayhack classification was created in 1990 and modified by Cooney in 1993 6 . It differences between intra and extra-articular fractures, with or without deviations, their reducibility, and stability 6 . The A.O./OTA Group classification was created in 1986 and revised in 1990. It is divided into extraarticular (type A), partial articular (type B) and complete articular (type C). The three groups are organized in increasing order of severity concerning morphological complexity, difficulty of treatment, and prognosis 7 .
The studies currently found in the literature present very different methodologies and show low intra-and interobserver reproducibility in the different classifications of fractures of the distal end of the radius, without consensus on which system should be used in daily practice and the conduction of scientific studies 4,[8][9][10] .

OBJECTIVE
The objective of this work is to evaluate the reproducibility of the three main classifications and to define which one has the highest intra and interobserver agreement, and whether the training stage of the participants influences the evaluation.

INTRODUCTION
Fractures of the distal end of the radius are defined as those that occur up to three centimeters from the radiocarpal articulation 1 . It has an incidence of approximately 1: 10,000 people, representing 16% of all fractures of the human body 2 . The most affected age group is between 60 and 69 years, mainly women, but there is an increase in the incidence among young people due to traffic accidents and high-energy sports injuries 1-3 . The high incidence in the elderly is correlated with osteoporosis, female sex, white race, and early menopause [1][2][3] .
The diagnosis of radio fractures is based on medical history, physical examination and image evaluation, generally obtained with plain radiographs of the wrist in the anteroposterior (AP) and lateral view [1][2][3] . Fractures at the distal end of the radius are divided according to the pattern of the injury . Therefore, classifications are important insofar as they help to make decisions about treatment to institute and guide the prognosis of fractures 4 .

METHODS
This is an observational study, which includes imaging examinations of 14 patients seen in the emergency department of a public health hospital, diagnosed with a fracture of the distal end of the radius, from June to September 2017. All included patients had radiographs in two views, anteroposterior and profile. Patients with immature skeleton, those without satisfactory radiography, and those with previous wrist fractures or deformities were excluded. For the assessment, 15 cases were presented to the evaluators, with one patient being repeated on purpose, in order to improve intraobserver precision.
Twelve orthopedists in different stages of training were selected as participants, eight members of the Brazilian Society of Orthopedics and Traumatology, two specialists in hand surgery and six non-specialists; and four resident physicians, one in the first year of training (R1), two in the second year (R2) and one in the third year (R3). The evaluators classified the fractures presented after a brief explanation of the classification systems and their consultation was allowed at any time during the evaluation. After seven days, the participants classified the same fractures again.
The study met all requirements concerning the rights of human beings and was approved by the institution's Research Ethics Committee (substantiated opinion No. 2,294,348).

Statistical analysis
The weighted Kappa coefficient composed the inferential analysis for intra and interobserver concordance of Frykman, Universal, and AO classifications. The Student's t-test for paired samples was applied to verify if there was a significant difference in the degree of inter-observer concordance between the instruments. The interpretation of the Kappa values was made following what was proposed by Landis and Koch, in 1977 11 , according to which the Kappa values below zero represent deficient reproducibility, from zero to 0.20 insignificant, from 0.21 to 0.40 slight reproducibility, 0.41 to 0.60 moderate reproducibility, 0.61 to 0.80 large reproducibility, and greater than 0.80 is considered a near-perfect match. The values obtained from the Kappa statistic were tested at a significance level of 5%.

RESULTS
Among the classifications, a better reproducibility was observed in the Universal classification, with a Kappa index of 0.72 considered as a great intraobserver reproducibility. In the inter-observer evaluation, this index showed a slight decrease, ceasing to have high reproducibility, changing to moderate with a value of 0.48. The Frykman classification had a Kappa index of 0.51, and reproducibility is considered moderate for intraobserver evaluations. In the inter-observer evaluation, the index was 0.36, classified as mild. The A.O. had a slight intraobserver and interobserver reproducibility (κ = 0.38 and 0.25, respectively) ( Tables 1 and 2).
When analyzing the classification of the repeated fracture, it was observed that only one evaluator questioned that the same radiograph had been previously evaluated. However, all the evaluators classified the lesion in the same way in at least one of the three systems. The Frykman classification showed reproducibility equal to the Universal classification, with seven correct answers, while that of group A.O. presented five correct answers (Figure 1).
When analyzing the degree of education and experience of the evaluator, there was no statistically significant variation about the values of the Kappa index (p < 0.05).

DISCUSSION
The ideal classification of any fracture should provide enough information to help make appropriate treatment decisions, determine the prognosis, in addition to having satisfactory reproducibility and being accessible to memorize 12 . The reproducibility of the system is based on inter-and intra-observer concordance, and a useful classification must be reproducible so that it can be widely accepted and allow different series to be compared 4,8 . In the present study, we analyzed the reproducibility of fractures of the distal end of the radius, and a more significant  13 , studied four classifications for distal radius fractures: Frykman, Melone, Mayo, and A.O. They found that none of them showed high interobserver concordance (Kappa between 0.61 and 0.80). In Frykman's classification, the intraobserver concordance ranged from 0.40 to 0.60, and the interobserver had an average Kappa index of 0.36. Regarding the A.O. complete, the mean intraobserver concordance ranged from 0.22 to 0.37, and, when reduced to three categories, a concordance level of 0.58 to 0.70 was obtained. However, by reducing to three categories, the A.O. system has questionable value compared to other classifications.
Assessing the reproducibility of the A.O. in 30 radiographs of distal radius fractures, classified by 36 observers with different levels of experience, Kreder et al., in 1996 14 , showed that the interobserver concordance was better for the simplified classification (κ = 0.68) and progressively decreased when including the groups (κ = 0.48) and subgroups (κ = 0.33) of this system. The Kappa index ranged from 0.25 to 0.42 for intraobserver concordance with the A.O. system and from 0.40 to 0.86 in the simplified classification. There was no difference regarding the degree of experience of observers in classifying "groups" and "subgroups." Illarramendi et al., In 1998 15 , used 200 radiographs classified by six observers with different levels of experience. For the Frykman classification, moderate interobserver reproducibility (κ = 0.43) and good intraobserver reproduction (κ = 0.61) were obtained. For the A.O.classification, they found slight interobserver reproducibility (κ = 0.37) and moderate intraobserver reproducibility (κ = 0.57). However, to obtain such results, the authors simplified the Frykman and A.O. classifications, improving the reproducibility of both, which perhaps would not occur if they were complete. There was greater intraobserver than interobserver reproducibility, and concordance did not improve with increasing observer experience.
There is still no consensus on the ideal methodology in the reproducibility studies of the classifications, since the number of image examinations analyzed and the number of evaluators influence the concordance of the answers [13][14][15]  In the present study, we chose to reduce the number of fractures, totaling 15 with two incidences each, so as not to make the process tiring, which could harm the results of the evaluations. However, in concordance with the previous studies, from reproducibility, we found that the classifications evaluated were not satisfactory, with a result considered good only for intraobserver concordance at Universal. In the rest, the concordance was mild to moderate [13][14][15] . Another point of concordance with the studies cited is the little influence of the level of experience of the participants when classifying distal radius fractures, since there was no significant difference between residents and specialists 13,15 .
Besides, unlike previous research, we purposely repeated a case for better assessment of intraobserver concordance. It was observed that many evaluators were unable to identify that they were classifying repeated radiographs, confirming the difficulty in creating a highly reproducible classification system.

CONCLUSION
The highest intra and interobserver concordance was observed in the Universal classification, followed by Frykman and, finally, that of the group A.O.; however, we found that the reproducibility of the classifications