Level of evidence in hand surgery

Background Few investigations have been done to analyze the level of evidence in journals related to hand surgery, compared to other related research fields. The objective of this study was to assess the level of evidence of the clinical research papers published in the Ibero-american (RICMA), the European (JHSE) and American (JHSA) Journals of Hand Surgery. Methods A total of 932 clinical research papers published between 2005 and 2009 (RICMA 60, JHSE 461, and JHSA 411) were reviewed. Two independent observers classified the level of evidence based on the Oxford International Classification, 5 being the lowest level and 1 the highest level. The observed frequencies of the level of evidence for each journal were compared with the expected frequencies by a chi-square (χ 2) test for categorical variables with a significance level of 0.05. Results Inter-observer agreement analysis showed a Kappa of 0.617. Intra-observer agreement analysis presented a Kappa of 0.66 for the observer 1, and a Kappa of 0.751 for the observer 2. More than 80% of the papers in RICMA and JHSE and a 67.6% in the JHSA presented a level of 4. No level 1 or 2 studies were published in RICMA, compared to JHSE (0.9% level 1 and 5.0% level 2) and JHSA (8.3% level 1 and 10% level 2). The percentage of papers with level 3 published in RICMA (16.7%) was higher compared to the JHSE (11.1%) and the JHSA (14.1%). All the results were statistically significant (χ2=63.945; p<0.001). Conclusions The level of evidence in hand surgery is dependent on the type of journal; being the highest level evidence papers those published in the JHSA, followed by the JHSE and finally the RICMA. Knowing the status of the level of evidence published in hand surgery is the starting point to face the challenges of improving the quality of our clinical research


Background
Since the first system for classifying the level of evidence of the clinical research papers was reported [1], Evidence-Based Medicine has become an important part of our clinical practice. Hand surgeons should understand the level of evidence in order to become aware of the reliability and the utility of the data provided in a research paper. Few investigations have been done to analyze the level of evidence in journals related to hand surgery compared to other related research fields, as in the orthopaedic surgery [2,3], and plastic surgery [4] journals. Only one specific hand surgery journal has been analyzed for evidence level over a six month period of time, and has been compared to other orthopaedic publications [3]. To our knowledge, no paper comparing the level of evidence in clinical research published in three hand surgery specific journals over a period of five years, has been reported before. The purpose of this paper was to assess the level of evidence of the clinical re-

Hypothesis
The researchers established the null hypothesis (Ho) that the variable "level of scientific evidence" was independent of the variable "type of journal".

Eligibility criteria and population study
Inclusion criteria. All the clinical research articles, which were published between January 2005 and December 2009 in the: Ibero-American Journal of Hand Surgery (RICMA) ("Revista Iberoamericana de Cirugía de la Mano"); The Journal of Hand Surgery European Volume (JHSE) and The Journal of Hand Surgery American Volume (JHSA).
Exclusion criteria. Animal studies, anatomical and cadaver studies, basic science studies, instructional course lectures, supplements of abstract, short reports, letters to the editors and review articles were not considered as feasible for the study.

Assessment of level of evidence
The articles which met the inclusion and exclusion criteria were randomly assigned to two independent observers (YMH, LRM), with large clinical experience in hand surgery, and very familiar with Evidence-Based Medicine. An approximate equal number of articles from each journal were assessed by each observer. The level of evidence for each article was assessed based on The Oxford Centre for Evidence-based Medicine -Levels of Evidence (March 2009) http://www.cebm.net/ index.aspx?o=1025 [5]. The observers were blinded to any previous information related to the level of evidence of the articles to be analysed (e.g.: "level of evidence and type of study", which is included in The JHSA with the abstract of every clinical research paper since 2006). The articles were ranked according to their level of evidence from Level 1 (highest level of evidence, e.g.: systematic review (SR), meta-analysis (MA), and individual randomized controlled trials (RCT) with narrow interval confidence) to level 5 (lowest level of evidence, e.g.: expert opinion). Assessment of the sub-group level of evidence, as well as the type of research article was not done by the observers (Table 1).

Reliability analysis
Before starting the study, the reliability of the assessment was evaluated based on the analysis of both the intraobserver error and inter-observer error. A random sample of 30 clinical research articles, from a total of 872 papers, published in the English language (461 from JHSE, and 411 from JHSA), were assessed by the two independent observers assigned to the study. After 15 days, a second assessment was undertaken with the order of the articles changed. No papers from the RICMA were included in the sample study for the reliability analysis. This was done so as to avoid information bias [6], because the different languages present in the RICMA publication (Spanish and Portuguese), could increase the intra-observer reliability. The intra-observer and interobserver reliability was studied using the Kappa coefficient test with a significance level of 0.05.

Data analysis
For the assessment of the results, the number of articles for each level of evidence rating was expressed as a percentage of the total number of articles meeting the inclusion and exclusion criteria for the period time study. The observed frequencies of the level of evidence for each journal were compared with the expected frequencies using a chi-square (χ 2 ) test for categorical variables with a significance level of 0.05.

Results
Inter-observer analysis showed a Kappa of 0.617, with an asymptotic standard error of Kappa (SE) of 0.117. Intraobserver analysis presented a Kappa of 0.66 (SE 0.114) for observer 1, and a Kappa of 0.751 (SE 0.103) for observer 2. All Kappa values were significant (p< 0.001). More than 80% of the papers in the RICMA and the JHSE; and a 67.6% in the JHSA presented a level of 4. Not one of the level 1 and 2 papers was published in the RICMA compared to the JHSE (0.9% level 1 and 5.0% level 2) and the JHSA (8.3% level 1 and 10% level 2). The percentage of papers with level 3, published in the RICMA (16.7%), was higher compared to the JHSE (11.1%) and the JHSA (14.1%) ( Table 2). All the results were statistically significant (χ 2 =63.945; p<0.001) and the null Hypothesis (Ho) was rejected.

Discussion
Results of this paper have demonstrated with a goodexcellent level of reliability that the variable "level of evidence" is dependent on the variable "type of journal".

Reliability analysis
The use of Kappa is important, as an often used proportion of agreement does not allow for the fact that some agreement is due to chance. A statistically significant Kappa coefficient means that the agreement is different from zero (null agreement). However, the interpretation of obtained values of kappa is subjective, and different classifications or guides have been proposed to interpret the Kappa coefficient in the reliability analysis. In this paper, the level of agreement in the inter observer and intra observer analysis has shown that a kappa value ranging from 0.617 to 0.751, can be considered as having an excellent to a good level of reliability [7,8] in the assessment of the level of evidence and the type of journal. Similar results have been reported before. Obremskey et al. [3], in the assessment of the level of evidence in orthopaedic journals, have reported Kappa values of 0.62 for inter observer agreement between inexperienced reviewers, and a kappa value of 0.75 for inter observer reliability between experienced reviewers. No intra observer agreement analysis was reported by those authors.

Level of evidence and type of journal
Not many papers have studied the level of evidence in hand surgery journals or in related research fields, such as orthopaedic and plastic surgery journals. Sinno et al. [4], reviewed 726 from six different plastic surgery journals and the level of evidence was assessed using a classification based on the Oxford Centre for Evidence level A systematic review (SR) is generally better than an individual study. Experimental study (e.g.: good quality RCT) is generally better than any observational study. For observational studies : cohort study is generally better than any case-control study . A case-control study is generally better than any case-series study. * By poor quality cohort study we mean a cohort study that failed to clearly define comparison groups and/or failed to measure exposures and outcomes (preferable blinding) in the same objective way in both expose and non-exposed individuals and/or failed to identify control known confounders and/ or poor follow up. The same for poor quality case-control study except that the patients are identified based on the outcomes in this design ( e.g.: failed replant) called "cases" are compared with those who did not have the outcome (e.g.: had a successful replant) called "controls" and consequently we do not have "exposed and non-exposed . The level of evidence was assessed using a classification included in the guide for authors (JBJS-A grading system) which was very similar to the one developed by the CEBM, in order to demonstrate trends in the level of evidence over 30 years. Furthermore, Obremskey et al. [3] reviewed 382 clinical research articles from nine different journals in order to assess the level of evidence in orthopaedic journals. In this paper, 932 clinical research papers from three specific hand surgery journals were reviewed, which constitutes the largest population of scientific clinical articles assessed to study the level of evidence reported until now. The results of this paper demonstrate that most of the clinical articles published in hand surgery, are papers with a very low level of evidence (80% level 4 in the JHSE or RICMA and 67.6% in the JHSA). Most of those papers were case-series and less frequently, poor quality cohort or poor quality case-control studies. Those results were higher compared to orthopaedic journals (48 % level 4 studies) [2], to plastic surgery journals (40% level 4 studies) [4] and to ophthalmology journals (58% Level 4 studies) [9]. However, other surgical journals as ear, nose and throat (otolaryngology) journals present a percentage similar to JHSE and RICMA (80% Level 4 studies) [10]. The percentage of level 4 papers in JHSA was lower, as compared to the rest of the hand surgery journals investigated, and it was very close to the one published by Obremskey et al. [3], who reported a 68.8% of level 4 papers, in a review of 32 articles published in the JHSA from January to June 2003.
The percentage of papers with a higher level of evidence (level 1 and 2), was larger in the JHSA (8.3% level 1 and 10% level 2), compared to the RICMA (0%) and the JHSE (0.9% level 1 and 5% level 2). Whilst compared to other journals, there was 21% of level 1 and 15% of level 2 of evidence in orthopaedic journals [2], 3% of level 1 and 16% of level 2 in plastic surgery journals [4], 18% of level 1 and 8% level 2 in ophthalmology journals [9], and 7% of level 1 and level 2 in otolaryngology journals [10].
The percentage of papers with level 3 (mostly casecontrol studies and non-consecutive cohort studies or with very limited population) published in the RICMA (16.7%) was higher compared to the JHSE (11.1%) and the JHSA (14.1%); and similar to other journals: 16% in orthopaedic journals [2], 16% in otolaryngology journals [10] and 16% in ophthalmology journals [9]. Hence, some authors have criticized the low number of high evidence level in surgery [11]. Even so, the criticism may seem overly severe, if we take into account that surgical trials are different from trials, which compare a medication with a placebo. Surgical procedures are invasive; it is difficult to randomise patients, blinding is a problem in surgical trials, and they are very expensive. If we do not have high quality randomized trials we cannot have a systematic review which synthesizes the evidence previously reported.
No trend analysis is a limitation for this paper, and the information within should be the purpose of further studies, in order to understand how the evidence published in hand surgery journals has changed and how the relationship between changes in the level of evidence and changes in the impact factor index, have also changed over time.
After reviewing several articles published in journals from different parts of the world, other questions have arisen. These being, whether the differences that we have found are a reflection of different regional priorities or how the resources used for research have an impact on our findings and even if different countries are the main contributors in high level studies.