Reporting of Randomised Clinical Trials in Skull Base Surgery: A Fourteen-Year Review

Skull base surgery has experienced dramatic advances in the last decade. Recently, various surgical disciplines have conducted reviews of quality of randomised controlled trials (RCTs). This is the first review to our knowledge regarding RCT quality within skull base surgery. Systematic review of skull base surgery RCTs published between 2000 and 2014 were conducted. Literature search provided 96 papers. Duplicates and trials which did not meet our inclusion criteria were excluded. This left 28 papers for analysis. A total of 1785 patients participated across trials. Consolidated Standards of Reporting Trials Jadad scale were assess to the quality of reporting. These were our main outcome measures. The mean CONSORT score prior to 2011 was 16.9 (n = 17, range; 13 – 22), and post 2011 was 17.5 (n = 11, range; 12 – 22). The mean Jadad score was 3.1 (n = 28, range 2 – 5). CONSORT were found to increase significantly with both increasing sample size (rho=0.467, p=0.012) and Jadad scores (rho=0.540, p=0.003). Linear regression showed CONSORT increase by 0.36 (95% CI: 0.02 – 0.70, p=0.041) for each additional 10 patients included, and by 1.50 (95% CI: 0.58 – 0.24, p=0.002) for each increase of one in the JADAD score. There are common omissions related to randomization, sample size calculations and availability of protocols. RCTs in skull base surgery are comparable to other surgical disciplines. We recommend utilisation of the CONSORT statement during protocol formation of RCTs to improve reporting of trials.


INTRODUCTION
Properly conducted trials, following scientific platforms are widely accepted as the foundations of treatment efficacy and safety [1]. The importance of such trials is that evaluation of a smaller population, where outcome of treatment variability is analysed, can effectively influence the management of the general population in the future. Retrospective trials contain serious potential bias, which could potentially influence outcomes. As a result it is accepted that the gold standard for clinical investigations is the randomised controlled trial (RCT), however these are not without controversy, especially in surgical disciplines where it may be difficult or even unethical to randomise to a non-surgical treatment arm.
RCT reporting should transparently convey the design, conduct, analysis and learning points [2]. Despite this, RCTs are still not being reported adequately [3][4][5]. Poor reporting can create difficult interpretation and application. [ (Table 1) [9]. The Jadad scale is a similar tool used to assess effectiveness of randomised controlled trials using a three item system, resulting in a score from 0 (low-quality study) to 5 (high-quality study) ( Table 2). It has been found to contain many of the important elements that have empirically been shown to correlate with bias and it has known reliability and external validity [10].
Reviews of RCT reporting within various surgical specialities, including paediatric, general and trauma surgery highlights multiple weaknesses resulting in a lack of transparency of reporting [11][12][13].
Having first been described in the late nineteenth and early twentieth centuries, skull base surgery has experienced dramatic advances over the last decade [14]. This includes advances in surgical technique, neuronavigation and optics, as well as involvement of specialities outside of neurosurgery [15]. As a result, there is an understandable groundswell of interest with appropriate research within this domain. We aim to utilise the CONSORT guidelines and Jadad scale to assess the quality of reporting, whilst simultaneously highlighting areas of research, and revealing future aspects of skull base surgery yet to be subjected to RCT. This is highly relevant within an age of evidence based medicine, owing to the importance of the quality of data collection and reporting. The aim of this paper is to analyse previous trials and provide a platform for effective future trials. To our knowledge there is no such paper analysing the strength of skull base surgery reporting. It is important to provide this information to assess the reliability of the data we provide within different surgical disciplines. We added a sub-item on providing a structured summary of trial design, methods, results and conclusions and referents the CONSORT for abstracts article Item 2b (introduction) We added a new sub-item (formerly item 5 in CONSORT 2001) on "Specific objectives or hypotheses" Item 3a (trial design) We added a new item including this sub-item to clarify the basic trial design (such as parallel group, crossover, cluster) and the allocation ratio Item 3b (trial design) We added a new sub-item that addresses any important changes to methods after trial commencement, with a discussion of reasons Item 4 (participants)

Formerly item 3 in CONSORT 2001
Item 5 (interventions) Formerly item 4 in CONSORT 2001. We encouraged greater specificity by stating that descriptions of interventions should include "sufficient details to allow replication" Iteam 6 (outcomes) We added a sub-item on identifying any changes to the primary and secondary outcome (endpoint) measures after the trial started. This followed from empirical evidence that authors frequently provide analyses of outcomes in their published papers that were not the pre-specified primary and secondary outcomes in their protocols while ignoring their pre-specified outcomes (that is, selective outcome reporting). We eliminated text on any methods used to enhance the quality of measurements Item 9 (allocation concealment mechanism) We reworded this to included mechanism in both the report topic and the descriptor to reinforce that authors should report the actual steps taken to ensure allocation concealment rather than simply report imprecise, perhaps banal, assurances of concealment Item 11 (blinding) We added the specification of how blinding was done and, if relevant, a description of the similarity of interventions and procedures. We also eliminated text on "how the success of blinding (masking) was assessed" because of a lack of empirical evidence supporting the practice as well as theoretical concerns about the validity of any such assessment Item 12a (statistical methods) We added that statistical methods should also be provided for analysis of secondary outcomes Sub-item 14b (recruitment) Based on empirical research, we added a sub-item on "Why the trial ended or was stopped" Item 15 (baseline data) We specified " A table" to clarify the baseline and clinical characteristics of each  group are most clearly expressed in a table  Item 16 (numbers  analysed) We replaced the mention of "intention to treat" analysis, a widely misused term, by a more explicit request for information about retaining participants in their original assigned groups Sub-item 17b (outcomes and estimation) For appropriate clinical interpretability, prevailing experience suggested the addition of "For binary outcomes, presentation of both relative and absolute effect sizes is recommended" Item 19 (harms) We included a reference to the CONSORT paper on harms Item 20 (limitations) We changed the topic from "interpretation" and supplanted the prior text with a sentence focusing on the reporting of sources of potential bias and imprecision Item 22 (interpretation) We changed the topic from "Overall evidence". Indeed, we understand that authors should be allowed leeway for interpretation under this nebulous heading. However, the CONSORT Group expressed concerns that conclusions in papers frequently misrepresented the actual analytical results and that harms were ignored or marginalized. Therefore, we changed the checklist item to include the concepts of results matching interpretations and of benefits being balanced with harms Item 23 We added a new item on trial registration. Empirical evidence supports the (registration) need for trial registration, and recent requirements by journal editors have fostered compliance Item 24 (protocol) We added a new item on availability of the trial protocol. Empirical avidence suggests that authors often ignore, in the conduct and reporting of their trial, what they stated in the protocol. Hence, availability of the protocol can instigate adherence to the protocol before publication and facilitate assessment of adherence after publication Item 25 (funding) We added a new item on funding. Empirical evidence points toward funding source sometimes being associated with estimated treatment effects.

Inclusion/Exclusion Criteria
The full inclusion and exclusion criteria is summarised in Table 3. Articles were included if they assessed a living human population with any skull base disease, using a prospective Randomised Controlled Trial between 01/01/2000 to 31/11/2014, with access to the full article in English. All other articles were excluded.
A subsequent level of screening excluded duplicate articles, and publications not related to skull base surgery, such as endocrinology of the HPA axis, pregnancy and in-vitro fertilisation (IVF) treatment.

Method and Data Analysis
All papers which adhered to our inclusion and exclusion criteria were obtained in full and were subsequently appraised using the CONSORT statement and the Jadad scale by two independent observers. Further data from each paper were extracted including; number of authors, location of study, methodology, number of patients, year of publication and synopsis of study. These factors were divided into two classes, those that were ordinal or continuous, and those that were categorical. Comparisons between ordinal or continuous variables were made using Spearman's correlation coefficients, with linear regression models produced where significant associations were found.
Ordinal and continuous variables were then compared across categorical variables using Mann-Whitney or Kruskal-Wallis tests, as appropriate, with medians and ranges used as summary statistics. Where Kruskal-Wallis tests returned significant results, post hoc comparisons between all groups were made using Mann-Whitney tests, with the p-values Bonferroni corrected for the number of comparisons being made.
Finally, comparisons between categorical variables were made using Fisher's exact test. All analyses were performed using IBM SPSS Statistics 22 (IBM Corp. Armonk, NY), with p<0.05 deemed to be indicative of statistical significance.

Study Selection
A combination of the key words aforementioned provided 96 papers using all databases. Duplicates were subsequently excluded leaving 73. Papers that were non-human, written in a foreign language, cadaver based and did not meet our inclusion criteria were also excluded. This left a total of 28 ( Fig. 1 Fig. 2). There were no RCTs produced outside of Europe, Asia and North America. The impact factor of journals ranged from 0.947 33 to 6.310 18 in 26 journals, whilst Surgical Neurology was discontinued and therefore did not receive a 2013 impact factor. The median impact factor was 2.347. Over half the studies were published in the last five years, ranging between 1 and 6 yearly.
All studies were prospective in nature. Blinding of participants, observers or surgeons occurred in 11 studies, whilst 4 used a placebo. There were a total of 3 double blinded, randomised controlled trials designed during the study period. These related to pre-operative medical treatment, perioperative haemostasis control and post-operative analgesia. 13 of the studies investigated acromegaly, whilst other areas including meningioma, prolactinoma, craniopharyngioma and pituitary adenoma. 8 studies did not address a specific condition, investigating any lesion within the skull base (Fig. 2).
Data were available for a total of 28 studies. The data were complete in all parameters being measured, with the exception of the impact factor, where two values were missing due to the journals being discontinued.
Since the CONSORT guidelines changed in 2010, the analysis was performed separately using both versions. However, since the two guidelines were so similar (rho=0.967), both sets of analyses returned comparable results, and only the more recent version of CONSORT was subsequently reported throughout. Table 5 reports the correlations between the continuous factors being considered. CONSORT scores were found to increase significantly with both increasing sample size (rho=0.467, p=0.012) and JADAD scores (rho=0.540, p=0.003). In addition to this, higher impact factors of journals were observed in RCTs with a greater number of authors (rho=0.622, p=0.001), and in the more recently published papers (rho=0.529, p=0.005).

Fig. 1. Search strategy
Linear regression analysis was then performed to further quantify these relationships, the results of which are shown graphically in Fig. 3. The CONSORT score was found to increase by 0.36 (95% CI: 0.02 -0.70, p=0.041, Fig. 3A) for each additional 10 patients included, and by 1.50 (95% CI: 0.58 -0.24, p=0.002, Fig. 3B) for each increase of one in the JADAD score. The impact factor was found to increase by 0.34 (95% CI: 0.18 -0.50, p<0.001, Fig. 3C) for each additional author, and by 0.17 (95% CI: 0.04 -0.30, p=0.014, Fig. 3D) in each subsequent year of the investigation. Table 6 reports the analysis of the categorical factors. As would be expected, studies with a placebo arm had significantly higher CONSORT (median 21 vs. 17, p=0.021) and JADAD (median 5 vs. 3, p=0.012) scores, with blinded studies also having significantly higher JADAD scores (median 4 vs. 2, p<0.001). In addition to this, both the number of authors (p=0.002) and the impact factor (p=0.003) were found to differ significantly by continent. Post-hoc analysis found that this was due to significant differences between Europe and Asia, with European papers found to have a significantly greater number of authors (median 8 vs. 5, post hoc p=0.018) and to be in significantly higher impact factor journals (median 3.35 vs. 1.75, post hoc p=0.021).
No significant differences in any of the outcomes by the type of study were detected. However, it must be noted that, due to the number of groups being compared and the small sample size, these tests had very low statistical power; hence the false negative error rate would be high in these analyses. Table 6 reports the rates of placebo and blinding usage by continent and study type. The only significant finding was that none of the eight preoperative studies employed blinding, compared to between 33% and 67% of the studies of other types (p=0.019).

DISCUSSION
A majority of RCTs presented in this paper score below 18 using the CONSORT statement, producing possible questions regarding validity.
There are common omissions related to randomization and blinding technique, sample size calculations and availability of protocols. Similar deficiencies in reporting randomization and blinding were highlighted in the Jadad score.

Exclusions of non-human, foreign language literature, retrospective trial, conditions unrelated to skull base lesion and other exclusion criteria
Studies included for data extraction Total = 28 A combination of these omissions leads us to questioning their reliability. However, when compared to other subject areas, RCTs in skull base surgery score higher than other specialities [11][12][13]. In addition to this, blinding is difficult to achieve in surgical specialities, especially in relation to surgical technique. As a result, all cases of blinding were in relation to medical treatment or post-operative pain control. Our results suggest that greater number of authors lead to publication within higher impact factor journals, which correlates to a modern initiative of collaborative research. Higher numbers of authors were present in papers from Europe, which significantly differed from other continents worldwide. It was also found that higher CONSORT score were found in papers with larger sample sizes. This was the only variable, in addition to Jadad score, which significantly influenced CONSORT score.

Fig. 2. Distribution of skull base topics, skull base pathology and location of randomised controlled trials
It is interesting to note that the deficiencies of surgical trials to adhere to the CONSORT statement was noted, with revisions made in 2008, creating a CONSORT statement for non-pharmacological treatment [44]. However, it was found that adherence to this revision was even poorer than the original CONSORT statement [45]. In addition to this, the difficulty of utilizing this revision, comparing it to the standard pre-2008 CONSORT statement would make for difficult result analysis. Ultimately it was decided to use the standard CONSORT statements for analysis of RCTs within this study period.

Limitations
The limitations in this study relate to the fact that analyses are performed are low on power due to the small sample size despite the use of a fourteen year data capture period. Papers prior to this would not be applicable to modern day management. Unfortunately, we cannot conclude that there is no difference within the nonsignificant tests, only that the sample size produced did not allow us to encounter one. A large genuine effect could be present within these areas, but would require a larger sample size.

CONCLUSION
The CONSORT statement was produced to reduce ambiguity regarding design and reporting of RCTs, with empirical results highlighting correlation with bias. It also has known reliability and external validity. In relation to skull base surgery, a relatively new field within medicine, there are deficiencies in reporting of randomization and blinding technique, sample size calculations and protocol availability. Despite this, there was appropriate reporting of multiple aspects of results and discussion. Our recommendation would be during the conception of RCT protocols, to consider the CONSORT statement, addressing all points with a view of providing easily reproducible results and improvement in readers understanding. This will produced less ambiguous study reporting. We would also respond favourably to a reproduction of our work in future years when greater numbers of studies are available.

CONSENT
It is not applicable.

ETHICAL APPROVAL
It is not applicable.