A Review of Assumed and Reported Intracluster Correlations in Cluster Randomized Trials

Background Cluster randomized trials (CRTs) are widely adopted in health and primary care research. However, the cluster effect needs to be taken into account appropriately in the design and analysis of CRTs. The objectives of this study were (i) to review the reporting of intracluster correlations in CRTs; and (ii) to evaluate whether the assumed intracluster correlation measures in sample size planning are consistent with those obtained in the analysis. Methods The Aggregate Analysis of ClinicalTrials.gov database was searched to identify CRTs registered between January 1, 2004 and March 27, 2016. The selected CRTs with accessible publications were screened according to eligibility criteria. Results Of the 281 CRTs identified, the percentage of studies accounting for cluster effect increased annually. A total of 183 studies accounted for clustering in sample size estimation, among them 43% of CRTs adopted the intraclass correlation coefficient (ICC) but the exact estimated value of ICC was provided in only 26% of the included studies. In different intervention types, there were no statistically significant differences between the assumed and reported values of ICC (all p-values >0.05). Conclusion Although the difference between the values of ICC assumed in sample size planning and that reported in the analysis was not statistically significant, deficiencies in CRTs are still common, such as low rates of considering cluster effect in sample size and reporting intracluster correlation estimates. We also suggest that researchers ought to be familiar with the properties of statistical approaches to improve the analysis of CRTs. Thus, more recommendations and guidelines such as the CONSORT statement for CRTs should be suggested to researchers.


Introduction
Cluster randomized trials (CRTs) or group randomized trials involve interventions that are randomly assigned to groups or clusters of participants instead of participants individually [1]. Allowing the study of interventions that cannot be implemented at the individual level and the control of experimental contamination across individuals, CRTs are increasingly adopted in research in education, lifestyle modification, and public health in recent years [2]. A principal difference between CRTs and individually randomized trials is that participants within the same cluster tend to show more similar responses than do those in different clusters [3]. To maintain an equivalent statistical power to that in individually randomized trials, the number of participants in CRTs needs to be increased. One of the pivotal determinants of this increase is the intracluster correlation, which is the ratio of the between-cluster variance to the total variance of an outcome variable. Intracluster correlation quantifies how strongly individuals within the same cluster resemble each other and can be reported using the intracluster correlation coefficient (ICC), often denoted by ρ, and the coefficient of variation (CV) [4]. The Consolidated Standards of Reporting Trials (CONSORT) statement, published in 1996 and extended to CRTs in 2004, incorporates the reporting of intracluster correlation as an index to judge the quality and reliability of CRTs [5]. Meanwhile, several journals require the reporting of intracluster correlation as a precondition for publication to facilitate appropriate interpretation of the results between two levels of inferences [6].
Previous reviews of published CRTs have found that most studies do not take the cluster effect into account, although there has been an increasing trend of articles including adequate planning and analysis of CRTs [7][8][9]. A review of cancer CRTs between 2002 and 2006 found that only 18 (24%) of the 75 articles reviewed accounted for the cluster effect adequately in their sample size calculations [10]. Petros et al. [11] reviewed all CRTs conducted in sub-Saharan Africa and reported that only 10 (20%) of 51 articles accounted for clustering in sample size or power calculations, and only 19 (37%) took clustering into account in their analyses. Although a few CRTs reported their intracluster correlation outcomes, they didn't compare the difference between assumed and reported intracluster correlation measures.
Furthermore, most literature reviews are based on published papers that were searched using the keywords "cluster randomized." As a result, publications that do not mention this term in their text might be omitted. Since there is no specific item to distinguish CRTs from other trials in ClinicalTrials.gov, a major international registry for clinical trials, this systematic review was conducted using a relational database of ClinicalTrials.gov, the In this study, we primarily evaluated whether the assumed ICC, a major intracluster correlation measure in sample size planning was consistent with that obtained in analysis.
We hypothesized that the ICC in sample size planning should not be under-or overestimated. Moreover, we determined the proportion of CRTs accounting for the intracluster correlation and profiled the intracluster correlation measures by different cluster and intervention types. The findings of this review may provide a valuable reference for researchers planning CRTs.

Eligibility criteria
All completed CRTs registered in ClinicalTrials.gov between January 1, 2004 and March 27, 2016 were screened according to following criteria:

1.
Quasi-randomized controlled trials and trials that further allocated participants within clusters were removed.

2.
Studies without available publications were also excluded, and so were those published in a language other than English.

3.
Articles were also excluded if they contained "cluster" with a different meaning (such as gene cluster, cluster of differentiation, cluster of symptoms, and cluster headache).

4.
Secondary analyses of previous publications, case reports, protocols, and pilot studies were also excluded.

Search strategy
A total of 211,437 records were registered in ClinicalTrials.gov from January 1, 2004 to March 27, 2016. These were obtained through the AACT and downloaded in SAS CPORT format. CRTs were identified using the search term "cluster" in variables "brief_title", "official_title", "brief_summary", "detailed_description", or "keywords".

Data extraction
We reviewed each potentially eligible study to determine whether it satisfied the eligibility criteria. From each included study, we extracted the following information: author, journal, year of publication, title, country or countries of recruitment, sample size, trial phase, number of trial arms, type of intervention, number of clusters, and type of cluster.
We also extracted whether 1) the study took into account the cluster effect (e.g., ICC) in sample size estimation; 2) the study reported the cluster effect; 3) the study accounted for the cluster effect in analysis; 4) the analysis was conducted at aggregate or individual level. One of the authors extracted data on all items. The other author assessed all questionable items. In case of disagreement, consensus was reached after discussion.

Data synthesis
Results were summarized using frequencies and percentages for categorical variables. We compared the difference between the assumed and reported ICCs using Wilcoxon signedrank test depended on the sample size. The ICCs were summarized using medians and ranges. Statistical analyses were performed using SAS 9.4 (SAS Institute, Cary, NC, USA).
All p-values reported are two-tailed.

Results
As summarized in Figure 1, a total of 211,437 records were registered in ClinicalTrials.gov from January 1, 2004 to March 27, 2016. Of these, 746 were obtained after computational screening with the search term "cluster" in either variables "brief_title," "official_title," "brief_summary," "detailed_description," or "keywords" of the AACT database. A total of 218 records were excluded after abstract screening and 147 records without publications were also removed. After full-text screening, 281 CRTs were included in the present review.  reporting ICC were more numerous than those following the CONSORT recommendation.  Statistical analysis of cluster effect Table 4 summarizes the statistical analysis methods adopted in the included CRTs. Most such as the advanced statistics module of SPSS and R, have also implemented multilevel modeling [12]. In the meantime, the revision of CRTs guidelines also suggests authors to report intracluster correlation measurers; consequently, the number of these articles grew dramatically after 2010.

ICC for different intervention types
Our results indicated an insufficient proportion of trials considering the cluster effect in sample size estimation and there thus remains considerable room for improvement.
Although accounting for clustering in CRTs to maintain statistical power is recommended in the CONSORT statement, several authors just simply mentioned that the cluster effect was considered without providing an exact value of intracluster correlation measure or only provided an unclear variation estimates. For example, some studies claimed that "sample size calculation took into account the complexity of the study design (i.e., cluster-randomized)" or "estimation included the accounting for cluster effects" without reporting concrete methods [13,14]. Moreover, the present study found that the majority of selected publications did not comply with the CONSORT statement and only one third reported their intracluster correlation measures in the results. CRTs ought to report the magnitude of correlation within clusters for their primary outcomes to enable readers to assess whether the original sample size estimation as well as study power is adequate.
In our study, we found that the difference between ICCs for considered in sample size calculation and that reported in the results was not statistically significant. This indicated that most CRTs did not under-or over-assume the cluster effect estimates in their specific trial setting. However, intracluster correlation measures from previous studies are not always suitable for different trials, since they are highly variable and depend on cluster levels, cluster size, and outcomes. In previous study, it was suggested that the upper confidence interval limit of the ICC estimates be used in sample size determination.
However, this approach may on the other hand induce an overestimation of the required sample size, and the redundant sample would require extra resources and enhance the risk of loss to follow-up particularly when involving trial settings across a wide geographical area [15]. Furthermore, to ensure that appropriate sample size estimations are undertook, a sensitivity analysis has been recommended to help determining the required sample size for a range of plausible values of ICC instead of a single value [16,17].
Our results indicate that a large proportion of authors (>80%) applied statistical methods to consider cluster effects in their CRTs. Although the improvement of existing methods and development of novel methods provide authors with more options, unawareness of the limitations of various approaches may lead to biased results [18]. For example, it is well documented that the application of GEE for CRTs with few clusters or small cluster size would underestimate the standard error of intervention effect [19]. Among the selected studies, a CRT evaluating the performance of a computerized malnutrition-screening system adopted GEE for analysis despite having only six clusters [20]. Applying such a statistical method increases the likelihood of obtaining a significant intervention effect but may inflate the type I error rate [21]. Previous comparison studies found that for both binary and continuous outcomes, mix-effect models are more efficient in terms of attaining a larger coverage of power, even if the number of clusters and cluster size are highly variable [22]. Conversely, mix-effect models are less accurate and precise than GEE when considering the presence of missing values [23]. Therefore, researchers ought to be familiar with valid approaches to improve the analysis of their CRTs.
This study has several major limitations. Previous studies found that about 30-50% of all completed clinical trials are not published in academic journals [24]. As only CRTs with publications were extracted from the AACT for our analysis, a proportion of unpublished trials is likely to be missing. In addition to this publication bias, selective reporting of outcomes in articles is another potential limitation to our study. Among trials evaluated by  only mention cluster effect was took into account in regression models. Figure 1 Flow chart of CRTs extraction