Overall similarity and consistency assessment scores are not sufficiently accurate for predicting discrepancy between direct and indirect comparison estimates

Objectives Indirect comparison methods have been increasingly used to assess the effectiveness of different interventions comparatively. This study evaluated a Trial Similarity and Evidence Consistency Assessment (TSECA) framework for assessing key assumptions underlying the validity of indirect comparisons. Study Design and Setting We applied the TSECA framework to 94 Cochrane Systematic Reviews that provided data to compare two interventions by both direct and indirect comparisons. Using the TSECA framework, two reviewers independently assessed and scored trial similarity and evidence consistency. A detailed case study provided further insight into the usefulness and limitations of the framework proposed. Results Trial similarity and evidence consistency scores obtained using the assessment framework were not associated with statistically significant inconsistency between direct and indirect estimates. The case study illustrated that the assessment framework could be used to identify potentially important differences in participants, interventions, and outcome measures between different sets of trials in the indirect comparison. Conclusion Although the overall trial similarity and evidence consistency scores are unlikely to be sufficiently accurate for predicting inconsistency between direct and indirect estimates, the assessment framework proposed in this study can be a useful tool for identifying between-trial differences that may threaten the validity of indirect treatment comparisons.

: Agreement between two independent assessors in the assessment of trial similarity and evidence consistency Figure S2: Mean disagreement (95% CI) between two independent assessors in final similarity and consistency scores. Table S1: Main characteristics of included CSRs and similarity/consistency assessment scores  Table S3: TSA sheet for clinical similarity assessment   Table S4: QSA sheet for Quality assessment of trials in meta-analysis Table S5: ECA sheet for evidence consistency assessment     Figure S1: Agreement between two independent assessors in the assessment of trial similarity and evidence consistency (Note: d -difference in the score between two assessors, SD -standard deviation, SE -standard error)  Table S1: Main characteristics of included CSRs and similarity/consistency assessment scores (TSA -Trial Similarity Assessment; ECA -Evidence Consistency Assessment; PSS -participant similarity score; ISS -intervention similarity score, OSS -outcome similarity score, TSSaverage total similarity score, QSS -quality similarity score; PCS -participant consistency score, ICS -intervention consistency score, OCSoutcome consistency score, TCS -average total consistency score)

Notes -Instructions to assessors for TSA
The assessment of similarity for AIC should be based on the major study characteristics (trial participants, interventions and outcome measures) presented in systematic reviews (study table). This should be based on information collated in Appendix 1, and additional evidence or discussions from the original CSR or primary trials inlcuded in the CSR. Assessment should focus on factors according to original systematic reviews that may possibly affect trial results or generalisibility of trial results. Sample size of trial should be taken into account when there are differences in participants, interventions and outcomes among multiple AvB or among multiple AvC trials. For each specific field of items (i.e, each row in Appendix 2), assessor first needs to decide whether the specific item is Applicable or Not Applicable. The assessor should mark "1" if Applicable and "0" if Not Applicable. When marked "0" then the score will automatically be locked at 0 and be excluded from the final score calculations in order to avoid giving a higher weight to the not applicable items. If marked "1" then the assessor needs to further decide if there are any important differences between AvB and AvC trials, and whether the relative effect of a treatment might be different because of observed differences between AvB and AvC trials. Specifically, if there are no "other known relative effect modertors" identified, this item should be marked as "not applicable". If case there is insufficient or missing data, the option of "Yes-missing data" or "No-Missing data" is available.
For each specific field of items (i.e, each row in Appendix 2), the assessors need to make their judgments based on a percentage score. For example if an assessor is unclear for a specific field of item due to some reason but is more inclined to mark either "Yes" or "No" depending on the available evidence then this can be divided as 30% uncertain and 70% No or 70% Yes and vice versa.
If there is any evidence that the pooled relative effect of either AvB or AvC is very likely to be different due to the observed difference between AvB and AvC trials, "Yes" should be selected with a percentage value for your judgement. It can either be 100% Yes if there is substantial evidence or the 100% could be split between "Yes and Unclear" or "Unclear and No" or "Yes, Unclear and No".
Likewise, if it is unclear whether the pooled relative effect of AvB or AvC is affected by the observed difference between AvB and AvC trials, "Uncertain" with a percentage value can be selected. If the assessor is unclear because there is no data available at all then Uncertain column can be marked as 100%. But if the assessor is unclear due to several missing data or other possible treatment moderators that may exist but only in a few trials so that the pooled relative effect may or may not be significantly affected, then the percentage can be split according to the assessor's judgement. Therefore, the 100% can split between "Unclear and No", "Unclear and Yes" or "Unclear, Yes and No".
If there is no difference between AvB and AvC trials, or the observed difference between them is very unlikely to have any important impact on the pooled relative effect of AvB and AvC, the assessment decision should be "100% "No". This can again be split between "Unclear and No" depending on the assessor's judgement. The judgement can be based on the following situations. (1) There are no important differences between AvB and AvC trials. (2) There is no evidence or any reasons to believe that relative effect of AvB or AvC is associated with the factor or factors that are different between AvB and AvC trials. (3) The relative effect of AvB or AvC may be associated with the factor that are different between trials, but only a very small number of (small) trials were involved and the pooled relative effect is not affected.
The final score of each item that is applicable will be converted from percentage to a score between 0-5 using the equation: Item similarity score = (Yes% * 0.0 + Unclear% * 2.5 + No% *5.0)/100and The total score will be the average of each applicable individual score.
Page 25 of 34

Instructions to assessors for QSA
The quality of trials in AICic will be assessed based on Jadad's scale, modified according to Schulz's components. Data required to use this scale is usually available from completed CSRs. The quality of individual trials is scored as 1 for adequate and 0 for no or unclear. The quality scores of multiple trials will be weighted by the total number of patients to calculate an average quality score for each of the three sets of trails.
Randomisation method: Select "1" if appropriate method of randomisation described; and "0" if the method was unclear or inappropriate. Appropriate methods of randomisation include: table of random numbers, computer generated, coin tossing, and dice throwing. Examples of inappropriate methods include data of birth, hospital numbers, medical record numbers.
Allocation Concealment: Select "1" if trials reported using either central randomisation, numbered or coded bottles or containers, or a statement indicating that drugs were prepared by a pharmacy. A serially numbered, opaque, sealed envelope is another example of adequate allocation concealment. Select "0" if allocation concealment was unclear or inappropriate.
Blinding of participants: Select "1" if a trial reported that it was "double-blind" or participants were masked about the intervention received or it is a placebo-controlled trial and "0" if patients were not masked.
Blinding of outcome assessor: Select "1" if a trial reported that it was "double-blind" or outcome assessors were masked about the intervention that patients received and select "0" if assessor not masked.
Drop-outs and withdrawals: Select "1" if the number of dropouts reported and <20% and select "0" if number of dropout rates reported and >20% or unclear.
The total quality score for each trial is the number of "1s" Calculating Quality Similarity Assessment (QSA) score: Let QT AC and QT BC represent the average quality score for AvC trials and the average quality score for BvC trials respectively. The quality similarity assessment (QSA) score, ranging from 1 (very low) to 5 (very high) was calculated by: QSA = 5-(|QT AC -QT BC |).

Notes -Instruction to assessors for ECA
The assessment of evidence consistency between DC and AIC should be based on the major study characteristics (trial participants, interventions and outcome measures) presented in systematic reviews (study table). This should be based on information collated in Appendix 1, and additional evidence or discussions from the original CSR or primary trials in the CSR. Assessment should focus on factors according to original systematic reviews that may possibly affect trial results or generalisibility of trial results. Sample size of trial should be taken into account when there are differences in participants, interventions and outcomes among multiple trials.
For each specific field of items (i.e, each row in Appendix 3), assessor first needs to decide whether the specific item is Applicable or Not Applicable. The assessor should mark "1" if Applicable and "0" if Not Applicable. When marked "0", the score will automatically be locked at 0 and be excluded from the final score calculations in order to avoid giving a higher weight to the not applicable items. If marked "1" then the assessor needs to further decide if there are any important differences between AIC and DC trials, and whether the relative effect of a treatment might be different because of observed differences between AIC and DC trials. Specifically, if there are no "other known relative effect modertors" identified, this item should be marked as "not applicable". In case there is insufficient or missing data, the option of "Yes-missing data" or "No-Missing data" is available and should be selected.
For each specific field of items (i.e, each row in Appendix 3), the assessors need to make their judgments based on a percentage score. For example if an assessor is unclear for a specific field of item due to some reason but is more inclined to mark either "Yes" or "No" depending on the available evidence then this can be divided as 40% uncertain and 60% No or 60% Yes and vice versa.
If there is any evidence that the pooled relative effect of either AIC or DC is very likely to be different due to the observed difference between AIC and DC trials, "Yes" should be selected with a percentage value for your judgement. It can either be 100% Yes if there is definite evidence or the 100% could be split between "Yes and Unclear" or "Yes, Unclear and No".
Likewise, if it is unclear whether the pooled relative effect of AIC or DC is affected by the observed difference between AIC and DC trials, "Uncertain" with a percentage value can be selected. If the assessor is unclear because there is no data available at all then Uncertain column can be marked as 100%. But if the assessor is unclear due to several missing data or other possible treatment moderators that may exist but only in a few trials so that the pooled relative effect may or may not be significantly affected, then the percentage can be split according to the assessor's judgement. Therefore, the 100% could be split between "Unclear and No", "Unclear and Yes" or "Unclear, Yes and No". Please not that the Intervention Consistency Score (ICS) in the Appendix 3 could be higher than the Intervention Similarity Score (ISS) in Appendix 2 due to 8 items instead of only 7.
If there is no difference between AIC and DC trials, or the observed difference between them is very unlikely to have any important impact on the pooled relative effect of AIC and DC, the assessment decision could be "100% No". This can again be split between "Unclear and No", "Unclear and Yes" or "Unclear, Yes and No" depending on the assessor's judgement. The judgement can be based on the following situations.
(1) There are no important differences between AIC and DC trials.
(2) There is no evidence or any reasons to believe that relative effect of AIC or DC is associated with the factor or factors that are different between AIC and DC trials. (3) The relative effect of AIC or DC may be associated with the factor that is different between trials, but only a very small number of (small) trials were involved and the pooled relative effect is not affected.
In principle, the total consistency score should be equal to or lower than the similarity score.
The final score of each item that is applicable will be converted from percentage to a score between 0-5 and the total score will be the average of each applicable individual score.
Page 30 of 34