Dear Editor,

We read with interest the recent article by Pediconi et al. which compared gadobutrol 1.0 M and gadobenate dimeglumine 0.5 M in patients scheduled for preoperative breast MRI [1]. Pediconi and co-workers are renowned for high-quality, rigorously performed intra-individual comparisons of contrast agents for breast MRI [24] so we were somewhat surprised by the article in question. In their study they concluded that gadobutrol at a dose of 0.1 mmol/kg body weight was non-inferior to an equivalent dose of gadobenate dimeglumine for breast lesion detection and sensitivity in lesion characterization in breast MRI. Unfortunately, major flaws and biases in the selection of the study population, in the design of the study, and in the statistical approach to data analysis render such conclusions invalid. We wonder whether an external statistician was consulted to judge whether "non-inferiority" was truly demonstrated. Our principal concerns with the article are outlined below:

  • Non-meaningful equivalence limit for non-inferiority: The authors state that “The sample size was based on the primary efficacy endpoint agreement in lesion detection of the index lesion, as defined by the investigator, assuming a rate of 88.5 % and defining a rate of 75 % to be a clinically meaningful limit for concluding equivalence of the contrast agents”. The authors’ approach in basing the sample size on the primary efficacy endpoint (i.e. agreement in detection of the index lesion) with these numbers assumes an extremely large and clinically meaningless non-inferiority margin. Essentially, their approach is based on the deduction ratio (88.5/75 = 1.18) which considers an 18 % decrease in agreement in detection of the index lesion as an acceptable limit. This is clinically meaningless. They are saying that if the agreement between the two agents is 75 %, they will assume non-inferiority. This is certainly not a valid way to test non-inferiority. Since no non-inferiority margin is defined in the statistical methods section, no conclusion of non-inferiority can be made on the basis of the study results.

  • Selection bias: Not all patients presenting with the relevant condition were included in the study and the exclusion of patients was not random. Moreover, not all patients who received contrast agent and had images available were included in the analysis of efficacy; 5 patients were excluded because of “major protocol violations” but no information is provided as to what the protocol violations were. The inclusion of patients for the analysis of efficacy was determined on the basis of grading of protocol deviations (minor/major) and the adequacy of images as identified by on-site investigators (who may or may not have been blinded to the contrast agent given) and off-site readers separately. This is a clear source of bias; it is inappropriate for patients and images to be filtered in terms of protocol deviations by on-site investigators prior to a blinded reading.

  • Verification bias: “The primary efficacy variable was the agreement rate in lesion detection of the index lesion, defined as the largest malignant lesion with a malignant biopsy result before staging MRI, based on the combined pre- and post-contrast images of MRI examination with both contrast agents”. This definition is fundamentally flawed: it is clearly a verification bias if the decision to perform the reference test is based on the result of the tests under examination. The agreement was based on “index” lesions which were identified on the two contrast-enhanced MRI examinations being compared. This of course will lead to excellent agreement between two test examinations and will unavoidably lead one to conclude that there is no difference in lesion detection rate between two contrast agents. However, the entire premise on which the comparison is made is false, a result of which is that the entire study is compromised and invalidated. Furthermore, basing the primary efficacy variable solely on the detection of the index lesion does not take into account multi-focal and multi-centric breast cancer lesions, the accurate detection and diagnosis of which can impact patient management decisions significantly.

  • Inappropriate analysis population: A relevant clinical population is a group of patients covering the spectrum of disease that is likely to be encountered in routine practice for which the current or future use of the test will be beneficial. Diagnostic sensitivity can be overestimated if the test is performed in a group of patients already known to have malignant lesions (index lesions). In this study, patients with a large index lesion which was histologically confirmed as malignant were included in the analysis of sensitivity. Only 7 of 103 histologically confirmed lesions were benign. The large majority of malignant lesions will artificially increase the sensitivity of both contrast agents resulting in no apparent difference in terms of sensitivity.

    The per-protocol (PP) population for the primary analysis defined in the statistical analysis section of the manuscript conflicts with the primary variable defined in the study design and in the presentation of results (Table 3). In the statistical analysis section, the primary analysis was based on PP patients with index lesions and all other breast lesions identified by on-site investigators, regardless of the availability of histology results. This does not match the results or the study design.

  • Inappropriate statistical methods used: It is not appropriate to test agreement across readers using logistic regression analysis. A large p value from logistic regression does not imply high agreement between two readers!

The approach used to analyse preferences for one contrast agent or the other was wrong. They used a sign test but excluded patients with “both MR studies equal”. Thus a different population was analysed and no inference can be made regarding the population studied. Moreover, the power of study was decreased.

Finally, several complicated modelling techniques are mentioned in the statistical analysis section (e.g. logistical regression, multinomial regression, mixed linear model, Poisson regression) but these tests are not relevant to the study and none of the analyses performed are identified in the results section.