We read with interest the article by Mudgway et al. entitled “The Impact of Primary Tumor Surgery on Survival in HER2 Positive Stage IV Breast Cancer Patients in the Current Era of Targeted Therapy”.1 With significant advances in systemic therapy for human epidermal growth factor receptor 2 (HER2)-amplified breast cancer, the therapeutic advantage of primary site resection in metastatic disease remains an important and open question.

Cohort data are the main sources for studying the role of surgery in metastatic disease for a variety of cancers. In the setting of metastatic colorectal cancer, for example, there are no randomized trial data comparing surgery with systemic therapy; however, based on the preponderance of evidence, surgery remains the preferred treatment in patients with resectable disease.2 Nonetheless, we have more information regarding surgical selection and outcomes in patients with metastatic breast cancer, including several randomized controlled trials (RCT) and a Cochrane review.3,4,5,6,7. The randomized trials have not provided a clear answer to this question; two have shown no difference in outcomes, 3,6 another suggested a survival advantage with primary site surgery,5 and a fourth was closed early due to poor accrual.4 Perhaps not surprisingly, the Cochrane review could not demonstrate a clear survival advantage with primary site surgery either.7 It may be impractical to run an RCT powered to answer this question in the subset of patients with HER2-amplified breast cancer. This is a data void that high-quality cohort studies  can fill in. In their manuscript, Mudgway et al. conclude primary site surgery is associated with increased overall survival in patients with metastatic HER2-amplified breast cancer. Their results mirror other authors using the National Cancer Data Base and using similar techniques,8 but contrast with other studies using different data sets and techniques,9 as well as published randomized trials. As with other tough clinical questions, we are left wondering which technique is best to address the question.

Cohort data provide large numbers to overcome study power limitations. For the researcher and the reviewer, the main drawback to cohort studies is that treatments, especially in the setting of metastatic disease, are subject to considerable selection bias.10,11,12 To address this are an armamentarium of statistical analytic techniques that account for overt and unmeasured bias, including hierarchical modeling, cohort restriction and sensitivity analyses, propensity scores, and econometric techniques. In the present paper, the authors use multivariable regression with propensity score matching as a type of sensitivity analysis. While both are appropriate approaches to the clinical question, the particulars of their application in this study, and therefore the validity of their outputs, is unknown.

Propensity score techniques are a relatively recent addition to the surgical literature, but are widely used in other settings.11,12,13 In essence, a subject’s probability of experiencing an exposure of interest is modeled using available covariates in the data; the resulting probability is that subject’s propensity score, which can then be used as an additional adjustment covariate or used to match patients with similar characteristics in a quasi-randomized experiment. Propensity score models and techniques are valuable tools for assessing cohort data, and are available in most commercially available statistical packages; however, their ease of employment undercuts their responsible use. Most propensity scores are derived using probit or logistic regression techniques, and, like any other regression, the final product (the propensity score) depends on the quality of the model used to generate it: the covariates used, their relationship with the exposure, and the model’s underlying assumptions and fit, etc. Several studies have shown that when applied indiscriminately, the outputs from propensity models generally yield similar results (in terms of treatment effect estimates) as ‘conventional’ multivariable regression.14 Most importantly, the models used to derive propensity scores are subject to the same biases as conventional regression models. In other words, the same ‘garbage-in-garbage-out’ principles for multivariable analyses also apply to propensity score analyses. For example, it is obvious from the published trial data that response to systemic therapy and metastatic burden have strong influences on patient selection for surgery.3,5,6 In study by Mudgway et al., the lack of metastatic burden and disease response information in the data are important, but even more important is a noticeable lack of techniques used to account for those and other missing data. Their methods are too opaque for the reader to assess selection bias mitigation in their study.

With the increase in propensity score modeling reported in the medical literature, there is a clear need for researchers to be transparent with their methods to facilitate peer review. In fact, there are several publicly available best practice guidelines, some specific to cancer research, that address this issue.15,16,17 All agree that authors should state their rationale for selecting their statistical methods and model variables, how they address missing data in their cohort, diagnostics used for assessing the regression model, and how adjustment/propensity scores change treatment effects from the baseline cohort. Unfortunately for the reader, the details of the present study’s analyses are not shown, their models assumptions and performance left unstated, and the treatment effects unclear since they mention both inverse-probability weighting and adjustment after propensity matching. The reader is left with the conclusion that patients selected for primary site surgery do better than those treated with systemic therapy alone, which is contradicted by RCTs. What remains unclear is whether their methodology was strong enough to underscore their conclusion that in the subset of HER2-enriched patients, RCT data should be disregarded in favor of their cohort data.

The authors use the word ‘impact’ in the title, implying a causal link between surgery and survival in this setting implies. This wording is problematic because the majority of physicians and surgeons treating patients with breast cancer are not trained extensively in statistical methods to independently develop a nuanced understanding of the limitations of these data. It is easy to imagine a situation where someone would conclude, based on the study’s title and presentation, that primary site surgery should be offered to patients with metastatic HER2-amplified breast cancer, and counsel the patient that they are expected to live longer with that approach. This is not supported by the data and is a potentially dangerous strategy. To be clear, we do not advocate for withholding primary site surgery in metastatic disease, rather we advocate that discussions on the matter should not cite survival benefits that are unproven in RCTs and are unsubstantiated by opaque analyses of cohort data.

Surgery in the setting of metastatic disease is an exercise in patient selection, and we as a surgical community have generally been excellent in selecting the right patients. However, to derive the most accurate estimate of surgery’s value (and not the surgeon’s selection) we need carefully conducted and honestly reported cohort studies.