FormalPara Key Points

The use of monoclonal antibodies (MoAbs) as a therapeutic option for metastatic colorectal cancer (mCRC) created expectations for greater overall survival as well as decreased toxicity and grade ≥ 3 adverse event complications compared with cytotoxic chemotherapy.

The results of the studies included in this meta-analysis showed increased overall survival, progression-free survival and metastasectomy rate in patients with mCRC using MoAbs; however, there was great heterogeneity in the studies and severe adverse events.

It is important to assess the value and cost of interventions for both first- and second-line treatments when making choices. Marginal gains with associated high costs are difficult to justify within universal healthcare systems.

1 Introduction

Cancer is one of the leading causes of death worldwide, with more than 8.8 million deaths in 2015, up from 8.2 million deaths in 2012 [1, 2], with breast, colorectal, lung, and stomach cancers the most commonly diagnosed cancers. The overall economic burden of cancer was estimated at US$1.6 trillion in 2010 and rising [2]. Colorectal cancer (CRC) continues to be a worldwide public health problem, with the number of new cases per year of CRC in 2012 at 1.36 million [3, 4], corresponding to 10% of patients diagnosed with cancer in 2012. Overall, CRC is the third most common neoplasm in men and the second most common in women [5], with 694,000 deaths in 2012 [3].

CRC is a curable disease if diagnosed in early stages [6]. However, between 70 and 90% of CRC cases are currently diagnosed in advanced stages of the disease, resulting in initiatives including biomarkers to help identify patients earlier [5,6,7,8].

Since the 1990 s, fluoropyrimidine-based chemotherapy (CT) (5-fluorouracil [5-FU] or capecitabine) has been the principal treatment for CRC, with demonstrated benefits in overall survival (OS) [9, 10]. Irinotecan and oxaliplatin are widely used in combination with 5-FU and leucovorin (folinic acid) as first- or second-line treatment for metastatic CRC (mCRC) [11, 12], with studies demonstrating their addition as first-line treatment improves median survival by 2–4 months [9, 11]. Whilst 5-FU and oxaliplatin have improved survival rates, this combination has resulted in a higher incidence of severe adverse events, however, with acceptable tolerability and maintenance of quality of life [11].

The use of molecular biological agents, monoclonal antibodies (MoAbs), in combination with 5-FU/oxaliplatin or irinotecan has become widespread to try and improve survival rates in patients with mCRC [6, 12,13,14,15]. However, the biological medicines have appreciably increased the cost of medicines with the high costs of MoAbs, often with limited health gain versus current standards. The high cost of biological medicines coupled with growing cancer prevalence rates have resulted in concerns for the future sustainability of healthcare systems [16,17,18,19,20,21,22,23].

The MoAbs used to treat patients with mCRC include cetuximab (CETUX) and panitumumab (PANIT) [5, 14], which act on the epidermal growth factor receptor (EGFR), and bevacizumab (BEVA) [5], which acts on the vascular endothelial growth factor (VEGF) [5, 24]. They have all improved progression-free survival (PFS) in patients with mCRC; however, there have been concerns expressed regarding the extent of their effectiveness with improving OS [14] and their cost-effectiveness [25, 26]. Theoretically, payers of healthcare should not grant high prices for new cancer medicines that improve PFS but have limited or no improvement on OS, as this will affect available resources for other high-priority disease areas [27]. However, this has to be balanced against the emotive nature of the disease and the anxiety that patients with cancer have [28].

Improved targeting of high-priced biological medicines could potentially address these concerns. According to Rougier and Mitry [10], MoAbs are restricted to patients without the Kirsten rat sarcoma viral oncogene (KRAS) and N-RAS oncogene mutations [29, 30]. Overall, approximately 45% of patients with mCRC with wild-type KRAS are resistant to treatment with CETUX [31]. Consequently, KRAS testing before CETUX can conserve resources [32,33,34].

The National Institute for Health and Care Excellence (NICE) in the UK currently recommends the use of CETUX and PANIT as an option for previously untreated anti-EGFR wild-type metastatic RAS in association with FOLFOX (folinic acid, 5-FU, oxaliplatin) or FOLFIRI (folinic acid, 5-FU, irinotecan). BEVA is currently not recommended for use in the UK, either in combination with intravenous 5-FU/folinic acid or with FOLFIRI for first-line treatment of patients with metastatic carcinoma of the colon or rectum [35,36,37]. Australia’s Pharmaceutical Benefits Scheme does not mention BEVA for mCRC; however, it is indicated for epithelial ovarian, fallopian tube or primary peritoneal cancer [38, 39]. CETUX in combination with FOLFIRI has been approved by the Canadian Agency for Drugs and Technologies in Health for first-line treatment in patients with mCRC and KRAS wild-type (KRAS-WT) oncogenes [40]. NICE do not recommend CETUX (monotherapy or combination CT), BEVA (in combination with non-oxaliplatin CT) and PANIT (monotherapy) for mCRC after first-line CT [41].

However, BEVA was considered by the Avastin® Registry—Investigation of Effectiveness and Safety (ARIES) observational cohort study as a potential treatment for the mCRC in recent years [42].

A number of meta-analyses and other studies have shown no difference in health gain between these three MoAbs in mCRC patients [41,42,43]. Three systematic reviews, Hapani et al. [43], Lv et al. [44], and Rosa et al. [45], have also shown no additional clinical benefit with BEVA compared with CT, i.e., FOLFORI or FOLFOX, or BEVA with CETUX or PANIT, in terms of increased efficacy or reduced side effects. However, a sub-study by Hurwitz et al. [46] confirmed the effectiveness of BEVA in the KRAS-WT subgroup of mCRC patients. This compares with the meta-analysis of Wagner et al. [47], who evaluated five randomized clinical trials (RCTs) with 3101 participants comparing BEVA versus no BEVA for first-line CT, which showed significant benefits for OS or PFS in favor of BEVA-treated patients. We note though that in the no BEVA group, in addition to CT regimens, the authors included vatalanib (another VEGF inhibitor), which was not included in our review. In addition, there was a high incidence of grade 3 and 4 adverse events including hypertension, arterial thromboembolic events and gastrointestinal perforations in patients treated with BEVA [43, 47].

Currently, in the Brazilian health system (Sistema Único de Saúde [SUS]), BEVA, CETUX and PANIT can only be used and funded after successful litigation against the state, or 100% co-payment, since they are not incorporated into the health system. The Brazilian Health Technology Assessment Agency of SUS (Comissão Nacional de Incorporação de Tecnologias no SUS [CONITEC]) did not recommend the incorporation of CETUX for the management of mCRC in view of concerns with its price and limited health gain, and currently, BEVA and PANIT have not yet been evaluated for inclusion into SUS [48]. In view of this, successful litigation is the only means for patients to have these treatments funded within SUS. In the state of Minas Gerais (MG), public expenditure on BEVA, CETUX and PANIT as a result of successful litigation was approximately US$20 million between 2009 and 2016 (Fig. 1E in the electronic supplementary material). This is a concern as these monies are not available for cost-effective medicines in other high-priority disease areas. Currently, the ex-factory monthly treatment costs based on prices from Câmara de Regulação do Mercado de Medicamentos (CMED) Agência Nacional de Vigilância Sanitária (ANVISA) [49] are BEVA U$2897.90, CETUX US$6585.10 and PANIT US$3100.20. Consequently, CETUX is more expensive than BEVA by 127% and more expensive than PANIT by 112%. PANIT is more expensive than BEVA by 6%.

Faced with the contradictory scenario regarding the use of MoAbs in mCRC, as well as considerable differences in prices between these three and versus standard CT, the objective of this study was to evaluate the effectiveness and safety of BEVA, CETUX and PANIT in combination with or compared to fluoropyrimidine-based CT alone in patients with mCRC, through an updated systematic review and meta-analysis with prospective or retrospective observational cohort studies. We believe the updated review will help people better understand the benefits and harms of the different treatments in heterogeneous populations in the ‘real world,’ reflecting conditions in routine clinical practice [50, 51]. This is important given the increasing costs of treatments for cancer, increasing pressure on available resources [26, 27, 34, 52,53,54], and the extent of current litigation surrounding these three MoAbs in Brazil.

2 Methods

This systematic review was based on the recommendations of the guidelines for Meta-analysis Of Observational Studies in Epidemiology (MOOSE) [55], with the protocol registered with the International Prospective Register of Systematic Reviews (PROSPERO) under no. CRD42016046613 (http://www.crd.york.ac.uk/PROSPERO).

2.1 Database and Search Strategy

The databases searched for potentially eligible studies included MEDLINE/PubMed (Medical Literature Analysis and Retrieval System Online), LILACS (Latin American and Caribbean Health Science Literature), Cochrane Library, and EMBASE. All sources were searched until November 2017. We used various combinations of MeSH terms, including those relating to the disease, interventions, and study types (see the electronic supplementary material, Table 1E). We supplemented this search with a manual search. In the manual search, we reviewed references in the annals of the Annual Meeting of the American Society of Clinical Oncology and the European Society for Medical Oncology between January 2014 and November 2017. In addition, we also manually searched the Journal of Clinical Oncology, the British Journal of Cancer, the Journal of the American Medical Association, and the World Journal of Gastroenterology, also for the period January 2014–November 2017.

We also searched the grey literature registered in the data bank of the Brazilian Digital Library of Theses and Dissertations, the Digital Library of Theses and Dissertations of the University of São Paulo (USP), the Capes Theses Database, and the ProQuest Dissertation and Theses Database, which included academic, government and conferences publications, books and reports, and the Digital Library of Theses of the Federal University of Minas Gerais (UFMG).

2.2 Selection of Studies and Eligibility Criteria

We selected concurrent and non-concurrent observational studies of patients with mCRC. The studies compared the effectiveness and/or safety of BEVA, CETUX, and PANIT combined with FOLFIRI, FOLFOX, or fluorouracil, leucovorin and oxaliplatin (FLOX), or fluorouracil and leucovorin (5-FU/LV), or other combinations of fluoropyrimidine-based CT versus BEVA, CETUX, and PANIT, or any CT scheme including fluoropyrimidine-based CT alone in patients undergoing treatment.

The inclusion criteria included studies published in Portuguese, English or Spanish; patients aged 18 years and older of both sexes; stage IV mCRC; life expectancy > 3 months; and wild-type or mutant KRAS.

Studies were excluded if they compared doses, intervention methods, or clinical protocols; were reviews, case reports, or studies in animals; were in vitro, pharmacodynamic and/or pharmacokinetic studies; were genetic and/or genomic studies; investigated other types of cancer; assessed concomitant therapies with MoAbs other than BEVA, CETUX, and PANIT; and included participants under the age of 18 years, or those who had less than 3-months of follow-up.

2.3 Selection of Studies and Data Collection

The studies found in the electronic databases were collected into a single database (EndNote® software) in order to delete duplicates. The selection and inclusion of studies were performed in two stages by two pairs of independent reviewers (WS and PA, JS and MS). This included the analysis of titles/abstracts followed by the full texts. Disagreements were resolved by a third reviewer (VA). The characteristics of the patients, their treatment length, as well as effectiveness and safety data were retrieved and incorporated into an Excel spreadsheet designed for this purpose and previously tested.

2.4 Assessment of Methodological Quality of Included Studies

We used the Newcastle-Ottawa scale to assess the methodological quality of the observational studies [55, 56]. In this scale, each study is assessed in three dimensions: selection of the study groups; comparability of groups; and the calculation of any exposure or outcomes of interest. The total score can be up to nine stars, and studies with a score above six are considered to be of high quality.

The sources of funding for the identified studies were examined for potential sources of bias. This is because the influence of this on subsequent findings has been seen in previously conducted reviews [57,58,59]. Comments regarding any conflicts of interest (COI), the source of financing of the study, including whether funded by the manufacturer of any of the evaluated MoAbs, or whether any of the authors were related to the pharmaceutical industry, or received fees, were examined and documented. The possibility of publication bias was assessed using funnel plots [60] for the outcome if more than ten studies were involved.

2.5 Outcome Measures

The primary outcome measures considered were OS, PFS, and post-progression survival (PPS). The assessment of OS, measured by the time between diagnosis and death from any cause, is the most accepted method to assess the outcomes of cancer treatments, especially among payers of healthcare working with finite budgets in view of concerns linking PFS and other surrogate markers with OS in solid tumors [4, 24, 61,62,63,64,65,66]. American and European oncology groups also agree that OS should be the principal outcome measure in clinical studies [4, 62, 64], although PFS is also mentioned.

It is worth noting that PFS is used as a measure to assess a direct treatment effect on patients with metastatic cancer. However, PFS can be a concern when replacing OS, especially for funding decisions based on OS estimations in, for instance, cost/quality-adjusted life year calculations [67]. Having said this, PFS is validated and relevant to the patients. However, when used alone, it is not considered to be enough evidence of benefit to patients, which is exemplified by recent guidance from the American Society of Clinical Oncology [64]. For this reason, OS is recommended as a measure of effectiveness for new cancer medicines [65, 68], with a significant effect on OS necessarily entailing a significant effect on PFS [66].

The secondary outcome measures were metastasectomy rates, response rates or disease control rate assessed by the Response Evaluation Criteria In Solid Tumors (RECIST) [69], and the occurrence of severe adverse events, considering only grade ≥ 3. The documentation of adverse events followed the Common Terminology Criteria for Adverse Events (CTCAE, version 4.0) of the National Institutes of Health (NIH) National Cancer Institute [70]. This describes and reports adverse events in a systematic way, providing a scale of severity for each adverse event ranging from grade 1 up to grade 5, i.e., 1 = mild; 2 = moderate; 3 = severe; 4 = extremely severe, life threatening; and 5 = death due to adverse events.

2.6 Summary of Data and Statistical Analysis

The data from the studies were combined using the random effects model of the Review Manager® software, version 5.3. The results were presented by mean difference (MD) in months for continuous variables and expressed as relative risk for dichotomous variables, with a 95% confidence interval (CI). To estimate the magnitude of statistical inconsistency, we used the test I2 > 50% and a p value of < 0.10 in the Chi-square test. Values above 75% were considered to have high heterogeneity [60]. A sensitivity analysis was conducted to assess the causes of heterogeneity, excluding one study at a time [60] and observing the changes in the I2 and p values.

Due to the great variety of CT regimens based on fluoropyrimidines, we chose to construct the grouping of the forest graph by the similarity of the treatments.

In cases where it was not possible to carry out the meta-analysis, a qualitative synthesis of the studies was performed as the heterogeneity of the measurement instruments and the data did not allow for quantitative synthesis.

3 Results

3.1 Search Results and Included Studies

We found a total of 2363 publications in the electronic databases. After excluding duplicates, 2175 articles were selected for analysis of the titles and abstracts, and 269 for thorough reading. After a full analysis of the articles, 21 studies were finally included in the meta-analysis (see the electronic supplementary material, Fig. 2E).

3.2 General Characteristics of the Studies Included in the Meta-Analysis

The 21 observational studies that were included in the meta-analysis were of the cohort type. Seventeen of them had a non-concurrent design and four a concurrent design. The follow-up time ranged from 6 to 37 months; however, this information was not reported in eight studies [69,70,71,72,73,74,75,76]. The duration of the studies ranged from 36 months (3 years) to 132 months (11 years); however, no information was given in two studies [78, 79]. Eight studies declared COI, nine studies declared having no COI, and three studies did not provide this information. With respect to funding, seven studies did not mention the sources of funding [71, 72, 76, 80,81,82,83], seven declared having been funded by the pharmaceutical industry [42, 73, 75, 78, 84,85,86], and the other studies were funded through other sources.

Nine studies assessed treatment with BEVA versus various CT regimens including fluoropyrimidine-based CT (i.e., including FOLFIRI, FOLFOX, and FLOX), four studies assessed BEVA versus CETUX, and only two studies compared CETUX versus PANIT. Five studies assessed treatment with BEVA during maintenance therapy, i.e., no BEVA beyond disease progression (no BBP) (BEVA vs no BBP) (Table 1).

Table 1 General characteristics and methodological quality of systematic review cohort studies

3.3 Methodological Quality

Among the studies assessed for methodological quality, two studies [42, 84] obtained the maximum score of nine stars using the Newcastle-Ottawa scale, three scored eight stars, seven scored seven stars, eight studies scored six, and one study scored five stars. As a result, the studies were seen overall as of moderate quality (Table 1). There was, though, asymmetry in the funnel plot for OS, suggesting publication bias (see the electronic supplementary material, Figs. 3E and 4E).

3.4 Clinical Characteristics of the Patients in the Included Studies

To assess the comparative effectiveness and safety of the MoAbs (BEVA, CETUX, and PANIT) combined with CT or compared to only CT schemes, 10,180 participants were assessed in the 21 studies. The size of the studies ranged between 26 and 2526 patients.

With respect to the sociodemographic and clinical characteristics of the patients in the studies, the patients’ mean ages ranged between 47 and 73 years and 40.5% were women. On average, according to data from five studies, 54% of the patients had exhibited wild-type KRAS and 45% unknown KRAS. In 14 of the studies, the primary location of the tumor was in the colon versus the rectum or other sites, and this was seen in 75% of patients. For lymph nodal metastases, only four studies presented data, and these metastases occurred in 32% of patients. According to data collected from six studies, the proportion of liver and lung metastases was 56% and 29%, respectively. Seven studies presented data for primary tumor removal, with 72.5% of the patients having undergone resection of their primary tumor. Regarding the rate of disease control, five studies showed a 74% control rate, whereas, for progressive disease, this was 27% of patients according to the RECIST scale (Table 2).

Table 2 Baseline clinical characteristics of patients

3.5 Summary of the Data

BEVA was the most used MoAb in the largest number of studies that met the inclusion criteria. Due to the scarcity of comparative observational studies for PANIT and CETUX, we organized a comparison with a group called ‘with BEVA’ versus other therapeutic schemes that did not contain BEVA (no BEVA). From this group, we obtained five intervention subgroups (Table 3), namely (a) BEVA + CT versus CT alone without specification of the combination; (b) BEVA + FOLFIRI versus FOLFIRI; (c) BEVA + FOLFOX versus FOLFOX; (d) BEVA + CT versus CETUX + CT; and (e) BEVA versus no BBP (without BEVA maintenance beyond progression and maintenance of some CT scheme). Two arms were built, i.e., BEVA versus no BEVA, since it was not possible to compare CETUX versus PANIT since there were concerns with the key measurements used. In addition to finding only two studies [12, 75] with this comparison, the authors used the median as a measure of central tendency. This is different from the other arms, where means were used for comparative purposes, making it difficult to include this arm in the analysis.

Table 3 Outcomes evaluated in the meta-analysis: BEVA vs schemes without BEVA (no BEVA)

3.6 Primary Outcomes

OS and PFS had been assessed in all the intervention subgroups described, and PPS had been assessed in only one intervention subgroup (BEVA vs no BBP) [42, 73, 81, 84, 85].

3.6.1 Overall Survival

We included 16 studies on OS, with 11,094 participants. There were significant differences between the groups BEVA versus no BEVA [13, 42, 71, 73, 74, 77, 78, 81,82,83,84,85,86,87] (MD = 4.07; 95% CI 1.69–6.45; p < 0.001; I2 = 81%). However, when we look at the clusters of CT regimens separately, we note the following: there were no statistically significant differences between the subgroups of BEVA + CT versus CT alone [13, 71, 78, 82] (MD = 2.83; 95% CI − 1.76 to 7.41; p = 0.23; I2 = 87%); BEVA + CT versus CETUX + CT [74, 77, 86, 87] (MD = − 0.52; 95% CI − 7.7 to 6.67; p = 0.89; I2 = 0%) and BEVA + FOLFOX versus FOLFOX [13, 83] (MD = 8.63; 95% CI − 9.93 to 27.19; p = 0.36; I2 = 96%).

The subgroup BEVA + FOLFORI versus FOLFIRI [13] had been assessed in only one study, and the OS difference was not significant. The only significant differences were in the subgroups BEVA versus no BBP [42, 73, 81, 83, 85] (MD = 4.89; 95% CI 1.91–7.87; I2 = 73%), favoring BEVA (Table 3; Fig. 1).

Fig. 1
figure 1

Overall survival forest plot

We performed heterogeneity tests taking into consideration the grouping of the CT regimens (or FOLFIRI or FOLFOX). We noticed that there was a difference when we removed the studies that used FOLFOX in association with BEVA. The heterogeneity was reduced from I2 = 81% to I2 = 68%. The withdrawal of the two studies (Suenaga et al. [83] and Meyerhardt et al. [13]) using FOLFOX as the associated CT in the comparison arms with BEVA reduced the heterogeneity; however, this did not significantly change the outcome of this meta-analysis.

When we undertook sensitivity analysis with the exclusion of the study by Grothey et al. [85], in the BEVA versus no BBP grouping, we noticed that the reduction in the heterogeneity of this grouping was reduced from I2 = 73% to I2 = 0%. The general heterogeneity was reduced from I2 = 68% to I2 = 59%, and there was no change in the clinical outcome direction, with a statistically significant difference between the interventions (p < 0.00001)

3.6.2 Progression-Free Survival

We included 11 studies on PFS with 3704 participants. There were significant differences between the groups BEVA versus no BEVA (MD = 2.85; 95% CI 0.74–4.96; p = 0.008; I2 = 94%). We performed the sensitivity analyses in the intervention subgroups because of the high statistical heterogeneity (I2 = 94%; p < 0.00001) (Table 3; Fig. 2).

Fig. 2
figure 2

Progression-free survival forest plot

In this analysis, the individual exclusion of studies of any intervention group neither reduced the high heterogeneity nor changed the outcome. However, the combined exclusion of studies, namely Hurwitz et al. [42] with BEVA versus no BEVA, Varol et al. [76] with BEVA + FOLFOX versus FOLFOX, Varol et al. [76] with BEVA + FOLFIRI versus FOLFIRI, and Turan et al. [82] on BEVA versus CT, showed MD = 3.30, 95% CI 2.17–4.42, p < 0.00001, and I2 = 0%.

3.6.3 Post-Progression Survival

For PPS, we included three studies with 2851 participants. These three studies only assessed the effect of BEVA versus the no BBP intervention subgroup [73, 84, 85]. It was noted that the results of the meta-analysis demonstrated a significant benefit in PPS for patients treated with BEVA (MD = 5.9; 95% CI 2.59–9.21; p = 0.0005) and high statistical heterogeneity (I2 = 82%; p = 0.004). In the sensitivity analysis, the exclusion of the study conducted by Grothey et al. [84] indicated reduced heterogeneity (I2 = 0%; MD = 4.12; 95% CI 2.57–5.68; p < 0.00001) without changing the direction of the outcome (Table 3; Fig. 3).

Fig. 3
figure 3

Post-progression survival forest plot

3.7 Secondary Outcomes

For secondary outcomes, we analyzed 571 participants from five studies on the response rate or disease control rate measured by the RECIST scale [74, 77, 80, 82] and 1897 participants from two studies regarding the metastasectomy rate [71, 77].

In the meta-analysis, we assessed eight studies that had described severe adverse events [72, 77, 83,84,85, 88, 89] for safety, including hypertension, arterial thromboembolism, venous thromboembolism, gastrointestinal perforation, bleeding, diarrhea, neutropenia, and other severe adverse events (Table 4).

Table 4 Severe adverse events (grade ≥ 3)

The meta-analyses that assessed the disease control rate [74, 77, 80, 83, 90] did not show any statistically significant differences between BEVA and no BEVA interventions. The comparison revealed low heterogeneity (I2 = 10%; p = 0.18)

With respect to the metastasectomy rate [71, 77], there were statistically significant differences for BEVA (p < 0.0001) in comparison to no BEVA intervention. The meta-analysis showed no heterogeneity (Fig. 4).

Fig. 4
figure 4

Metastasectomy rate forest plot

Regarding adverse events, there were no statistically significant differences between the interventions for arterial and venous thromboembolism, bleeding, diarrhea, neutropenia, and other severe adverse events (Table 4)

With respect to gastrointestinal perforation (Fig. 5), we assessed five studies with 5182 participants. There was a borderline statistical difference, with a relative risk of 1.89, 95% CI 0.99–3.59, p = 0.05, and I2 = 17%. We performed a sensitivity analysis excluding the study conducted by Grothey et al. [84]. We obtained a relative risk of 2.48 (95% CI 1.36–4.53), p = 0.003, and I2 = 0%, favoring the risk of an event occurring with BEVA.

Fig. 5
figure 5

GI perforation forest plot. GI gastrointestinal

There were statistically significant differences for hypertension demonstrating the risk for this event with BEVA in comparison to no BEVA (p = 0.007). The meta-analysis indicated low heterogeneity (Fig. 6). Table 4 shows in detail the outcomes of severe adverse reactions comparing BEVA with no BEVA schemes.

Fig. 6
figure 6

Hypertension side effects forest plot

3.8 Conflict of Interest

For the COI, we included nine studies with 8049 participants. These nine studies evaluated the overall quality effect of the studies of the BEVA versus no BEVA [13, 42, 71, 73, 74, 81, 83,84,85]. It was noted that the results of the meta-analysis demonstrated a significant difference in the COI subgroup for patients treated with BEVA (MD = 4.85; 95% CI 2.30–7.40; p = 0.0002) and high statistical heterogeneity (I2 = 74%, p = 0.004) (see the electronic supplementary material, Fig. 5E). The overall effect of the combined subgroups (COI and no COI) was also highly significant for patients treated with BEVA (p < 0.00001). Consequently, some of the studies may have been influenced by publication bias, as shown by funnel plots.

3.9 Quality of the Studies (Newcastle-Ottawa Scale)

We included 19 studies with 9856 participants for BEVA versus no BEVA [13, 42, 71,72,73,74, 76,77,78, 80,81,82,83,84,85,86,87, 90]. There was a significant difference in the subgroup with a quality of seven or more stars for patients treated with BEVA (MD = 4.52; 95% CI 3.26–7.10; p < 0.00001) and a high statistical heterogeneity (I2 = 90%; p = 0.0001) (see the electronic supplementary material, Fig. 6E).

4 Discussion

Among the assessed MoAbs, BEVA had been assessed in the largest number of studies. The studies on effectiveness comparing BEVA and no BEVA groups demonstrated statistically significant and clinically relevant benefits in patients treated with BEVA principally around OS, PFS, PPS and metastasectomy rates, but not in the disease control rate.

With respect to OS, patients given BEVA in combination with fluoropyrimidine-based CT regimens showed similar results across studies; however, better results (outcomes) were found when this comparison was made with patients who received BEVA versus CETUX. Analysis of the subgroup of patients given maintenance of treatment with BEVA versus no BBP showed patients had better results with BEVA + CT compared to those when BEVA was suspended and patients were maintained on CT alone (FOLFIRI or FOLFOX) when their disease progressed.

However, the effect on OS, PFS and PPS indicated significant heterogeneity, which is probably attributable to differences in the effects of BEVA therapy in combination with different CT regimens. One explanation for these findings is that studies tend to be more homogeneous when the same combinations of fluoropyrimidine-based CT regimens are used with each comparator arm. In the sensitivity analysis, studies that clearly used BEVA + FOLFOX versus FOLFOX (oxaliplatin-based) alone showed high heterogeneity. When these studies were excluded from the analysis, we observed a reduction in heterogeneity to acceptable levels without altering the clinical effects for the group.

However, in the PPS subgroup, this difference in timing of progression was statistically significant, and when the study of Grothey et al. [85] was excluded, a reduction of heterogeneity to I2 = 0% was observed.

The methodological quality of the various studies was moderate, with most studies funded by the pharmaceutical industry. This is important since previous meta-analyses conducted by our group and others have shown that studies funded by pharmaceutical companies tend to be more positive towards their medicines than those conducted by independent groups [57,58,59].

In addition, high heterogeneity can also be attributed to the different treatment lines and different CT regimens associated with anti-angiogenic therapy, anti-epidermal growth factor differences related to the KRAS status (wild type of mutation status), differences in prognostic factors such as performance status, location of the primary tumor, and the location of the metastases. Declared differences in COI and the quality of the studies can also be considered potential sources of heterogeneity. However, in all these cases, a significant statistical heterogeneity of I2 > 75% can be expected.

The use of MoAbs as a therapeutic option for mCRC created expectations for greater OS, decreased toxicity and grade ≥ 3 adverse event complications compared with previous cytotoxic CT used in mCRC patients. Notably, it was observed that BEVA had been a milestone in clinical oncology [91]. However, in reality, there was only a modest impact on OS versus current CT regimens.

The results of this meta-analysis point to a statistically significant advantage in favor of BEVA. This advantage may be considered clinically modest in relation to a longer life for patients with metastatic disease; however, this should be considered a considerable advance in terms of the choices and options available for combinations of fluoropyrimidine-based therapeutic regimens. The most important factor to note though is a statistically significant increase in severe adverse events associated with BEVA, especially severe hypertension and gastrointestinal perforation, which need to be factored into any decision.

Within Brazil, any modest improvement in effectiveness, coupled with a significant increase in severe adverse events with BEVA, needs to be balanced against the increasing number of successful litigations as well as the significant increase in costs and associated expenses with the MoAbs (Fig. 1E, electronic supplementary material). This reduces available resources for other priority disease areas; however, it increases the availability of these MoAb for diseases such as mCRC and cervical cancer, among others, which are not currently incorporated into the Brazilian health system. This raises the need for assessing their comparative effectiveness [51, 92] and as a therapeutic alternative capable of being incorporated into the Brazilian health system [93].

Our findings should also provide guidance to the judiciary when assessing potential funding decisions for these three MoAbs in patients with mCRC, similar to the situation with insulin glargine [54].

We did not find any previous systematic reviews assessing the effectiveness and safety of the different MoAbs relevant to the current situation. Some randomized studies had compared different treatment schemes combined or not with the MoAbs and showed conflicting results. Some studies indicated benefits, whereas others showed an increase in the toxicity profile of treatment without impacting on patients’ survival [94,95,96,97,98,99].

The result of this analysis demonstrated that the patients prescribed BEVA spend more time without disease progression and had a higher rate of resection compared to the non-BEVA or other anti-EGFR containing regimens, and there were significant gains in OS when re-analyzed to reduce the heterogeneity of the included studies. With respect to safety, most studies in the meta-analyses indicated no significant reduction in severe adverse events using BEVA.

In a meta-analysis of Lv et al. [44], the efficacy and safety of adding BEVA to a therapy based on CETUX or PANIT for treating patients with mCRC were estimated. The authors suggested that the combination of anti-VEGF antibody (BEVA) and anti-EGFR antibody (CETUX and PANIT) had not improved PFS, OS, or the disease control rate when compared with the antibody alone. In addition, the incidence of grade ≥ 3 adverse events was not significantly greater in BEVA groups when compared to CETUX/PANIT groups. There are a number of similarities with our study findings.

On the other hand, the meta-analysis conducted by Wagner et al. [47], which compared BEVA versus no BEVA as first-line CT, showed significant benefits in OS and PFS in patients treated with BEVA. We have previously pointed out that in the no BEVA group, in addition to the CT schemes, Wagner et al. [47] had used vatalanib (another VEGF inhibitor), which was not included in our review.

We believe this situation justified the need for additional re-evaluation as more data became available. There was a high incidence of grade 3 and 4 adverse events for arterial hypertension, arterial thromboembolic events and gastrointestinal perforations in our review, which may compromise patient safety. Observational studies included in this analysis, such as Meyerhardt et al. [88], Grothey et al. [84], and Yang et al. [77], agreed with the results found in the RCTs indicating the high patient risk of using BEVA. In the meta-analysis conducted by Hapani et al. [43], the incidence of gastrointestinal perforation was 0.9% for patients receiving BEVA, with mortality at 21.7%.

Hurwitz et al. [100], comparing various CT schemes, combined or not with BEVA, as first- or second-line treatment, found that the addition of BEVA was associated with a significant gain in OS and PFS defined by the CT backbone (oxaliplatin-based, irinotecan-based) and the extent of disease (liver metastases only, extensive disease), which is in accordance with the present study. The inclusion of more recent studies suggests an increase in OS, especially when reducing the heterogeneity of included studies. In addition, the incidence of grade ≥ 3 adverse events was higher in the group that received combinations with BEVA, confirming earlier studies.

With respect to safety, we emphasize that most outcomes of meta-analyses indicated no significant reduction in severe adverse events using BEVA. This implies that the promise to use the MoAbs to reduce adverse events has not yet been fulfilled [43, 47].

In the studies reviewed and compared in our systematic review, there were some differences compared with individual studies. Individual studies indicated improvements with an average survival duration close to 3 years, and high survival rates at 5 years, which means an increase of 20% compared to that seen in some trials with patients treated solely with CT [101]. However, for most patients, the improvement obtained with treatment would be palliative and not curative [102]. The main expectations are prolonging PPS and maintaining quality of life for as long as possible rather than improving OS.

It is important to recognize that mCRC is a chronic disease and that prolonging survival with CT regimens associated with biological agents is only typically seen for the PPS period, although there were modest improvements in OS with BEVA. However, it is necessary to evaluate the cost of new interventions according to the configuration of each health system and their associated costs for reimbursement and funding decisions. Marginal gains for high-cost medicines are not seen as acceptable if this restricts funding for other more effective interventions in this and other patient populations within health systems with universal access. This may well mean denying effective treatment options in other high-priority disease areas, which is increasingly difficult to justify within finite budgets [28, 61, 103]. However, it is important to note that advances made in first-line treatment will also be applied to the second-line treatment [4].

In addition, assessments of adverse events have typically only been carried out for severe cases. In our review, we did not assess the effect that adverse events have on patients’ quality of life, i.e., BEVA-induced dysphonia (necrosis of the vocal cords) and hearing loss due to BEVA, given that they were not life threatening.

Another important finding was the COI seen in some of the studies, which Thompson and others [104, 105] consider to be a set of study conditions in which professional judgment could be unduly influenced by interests such as financial gain. Scientific publications that present a COI with pharmaceutical companies can be a concern as they can be persuasive and avoid disclosing negative results, more frequently have favorable results towards the sponsor’s medicine, or even influence the delay in disclosing these results as a strategy to protect potential markets [106,107,108,109,110]. Some of the findings in the studies selected in this systematic review and meta-analysis may have been influenced by publication bias, as shown by funnel plots (see the electronic supplementary material, Figs. 3E.1 and 4E.2) as well as Fig. 5E, with the subgroups combined (COI and no COI) favoring patients treated with BEVA (p < 0.00001).

We also performed a meta-analysis to evaluate the quality of non-randomized studies, based on the quality assessment guideline made by the MOOSE tool, using the Newcastle-Ottawa scale. Low-quality studies may lead to a distortion of the summary of effect estimation (see the electronic supplementary material, Fig. 6E). Our findings show that there is a statistically significant difference between studies that were rated six stars or less and those rated seven stars or higher. Consequently, the high heterogeneity among the studies in this meta-analysis may be influenced proportionally by the quality of the studies.

We recognize this systematic review has some limitations. We only included observational studies, and the bias of this study design is its lack of randomization as well as uncontrolled confounding factors. Some studies did not provide complete and/or accurate information, excluding them from the quantitative analysis; consequently, hindering the understanding of the heterogeneity found in some comparison groups. In addition, only five studies assessed the effect of discontinuing BEVA beyond disease progression, and most of the studies did not use the same combinations or treatment schemes, although 16 of the 21 studies had used a CT scheme combined with fluorouracil.

Therapeutic care (in terms of the types of interventions, therapeutic schemes, and the level of expertise) was also rarely described in detail in these studies.

However, despite these limitations, we believe that our findings are robust in view of our methodological approach. This was confirmed by the sensitivity analysis where the inclusion and exclusion of studies in each comparison did not change the direction of most outcomes; however, significant changes were observed in heterogeneity.

5 Conclusions

The results of this meta-analysis point to a statistically significant advantage in favor of BEVA for the outcomes of OS, PFS, PPS, and the metastasectomy rate. This advantage may be considered clinically modest in relation to the patient’s lifetime in the metastatic stage. This increases the choices for combining BEVA with fluoropyrimidine-based regimens, but we must not overlook the quality of life of these patients. This is because one of the most important factors to be observed with BEVA is the statistically significant increase in adverse events associated with its use, especially severe hypertension and gastrointestinal perforation.

This review also emphasized that studies directly comparing the effectiveness and safety of MoAbs in patients with mCRC are currently scarce, which needs to be addressed when payers are faced with funding choices. There is also a need for further observational studies to assess OS and adverse events comparing MoAbs associated with the different CT regimens given current concerns with their impact on OS in reality.

These findings combined with those from RCTs can be used to update clinical guidelines to systematically promote better and more appropriate healthcare within universal healthcare systems through establishing the magnitude of benefits, risks, and costs, relating to specific aspects of patient care. In addition, it is hoped this review will also be of interest to the judiciary when authorizing resources for high-cost medicines with limited benefits, as this means less funds are available for valuable medicines in other priority areas.