The adherence paradox: guideline deviations contribute to the increased 5-year survival of breast cancer patients

In German breast cancer care, the S1-guidelines of the 1990s were substituted by national S3-guidelines in 2003. The application of guidelines became mandatory for certified breast cancer centers. The aim of the study was to assess guideline adherence according to time intervals and its impact on survival. Women with primary breast cancer treated in three rural hospitals of one German geographical district were included. A cohort study design encompassed women from 1996–97 (N = 389) and from 2003–04 (N = 488). Quality indicators were defined along inpatient therapy sequences for each time interval and distinguished as guideline-adherent and guideline-divergent medical decisions. Based on all of the quality indicators, a binary overall adherence index was defined and served as a group indicator in multivariate Cox-regression models. A corrected group analysis estimated adjusted 5-year survival curves. From a total of 877 patients, 743 (85 %) and 504 (58 %) were included to assess 104 developed quality indicators and the resuming binary overall adherence index. The latter significantly increased from 13–15 % (1996–97) up to 33–35 % (2003–04). Within each time interval, no significant survival differences of guideline-adherent and -divergent treated patients were detected. Across time intervals and within the group of guideline-adherent treated patients only, survival increased but did not significantly differ between time intervals. Across time intervals and within the group of guideline-divergent treated patients only, survival increased and significantly differed between time intervals. Infrastructural efforts contributed to the increase of process quality of the examined certified breast cancer center. Paradoxically, a systematic impact on 5-year survival has been observed for patients treated divergently from the guideline recommendations. This is an indicator for the appropriate application of guidelines. A maximization of guideline-based decisions instead of the ubiquitous demand of guideline adherence maximization is advocated.


Background
Breast cancer (BC) is the most frequent female malignancy with approximately 1.65 million diagnosed women worldwide [1,2]. Growing incidence and decreasing mortality rates are reported for developed countries. In Germany, general trends are confirmed and today, survival after BC is higher than in the 1990s [3].

Guidelines before 2000
There are many reasons for these trends. The increasing effectiveness of therapy itself is certainly one crucial factor [4]. However, it is critical to distribute and implement published research from clinical trials into daily routine in a comprehensive manner. In the past, a small number of experts (St. Gallen consensus panel) interpreted actual results of trials and published the current state-of-the-art BC treatment [5][6][7]. Additionally, national [8] or European guidelines [9] provided treatment recommendations for physicians willing to improve their skills. Low acceptance and arbitrary application of these S1-guidelines were the norm, not the exception [10]. BC treatment depended mainly on experiences and knowledge of the physician. Counseling colleagues or quality circles met irregularly and the availability of expertise from other (medical) disciplines involved in the BC treatment was not institutionalized. Overall, health care professionals of different settings cooperated in a "free interplay" (e.g., liberally organized market) within a fragmented, but competitive German health care system [11].

Guidelines after 2000
A common effort of all stakeholders intended to overcome these deficits and developed evidence-, consensus-, and outcome-based national guidelines for the early detection [10] and therapy [12] of BC. The application of these S3-guidelines was mandatory for centralized BC networks inspired by so called "hub & spoke" models [13][14][15]. Hubs were defined by academic institutions, and spokes refer to all of the related health care professionals. A network-wide monitoring of guideline application was assured by quality management systems, which became officially certified [13]. Multidisciplinary counseling was assured by expert panels (tumor conferences) hosted at comprehensive cancer centers. Integrated care models [16] were developed to overcome aforementioned infrastructural deficits.

Effectiveness of guidelines
Only a few studies have focused on all (inpatient) therapy sequences, guideline adherence and its impact on outcome measures. In Germany, the studies of Woeckel et al. [17,18] examined this topic and confirmed the general effectiveness of guideline adherence using time intervals 1992-2005. Based on these data, authors called for the maximization of guideline adherence.
This approach is straightforward and yields two critical assumptions. First, study design might be appropriate as long as the general effectiveness of S3-guidelines is of concern. However, if the appropriateness of medical decisions according to released and concurrent guidelines is of interest, the above mentioned approach is not adequate. S3-Guidelines were not released before 2003, and therefore time effects induced by different guidelines cannot be captured.
Second, the concluded guideline maximization hypothesis was based on the assumption that medical decisions adherent to the guidelines is appropriate. This assumption is true if all of the physical and mental conditions of the patient agree with clinical algorithms, ancillary conditions, and patients' preferences. However, if one of these premises is not fulfilled, the physicians are encouraged to decide against guideline recommendations [10,12].

Aim of the study
The objective of the study is to exam the impact of process quality on 5-year overall survival. But in contrast to the above mentioned studies, process quality is assessed according to operating guidelines of time intervals . Guideline adherence and divergence should be measured by a set of quality indicators defined along inpatient therapy sequences and related medical decisions. An overall adherence index is developed and two questions are examined: Is there a difference between guideline adherent and guideline divergent treated patients in terms of survival, first, within each time interval, and second, across time intervals? It is hypothesized that process quality increased over time. But in contrast to the cohort 1996-97, we expect an impact of process quality on survival for the cohort 2003-04. With respect to cross-period analysis, we expect higher survival of patients treated adherent to guidelines in 2003-04 and no survival differences of patients treated divergent from guidelines.

Incidence-based full population survey
All women with primary BC treatment in two general hospitals and one specialized academic hospital located in the district of Marburg-Biedenkopf (Hesse, Germany) were included (entry cohort). Patients were identified by surgical schedule lists and attendant histological affirmation of BC (ICD-10: C.50). Physicians recruited patients by explaining the aims of the study and obtained written informed consent. The relevant data were extracted from patient record files and stored in a clinical register [19][20][21]. The study was approved and conducted according to the Declaration of Helsinki and the local ethics committee of the Philipps University of Marburg (Germany).

Sample selection for analysis
The entry cohort encompassed all treated patients (total "workload"), but not all patients of the entry cohort could be analysed by standardized quality indicators. Therefore, heterogeneous patient collectives with noninvasive tumours (pTis) and with distant metastasis or unknown metastasis status were dropped from further analysis to consider individual medical needs and the complexity of each therapy. This step defined the institutional-invasive samples. These were corrected by identifying non-resident patients to define regionalinvasive samples [19][20][21].

Primary endpoint and follow-up
Five-year overall survival regardless of causes of death was defined as the primary endpoint. The start of the observation time was the date of surgical intervention. The verification of the vital status was assessed by the official registry office corresponding to each inpatient. Follow-up began in 10/2008 and ended in 2/2009.

Covariates for risk adjustments
Available risk factors, prognostic and predictive factors for BC [22] were integrated into the Cox model. Regressors of the final model were: age at surgical intervention, binary nodal status, binary tumour size, binary hormone receptor status, and binary adherence index. The information on treatment location and application of chemotherapy served as strata variables.

Quality indicators of medical decision-making
Quality indicators were defined alongside relevant inpatient treatment sequences: surgical intervention (tumor, lymph nodes) together with radio-oncological irradiation, and chemo-and hormone-therapy according to different risk categories [7,12]. Pre-operative diagnostic sequences and other systemic interventions (e.g., HER2neu among others) were not available in 1996-97 and were excluded.
Quality indicators (QI) operationalized guideline recommendations in two categories. First, recommendations that should be respected by physicians if all other ancillary conditions are fulfilled were one category. This QI category translated to Guideline Adherent Decisions (GAD). Second, medical decisions against recommendations of the guidelines were defined by Guideline Divergent Decisions (GDD). It is important to note that GADs and GDDs are not always the opposite of each other (e.g., not disjunctive). For the definition of QIs according to the S1-guidelines (1996-97) and S3-guidelines (2003-04), short and long descriptions are provided (see Additional files 1 and 2).

Adherence index
Developed QIs were aggregated into four indices concerning the adherence status of every therapy sequence. However, all QIs contributed to one overall binary adherence index. The aggregation of QIs was performed by the following methodology. First, each QI was assessed according to its category (GAD, GDD). Second, if all GADs were assessed as positive (e.g., adherent), BC treatment of one patient was preliminarily considered to be guideline adherent by the summarizing overall adherence index. But, if even one GAD did not catch up with guideline recommendations, the adherence index was devalued and considered to be guideline-divergent. Third, even when one GDD was administered as positive (e.g., divergent), inpatient primary BC therapy was classified as guideline-divergent by the overall adherence index. In this sense, only one disrespected quality indicator devalued all possible guideline-adherent indicators beforehand.

Statistics
Univariate statistics describe clinical characteristics of the selected samples. The distributions of covariates between cohorts were compared by Chi-square-, Kruskall-Wallis-, Mantel-Haentszel Chi-square, Mann-Whitney U-or T-Tests. Derived pvalues were adjusted for multiple testing by the Bonferroni-Holm method. Frequency counts described quality indicators, and Chi-Square tests adjusted for multiple testing with the Bonferroni-Holm method were applied. Multivariate survival analysis was performed by a Cox-regression model [23]. Multivariate survival curves were derived by the corrected group analysis method [24]. The significance level was defined by α = 5 %. SAS 9.3 software was used.

Analysis strategy
Univariate results of sampling and distributions of important covariates are presented first. The number of developed quality indicators and their guideline adherence (divergence) of each therapy sequence and of the overall binary adherence index are presented. Finally, multivariate survival methods analyzed every period (e.g., cohort) separately, before cross-period/cohort comparisons without the adherence index and cross-period/-cohort comparisons conditioning on adherence status were performed.

Sampling results
An entry cohort of 877 patients was reduced by 134 patients (15 %) due to loss to follow-up (1.9 %), nonassessable stage information (0.3 %), non-invasive tumours (6.3 %), non-assessable or distant metastases (5.2 %), or non-assessable margins of removed tumours (1.5 %). Excluded patients were randomly distributed over both cohorts (see percentages of Table 1), and no significant differences between included and excluded patients were detected (p-values not shown here). The exclusion of patient records left 743 (84.7 %) in the institutional-invasive samples and 504 (57.5 %) patients in the regional-invasive samples for analysis.

Process quality indicators
In total, 104 quality indicators defined Guideline Adherent Decisions (51) and Guideline Divergent Decisions (53). Common QIs valid for both cohorts due to equal guideline recommendations related to the surgical strategy. A total number of 23 QIs referred to the sequences of breast conserving surgery and irradiation (BCS + RAD: 8 QIs) and the modified radical mastectomy (15 QIs).

Adherence indices
The application of defined QIs showed significant differences of guideline adherence between 1996-97 and 2003-04 (see Table 3). The relative share of guidelineadherent surgical treatments increased from 28.7 % (1996-97) to 52.8 % (2003-04) in the institutionalinvasive sample (from 30.3 to 51.9 % in the regionalinvasive sample). Chemotherapy adherence increased from 74.5 to 93.2 % (76.9 to 92.1 %) of treatments and hormone therapy from 70.1 to 84.4 % (68.1 to 83.8 %). Only the therapy sequence of lymph node dissection failed to exhibit a significant difference between cohorts due to the high quality level prior to infrastructural changes.
The summarizing overall binary adherence index among all of the measured inpatient therapy sequences significantly increased from 13.3 % (1996-97) to 35.2 % (2003-04) in the institutional-invasive samples and from 15.1 to 33.5 %. In other words, a two-fold increase of process quality has been achieved and the relative share of treatments divergent from guidelines declined from 86.7 to 64.8 % (84.9 to 66.5 %).

Multivariate 5-year survival estimates Period-specific results
Furthermore, the impact of the overall binary adherence index on survival should be measured. Several steps of model selection-, check-and model-fit-procedures identified a relevant covariate set encompassing the developed adherence index. Estimates of the final Cox-regression model are shown in Table 4. Table 4 shows cohorts and samples across the statistical information. For cohort 1996-97, both samples (institutional-and regional-invasive) show the negative association between adherence index and survival. If a patient was treated according to the guidelines, the temporary affinity to die (hazard ratio) declined and the 5-year overall survival increased. However, this result is not significant. A systematic effect of adherence on survival is not evident. This result is consistent across cohort 2003-04 and defined samples. The related survival curves of multivariate survival estimates should be derived by the corrected group analysis (CGA) method. The results are shown in Table 5.
If all of the additional variables of the Cox model are taken together, the CGA method allows for estimating survival rates and related curves [24]. The cohort-specific perspective and the institutional-invasive samples are presented first. Cohort 1996-97 exhibits remarkable survival differences between comparison groups (institutional-invasive: 84.5 − 76.8 = 7, 7). However, confidence intervals and related p-values indicated that the results were not significant. The same result was obtained for cohort 2003-04. A small 5-year survival difference (87.7 − 86.3 = 1,4) was estimated. However, the survival curves behave differently as Fig. 1a-b indicates. Legend: a refers to non-assessable stage information, b excludes non tissue invasive tumors (pTis), c excluded all non-assessable metastasis status or distant metastasis, d patients without any information are excluded Figure 1a on the left shows the development of cohort 1996-97. The survival curves start separating after 12 months and depart after 30 months. The survival curves of guideline-divergent treated patients decline more than patients treated according to guidelines. In comparison, for cohort 2003-04 the survival differences between groups are very small, and the decline occurred after 20 months and a less steep development for the guidelinedivergent treated patients was observed (Fig. 1b). If the analysis is restricted to regional-invasive samples (e.g., residential patients), cohort 1996-97 displayed small survival differences (83.4 − 79.9 = 3.5, see Table 5) and cohort 2003-04 displayed considerable survival differences (91.0 − 84.0 = 7.0, see Table 5) between the comparison groups. Figure 2a-b demonstrates insights. Figure 2a shows the survival curve of cohort 1996-97. It seems that the curves start to separate after 12 months, and after 30 months the curve declines more. The survival   curve of cohort 2003-04 (Fig. 2b) exhibits a different pattern. The survival curve starts departing from the beginning of the observation time and the curve of guideline-divergent treated patients is steeper after 10 months. Thus, the survival curves of cohorts and samples were altered substantially in terms of survival level and curve developments.

Cross-period results
To obtain more insights into cross-period survival rates and patterns, the cohorts were compared regardless of adherence status (not shown in tables). The institutionalinvasive sample estimated a survival rate of 79 % for cohort 1996-97 and 86 % for cohort 2003-04. The survival difference between cohorts was significant (p = 0.007). However, if the information of guideline adherence is added to the model and cross-period survival curves of guideline-adherent only, or guideline-divergent treated patients only were estimated, the subject becomes more intriguing.
First, if only guideline-adherent patients of the institutional-invasive samples were compared, the survival  estimates (see Table 5) were almost identical for cohorts 1996-97 and 2003-04 (89.6 % vs 89.9 %). The survival differences were not significant. If this comparison is restricted to residential patients (e.g., the regional-invasive sample), the survival rate of cohort 1996-97 was essentially lower than in cohort 2003-04 (87.1 % vs 92.2 %) but still not significant. Figure 3a-b shows the survival curves. Second, only guideline-divergent treated patients were observed across the samples. The institutional-invasive samples showed a survival rate of 76.4 % in cohort 1996-97 and 84.6 % for cohort 2003-04 (see Table 5). This difference was significant (p = 0.013). However, this result was not replicated for the regional-invasive samples (79.6 vs 82.5; not significant). The survival curves are shown in Fig. 4a-b.

Discussion
Based on the defined set of quality indicators according to time dependent guidelines and available medical knowledge, a two-fold increase of process quality and its medical decision making from the expert's point of view has been observed. This result is a benefit for women with BC because the complexity of modern therapies continues to grow.

Period-specific comparisons
The process quality of cohort 1996-97 was expected to be low, and no survival differences between comparison groups in cohort 1996-97 were expected. In fact, no impact of process quality on survival was observed. For cohort 2003-04, a higher clinical process quality was hypothesized and an impact on survival was expected. Higher survival rates of the guideline adherence group were expected but were not observed. Multivariate survival analysis revealed no significant associations of the adherence index on 5-year overall survival across all of the defined samples.

Cross-period comparisons
The cross-period/cohort comparisons should yield deeper insights into mechanisms of temporal changes. Crossperiod comparisons without considering the overall binary adherence index showed a significant difference of survival rates of approximately 7 % (see subsection 'Cross-period results'). However, cross-cohort comparisons of the adherence group only showed that estimates revealed no significant survival differences. When the guideline divergence group of cohort 1996-97 and 2003-04 were compared, systematic survival gains of 10 % were observed for the institutional-invasive sample. The latter survival increase exceeds the survival increase of periods regardless of the adherence status by approximately 3 %. This excess survival can be characterized as a period effect and was not expected for this subgroup.
In the context of guideline developments and its assessment, this unraveled period effect was deemed inconsistent with the ubiquitous demand of the maximization of guideline adherence [17,18]. Isn't it a paradox that particular women with BC benefited most in the last decade from treatment which violated guideline recommendations?

Essence of guidelines
It is not inconsistent with the essence of guidelines because the identified paradox reflects the very nature of guidelines as they should apply for the vast majority of patients. Schulz et al. [10] emphasized that "if the individual situation requires deviations of guidelines, it is not solely possible, it is mandatory to do so. Guidelines do not discard physicians from their obligation to concern the clinical characteristics, somatic, psychological and social conditions of each patient".
At this point, cohort 1996-97 and 2003-04 differ substantially from the infrastructural perspective. Systematic, rationale and conscious decisions against guidelines were made and monitored by expert panels in 2003-04.

Why adherence paradox?
These multidisciplinary expert panels were introduced in the decade of cohort 2003-04 to cope with the essence of guidelines. Expert panels operated by leading physicians from all related disciplines (e.g., gynaecologist, oncologists, surgeons, pathologists, radio-oncologists, psycho-oncologists, etc.) gave consensual advice for further, multi-modal treatment [11]. Expert panels became an important forum to consider guideline recommendations, individual medical experience of various experts, patient preferences and their social circumstances. Expert panels use guidelines as a starting point for common recommendations and, if necessary, violate them systematically, rationally and consciously to tailor an individualized therapy. Thus, the identified adherence paradox reflects this essence of guidelines and signalizes its appropriate application in certified BC networks [15].

Alternative approaches to define an adherence index
In comparison to related studies, most of these studies use a rate-based/criterion-based approach to define 5 to 20 quality indicators, mostly extracted from routine data [25][26][27][28][29][30]. These studies estimate that guideline adherence is between 80 and 100 %. If 33 indicators are used, the adherence of medical decisions decreases to 52 % [17,18]. If medical decisions documented in patient record files are revised, 19 % (1993) and 54 % (1995) of 375 medical decisions appear to be adherent with current guidelines [31]. Scientifically legitimated deviations increased from 42 % (1993) to 68 % (1995). As an experimental design with the same methodology was conducted, a non-significant increase of 36 % (1996) to 40 % (1999) of 825 revised medical decisions was found [32]. Overall, the degree of adherence strongly depends on the length of observation time [33], age of the patient [34], number of quality indicators and included therapy sequences.

Adherence index and survival of related studies
Most studies only refer to selected therapy sequences (e.g., surgery, chemotherapy, etc.) [35,36] and dismiss effects of relevant or related interventions. Other studies assessed inpatient therapy by a small number of indicators and estimated 50 % lower hazard ratios induced by guideline adherence treatment [37]. Woeckel et al. reproduced this result with a greater number of indicators but advised that a non-linear relationship between adherence and survival seems to be persistent [17,18]. Indeed, the influences of the socio-economic status (SES) seem to modify treatment effects because social disparities of survival have been reported [38,39]. Hence, systematic positive and linear relationship of adherence and survival is not replicable with incomplete multivariate models. In this sense, the present study is consistent with other reports [40,41].

Strengths of study
Data quality assessment prior to this study [19][20][21] assured high data quality, epidemiological relevance, and reliable and valid survival estimates. Sample distinction between all selected patients and residential patients emphasizes that confounding effects and related biases were adjusted for survival analyses. The definition of quality indicators is based on "pathways of coherent decisions" and is superior to the rate-based/criterion-based methodology. For example, breast conserving surgery/mastectomy (BCS/MRM) together with irradiation (RAD) defines a compound therapy according to the guidelines [12]. As this approach was applied to time-interval specific guidelines, this study was able to identify the (unexpected) period effects.

Limitations of the study
A number of the 104 quality indicators did not include important variables necessary for guideline assessment. Particularly, patients' preferences for treatment strategies are missing. Studies have shown that up to 50 % of patients disagree with physicians' treatment recommendations [42]. This comparatively high share of disagreement between patients (mastectomy preference) and their physicians (favoring breast conserving therapy) referring to a sample recruited between 2001 and 2003 emphasizes that guideline deviations do not descend from medical experts alone. Additionally, some indicators refer to decisions and planned actions but not to actual "clinical performance". This limitation refers to chemo-and hormone-therapies whose time schedules strongly depend on the patients' physical conditions. To consider this general flaw of conceptualization, new categories such as "scientifically legitimate decisions" [31,32] or "justifiable guideline divergence" decisions [43] seem to be more appropriate to relax the rigid distinction between guideline adherence and divergence.