Impact of the occurrence of a response shift on the determination of the minimal important difference in a health-related quality of life score over time

Background An important challenge of the longitudinal analysis of health-related quality of life (HRQOL) is the potential occurrence of a Response Shift (RS) effect. While the impact of RS effect on the longitudinal analysis of HRQOL has already been studied, few studies have been conducted on its impact on the determination of the Minimal Important Difference (MID). This study aims to investigate the impact of the RS effect on the determination of the MID over time for each scale of both EORTC QLQ-C30 and QLQ-BR23 questionnaires in breast cancer patients. Methods Patients with breast cancer completed the EORTC QLQ-C30 and the EORTC QLQ-BR23 questionnaires at baseline (time of diagnosis; T0), three months (T1) and six months after surgery (T2). Four hospitals and care centers participated in this study: cancer centers of Dijon and Nancy, the university hospitals of Reims and Strasbourg At T1 and T2, patients were asked to evaluate their HRQOL change during the last 3 months using the Jaeschke transition question. They were also asked to assess retrospectively their HRQOL level of three months ago. The occurrence of the RS effect was explored using the then-test method and its impact on the determination of the MID by using the Anchor-based method. Results Between February 2006 and February 2008, 381 patients were included of mean age 58 years old (SD = 11). For patients who reported a deterioration of their HRQOL level at each follow-up, an increase of RS effect has been detected between T1 and T2 in 13/15 dimensions of QLQ-C30 questionnaire, and 4/7 dimensions of QLQ-BR23 questionnaire. In contrast, a decrease of the RS effect was observed in 8/15 dimensions of QLQ-C30 questionnaire and in 5/7 dimensions of QLQ-BR23 questionnaire in case of improvement. At T2, the MID became ≥ 5 points when taking into account the RS effect in 10/15 dimensions of QLQ-C30 questionnaire and in 5/7 dimensions of QLQ-BR23 questionnaire. Conclusions This study highlights that the RS effect increases over time in case of deterioration and decreases in case of improvement. Moreover, taking the RS into account produces a reliable and significant MID. Electronic supplementary material The online version of this article (doi:10.1186/s12955-016-0569-5) contains supplementary material, which is available to authorized users.


Background
In oncology clinical trials including health-related quality of life (HRQOL) as an endpoint, the main objective is to assess the impact of the treatment on patient's HRQOL level over time. Consequently, a longitudinal assessment of HRQOL is desirable. The interpretation of the results of the longitudinal analysis of such data must be made in both statistical and clinical point of view in order to produce meaningful results for both patients and clinicians [1,2]. The minimal important difference (MID) was defined as the smallest change between two scores in a treatment outcome that a patient would identify as important [3][4][5][6].
For the European Organization of Research and Treatment of Cancer (EORTC) HRQOL questionnaires, the MID is generally fixed to 5 or 10 points for each score standardized on a 0-100 scale [2]. Nevertheless, this MID must be studied and determined for each HRQOL questionnaire and according to each cancer site. This was already made for the EORTC module of lung and brain cancer as example [7,8]. To our knowledge, it has not yet been done for the EORTC QLQ-BR23 module for breast cancer patients. Furthermore, it is mandatory to not ignore the importance of this MID and to take it into account in the interpretation of HRQOL results. Indeed, the longitudinal analysis of HRQOL remains complex, particularly due to the potential occurrence of a response shift (RS) effect characterizing the process of adaptation of the patient in relation to the illness and its treatment [9,10]. Thus, patients may not assess with the same criteria their HRQOL level over time.
The RS refers to a change in the meaning of HRQOL over time. Its definition proposed by Sprangers and Schwartz consists of three components: -A recalibration: change in the respondent's internal standards of measurement; -A reprioritization: change in the importance of the component domains that constitute HRQOL; -A reconceptualization: redefinition of the concept of HRQOL [9,10].
Several methodological or statistical methods have been proposed to characterize the occurrence of the RS effect, such as the then-test [9] or structural equation modeling [11]. The then-test consists to ask to patients post-treatment to provide their current levels (post-test) but also their pre-test levels in retrospect (then-test). Its impact on longitudinal HRQOL analysis has also been studied in breast cancer patients [12]. However, at this time, few researches have focused on the impact of RS effect on the determination of the MID [13]. Thus, the MID recommended for future studies could thus be under or over estimated by a potential RS effect.
The occurrence of the RS effect could affect the interpretation of change in HRQOL scores. In this case, we need to assess the occurrence of the RS to obtain a valid and reliable assessment of change over time. In particular, when we have longitudinal data to determine the MID, it is therefore important to take into account this RS to assess the true change represented by the MID.
Many studies aimed to estimate the MID without taking into account the occurrence of the RS [3,4]. To our knowledge, only one study explored the impact of the RS on the MID determination. This study demonstrated that the recalibration component of the RS effect does not have an important effect in patients with multiple myeloma who respond to treatment, i.e. for which a HRQOL improvement was observed. However, the author showed that RS does have an important effect in case of deterioration of patient's HRQOL level [13]. Thus it seems to be essential to study the impact of the RS effect in studies aiming to determine the MID, and to study if this RS impact differentially patients who improved to those who deteriorate over time. Only two measurement times (at inclusion and after three months) were considered in the study of Kvam et al. which allow detecting the importance of RS effect on deteriorating or improving of HRQOL and its direction after three months [13]. However, since the RS effect is a longitudinal process, it could be relevant to include more time points in order to study the longitudinal change of the RS effect over time.
In this context, the objective of this work was to study the impact of the recalibration component of the RS effect on the determination of the MID in breast cancer patients between three measurement times using the EORTC QLQ-C30 cancer specific questionnaire and its breast cancer module QLQ-BR23.

Patients
Data from a prospective, multicenter, cohort study were used including all women hospitalized for the diagnosis or treatment of primary breast cancer or for a suspicion of breast cancer. Patients who have other primary cancer sites were excluded. Patients already hospitalized or treated for breast cancer were not included. Written informed consent was obtained from all participants. The protocol was approved by the ethics committees ("Comité de Protection des Personnes"). The complete design of this study was extensively described elsewhere [14].

Study design HRQOL questionnaires
HRQOL was assessed using the EORTC QLQ-C30 cancer-specific questionnaire and its QLQ-BR23 breast cancer module. Three measurement times were used: at baseline (initial examination or initial hospitalization, T0), three months (T1) and six months later (T2).
The QLQ-C30 consists of 30 items measuring five functional scales (physical, role, emotional, cognitive and social functioning), a global health status (GHS), financial difficulties and eight scales of symptoms (fatigue, nausea and vomiting, pain, dyspnea, insomnia, appetite loss, constipation, diarrhea) [15]. One score is generated per dimension and standardized on a 0 to 100 scale in order that a high score reflects a high GHS, functional and symptomatic level [16].
The QLQ-BR23 module is specific to breast cancer. It includes 23 items allowing to assess four functional scales (body image, sexual functioning, sexual enjoyment, future perspectives) and four symptom scales (systemic therapy side effects, breast symptoms, arm symptoms, upset by hair loss) [17]. As for the QLQ-C30, one score is generated per dimension on a 0-100 scale in order that a high score represents a high level of functioning and a high symptomatic level.
Questionnaires have been distributed by a clinical research assistant to the patients during the hospitalization or after a consultation or sending by the post.

Then-test assessment
In this study, the then-test method was used to detect changes in internal standards, namely the "recalibration" component of the RS [18].
At each follow-up time point, one prospective and one retrospective measurement were performed. For the retrospective measurement (then-test) at T1, patients were asked to re-evaluate their baseline HRQOL level (three months before). At T2 (six months), patients were asked to re-evaluate their HRQOL level at three months (retrospective assessment of HRQOL level at T1 (three months)).

Assessment of change in HRQOL level
The anchor-based method was used to determine the MID according to the Jaeschke transition question [3]. At three (T1) and six months (T2), patients were asked to evaluate their HRQOL change in the last three months. The question was asked by the following way: "During the past three months, do you consider your HRQOL: -Did not change globally -Deteriorated: very much, much, a little -Improved: a little, much, very much" Since the Jaeschke transition question was asked to the patients at T1 and then at T2, patients can deteriorate between T0 and T1 and then can improve between T1 and T2. Thus, patients in the group "little worse" at T1 can then be in the group "little better" at T2.
To facilitate the interpretation of the results and to yield sufficient numbers of patients in each category, we brought the two categories "very much" and "much" in a single category to get finally five response categories for the anchor (much better, little better, unchanged, little worse, and much worse).

Statistical considerations and missing data
All dimensions of both QLQ-C30 and QLQ-BR23 questionnaires were analysed except the hair loss dimension of the QLQ-BR23 due to missing data (few patients concerned at the stage of the beginning of the treatment).
Only patients with available data at each time measurement were included in the analyses.
All tests were performed at the statistical level of 0.05 with no adjustment on multiple tests. All tests were performed at the statistical level of 0.05 with no adjustment on multiple tests. All p-values were given for information only since sample sizes for each test do not allow to produce some results with a high statistical power. The analyses and tests were made as an exploratory purpose only, we are more interested about clinical meaning of the difference instead of statistical significance.
Scores were calculated according to the recommendations of the EORTC scoring manual [15]: if at least half of the items per dimension were answered, the score was estimated on available items, i.e. considering that missing items were equal to the mean of answered items (simple imputation by the personal mean).
Missing data profile was already explored in a previous study [19]. They were considered as missing at random.

Descriptive analysis
Baseline sociodemographics and clinical characteristics of the patients as well as baseline HRQOL scores were described using mean and standard deviation (SD) for continuous variables and frequencies with percentages for qualitative variables.
Detection of the recalibration effect of the RS RS analyses were performed on patients with available scores at both the the then-test and the corresponding pre-test. The mean differences between the prospective measure performed at T0 and the retrospective measurement performed at T1 as well as between the prospective measure performed at T1 and the retrospective measurement performed at T2 were calculated for each HRQOL score. Results were presented according to each response category of the anchor's question in order to detect the magnitude of the RS effect according to the observed changes.
As the MID is defined as the smallest change between two scores, we were particularly interested in the two categories "little worse" and "little better" to interpret the results.
We looked at the direction of the RS effect: a positive (respectively, negative) value of RS indicates that patients had overestimated (respectively, underestimated) their HRQOL level, their functional or symptomatic level at the previous measurement time.
Then, we were interested in the evolution of the magnitude of the RS effect over time, it means, if the RS effect had increased or decreased over time in absolute value.
Finally, the direction of the response shift effect over time was also analyzed, i.e. if the RS remained positive or negative at both follow-up time points.
The impact of sample size was indicated by calculation of the 95% confidence intervals (95%CI).
The p-values calculated by the non-parametric Wilcoxon paired test were also presented to indicate the statistical significance of the RS.
The effect size (ES) was calculated to detect the magnitude of the recalibration component of the RS effect for each category of the anchor. The ES represents the mean change between the pre-test and the then-test divided by the standard deviation (SD) of the pre-test score. We used Cohen's generally accepted criteria for interpreting the magnitude of the ES in absolute value: an ES of at least 0.20 was considered as a small change, between 0.2 and 0.50 as a moderate change, and greater than 0.80 as an important change [20].

Observed and adjusted MID
MID were determined by calculating the observed changes (i.e. without taking into account the RS effect) and the adjusted changes (i.e. taking into account the RS effect) given by post-test minus pre-test and post-test minus then-test respectively. Observed changes for each score were estimated on patients with the corresponding score available at both the post-test and pre-test measurement times. Adjusted changes for each score were performed on patients with the corresponding score available for both the post-test and then-test. In the both cases, the mean differences were calculated for each HRQOL scores according to each response category of the anchor's question.
The impact of sample size was indicated by calculation of the 95%CI of the mean difference.
The global range for the observed and adjusted MID for all dimensions was then reported by questionnaire and measurement times.
Results of the observed and adjusted MID were finally compared to the threshold of 5 points MID  which is widely used for the EORTC HRQOL questionnaires [2]. All analyses were performed using R statistical software (version 3.2.1) [21].

Patients
Between February 2006 and February 2008, 381 patients with confirmed or suspicion breast cancer were included in the four participating centers (Fig. 1). A difference between centers was observed in terms of questionnaires completions rate due to logistics problems. Mean age was 58.4 (SD = 11) years. Three hundred and forty (89.2%) patients had a confirmed breast cancer. the clinical and socio-demographic characteristics of all patients were described in Table 1.

Detection of the recalibration component of the Response shift effect
Tables 2 and 3 present the results of RS effect at T1 and T2 for the QLQ-C30 and QLQ-BR23 questionnaires respectively (see Additional file 1: Tables S1 and S2 for complementary results).
For 18 over 22 dimensions analysed of both questionnaires, an increase of the magnitude of the RS effect was observed in case of little deterioration over time, i.e. between T0 and T1 and then between T1 and T2 (i.e. category "little worse" for the anchor at each follow-up time point).
To illustrate: -For the insomnia dimension of the QLQ-C30 questionnaire, the RS effect was equal to 2.42 in mean at three months reflecting that patients had overestimated their baseline insomnia level, considering the retrospective measure at three months as the reference. The RS effect became more important after six months by increasing to 10.9 in mean reflecting an overestimation of the insomnia level at T1. Thus, the magnitude of the RS effect increased at six months as compared to three months, with a positive direction of the RS at both time points. -Regarding the body image dimension of the QLQ-BR23 module, the magnitude of the mean RS effect also increase between each follow-up time point from 5.04 to 11.17 in absolute value but with an opposite direction of RS was shown at T1 equal to 5.04 in mean compared with its value after six months by increasing to −11.17 with underestimation of body image pre-test score at T1.
ES indicate a moderate RS effect in case of deterioration after six months for physical, role, cognitive social and sexual functioning as well as for pain, appetite loss, and systemic therapy side effects dimensions (ES > 0.5) and an important RS effect for fatigue (ES > 0.8). Furthermore, a statistically significance of RS effect has been indicated (for information only since sample sizes In case of deterioration, the same direction of RS between T1 and T2 was observed for 11/22 dimensions of both QLQ-C30 and QLQ-BR23 questionnaires. In case of improvement, the direction of the RS effect remained the same for only 3/22  dimensions (namely, pain, dyspnea and appetite loss dimensions).

Observed and adjusted MID
Tables 4 and 5 represent the results of the observed and adjusted MID at 3 and 6 months for the QLQ-C30 and QLQ-BR23 respectively (see Additional file 1: Tables S3  and S4 for complementary results). Based on the scales that have at the same time an increase of the RS effect in case of deterioration and a decrease of the RS effect in case of improvement (GHS, role, cognitive, social and sexual functioning, pain, insomnia, diarrhea, body image, breast and arm symptoms), the minimal and maximal MID for observed and adjusted MID were calculated in case of deterioration and improvement of HRQOL after three and six months ( Table 6). The diarrhea scale was excluded from the analysis because it will disrupt the results; due to a number of scores containing zero values. The financial difficulties and appetite loss dimensions were also added since a remarkable increase was observed when deteriorating and a relatively low increase was highlighted when improving.
A comparison of the observed and adjusted changes in case of a small deterioration (category "little worse") to a threshold of 5 points for both the QLQ-C30 and QLQ-BR23 questionnaires after three and six months was presented in Table 7.
The variations of the RS effect between 3 and 6 months were represented on Figs. 2 and 3 for several HRQOL dimensions. For example, the Fig. 2 illustrates an increase of the RS effect in mean in case of a small deterioration and a decrease in case of a small improvement for the most dimensions of the QLQ-C30. Some consistent results for the most dimensions of the QLQ-BR23 questionnaire were also detected and illustrated in Fig. 3.

Discussion
The objective of this study was to explore the impact of the occurrence of the response shift effect on the determination of the MID over time in breast cancer patients using both the QLQ-C30 and QLQ-BR23 questionnaires.
Both an increase of the RS effect in case of deterioration and a decrease of the RS effect in case of improvement was observed for 7/15 dimensions of the QLQ-C30 questionnaire (global health status (GHS), role, cognitive and social functioning, pain, insomnia, diarrhea) and 4/7 dimensions for the QLQ-BR23 questionnaire (body image, sexual function, breast and arm symptoms). This indicates the differential occurrence of the RS effect according to the change in patient's HRQOL level over time (i.e. deterioration or improvement of HRQOL).
For 13/15 dimensions of the QLQ-C30 questionnaire (except the emotional functioning and the constipation   diarrhea and insomnia dimensions) and 5/7 dimensions of the QLQ-BR23 questionnaire (except the sexual enjoyment and the future perspectives dimensions), a decrease of the RS effect was observed in case of improvement with corresponding ES values close to zero and p-values ≥ 0.05. These values mean that the impact of the RS effect became negligible on the determination of the MID in case of improvement after six months.
Regarding the QLQ-C30 questionnaire, the observed MID in case of deterioration after three months was between 5 and 26 points: a mean difference of 5 points was sufficient to conclude that the difference was clinically significant at T1. After taking into account the RS effect, the minimal difference considered as important for patients increased to 8 points. Furthermore, the adjusted MID became between 8 and 19 points and thus more restricted than it was for the observed changes (without taking into account the RS effect). However, at T2, the observed MID was between 0.5 and 10 then became between 4 and 18 for the adjusted MID. Thus, if we did not have taken into account the occurrence of the RS effect, we can wrongly conclude that a deterioration of 0.5 point is considered as the MID for the patients which seems to be very low. Whereas, after taking into account the RS effect, the mean difference became between 4 and 18 points which seems to be more relevant than the previous interval.
Comparing the results of adjusted MID in case of deterioration at T1 and T2, we find that the MID was between 8 and 19 points at T1 then became between 4 and 18 points at T2, which means that a smaller change of HRQOL can be considered as clinically significant to the patients. In other words, a difference of 4 points out of 100 was not enough to say that this difference was significant after three months; but the same difference became significant to the patient after 6 months. In addition, consistent results have been found for the QLQ-BR23 questionnaire in case of deterioration concerning the observed and the adjusted MID.
Regarding HRQOL improvement, no impact of the RS on the determination of the MID was observed. In contrast, the RS effect seemed to highly impact the MID for deterioration. To illustrate, the minimal of observed MID was smaller than one point in case of deterioration for QLQ-C30 (MID: 0.5-10) and QLQ-BR23 (MID: 0.4-4) after six months, the minimal of each MID was equal to 4 points for the QLQ-C30 and to 6 points for QLQ-BR23 after taking into account the RS effect. Thus, without taking into account the occurrence of the RS effect, we can wrongly conclude that a difference of less than 1 point is clinically significant to the patients.
For patients who have an improvement in their HRQOL, the minimal of observed and adjusted MID  found for the two questionnaires after three and six months is close to zero and they all stayed close to zero after taking into account the RS effect, except for the observed MID after three months for the QLQ-BR23 questionnaire which was equal to 2 points. We can conclude that a very small improvement over time can be considered as important for the patient. For thirteen over the 15 scales of the QLQ-C30 questionnaire and all scales of the QLQ-BR23, the observed and adjusted MID was greater than 5 points after three months. Whereas, after six months, 11/15 scales of the QLQ-C30 and 5/7 scales of the QLQ-BR23 had an observed MID smaller than 5 points and became greater than 5 points after taking into account the RS effect. Thus, the RS effect seems to have an important impact on the determination of the MID and notably after six months.
Our study confirms the earlier results released by Kvam and al. between two measurement times which showed that the RS has an important impact on the results in case of deterioration and was unimportant in case of improvement [13]. However, this previous study was limited to two measurement times.
Three measurement times were considered in our study allowing us to evaluate the change of RS effect and to detect specifically the dimensions for which an increase or a decrease of the RS effect was observed over time. Moreover, using three measurement times allowed us to compare the MID at three and six months and to conclude the importance of taking into account the RS effect in order to obtain a reduced interval of MID and a MID significant after six months comparing with a threshold of five points.
Another strength of our study was the consideration of all dimensions of both QLQ-C30 and QLQ-BR23 breast cancer module. The study of Kvam and al. was on patients with multiple myeloma and the QLQ-MY20 multiple myeloma module [22] was not yet validated at the time of the conception of this study, which justifies the limitation to the QLQ-C30.
Twenty two dimensions have been analysed in our study that provides to collect a lot of information on dimension impacted by the RS effect and trends over time.
A limitation of our study is the use of the then-test method to assess the occurrence of the RS effect. This method required to be planned at the time of the conception of the study and may be subject to a recall bias. Moreover, it focused on the recalibration component to study the impact of the RS, thus we recommend more researches in order to determine the impact of the other components of the RS effect (reprioritization and reconceptualization) on the determination of the MID. The structural equation modeling may be preferable to the then-test method to detect all the three components of the RS in a HRQOL analysis. However, the Oort procedure based on the structural equation modelling to detect the RS effect was developed and mainly applied the SF-36 questionnaire [11,23]. Some researches are still ongoing to adapt this procedure to the EORTC questionnaires [24].
Using item response theory (IRT) may be very important for the future researches to assess the components of RS and its impact on the determination of the MID [19].
Although the use of three measurement times was useful in this study, but there is a recall bias may affect the answers of patients over time. In addition, using just the anchor based approach to compute MID may bias the finding as the standard practice is the combination with both anchor and distribution based methods. The limited number of patients per anchor category is considered also as a limitation of our study. Hence, further studies are needed to study the RS trends with a quite large number of patients per anchor-item category.
The heterogeneity of our data is considered also as a limitation of this study.