Use of the European Organisation for Research and Treatment of Cancer multiple myeloma module (EORTC QLQ-MY20): a review of the literature 25 years after development

The European Organisation for Research and Treatment of Cancer Quality of Life Multiple Myeloma Questionnaire (EORTC QLQ-MY20) was developed in 1996 to assess health-related quality of life (HRQoL) in patients with multiple myeloma. Since its development new therapies have prolonged survival in patients with myeloma and new combination agents are likely to impact HRQoL outcomes and its measurement. The aim of this review was to explore the use of the QLQ-MY20 and reported methodological issues. An electronic database search was conducted (1996-June 2020) to identify clinical studies/research that used the QLQ-MY20 or assessed its psychometric properties. Data were extracted from full-text publications/conference abstracts and checked by a second rater. The search returned 65 clinical and 9 psychometric validation studies. The QLQ-MY20 was used in interventional (n = 21, 32%) and observational (n = 44, 68%) studies and the publication of QLQ-MY20 data in clinical trials increased over time. Clinical studies commonly included relapsed patients with myeloma patients (n = 15, 68%) and assessed a range of combinations therapies. QLQ-MY20 subscales (disease symptoms [DS], side effects of treatment [SE], future perspectives [FP], body image [BI]) were defined as secondary (n = 12, 55%) or exploratory (n = 7, 32%) trial endpoints, particularly DS (n = 16, 72%) and SE (n = 16, 72%). Validation articles demonstrated that all domains performed well regarding internal consistency reliability (>0.7), test-reset reliability (intraclass correlation coefficient > =0.85), internal and external convergent and discriminant validity. Four articles reported a high percentage of ceiling effects in the BI subscale; all other subscales performed well regarding floor and ceiling effects. The EORTC QLQ-MY20 remains a widely used and psychometrically robust instrument. While no specific problems were identified from the published literature, qualitative interviews are ongoing to ensure new concepts and side effects are included that may arise from patients receiving novel treatments or from longer survival with multiple lines of treatment.


INTRODUCTION
Multiple Myeloma (MM) is a haematological cancer that affects multiple organs and is associated with complex symptoms [1]. However, due to treatment option advances, MM survival rates have significantly improved in the past 25 years [2][3][4]. Despite the constantly evolving treatment landscape for MM, it remains an incurable and progressive disease, that requires either continuous or intermittent therapies to maintain disease stability and sustain or prolong the survival [5].
Disease symptoms, in addition to treatment side effects caused by multiple lines of therapies, can severely impact on patient's wider health-related quality of life (HRQoL). For example, fatigue and pain are physical symptoms commonly reported by patients with myeloma which significantly impair HRQoL [6,7]. In addition to extended survival, it is important to understand how new and combination treatments may affects patients' lives, therefore, it is recognised that patient-reported outcome (PRO) measures are vital to assess in clinical trials and in the management of MM [8].  [9]. The original module QLQ-MY24, released in 1996, included 4 additional items under the domain of Social Support (SS) that was subsequently removed due to observed ceiling effects [10]. The QLQ-MY20 module is used in conjunction with the EORTC Core Quality of Life Questionnaire (QLQ-C30) designed for use in oncology patients more generally. The MM module has been translated into over 70 language versions [11], is a MM-specific measure used most globally and is one of the most extensively validated instrument for use in MM clinical research [10,12].
Since the module's development, the treatment for MM has changed [13]. The original validation of the QLQ-MY20 was largely in newly diagnosed patients and the module was focused on the expected side effects of conventional chemotherapy and steroids when it was originally developed [9]. The conventional chemotherapy in 1999 was mainly melphalan, cyclophosphamide, vincristine and doxorubicin. Although it is recognized that patients with myeloma can be treated with a variety of different chemotherapy drugs and regimens, it was felt that the side effects of conventional chemotherapy and steroids may more adversely affect the HRQoL of the patients for a longer period of time. However, after 1999, nochemotherapy treatments (proteasome inhibitors, immunomodulatory drugs, monoclonal antibodies and other novel agents) have been introduced. The increase in survival rates coupled with the rapid progression in therapeutic options for patients with myeloma have implications for the HRQoL outcomes and side effects for this population. Osborne et al published a review in 2012 identifying issues important to patients and whether existing instruments comprehensively cover the current treatment landscape and patient experience [12]. While the QLQ-C30 and QLQ-MY20 were acknowledged as the instruments which had good conceptual coverage and had undergone the most extensive validation in patients with myeloma, no instruments were identified as covering all issues relevant to patients, signifying the need for a MM module update that will represent HRQoL taking into account current therapy issues and HRQoL concerns to patients today.
The EORTC guidelines provide a four-phase framework for updating existing modules [14]. As part of Phase I (generation of QoL issues), a literature review assessing the use of the QLQ-MY20, and any reported methodological issues was performed. The following article details this literature review which aimed to explore: 1. In which types of clinical studies the module has been used 2. To what extent has the module been used in both newly diagnosed and relapsed patients 3. The types of treatments/therapies the module has been used to assess 4. How and where the module-related endpoint is positioned within randomised controlled trials (RCTs) 5. How the module results are reported, and the prominence given to these results 6. The statistical results from QLQ-MY20 subscales in RCTs 7. PRO limitations identified from interventional studies and validity/reliability issues raised in psychometric validation studies Abstracts were included if they were reporting a clinical study of any design that generated data using the QLQ-MY20/24 or a study to evaluate the QLQ-MY20/24, including the assessment of the psychometric properties of the module (validation study). Only abstracts reporting original research were included thus reviews, conference proceedings and book chapters were excluded. The full-text publications were sought for all references meeting these criteria. When a single study was referenced across multiple references only the most comprehensive or relevant publication (e.g., HRQoL focused) was retained. Clinical studies were categorized as interventional (i.e., RCT's, clinical trialsinglearm, clinical trialcross over) or observational (i.e., cross-sectional and longitudinal/cohort) study designs.

Data extraction
General information (e.g., author, title and year and location of the study) was collected for all studies. For all clinical studies information about the disease severity (i.e., newly diagnosed/ relapsed), and other clinical outcome assessments (COAs, including patient-reported outcomes) used was extracted. For trials (RCT's, single, and cross-over arm) further information about the study design, reporting and presentation of results were extracted. Further in-depth extraction of RCTs was performed, including type of statistical analysis on QLQ-MY20 data and comparisons between groups. For validation studies data on the instrument structure and data distribution, reliability, validity and ability to detect change/interpretation of change scores was extracted.
Interrater agreement Data extraction was initially performed by one reviewer, the indications of the first reviewer were subsequently checked by a second reviewer. Any cases of disagreement or uncertainty were then discussed, and consensus was established in all instances by the study team based on the inclusion criteria. For the extraction of statistical data, all data extracted was checked by a statistician to ensure accuracy.

RESULTS
The search yield 502 unique records ( Fig. 1) of which 74 publications were taken forward for review (33 full-text articles and 41 conference abstracts). Table 1 provides an overview of study design where the QLQ-MY20 was used and the country in which the author team were affiliated. The studies had a wide international spread and in recent years there has been a growth in scientific publication on the use of the QLQ-MY20 in both clinical and instrument validation studies. There has been an increase in the use of the QLQ-MY20 in RCTs, single-arm clinical trials and cross-sectional observational studies over time.

Study designs where QLQ-MY20 is used
QLQ-MY20 instrument use in observational and interventional studies When stated, interventional and observational studies included either exclusively relapsed patients (n = 24/43, 55.8%, 14 of which were interventional), newly diagnosed patients (n = 10/43, 23.3%, seven of which were interventional), and a mix of newly diagnosed and relapsed patients (n = 9/43, 20.9%, none of which were interventional). Over time, both observational and clinical trials increasingly utilized the QLQ-MY20 with samples of relapsed patients and mixed samples of newly diagnosed and relapsed patients.

QLQ-MY20 instrument use in interventional trials
Trends over time (for the reporting periods 2006-2010, 2011-2015, and 2016-2020) were assessed across interventional studies and four notable trends were observed. Over time, the proportion of RCTs, relative to single-trial arm and cross-over trials, increased from n = 0 between 2006 and 2010 to n = 5/7 between 2011 and 2015 to n = 10/13 between 2016-2020. Similarly, the number and proportion of trials utilizing a sample of patients who have experienced their 1st or subsequent relapses, relative to being newly diagnosed, increased over time from n = 1/2 between 2006 and 2010 to n = 4/7 between 2011 and 2015, and n = 9/13 between 2016-2020. The average QoL sample size increased from 144 between 2006 and 2010 to 479 and 465 between 2011 and 2015 and 2016 and 2020 respectively. In recent years, there has also been more questionnaires used in conjunction with the QLQ-MY20; between 2006 and 2010 only two additional questionnaires were used alongside the QLQ-MY20, however, five were used between 2016 and 2020. No differences were observed in the types of treatments/therapies the QLQ-MY20 has been used to assess, the endpoint hierarchy that the QLQ-MY20 was selected for, the study phase it was used in or the presentation of QoL results in the form of tables, figures and/or in text. The review of interventional study papers highlighted the main limitations with the PRO instruments or analysis/results as reported by authors (Table 2). Some issues are those generally affecting PROs rather than specific to the QLQ-MY20 such as differential dropout or poor completion rates potentially biasing the analysis, low baseline levels of symptoms limiting the opportunity to show improvement, single arm studies, short term PRO data collection and lack of standardization in collection and analysis of PROs across trials limiting comparison of results across studies. Issues raised which may be more specific to the QLQ-MY20 were the need for thresholds for meaningful change at the individual patient level, the need for consistency across studies in definitions of meaningful change, discrepancy between patient-reported 'tingling hands and feet' and the clinician reported peripheral neuropathy events, higher incidence of AEs or more severe AEs not translating into an impact on the PRO scores and potential lack of sensitivity of current questions to pick up variations in HRQoL depending on treatment administered. Another paper suggested that elements such as dosing convenience were currently not adequately measured by the available PROs. Table 3 summarises the results from the 15 RCTs with respect to comparisons of QLQ-MY20 scores between treatment groups. The statistical significance of any mean difference comparisons between groups and any time to deterioration (TTD) comparisons between groups is reported.

Role of QLQ-MY20 alongside clinical endpoints in RCTs
Most trials evaluated the meaning of the PRO results in context with the clinical results. Five of the 15 trials were comparing triplet versus doublet therapy combination therapies. It was common in these studies for no statistically significant differences between treatment groups to be observed and for authors to interpret this as a positive result, demonstrating the addition of an agent to the combination did not impact on HRQoL. Four studies reported statistically significant differences between groups for the SE subscale (lenalidomide ( ). One study reported longer time to deterioration for one arm for the DS subscale (once weekly vs twice weekly). One study reported longer time to deterioration for the SE subscale (Kd vs Vd). Another study reported differences between arms with respect to FP at later timepoints (IRd vs Rd). One small study [24] noted clinically relevant differences between cyclophosphamidebortezomib-dexamethasone (VCD) plus placebo and VCD plus clarithromycin for DS and SE, and statistically significant differences with respect to BI.
In addition to these formal comparisons between treatment groups the RCTs also reported the proportion of patients with improved/stable/worsened QLQ-MY20 scores, association of clinical endpoints (response, time to progression and toxicity) with the QLQ-MY20 scales and the effect of age on HRQoL benefit.

DISCUSSION
The objective of this literature review was to review the use of the QLQ-MY20, since its first release 25 years ago, as the first validated module for patients with myeloma designed to be used with the       There were a few drivers for this review. At the time of the original validation study the majority of clinical trials were in newly diagnosed patients and there was limited data for validation of the QLQ-MY20 in relapsed/refractory patients. Over the time period since the original publication of the QLQ-MY20, the treatment landscape has changed dramatically and patients with myeloma now undergo multiple lines of treatment and relapses. We wanted to use this review to see if the use of the questionnaire in relapsed patients has increased accordingly. The review aimed to summarise the range of studies the questionnaire has been reported in, how the data from the QLQ-MY20 was reported and how the results impacted on the evaluation of the treatments in the studies alongside clinical endpoints. We also wanted to collate any further psychometric evaluations of the QLQ-MY20 to see if any issues have emerged as the use of the questionnaire changed.
Seventy-four studies, that used the QLQ-MY20, were reviewed following screening, of which there were 15 RCTs, 6 single arm or cross-over trials, 44 observational and nine instrument validation studies, indicating diverse and extensive use of the QLQ-MY20 in several different clinical settings and investigations. The review of the published literature did not highlight any specific problems with the QLQ-MY20, however, qualitative interviews are ongoing to further explore the patient experience of symptoms and side effects of novel treatments. A revised version of the QLQ-MY20 is therefore warranted to ensure all concepts of interest are captured; concepts assessed by the additional COAs reported should be explored further in Phase I and II (generation of QOL issues and construction of the item list) of modular development and considered for inclusion in the updated version of the QLQ-MY20.
The RCTs highlighted that often no difference between treatments were observed with respect to the QLQ-MY20 subscales but that in conclusion often this was a desirable outcome, especially regarding the SE subscale (e.g., demonstrating that adding a further agent to a combination regimen does not have a detrimental impact on QoL). As new treatment regimens and new combination therapies continue to be developed, this should be a key consideration at the design stage for a RCT. The QoL comparisons should be non-inferiority rather than superiority and ensuring there is sufficient sample size to declare non-inferiority where applicable. It is also important for robust meaningful change thresholds to be determined in order that non-inferiority margins can be defined. To date there has been one study on deriving meaningful change [31] but further development of these may be required. The RCT data also supported the QLQ-MY20 subscales being related to clinical outcomes and supporting and supplementing the conclusions from the clinical endpoints. A number of studies investigated the relationship of the QLQ-MY20 scales with clinical outcomes such as time to progression and response.
Indicative of the expansion of the treatment portfolio and changing prognosis for patients, the proportion of RCTs using the QLQ-MY20 increased over time from n = 0 in the first 5 years to n = 10/13 in the last 5 years. The proportion of trials in patient post their 1st or subsequent relapses, relative to being newly diagnosed, increased over time from n = 1 in the first 5 years to n = 9/13 in the last 5 years. Over these time periods there were no observed trends for QoL endpoints to move up the hierarchy, however, this could be due to the inevitable time lag between research and publication of findings. Similarly, there were no trends or improvements in the reporting of QLQ-MY20 results in tables/figures rather than text alone; generally the reporting of the QLQ-MY20 included tables and/or figures throughout the period.
There were a few instances where limitations of the QLQ-MY20 were highlighted by individual papers. One issue was the need for work on meaningful change thresholds for the QLQ-MY20. Although this has since been addressed by Sully et al [31] more studies in this area would be beneficial in the future. Some studies used an additional peripheral neuropathy questionnaire alongside the QLQ-MY20 and one noted a discrepancy between the QLQ-MY20 item 'tingling hands and feet' and the clinician-reported peripheral neuropathy, which could indicate the need for more detailed items in the QLQ-MY20 on this side effect. Amongst the psychometrics studies, the instrument performed consistently well. One potential issue found in some studies was a ceiling affect for the BI subscale so this may warrant further investigation and may be the case for certain populations.
Potential limitations of our study include comprehensiveness of the usage of the QLQ-MY20. Our search will have identified any studies reporting results from the QLQ-MY20 but we acknowledge that this will exclude any studies that have used the instrument but not published any results from it. There will also be key multiple myeloma trials not in this review as they used only the QLQ-C30 or a different PRO. Regardless we have shown across a broad range of studies where the QLQ-MY20 has been used some of the trends over time in terms of patient populations and study designs.
In conclusion, the QLQ-MY20 has been shown to perform well psychometrically since its initial validation. The QLQ-MY20 scales have been supportive of clinical endpoints in RCTs and have been used to understand the patients' QoL alongside improved response and time to progression outcomes. To maintain content validity in today's MM treatment landscape (i.e., to ensure the instrument is relevant to MM patients and captures their symptoms and side effects of novel treatments and later lines of therapy) qualitative interviews with patients and health care professionals and an update to the QLQ-MY20 is underway to incorporate findings.

DATA AVAILABILITY
All data generated or analysed during this study are included in this published article [and its supplementary information files].