Cross-cultural adaptation and validation of the French version of the Expanded Prostate cancer Index Composite questionnaire for health-related quality of life in prostate cancer patients

Background Health-related quality of life (HRQoL) has been positioned as one of the major endpoints in oncology. Thus, there is a need to validate cancer-site specific survey instruments. This study aimed to perform a transcultural adaptation of the 50-item Expanded Prostate cancer Index Composite (EPIC) questionnaire for HRQoL in prostate cancer patients and to validate the psychometric properties of the French-language version. Methods The EPIC questionnaire measures urinary, bowel, sexual and hormonal domains. The first step, corresponding to transcultural adaptation of the original English version of the EPIC was performed according to the back translation technique. The second step, comprising the validation of the psychometric properties of the EPIC questionnaire, was performed in patients under treatment for localized prostate cancer (treatment group) and in patients cured of prostate cancer (cured group). The EORTC QLQ-C30 and QLQ-PR25 prostate cancer module were also completed by patients to assess criterion validity. Two assessments were performed, i.e., before and at the end of treatment for the Treatment group, to assess sensitivity to change; and at 2 weeks’ interval in the Cured group to assess test-retest reliability. Psychometric properties were explored according to classical test theory. Results The first step showed overall good acceptability and understanding of the questionnaire. In the second step, 215 patients were included from January 2012 to June 2014: 125 in the Treatment group, and 90 in the Cured group. All domains exhibited good internal consistency, except the bowel domain (Cronbach’s α = 0.61). No floor effect was observed. Test-retest reliability assessed in the cured group was acceptable, expect for bowel function (intraclass coefficient = 0.68). Criterion validity was good for each domain and subscale. Construct validity was not demonstrated for the hormonal and bowel domains. Sensitivity to change was exhibited for 5/8 subscales and 2/4 summary scores for patients who experienced toxicities during treatment. Conclusions The French EPIC questionnaire seems to have adequate psychometric properties, comparable to those exhibited by the original English-language version, except for the construct validity, which was not available in original version.


Background
Prostate cancer is the second most common cause of cancer and the sixth leading cause of cancer death among men worldwide, with an estimated 1.1 million cases in 2012 and 307,000 new deaths [1]. In France, prostate cancer has been the most frequent cancer in men for the last two decades, with around 54,000 new prostate cancers diagnosed in 2011 [2]. However, prostate cancer is the fifth overall cause of cancer-related death, with less than 10,000 deaths per year as of 2011 [2].
For prostate cancer that is diagnosed when still at a local stage, several curative treatment strategies exist that can achieve long-term remission. However, these treatments can induce significant functional impairment at the urinary, sexual, digestive and hormonal level [3,4]. In this context, health-related quality of life (HRQoL) is an important endpoint for such patients. Moreover, HRQoL is now recognized as a second primary endpoint by the American Society of Clinical Oncology and the Food and Drug Administration if no effect of treatment on overall survival is observed [5][6][7].
In order to capture all symptoms specific to prostate cancer and the side effects of prostate cancer treatment, it is recommended to use disease-specific HRQoL questionnaires [8]. These questionnaires must also be adapted to the culture and the language of the study [9], especially in non-English-speaking countries, and validation of the psychometric properties of adapted questionnaires remains mandatory.
Few HRQoL questionnaires specific to prostate cancer have been validated in the French language. The QLQ-PR25 prostate-cancer-specific module of the European Organization of Research and Treatment of Cancer (EORTC) is available and validated in French [10], but only explores symptoms from a factual point of view. The University of California-Los Angeles Prostate Cancer Index (UCLA-PCI) is widely used in Englishspeaking countries [11]. The Expanded Prostate Cancer Index Composite (EPIC) was developed from the UCLA-PCI by supplementing it with items focusing on urinary irritative and obstructive voiding symptoms and with items addressing hormonal symptoms [12]. The availability of a validated French-language version of the EPIC, which would be better adapted to epidemiological studies [13], is thus essential to allow comparison of French data with existing reports in the international literature.
In this context, the objective of this study was to perform transcultural adaptation and validation of the French version of the EPIC questionnaire according to classical test theory. This study is part of the French National QALIPRO project that aims to investigate the long-term side effects of prostate cancer, in which the EPIC questionnaire is used.

Study design
The validation of the French version of the EPIC questionnaire was performed in two steps. Nine French cancer care centers and university hospitals participated in patient recruitment.
Step 1: Transcultural adaptation and qualitative validation (face validity) Transcultural adaptation of the EPIC questionnaire in French was done using back translation technique and was pretested on a planned sample of 50 patients [14]. These patients were recruited during urology or radiotherapy consultations. Patients had to fill out a debriefing questionnaire to assess their completion of the EPIC: i.e., they could indicate if the questionnaire was too long, or too complicated, or if some items were found to be disturbing, irrelevant, redundant or missing, among other questions. The time required to complete the questionnaire was also recorded. This information gave an indication of both the quality of the translation and the possible acceptability of the questionnaire.
Step2: Quantitative validation of the psychometric properties of the questionnaire Participants For the second step, both patients with ongoing treatment and cured patients were recruited and categorized into two groups as follows: -In the Cured group, patients had to be considered cured of prostate cancer (≥3 years after diagnosis), regardless of initial treatment, and attending a follow-up consultation. Patients with recurrence were excluded. -In the Treatment group, patients were prospectively included before curative treatment for localized prostate cancer. Patients were eligible to participate if they had a histologically confirmed diagnosis of localized prostate cancer, and if they had no previous treatment for their prostate cancer. Patients were excluded if the curative treatment had already been performed, or in case of metastasis.
All patients had to have social security coverage, and had to be able to complete HRQoL questionnaires. All patients were fully informed of the study and provided signed written informed consent. The protocol was approved by the local ethics committee (Comité de Protection des Personnes Est II).
Questionnaires Patients were required to complete the EPIC questionnaire, as well as the EORTC QLQ-C30 cancer specific questionnaire, and the QLQ-PR25 prostate cancer module, to assess criterion validity.
The EPIC questionnaire is a 50-item instrument specific to prostate cancer developed and validated in the English language [12]. Items are separated in domains and each item contains four or five response categories. The questionnaire assesses four domains, namely: urinary, bowel, sexual and hormonal; and each domain comprises two subscales, namely symptom severity (function subscale) and symptom-related impairment (bother subscale). The urinary domain can also be separated into two other subscales, combining both function and bother items, namely a urinary incontinence subscale and a urinary irritation/obstruction subscale. The last item evaluates overall satisfaction. Scale scores (one score per subscale) are transformed linearly to a 0to-100 scale, with higher scores representing higher HRQOL, i.e., high function and low bother. A summary score is also generated for each domain, corresponding to the mean of the function and bother subscales. Scores were generated according to the recommendations of the questionnaire developers [15].
The QLQ-C30 includes 30 items and measures five functional scales (physical, role, emotional, cognitive and social functioning), global health status (GHS), financial difficulties and eight symptom scales (fatigue, nausea and vomiting, pain, dyspnea, insomnia, appetite loss, constipation, diarrhea) [16]. One score is generated for each dimension. Scores vary from 0 (worst) to 100 (best) for the functional dimensions and GHS, and from 0 (best) to 100 (worst) for the symptom dimensions, and were generated according to the EORTC Scoring Manual [17].
The QLQ-PR25 module contains 25 items assessing two functional scales (sexual activity and sexual functioning) and four symptomatic scales (urinary symptoms, bowel symptoms, hormone treatmentrelated symptoms and incontinence aid) specific to prostate cancer. This module must be completed in conjunction with the QLQ-C30 questionnaire. As with the QLQ-C30, one score is generated for each scale and standardized on a 0-100 scale such that a high score represents a high functional or symptomatic level [10].
Measurement times Patients were required to complete the questionnaires twice, as follows: -In the Cured group, patients completed the questionnaires at baseline (T 1 ) and again 2 weeks later (T 2 ) to assess test-retest reliability; -In the Treatment group, patients completed the questionnaires immediately before the initiation of treatment (T 1 ) and then again at the end of treatment (6 to 8 weeks after the first assessment, T 2 ) to assess sensitivity to change.

Sample size calculation
For the qualitative validation, we planned to include 50 patients.
For the quantitative validation of the psychometric properties of the EPIC questionnaire, it was planned to include 300 patients in order to ensure the statistical robustness of the analyses. In particular, exploratory factor analysis (EFA) to investigate the dimensionality of the questionnaire requires a minimum of 150 patients [18], with a minimum of 30 patients per response category with 5 response categories per item. We planned to include 100 patients in the Cured group and 200 patients in the Treatment group.
In the Treatment group, to highlight a minimal clinically important difference (MCID) of 10 points in one dimension between both questionnaire measurements, with a standard deviation (SD) of 20 points, a type I error of 5% and statistical power of 80%, a minimum of 150 patients was required (75 patients in two groups defined by an external criteria).

Statistical analysis
Baseline socio-demographic and clinical characteristics of the patients are described using mean ± SD or median (range) for continuous variables, and number (percentage) for qualitative variables.
Face validity was assessed using the debriefing questionnaire from step 1.
All other analyses were performed on the validation population from step 2 including all patients (both Cured and Treatment groups) at the first measurement point (except if a specific population is concerned).
Acceptability of the questionnaire was assessed by the percentage of missing data (missing items and missing forms). A high proportion of missing forms may indicate poor acceptability of the instrument [8]. Information provided in the debriefing questionnaire in step 1 was also used to gain additional insights into the acceptability of the questionnaire. For example, if the patient found some questions to be disturbing, difficult to understand, or found the questionnaire too long, this could explain some missing data and could similarly lead to missing data in future studies. The debriefing questionnaire was summarized as number and percentage for each question for all patients of step 1. The mean ± SD time required to complete the questionnaire was also reported.
Floor and ceiling effects were estimated for each subscale and for the overall score. The number and percentage of patients who obtained the lowest or highest possible score for each subscale and overall domain were recorded. Floor or ceiling effects were considered to be present if more than 15% of the responders achieved the lowest or highest possible score respectively [19].
Reliability of the questionnaire was assessed using Cronbach's α coefficient for internal consistency and the test-retest method for repeatability. Cronbach's α was estimated for all subscales and overall domains. It was expected to be higher than 0.70 [20]. Test-retest reliability was assessed from the data of the Cured group with estimation of the interclass correlation coefficient (ICC) between assessments T 1 and T 2 [21].
Convergent and discriminant validities were evaluated using multitrait scaling analysis [18] conducted separately for the eight subscales and for the four domains. The convergent validity of each item was assessed using Spearman's correlation coefficient between each item and its own subscale score, computed without including the corresponding item. The convergent validity was considered satisfactory if the correlation coefficient was higher than 0.40, in absolute value. For the discriminant validity, the correlation between each item and its own scale score was expected to be greater than the correlation between that item and the other scale scores. Similar analyses were conducted for each domain.
EFA was performed to assess the dimensionality of the questionnaire with orthogonal rotation of the factors and imposing four factors. All items were integrated into the EFA except the last item assessing overall satisfaction. EFA was performed on patients who had completed all items at the first measurement timepoint. The variance explained by the four factors was reported and the resulting factors were interpreted [22].
Criterion validity was assessed using a correlation matrix between the EPIC domain summary scores and the QLQ-PR25 scores. The correlation between the EPIC and QLQ-PR25 scores assessing the same HRQoL domain was expected to be higher than 0.4, in absolute value. Conversely, the correlation between each EPIC domain summary score and other scales of the QLQ-PR25 assessing other HRQoL domains was expected to be lower.
Sensitivity to change was assessed among patients from the Treatment group by comparing, using a paired T-test, the change in scores between T2 and T1 according to the presence or absence of at least one toxicity rated grade 2 or higher during treatment. We assumed that patients experiencing at least one toxicity would experience a greater deterioration in their HRQoL level than patients without toxicity. Mean change between T2 and T1, with the SD of the change between the two measures was reported. The Standardized Response Mean (SRM) was also estimated, corresponding to the mean change divided by the SD of change. An absolute SRM value less than 0.2 was considered as a "small" change, between 0.2 and 0.5 as "moderate", and greater than 0.5 as "large" [23]. Each domain and subscale was analysed without any specific hypothesis as to whether the effect of toxicity on each scale was similar. The median (range) time between the two assessments was also reported. We expected a clinically significant change of at least 10 points in all the subscales and summary scores for patients who experienced toxicities during treatment.
All analyses were performed using SAS version 9.4 (SAS Institute Inc., Cary, NC, USA). All tests were twosided and the type I error was set at 0.05. No adjustment for multiple testing was performed.

Step1: Face validity
Forty-six patients were included in step 1. None of them considered the questionnaire to be too long. Five patients (11.6%) reported that they found at least one question disturbing. These patients mainly reported that the disturbing questions referred to symptoms that they did not experience, so they had some difficulty judging the impact of the given symptom on their HRQoL level, for example. Two patients mentioned that they found some questions to be disturbing, explaining that it was too private (questions 17 and 21 about sexuality). Four patients (8.7%) needed help completing the questionnaire. Twenty-eight patients (62.2%) considered that the questionnaire addressed relevant issues. Forty-one patients (91.1%) considered that the questionnaire might concern other men. Finally, 38 patients (84.4%) declared that the questionnaire enabled them to deal more easily with problems or difficulties related to their disease. The mean time required to complete the questionnaire was less than 20 min (mean 19.8, SD = 9.3). In light of these results, the French version of the EPIC questionnaire tested in step 1 was maintained as is for step 2.
Step2: Quantitative validation of psychometric properties Study population In step 2, 215 patients were included between January 2012 and June 2014: 125 patients in the Treatment group and 90 patients in the Cured group. The mean age was 68 years (SD = 6.6). The baseline sociodemographic and clinical characteristics of the study population are summarized in Table 1.

Validation of the psychometric properties
Acceptability Nineteen patients (8.8%) did not fill out the baseline EPIC questionnaire: 11 patients (8.8%) in the Treatment group and 8 patients (8.8%) in the Cured group. Nine of these patients (47.4%) reported that they forgot to return the questionnaire. Eighty-one patients (37.7%) fully completed the baseline questionnaire.
Floor and ceiling effects Mean (SD) scores for all patients at baseline, as well as the percentage of lowest or highest possible scores to quantify floor and ceiling effects respectively, for each subscale and for each summary score are given in Table 2. A ceiling effect (i.e., more than 15% of patients having the highest possible score) was observed for the bowel (15.35%) and hormonal (20.93%) summary scores, and for the main subscales, except for the sexual subscales, bother, and the irritation/obstruction subscales of the urinary domain. No floor effect was observed.
Reliability Regarding internal consistency, Cronbach's alpha was ≥ 0.70 for each subscale except for urinary function, bowel function and hormonal symptoms, where Cronbach's alpha was 0.63, 0.53 and 0.63 respectively ( Table 2). Regarding the summary scores, Cronbach's alpha coefficient was <0.70 for the bowel domain only (0.61). Table 3 displays the mean (SD) scores for each summary score and subscale for patients in the Cured group at the two measurement times, as well as the ICC for test-retest reliability. An ICC ≥ 0.70 was observed for each subscale and summary score, except for bowel function, where the ICC was 0.68.
Construct validity Convergent validity was achieved for all items of the sexual function subscale, with a correlation between the sexual function score and summary score greater than 0.4 (Table 4). Regarding the items of the sexual bother subscale, the convergent validity was respected in terms of correlation with the sexual bother score, but not for the sexual domain summary score, with the correlation between each item and the summary score being less than 0.4. Convergent validity was also respected for all items of the urinary bother subscale, except for item q30, with a correlation of −0.28 with both the urinary bother subscale score and the urinary domain summary score. Likewise, convergent validity was also respected for all items of the bowel bother subscale, except for item q53, with a correlation of −0.38 with the bowel bother subscale score, and −0.31 with bowel domain summary score. For all other dimensions, the convergent validity was respected for only a minority of items, in particular hormonal function, where no item of this subscale respected the convergent validity with respect to its own subscale. Yet, most of items of the hormonal scale respected the convergent validity with the overall hormonal domain.
Discriminant validity was achieved for all items of both the urinary and sexual domains. Conversely,  discriminant validity was not respected for 5/7 items of the bowel function subscale, 1/7 item of the bowel bother subscale, 3/5 items of the hormonal function subscale and 2/6 items of the hormonal bother subscale. The result of the EFA is summarized in Table 5. Fifty-three percent of the variance was explained by the four factors. Items of the sexual function subscale were highly correlated with Factor 1, while those of the sexual bother subscale were correlated with Factor 2. Most of the items of the urinary domain were correlated with Factor 3 while those of the bowel domain were correlated with Factor 4. Finally, items of the hormonal domain were equally distributed over the first three factors.
Criterion validity Correlation between each EPIC summary score and the equivalent scale of the QLQ-PR25 was greater than 0.4, highlighting good criterion validity ( Table 6). In particular, the correlation was highest between the urinary symptoms scale of the QLQ-PR25 module and the urinary summary score of the EPIC questionnaire, at −0.83. The urinary, sexual and hormonal summary scores of the EPIC were also highly correlated with the incontinence aid dimension of the QLQ-PR25 (at −0.66, −0.55 and −0.57, respectively).

Sensitivity to change
The median time between the two assessments was 7.8 weeks (range 3.2-12.8). Among the patients of the Treatment group, 55 (44%) experienced at least one toxicity rated grade 2 or higher during treatment. For all subscales and summary scores, these patients experienced a decrease of mean HRQoL at T2 as compared to T1, with a clinically significant difference (10-point MCID) observed for 7/10 subscales and 2/4 summary scores ( Table 7). As compared to patients without toxicity during treatment, patients who experienced toxicities presented a greater decrease in the sexual summary score (mean difference = −30.59 vs.

Discussion
This paper reports the cross-cultural adaptation and validation of the French version of the EPIC HRQoL questionnaire specific to prostate cancer, using classical test theory. The qualitative step overall exhibited good acceptability and understanding of the questionnaire. Few patients reported that they found the questions to be disturbing, and the comments provided showed that this was not due to translation problems, but rather to the construction of the questionnaire itself, since some symptoms were not experienced by the patients and therefore, they found it difficult to judge its impact, while some other items were perceived to be too private. The number of patients who refused to participate in this study could also be informative for acceptability, but unfortunately, it was not possible to collect this data in each center. Ceiling effects were found for bowel and urinary summary scores, although the percentage of patients with the highest possible score was close to the threshold of 15% for the urinary domain. A high ceiling effect was observed for the function and incontinence subscales of the urinary domain. Conversely, no floor effect was observed for any domain or subscale. Similar findings were observed in the original English version of the EPIC questionnaire [12].
Good internal consistency was observed for all summary scores, except for the bowel domain, where the Cronbach's alpha was 0.61. The repeatability was also demonstrated for all domains and subscales, expect for bowel function (ICC = 0.68). Thus, the reliability of the questionnaire was globally good for all scales, except for the bowel dimension. In comparison, good internal consistency and repeatability were observed for each subscale, except for hormonal function (Cronbach's alpha coefficient 0.51), and for all summary scores in the original English version [12].
We also observed high criterion validity, with a high correlation between EPIC domains and the equivalent scales of the QLQ-PR25. Wei et al. [12] compared the English version of the EPIC questionnaire to both the Functional Assessment of Cancer Therapy-Prostate (FACT-P) module [24] and to the American Urological Association Symptom Index (AUA-SI) [25,26]. For both these prostate-cancer-specific questionnaires, only one overall score is generated. The QLQ-PR25 presents the advantage, as with the EPIC, of generating scores for each HRQoL domain, and may thus be considered to be more precise.
The sensitivity to change assessed in this study highlighted a clinically and statistically significant change on 5 of the 8 subscales, and 2 of the 4 summary scores in patients who experienced toxicity during treatment. Compared to patients who did not suffer toxicity during treatment, those with toxicities had a greater decrease in the sexual summary score, and in the urinary and sexual   Values in bold for item own-scale correlation and item own-domain correlation correspond to items respecting convergent validity, i.e., correlation >0.4 in absolute value. Values in bold for correlation between an item and other scale scores correspond to items that did not respect discriminant validity, i.e., correlation between the item and the other scale scores was greater than that between the item and its own-scale score subscales. Wei et al. did not assess the sensitivity to change of the original English version of the EPIC, therefore precluding comparison with our results [12].
Regarding the construct validity, multitrait analysis and EFA showed complementary results. Only the urinary domain and sexual function subscale reached the goal of good construct validity. Most of the dimensions showed good discriminant validity, except for bowel and hormonal functions. The bowel bother subscale also exhibited good convergent validity. The sexual function and bother subscales seemed to capture complementary results, since a bidimensional component was highlighted by the EFA. In fact, these subscales were highly correlated with two separate factors in the EFA. Moreover, items of the sexual bother subscale were poorly correlated to the sexual summary scores, as shown by the multitrait analysis. The hormonal domain presented poor construct validity overall, since neither convergent nor discriminant validity was respected, and correlation was observed with three of the four factors of the EFA. These poor results for the hormonal domain could be partially explained by the low number of patients treated by hormone therapy (only 45 patients (20.9%)). The construct validity was not reported for the original English version of the EPIC questionnaire, rendering comparison impossible. These poor results could also be partially explained by the low number of patients with complete data included in the EFA analysis (only 111 patients), whereas at least 150 patients are required to ensure the robustness of these analyses [18]. Indeed, the main limitation of this study is the low number of patients involved in step 2. Only 215 patients were actually enrolled in step 2, whereas a sample size of 300 patients was initially expected. Nonetheless, this sample size may be sufficient for most of the statistical analyses, even though the observed statistical power was lower than expected. The low sample size stems from the fact that the inclusion period was longer than planned, and the study had to be interrupted due to financial constraints.   One of the strengths of this study is the assessment of all psychometric properties, including both reproducibility and sensitivity to change, which are not always systematically assessed in the validation process of HRQoL questionnaires. In order to complement these results according to classical test theory, particularly the construct validity, a modern psychometric approach using item response theory is essential [27], such as the use of Rasch-family models [28,29]. These analyses are planned and will be fully reported in a separate paper. Moreover, these analyses present the advantage of being robust in the presence of missing data, and thus, all patients with at least one item completed can contribute to the analyses [30]. These further analyses should make it possible to validate or rule out the trend observed on EFA analysis.
It would also be interesting to study the impact of other demographic and clinical data on the results of the EPIC validation, including the impact of age, relationship status or treatment modality and race. However, the sample size was not sufficient to ensure adequate variability to enable these additional analyses. We also plan to study the occurrence of differential item functioning using item response theory analysis. While the impact of race could be of interest, in France, as in several other European countries, current legislation does not allow data on racial origin to be collected routinely in studies, without ample justification of the importance of this variable.
Another strength of this study is that both patients with ongoing treatment and cured patients were included in the validation process. This enables validation of the EPIC questionnaire for future epidemiological studies performed late after diagnosis.
This adaptation and validation of the French version of the EPIC questionnaire was essential in order to enable the use this tool in future French-language studies. This validation was also mandatory to enable the second part of the QALIPRO study, which aims to investigate the long-term side effects of prostate cancer treatment. Moreover, although this questionnaire was initially developed for prostate cancer patients, other elderly patients without prostate cancer could be concerned by some of the issues mentioned in this instrument. In fact, in the debriefing questionnaire in step 1, forty-one patients (91.1%) considered that the questionnaire could concern other men with no prostate cancer.

Conclusions
In conclusion, the French version of the EPIC questionnaire showed good psychometric properties in patients with prostate cancer, similar to those of the original English version. An item response theory analysis will complete these results.