Utility of the PHQ-9 to identify major depressive disorder in adult patients in Spanish primary care centres

Background The prevalence of major depressive disorder (MDD) in Spanish primary care (PC) centres is high. However, MDD is frequently underdiagnosed and consequently only some patients receive the appropriate treatment. The present study aims to determine the utility of the Patient Health Questionnaire-9 (PHQ-9) to identify MDD in a subset of PC patients participating in the large PsicAP study. Methods A total of 178 patients completed the full PHQ test, including the depression module (PHQ-9). Also, a Spanish version of the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I) was implemented by clinical psychologists that were blinded to the PHQ-9 results. We evaluated the psychometric properties of the PHQ-9 as a screening tool as compared to the SCID-I as a reference standard. Results The psychometric properties of the PHQ-9 for a cut-off value of 10 points were as follows: sensitivity, 0.95; specificity, 0.67. Using a cut-off of 12 points, the values were: sensitivity, 0.84; specificity, 0.78. Finally, using the diagnostic algorithm for depression (DSM-IV criteria), the sensitivity was 0.88 and the specificity 0.80. Conclusions As a screening instrument, the PHQ-9 performed better with a cut-off value of 12 versus the standard cut-off of 10. However, the best psychometric properties were obtained with the DSM-IV diagnostic algorithm for depression. These findings indicate that the PHQ-9 is a highly satisfactory tool that can be used for screening MDD in the PC setting. Trial registration Current Controlled Trials ISRCTN58437086. Registered 20 May 2013.


Background
Major depressive disorder in Spanish primary care The vast majority of mental disorders in Spain are diagnosed in primary care (PC), which serves as a gateway to treatment and to the entire public health system [1]. In this context, emotional disorders are often misdiagnosed, with rates of up to 78% for depression, 71% for generalized anxiety disorder (GAD), and 86% for panic disorder [2]. Moreover, even among patients who are correctly diagnosed, only 35.8% of those with depression and 30.7% of those with any anxiety disorder receive adequate treatment [3] (i.e., most patients receive primarily pharmacological treatment, which is not recommended in clinical practice guidelines [4]). These mental disorders impose an important economic and societal burden on European countries, including Spain [5,6].
Major depressive disorder (MDD) is highly prevalent in Spanish PC centres, with 9.6% of attendees suffering from this disorder each year [7], although this figure is lower than the mean prevalence rate (19%) in European countries [8]. Nevertheless, due to the absence of systematic screening tests, general practitioners (GPs) only recognize about 60% of cases of MDD [3], partly because this condition is frequently comorbid with other physical, somatic, and/or psychological problems such as anxiety disorders or alcohol abuse [9]. Based on international guideline recommendations (such as the NICE) to manage depression, it is clear that improved assessment methods (for both screening and diagnosis) are needed to improve MDD identification in order to refer these individuals to the appropriate therapeutic intervention [10]. For this reason, screening tests are very helpful to obtain a quick, initial identification of a possible case of MDD; however, such tools are not sufficiently reliable to be used as the sole detection instrument [10,11]. Thus, clinical interviews are required as a second step to confirm diagnoses. The use of these screening tools followed by clinical interviews should increase the efficiency of PC centres and improve overall public health outcomes for MDD.
One screening test that could be used in PC centres to identify MDD is the PHQ-9 [12]. This self-report instrument is derived from the Primary Care Evaluation of Mental Disorders (PRIME-MD), which was originally developed to identify five mental disorders: depression, anxiety, alcohol abuse, somatoform disorder, and eating disorder. A systematic review of 16 studies that were carried out to identify depression [13] concluded that although there are many valid tools, the PHQ-9 is equal or superior to other instruments. In this context, given that the operating characteristics of these instruments are similar, selection of the optimal tool to identify MDD should depend on its feasibility, administration and scoring times, and the capability of the instrument to serve additional purposes, such as monitoring depression severity or response to therapy. Indeed, several meta-analyses recommended the PHQ-9 to identify depression in the PC setting because, it can be administered easily, quickly, and in a wide range of clinical contexts [14,15]. For instance, Gilbody et al. [14] analysed 17 validation studies (> 5000 participants), concluding that the PHQ-9 has good psychometric properties (sensitivity 0.80, specificity 0.92) using either the ≥10 cut-off score or the "diagnostic algorithm" method. Manea, Gilbody and Mcmillan [15] analysed a total of 18 studies (7180 patients, 927 with MDD confirmed by diagnostic interviews), concluding that the PHQ-9 shows acceptable psychometric properties for MDD. In that study, using the widely-recommended cut-off score of 10, sensitivity was 0.85 and specificity 0.89, with no substantial differences in pooled sensitivity and specificity for cut-off scores ranging from 8 to 11.
The PHQ-9 items closely follow the nine criteria specified in the DSM-IV diagnostic manual (the core criteria for MDD have not changed in the DSM-5). Patients use Likert scales to rate the presence of symptoms in the prior two weeks. Depending on frequency ("not at all", "several days", "more than half of the days", and "almost every day"), the nine items are scored from 0 to 3 points (total severity scores range from 0 to 27 points). Total scores of 10-14 points, 15-19 points, and 20-27 points indicate, respectively, moderate, moderately severe, and severe levels of depressive symptoms. When the PHQ-9 is used as a screening test, the most widely recommended cut-off value is 10, as previous research has demonstrated that this cut-off value provides the best combination of sensitivity (0.88) and specificity (0.88) [12]. The PHQ-9 has also been proposed for use as a diagnostic tool using a specific coding algorithm based on the DSM-IV criteria for MDD in which MDD is diagnosed if at least one of the two first symptoms (items) is rated with a 2 (more than half of the days) or a 3 (most days) and four of the remaining items are also rated with a score of 2 or 3 (with the exception of item 9 [suicide], in which a rating of 1 is sufficient). However, the general consensus is that the PHQ-9 can be used as a screening test but not as a diagnostic test [12][13][14][15].
The construct validity of the PHQ-9 has been demonstrated in PC patients in many countries, including Spain [16], Brazil [17], China [18], East-Africa [19], Holland [20], South-Africa [21], the US [22] and others. These studies indicate that the PHQ has a high convergent validity with other depression measures. However, questions have been raised with regard to the optimal cut-off scores for screening to obtain the most accurate results on the PHQ-9. For example, a meta-analysis [12] suggested that the PHQ-9 presented good screening properties with both the ≥10 cut-off and the "diagnostic algorithm" method, but that the cut-off point may be increased to ≥11 or ≥12 to obtain optimum specificity in some community-based studies. In a recent review, Kroenke et al. [23] argued against using an inflexible adherence to a single cut-off score; rather, those authors argue that the cut-off should be adjusted to the target population. Manea et al. [15] found no significant differences in sensitivity or specificity between a cut-off score of 10 and other cut-off scores (ranging from 8 to 11), but suggested that a cut-off of 11 may represent the best trade-off between sensitivity and specificity. Although the optimal cut-off point is controversial and may depend on the target population, the PHQ-9 presents a reasonably good sensitivity and specificity when used as a screening tool, regardless of the precise cut-off point. By contrast, in studies conducted to assess the validity of the "diagnostic algorithm", results have been more ambiguous. A recent meta-analyses performed to assess 27 validation studies of the PHQ-9 algorithm scoring method in various settings concluded that-in most cases-sensitivity was low but specificity was good [24]. Similarly, Mitchell et al. [25] conducted a meta-analysis of 26 publications reporting on 40 individual studies (n = 26,902 patients), finding that the best estimates of sensitivity and specificity for the PHQ-9 algorithm were 0.57 and 0.93, respectively. So, the PHQ-9 can be used as a screening test using different cut-off scores but the psychometric properties of the "diagnostic algorithm" were not as good.
Few studies have evaluated the Spanish version of the PHQ-9. The first study by Diez-Quevedo et al. [26] was conducted to validate the Spanish version of the whole PHQ (including the 9 items for depression) in an inpatient setting, finding that this 9-item part of the PHQ-9 yielded satisfactory sensitivity (0.84) and excellent specificity (0.92) for MDD compared to the gold standard at that time (i.e., the Structured Clinical Interview for DSM-III-R). However, the profile of patients in PC centres is likely to differ substantially from those treated in a psychiatric inpatient setting. A Spanish version of the PHQ-9 has also been evaluated for use in PC centres in Honduras, with all of the linguistic and cultural differences implied by that setting [27]. However, only one study has focused on a Spanish version of the PHQ-9 for Spain [16]. In that study, although the sample was obtained from Spanish PC centres, the PHQ-9 was administered by telephone, and thus reported internal consistency of the PHQ-9 applies only to telephone administration. Consequently, little is known about how the PHQ9 performs in Spanish PC centres, nor do we know the optimal cut-off criteria that would be most appropriate in this context in Spain.

Objectives
The aim of the present study was to assess the utility of the PHQ-9 as a screening test to identify MDD in patients at Spanish PC centres. We performed psychometric analyses to identify the sensitivity and specificity of the PHQ-9 total score to obtain the optimal cut-off value based on diagnoses obtained with the standardized clinical interview (Structured Clinical Interview for DSM-IV Axis I Disorders; SCID-I). Additionally, we tested sensitivity and specificity of the "diagnostic algorithm".

Setting
The study was conducted from January to December 2014 at five PC centres participating in the larger PsicAP study [28], a clinical trial designed to evaluate the diagnosis and treatment of emotional disorders among PC patients in Spain. The centres are located in several cities in Spain (two in Valencia, and one each in Albacete, Vizcaya, and Mallorca).

PHQ-9
The PHQ-9 [12] is part of the PHQ and consists of nine items to assess for the presence of the nine diagnostic criteria for major depression according to DSM-IV. The PHQ-9 evaluates the presence of the following symptoms over the previous two-week period: (a) depressed mood; (b) anhedonia; (c) sleep problems; (d) feelings of tiredness; (e) changes in appetite or weight; (f ) feelings of guilt or worthlessness; (g) difficulty concentrating; (h) feelings of sluggishness or worry; (i) suicidal ideation. Items are answered on a four-point Likert scale from 0 to 3 as follows: 0 (never), 1 (several days), 2 (more than half of the days), and 3 (most days). Internal consistency was satisfactory in the current sample (McDonald's ω = .89) and all item-test correlations were >.40. A public version of the PHQ-9, written in Spanish for use in Spain, provided by the authors of the PHQ was used in this study.

Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I)
The Spanish Version of this semi-structured interview [31] was conducted by clinical psychologists (7 in total) who had received intensive training by an expert clinical psychologist (see Cano-Vindel et al. [28] for more details). The interview sessions were supervised by the same clinical psychologist for the duration of the study. Patients were diagnosed with MDD when they presented at least five of the DSM-IV criteria during the last two weeks: that is ≥ one of the first two symptoms and ≥ four of the remaining symptoms.

Procedure
Patients with anxious, depressive or physical symptoms without a clear biological basis were asked by the GPs to participate in the PsicAP clinical trial (see Cano-Vindel et al. [28]). They received the Patient Information Sheet and provided informed consent. Next, an individual meeting was arranged to review the study details with the participants and to complete the PHQ and the other tests. Computerized versions of these tests were used in most cases. Patients with impaired vision received help in completing the questionnaires. Paper versions of the measures were provided to patients with difficulties using the computer. After completing the PHQ-9, participants were asked to participate in the SCID-I interview, which was then scheduled within a maximum of 2 weeks from completion of the PHQ-9. Prior to administration of the SCID-I, all participants received a Patient Information Sheet of this sub-study and signed an informed consent form. All clinical psychologists conducting the interviews were blinded to the results of the PHQ-9.
This study was approved by the Corporate Clinical Research Ethics Committee of Primary Care of Valencia (CEIC-APCV) (as the national research ethics committee coordinator) and the Spanish Medicines and Health Products Agency (AEMPS) (N EUDRACT: 2013-001955-11 and Protocol Code: ISRCTN58437086).

Data analysis
A receiver operating characteristic (ROC) curve analysis was performed using data from the 178 patients that completed the PHQ-9 and were interviewed with the SCID-I; this statistical analysis was performed using the pROC package [32] for the statistical programming environment R [33]. The following ratios were calculated: sensitivity, specificity, positive and negative predictive values, and positive and negative likelihood ratios. To evaluate the test's screening properties, we used the sum scores of the PHQ-9 and the "diagnostic algorithm". The optimal cut-off value to balance sensitivity and specificity was defined as the value corresponding to the maximum value of the Youden's index, calculated as (sensitivity + specificity -1) [34].

Study sample
All patients between 18 and 65 years (inclusive) who presented at one of these five PC centres for somatic or psychological complaints during the study inclusion period were invited to participate (n = 298). Of these, 260 participants (186 females) completed the PHQ and 178 (125 females) were interviewed using the SCID-I. In terms of socio-demographic variables, no differences were observed between the whole sample and the subset of participants who completed the SCID-I interview (as indicated by t-tests or chi-squared tests, depending on variable type; all p ≥ .35). The Vizcaya centre, however, had a slightly higher dropout rate. Table 1 shows the socio-demographic variables and data on prescription medications taken by the patients.

SCID-I-based prevalence
Of the 260 patients included in our study, 178 completed the clinical interviews with the SCID-I. The prevalence of MDD seen in our PC population was high: 129 of 178 patients (72.5%) met the criteria for MDD on the SCID-I, while 49 patients (27.5%) did not fulfil these criteria.

PHQ-based prevalence
Of the 260 patients who completed the PHQ, 141 (54%) met the criteria for somatization disorder (SD; (PHQ-15 ≥ 5), 68% for MDD (n = 178) according to the DSM-IV "diagnostic algorithm" or 78% PHQ-9 for scores ≥10 (n = 203) and 69% for GAD (GAD-7 ≥ 10; n = 180). 110 participants (42%) met the criteria for panic disorder according to the modified algorithm of the PHQ-PD and 22% (n = 57) met panic disorder criteria according to the original algorithm of the PHQ-PD. Finally, 17% (n = 45) met criteria for eating disorder) and 14% (n = 38) for alcohol-related disorder. As expected, comorbidity between disorders was high, particularly for comorbid MDD and GAD (n = 150; 57%), SD and MDD (n = 115; 44%), and GAD and SD (n = 117; 45%). Overall, 40% of the participants with MDD presented comorbidity with either GAD or SD (n = 104). We found no differences between the total sample of PHQ-9 respondents (n = 260) and the subsample of PHQ-9 and SCID-I respondents (n = 178) in terms of the proportion of participants that met criteria for one or more of the aforementioned disorders, nor with regard to comorbidities (all p > .61). See Table 2 for details.
Operating characteristics of the PHQ-9 using different cut-off scores The ROC curve analysis showed that the PHQ-9 had an area under the curve of 0.89 (Fig. 1). The most widely used cut-off value for correctly identifying cases with MMD is ≥10. In our study, of the patients diagnosed with MDD according to SCID-I, 95% had scores >10 on the PHQ-9 while 67% of patients without a SCID-I diagnosis of MDD scored below the cut-off level (< 10). As a result, the PHQ-9 had a sensitivity of 0.95, a specificity of 0.67, positive and negative predictive values of 0.88 and 0.83, respectively, and positive and negative likelihood ratios, respectively, of 2.90 and 0.08. Increasing the PHQ-9 cut-off point to 12 yielded the following values: sensitivity, 0.84; specificity, 0.78; positive and negative predictive values of 0.91 and 0.66, respectively; and positive and negative likelihood ratios of 3.76 and 0.20, respectively. Most (84%) depressed patients (SCID-I diagnosis) had scores of 12 or higher, whereas 78% of patients without a depression diagnosis scored below the cut-off point. Moreover, according to the Youden's index, which offers the optimal cut-off value balancing sensitivity and specificity (sensitivity + specificity -1), the most appropriate cut-off value was 14 (J = 0.66), whereas these values were lower when other cut-off scores were used, as follows: 10 (J = 0.62), 11 (J = 0.63), 12 (J = 0.62). With a cut-off score of 14, the PHQ-9 showed the following psychometric properties: sensitivity, 0.78; specificity, 0.88; positive and negative predictive values, 0.94 and 0.60, respectively; and positive and negative likelihood ratios, 6.33 and 0.26, respectively (Table 3 shows other possible cut-off points and confidence intervals).   Operating characteristics of the PHQ-9 using the "diagnostic algorithm" Of the patients with a SCID-I diagnosis of MDD, 88% were also identified as having major depression according to the PHQ-9 "diagnostic algorithm". By contrast, 80% of non-depressed patients (SCID-I) did not reach the diagnostic cut-off point. Based on these data, the PHQ-9 presented a sensitivity of 0.88, a specificity of 0.80, positive and negative predictive values, respectively, of 0.92 and 0.72, and positive and negative likelihood ratios of 4.33 and 0.15, respectively. The highest value for the Youden's index (J = 0.68) was obtained for the PHQ-9 "diagnostic algorithm". (Table 3 provides mores details, including confidence intervals and alternative cut-off points).

Discussion
In this study, we assessed the utility of the PHQ-9 as a screening tool to identify MDD in users of Spanish PC services. The main appeal of the PHQ-9 is that it is an easy to administer and inexpensive self-report measure. Our main finding is that the PHQ-9 is of value in identifying MDD in patients at Spanish PC centres, but our findings suggest that a higher cut-off value (12 or more) or the "diagnostic algorithm" might be better than the standard 10-point cut-off value in order to improve specificity in this patient population.
Our results show that the PHQ-9 is a sensitive screening instrument for MDD, and in most cases it correctly identified individuals with MDD when the most common cut-off point (10 points) was used [12,13,35]. Unexpectedly, the specificity of the PHQ-9 in our study was much lower than reported in previous studies, suggesting more false positive diagnoses of MDD. Increasing the cut-off point to 12 resulted in a slight decrease in sensitivity but specificity improved to a more satisfactory value, yielding a more acceptable trade-off. At the 12-point cut-off, the positive predictive value increased while the negative predictive value decreased. According to the Youden's index, the most appropriate cut-off score was 14 (J = 0.66) compared to a cut-off score of 10 (J = 0.62), 11 (J = 0.63), 12 (J = 0.62). Using a cut-off point of 14, the sensitivity was 0.78 and the specificity 0.88. To reduce false negatives, an important characteristic of a good screening tool is a high sensitivity. For this reason, we suggest a cut-off score of 12 in the context of Spanish PC centres due to the high sensitivity (0.84) achieved with this cut-off level. However, the optimal cut-off in other populations may vary and other authors have recommended adjusting the cut-off point to suit the target population [13,14]. Given that sensitivity is vital in the PC setting, we believe that a moderate specificity (found in the cut-off score of 10) is acceptable. Thus, rather than strictly following the Youden's  index, we believe that our recommendations are more appropriate for clinicians in this setting. Using the original DSM-IV algorithm to identify MDD, the results of the PHQ-9 were satisfactory, with a very high sensitivity (0.88) and good specificity (0.80). Consequently, the positive predictive value was quite high, the negative predictive value was acceptable, and the positive and negative likelihood ratios were, therefore, also good. Moreover, the Youden's index showed the best index value (J = 0.68) when using the "diagnostic algorithm" compared to other cut-off scores. Overall, these results indicate that, from a psychometric perspective, the DSM-IV "diagnostic algorithm" is superior to most common cut-off scores of 10 or the other suggested values (ranging from 12 to 14 points), with an excellent ability to correctly differentiate between depressed and non-depressed individuals. Furthermore, the satisfactory positive and negative predictive values of the PHQ-9 show that the test is excellent for ruling out non-MDD cases but can also adequately confirm MDD. These findings are also consistent with the Spanish validation study [26], which also found high sensitivity and specificity under these conditions, as the "diagnostic algorithm" was used in the depression section. Based on these findings, we believe the DSM-IV algorithm should be used with the PHQ-9. In contrast to some previous research [24,25,36], these results suggest that the PHQ-9 can be used as a screening test when the DSM-IV "diagnostic algorithm" is used. That said, it is important to stress that the "diagnostic algorithm" used for screening purposes should not be confused with a diagnosis of MDD. We agree with Mitchell et al. [25] that the PHQ-9 should not be used as the only source of information to confirm a clinical diagnosis. Thus, the "diagnostic algorithm" for the PHQ-9 may serve as a useful screening method to quickly and efficiently identify MDD or other depressive symptoms in the PC setting. However, patients with suspected MDD should be referred for a clinical interview performed by an experienced clinician to confirm the diagnosis and to determine secondary causes.
This study presents some limitations that may have contributed to the discrepant results compared with other studies. To start with, patient recruitment required a referral by the GP, who informed patients about this clinical trial involving psychological treatment. This recruitment approach likely resulted in some degree of selection bias, which may have partially affected our results. This influence may have been negative because it seems probable that the low specificity of the PHQ-9 observed in our sample using a 10-point cut-off value may be attributable to some participants exaggerating their symptoms on the questionnaire to ensure eligibility for treatment. This hypothesis is supported by the fact that many patients with scores >20 (indicative of severe depression) were diagnosed as only mildly depressed on the SCID-I interview. Additionally, in previous studies, patients scoring >20 on that test did not present severe MDD [37]. In fact, based on those findings, Zimmerman et al. [37] called for caution in using the PHQ-9 to guide treatment selection until the thresholds to define severity ranges have been empirically established. Importantly, based on these findings, we have since modified the protocol of the PsicAP study [28] to prevent misuse: patients with PHQ-9 scores above 20 are automatically interviewed with the SCID-I to confirm the severity of their depression. Another limitation is that many patients that participated in our study presented symptoms of other emotional disorders, such as anxiety, somatizations, and mood disorders. Given that anxiety and depression share common features [38], this may explain the high rates of comorbidity. Thus, it is possible that patients suffering from anxiety or somatizations may have depressive symptoms that did not meet DSM-IV criteria for MDD on the SCID-I. In turn, this would have affected specificity estimates in our data. In fact, it is possible that the "diagnostic algorithm" performed better than other cut-off values because it is better adapted to these circumstances that are typically observed in the applied clinical setting. Therefore, the PHQ-9 may have some ecological validity for PC settings, where comorbidity is high and resources and available time are scarce. However, more studies are needed in Spanish PC centres to replicate these results and to identify possible boundary conditions. Additionally, given that the DSM-5 and DSM-IV use the same algorithm to diagnose MDD, a fertile area for future research would be to investigate the relationship between the PHQ and the restructured broader diagnoses of DSM-5 affective disorders.

Conclusions
This is the first study to assess the PHQ-9 to obtain the optimal cut-off values for screening patients with MDD in the PC setting in Spain. The findings presented in this study indicate that the PHQ-9 is a valuable tool to help to identify suspected cases of MDD among patients treated at Spanish PC centres. Based on our results, in this population we recommend using a cut-off value of 12 or the DSM-IV "diagnostic algorithm" instead of the most common cut-off value of 10. Patients identified by the PHQ-9 screening tool with suspected MDD must be referred to specialised clinicians to confirm the diagnosis with other diagnostic measures and/or clinical interviews.

Availability of data and materials
The study data are only available upon request. The name(s) of the contact person(s) to request data are available upon request to all interested researchers. Legal and ethical restrictions make data available upon request and are in accordance with the nature of the data collection. Data are available from the promoter (Spain) for researchers who meet the criteria for access to confidential data. Contact: Psicofundación (Spanish Foundation for the Promotion and Development of Scientific and Professional Psychology). Address: Calle Conde de Peñalver, 45, 5o izquierda, 28,006 Madrid, Spain.

Confidentiality
The study is conducted in accordance with the Spanish Data Security Law. All professionals participating in the study agreed to adhere to the Helsinki Declaration and to Spanish law. All health care professionals participating in the study are required to sign a form indicating their agreement to adhere to the above-mentioned declaration and Spanish law. The patient names and all other confidential information fall under medical confidentiality rules and are treated according to Spanish Data Security Law. The patient questionnaires are stored on a protected central server and saved in an encrypted database. The project complies with current guidelines in Spain and EU for patient protection in clinical trials with regards to the collection, storage and the keeping of personal data. Only direct members of the internal study team can access the data.
Authors' contributions RMN Acquired, analysed and interpreted data. Wrote the original draft and led the revision process of the manuscript to give final approval for publication. Agreed to be accountable for all aspects of the work. ACV Contributed to conception and design. Revised the original manuscript and contributed to fit the work to its previous design. Gave final approval for publication. Acquired funding and agreed to be accountable for all aspects of the work. LAM Analysed and interpreted data. Revised the manuscript and contributed in the methodology and analyses of the work. Gave final approval for publication. Agreed to be accountable for all aspects of the work. FS Analysed and interpreted data. Revised the manuscript. Gave final approval for publication. Agreed to be accountable for all aspects of the work. PRR Acquired data. Revised the manuscript. Gave final approval for publication. Agreed to be accountable for all aspects of the work. CAM Acquired data. Revised the manuscript. Gave final approval for publication. Agreed to be accountable for all aspects of the work. MAFP Acquired data. Revised the manuscript. Gave final approval for publication. Agreed to be accountable for all aspects of the work. AMHP -Acquired data. Revised the manuscript. Gave final approval for publication. Agreed to be accountable for all aspects of the work. All authors read and approved the final manuscript.

Ethics approval and consent to participate
The sample of this study comes of a multi-centre Randomized Clinical Trial with medication (N EUDRACT: 2013-001955-11 and Protocol Code: ISRCTN58437086) promoted by the Psicofundación (Spanish Foundation for the Promotion and Development of Scientific and Professional Psychology) and approved by the Corporate Clinical Research Ethics Committee of Primary Care of Valencia (CEIC-APCV) (as the national research ethics committee coordinator) and the Spanish Medicines and Health Products Agency. Approval was received by both agencies in November 2013, prior to study initiation in December 2013. Patient informed consent: Prior to study participation, all patients receive written and oral information in the Patient Information Sheet about the content and extent of the planned study. This includes information about the potential benefits and risks for their health. Patients who agree to participate are required to sign the informed consent form. In the case of patients who withdraw from the study, all data will be destroyed or the patient will be asked if he/she agrees to allow the use of existing data for analysis in the study. Patient participation in the study is completely voluntary and participants can withdraw at any time with no need to provide reasons and without negative consequences for their future medical care. The protocols used in this study pose no risk whatsoever to the participants. CBT is non-invasive at the cognitive level, except with regards to learning or teaching.

Consent for publication
Not applicable.