Systematic review and Meta-analysis on the Patient Health Questionnare-9 (PHQ-9) for depression screening in Chinese primary care patients

Background: Depression is a chronic illness with mounting burden to the society. The 9-item Patient Health Questionnare-9 (PHQ-9) is a commonly used screening instrument in primary care settings to help enhance the detection of depression. There is a need to evaluate the effectiveness of the PHQ-9 in Chinese primary care settings. Methods: A systematic review and meta-analysis was undertaken to evaluate PHQ-9 amongst Chinese primary care patients. MEDLINE, PubMed and PsycINFO databases, and the Cochrane Library were searched between 2006 and 2016 for any report of studies that evaluated performance of PHQ-9. Four screening studies (n = 8,403) met the inclusion criteria. Quality of studies were assessed according to the accepted guidelines. Meta-analyses were conducted using a random effects model. Results: The pooled estimate of area under the receiver operator characteristic curves for PHQ-9 reported from four studies was 0.885 (95% CI: 0.805 to 0.965; SE: 0.0408; P < 0.001), which indicated a good performance of the screening instrument. Heterogeneity between studies were observed for screening studies (I2 statistics: 96.83%, P < 0.0001). Meta-analyses were limited by the small number of studies. Conclusions: The PHQ-9 instrument is a sensitive screening tool highly predictive of depression among Chinese population in primary care setting. Further studies are however required to increase the power of meta-analysis. *Correspondence to: CHIU Chi Fai Billy, Family Medicine and Primary Care Centre, Hong Kong Sanatorium & Hospital, Hong Kong, Tel: +852 2572 0211; Email: hkuchiucf@gmail.com Received: April 13, 2018; Accepted: April 27, 2018; Published: April 30, 2018 Introduction The prevalence of depressive disorders has been estimated to be around 10% to 20% in primary care [1]. One in 10 patients are estimated to have mild-to-moderate depressive symptoms at the time of a primary-care consultation, of whom around one in four are identified as having depression by the doctor [2]. However, most of the epidemiological data to date has been retrieved from Western countries, whereas epidemiological data on mental illness in lowor middle-income countries have been lacking [3]. In view of the sociocultural differences between Western and Eastern societies, results from Western societies may have limited generalizability to Asian populations. This issue is also expected to be of increasing importance in China as an increasing trend of depression is observed in many Asian including Chinese populations. Depressive disorders are associated with significant morbidity, disability and healthcare utilization in primary-care settings [4,5]. In many countries, treatment for depression is mainly provided in primary-care settings [6,7], in line with the World Health Organization recommendation that common mental illnesses should be treated in primary care [8]. However, up to 50% of depressive disorders were unrecognized in a primary care setting [9,10]. Moreover, Chinese patients tend to underutilize mental health services. According to a local study in 10,179 adult patients from primary-care settings in Hong Kong, only 24.3% of 518 patients who screened positive for depression received services from healthcare professionals [11]. This indicates that depressive disorders are commonly undiagnosed and undertreated in primary-care settings, which may subsequently pose a significant medical burden on the community. The use of appropriate screening tool is essential to identify and manage patients with depressive disorders in primary care. The 9-item Patient Health Questionnare-9 (PHQ-9), which scores each of the 9 Diagnostic and Statistical Manual of Mental Disorders — Fourth Edition (DSM-IV) criteria on a scale of “0” (not at all) to “3” (nearly every day), has been well validated in primary-care settings in Western populations [12-15]. However, its performance in Chinese populations has not been well established. We therefore aimed to performed a systematic review to evaluate the diagnostic performance of the PHQ9 as a screening instrument for depression in Chinese primary care patients [16-21]. Methods Study selection Search strategy: Literature search was performed in the MEDLINE, PubMed and PsycINFO databases (between 2006 and 2016) and the total database of the Cochrane Library, using “primary care”, Billy CCF (2018) Systematic review and Meta-analysis on the Patient Health Questionnare-9 (PHQ-9) for depression screening in Chinese primary care patients Volume 1(1): 1-2 Fam Med Care, 2018 doi: 10.15761/FMC.1000105 “screening”, “depression” and “Chinese” as search terms to identify candidate articles for systematic review of the screening instrument. Inclusion and exclusion criteria Screening studies were included if the following criteria were met: use of PHQ-9 to identify depressive disorders, Chinese subjects recruited from primary-care settings, and outcome variables which included at least an area under the receiver operator characteristic (ROC) curve with either standard error or 95% confidence interval (CI) to demonstrate the performance of the screening tool. Studies published in a language other than English were not included due to the incapability of translation. Quality assessment of studies The studies identified in the search were assessed for inclusion based on their methodological quality. Four screening studies were finally selected for inclusion in the systematic review and meta-analysis. The selected screening studies were validation studies of which blinding is a potential source of bias [22], and therefore blinding of assessors and outcomes was the criteria for assessment of study quality [23]. Data extraction Following the preliminary search and quality assessment of selected studies, data were extracted from the full text of journal articles and captured in a spreadsheet, which included data fields to collect information on the year of study, type of study design, randomization method, study population, sample sizes, types of screening instrument, and study outcomes. Data analysis Screening studies that reported the ability of PHQ-9 to detect depressive disorders were included for meta-analysis. As ROC curves are representative of the inherent trade-offs between sensitivity and specificity for a screening instrument [24], we mainly performed meta-analysis of area under curve (AUC). To standardize deviation of distribution of the statistics, standard error (SE) for AUC was derived from 95% confidence interval (CI) according to the formula “SE = (upper limit – lower limit) / 3.92” for all included studies [16]. Metaanalysis of ROC curves was performed using MedCalc version 16.4.3, MedCalc Software, Ostend, Belgium [25]. All statistical tests were twosided and the level of significance was set at 5%. As only a small number of studies could be selected and pooled, funnel plot was not performed to assess for publication bias. Results Search results A total of 80 related articles were identified from PubMed/Medline database (N = 31), Cochrane Library (N=17) and PsycINFO database (N=32) according to the preset search criteria. Following the selection of studies in accordance with the inclusion and exclusion criteria, only five full-text articles with complete data were retrieved for further methodological evaluation [26,27]. Of the five eligible studies that investigated the accuracy of the PHQ9 instrument in Chinese populations [28-32], one study validated the Chinese/English bilingual version of PHQ-9 for depression screening among immigrant Chinese Americans in primary care [32]. Since this study focused on immigrants who attended a community health center in Boston, USA, and the patients there might be influenced by Western culture, the study might not be suitable for meta-analysis to evaluate the overall estimate of the AUC value for PHQ-9 in Chinese populations. Therefore, four studies with a total of 8,403 Chinese subjects were included for evaluation of the performance of the PHQ9 instrument for screening of depressive disorders in primary-care settings. The flowchart of study selection, from initial search results to final inclusion for meta-analysis, is shown in Figure 1. Characteristics of included studies Four included studies [28-32] mainly aimed to examine the reliability and validity of the screening instrument for depression in Chinese primary-care setting. Three studies were performed to determine the reliability and validity of PHQ-9 instrument while one by Chin, et al [28] focused on validating the use of Center for Epidemiologic Studies Depression Scale (CES-D) instrument for screening Chinese primary care patients. In the study by Chin et al, the PHQ-9 was used to assess the convergent validity of the screening instrument CES-D as they were both depression instruments measuring a similar construct. As the study also reported an outcome variable of AUC for PHQ-9 instrument, it was not excluded from the subsequent meta-analysis after the final in-depth review of the included studies. A summary of the included studies for meta-analysis of PHQ-9 instrument for screening of depression in Chinese primary-care setting is presented in Table 1 [33-37]. Summary of the results Preliminary appraisal of study quality indicated that all investigators of the selected studies were blinded to the screening score results and therefore all four studies were determined to be of good quality for inclusion in the meta-analysis. Since most of the studies presented an estimate with 95% CI, calculation of the SE was performed for all studies to standardize the measurement of outcome deviations. Pooled estimates of AUC were calculated to illustrate the overall estimates. Considering the heterogeneity between studies with I2 statistics of 96.83% (P < 0.0001), a random effect model was adopted. Figure 2 shows the pooled AUC estimates for PHQ-9 for depression screening. The pooled area under the ROC curves was 0.885 (95%


Introduction
The prevalence of depressive disorders has been estimated to be around 10% to 20% in primary care [1].One in 10 patients are estimated to have mild-to-moderate depressive symptoms at the time of a primary-care consultation, of whom around one in four are identified as having depression by the doctor [2].However, most of the epidemiological data to date has been retrieved from Western countries, whereas epidemiological data on mental illness in low-or middle-income countries have been lacking [3].In view of the sociocultural differences between Western and Eastern societies, results from Western societies may have limited generalizability to Asian populations.This issue is also expected to be of increasing importance in China as an increasing trend of depression is observed in many Asian including Chinese populations.
Depressive disorders are associated with significant morbidity, disability and healthcare utilization in primary-care settings [4,5].In many countries, treatment for depression is mainly provided in primary-care settings [6,7], in line with the World Health Organization recommendation that common mental illnesses should be treated in primary care [8].However, up to 50% of depressive disorders were unrecognized in a primary care setting [9,10].Moreover, Chinese patients tend to underutilize mental health services.According to a local study in 10,179 adult patients from primary-care settings in Hong Kong, only 24.3% of 518 patients who screened positive for depression received services from healthcare professionals [11].This indicates that depressive disorders are commonly undiagnosed and undertreated in primary-care settings, which may subsequently pose a significant medical burden on the community.
The use of appropriate screening tool is essential to identify and manage patients with depressive disorders in primary care.The 9-item Patient Health Questionnare-9 (PHQ-9), which scores each of the 9 Diagnostic and Statistical Manual of Mental Disorders -Fourth Edition (DSM-IV) criteria on a scale of "0" (not at all) to "3" (nearly every day), has been well validated in primary-care settings in Western populations [12][13][14][15].However, its performance in Chinese populations has not been well established.We therefore aimed to performed a systematic review to evaluate the diagnostic performance of the PHQ-9 as a screening instrument for depression in Chinese primary care patients [16][17][18][19][20][21].

Study selection
Search strategy: Literature search was performed in the MEDLINE, PubMed and PsycINFO databases (between 2006 and 2016) and the total database of the Cochrane Library, using "primary care", "screening", "depression" and "Chinese" as search terms to identify candidate articles for systematic review of the screening instrument.

Inclusion and exclusion criteria
Screening studies were included if the following criteria were met: use of PHQ-9 to identify depressive disorders, Chinese subjects recruited from primary-care settings, and outcome variables which included at least an area under the receiver operator characteristic (ROC) curve with either standard error or 95% confidence interval (CI) to demonstrate the performance of the screening tool.Studies published in a language other than English were not included due to the incapability of translation.

Quality assessment of studies
The studies identified in the search were assessed for inclusion based on their methodological quality.Four screening studies were finally selected for inclusion in the systematic review and meta-analysis.The selected screening studies were validation studies of which blinding is a potential source of bias [22], and therefore blinding of assessors and outcomes was the criteria for assessment of study quality [23].

Data extraction
Following the preliminary search and quality assessment of selected studies, data were extracted from the full text of journal articles and captured in a spreadsheet, which included data fields to collect information on the year of study, type of study design, randomization method, study population, sample sizes, types of screening instrument, and study outcomes.

Data analysis
Screening studies that reported the ability of PHQ-9 to detect depressive disorders were included for meta-analysis.As ROC curves are representative of the inherent trade-offs between sensitivity and specificity for a screening instrument [24], we mainly performed meta-analysis of area under curve (AUC).To standardize deviation of distribution of the statistics, standard error (SE) for AUC was derived from 95% confidence interval (CI) according to the formula "SE = (upper limit -lower limit) / 3.92" for all included studies [16].Metaanalysis of ROC curves was performed using MedCalc version 16.4.3,MedCalc Software, Ostend, Belgium [25].All statistical tests were twosided and the level of significance was set at 5%.As only a small number of studies could be selected and pooled, funnel plot was not performed to assess for publication bias.

Search results
A total of 80 related articles were identified from PubMed/Medline database (N = 31), Cochrane Library (N=17) and PsycINFO database (N=32) according to the preset search criteria.Following the selection of studies in accordance with the inclusion and exclusion criteria, only five full-text articles with complete data were retrieved for further methodological evaluation [26,27].
Of the five eligible studies that investigated the accuracy of the PHQ-9 instrument in Chinese populations [28][29][30][31][32], one study validated the Chinese/English bilingual version of PHQ-9 for depression screening among immigrant Chinese Americans in primary care [32].Since this study focused on immigrants who attended a community health center in Boston, USA, and the patients there might be influenced by Western culture, the study might not be suitable for meta-analysis to evaluate the overall estimate of the AUC value for PHQ-9 in Chinese populations.Therefore, four studies with a total of 8,403 Chinese subjects were included for evaluation of the performance of the PHQ-9 instrument for screening of depressive disorders in primary-care settings.The flowchart of study selection, from initial search results to final inclusion for meta-analysis, is shown in Figure 1.

Characteristics of included studies
Four included studies [28][29][30][31][32] mainly aimed to examine the reliability and validity of the screening instrument for depression in Chinese primary-care setting.Three studies were performed to determine the reliability and validity of PHQ-9 instrument while one by Chin, et al [28] focused on validating the use of Center for Epidemiologic Studies Depression Scale (CES-D) instrument for screening Chinese primary care patients.In the study by Chin et al, the PHQ-9 was used to assess the convergent validity of the screening instrument CES-D as they were both depression instruments measuring a similar construct.As the study also reported an outcome variable of AUC for PHQ-9 instrument, it was not excluded from the subsequent meta-analysis after the final in-depth review of the included studies.A summary of the included studies for meta-analysis of PHQ-9 instrument for screening of depression in Chinese primary-care setting is presented in Table 1 [33][34][35][36][37].

Summary of the results
Preliminary appraisal of study quality indicated that all investigators of the selected studies were blinded to the screening score results and therefore all four studies were determined to be of good quality for inclusion in the meta-analysis.Since most of the studies presented an estimate with 95% CI, calculation of the SE was performed for all studies to standardize the measurement of outcome deviations.Pooled estimates of AUC were calculated to illustrate the overall estimates.Considering the heterogeneity between studies with I 2 statistics of 96.83% (P < 0.0001), a random effect model was adopted.Figure 2 shows the pooled AUC estimates for PHQ-9 for depression screening.The pooled area under the ROC curves was 0.885 (95% CI: 0.805 to 0.965; SE: 0.0408; P < 0.001) (Table 2) [38].

Discussion
Depression is a chronic illness commonly affecting the quality of life of patients and posing a burden on the society globally [39].The noncommunicable disease has a significant impact in primary-care settings [4,5] where subthreshold depressive symptoms are prevalent [40,41].It is therefore essential to identify an appropriate screening tool to help tackle the under detection of the problem from the point of entry to the healthcare system for most people requiring health services.
In a local study that estimated the 12-month cumulative incidence and predictors of a positive screen for depressive symptoms using PHQ-9 among primary-care patients [42], the cumulative incidence of PHQ-9-screened depressive symptoms were found to be higher than the incidence of depressive disorders reported in systematic reviews [3,43].The screening instrument has been translated into Chinese and validated in Chinese primary-care patients [30,31,44], suggesting that it may be an appropriate tool to identify patients with depressive symptoms in local primary-care settings in view of its high sensitivity of 0.81 (95% CI: 0.68 to 0.89) and specificity of 0.88 (95% CI: 0.84 to 0.91) as shown in a previous systematic review [45].However, data about its screening ability in Chinese populations are lacking.In this  current systematic review and meta-analysis, statistical details of sensitivity and specificity were not available for the included studies, but data on the area under the ROC curve were obtained for metaanalysis.The pooled AUC value of PHQ-9 was 0.885 (95% CI: 0.805 to 0.965), which was greater than 0.7, the value for an instrument to be considered as a sensitive screening tool [46].It is, important to note that the AUC value conveys little about the diagnostic properties of a screening modality, and only a limited number of studies was included in this meta-analysis.

Limitations
The number of studies included in the systematic review was not adequate for meta-analysis.The sample size was too small to draw a conclusion on how well the PHQ-9 instrument could identify depressive patients in primary-care settings.This might also mask publication bias and heterogeneity between studies, and the current review is likely to be vulnerable to publication bias, a threat to the validity of systematic reviews [47].However, the current study provided preliminary information for primary-care practitioner's reference, and further research in this area is required.The selection of papers for the systematic review was conducted by one investigator and therefore the review process and selection of studies could be biased.

Implications and future directions
There is currently no local guidelines on the use of screening instruments for identifying patients with depressive symptoms in primary-care setting in Hong Kong.The present systematic review has demonstrated that the PHQ-9 is a sensitive screening instrument which is highly predictive of depression among Chinese patients in primary care settings, suggesting primary care physicians could utilize the widely available instrument to help identify patients with depressive symptoms and provide treatment earlier as a gatekeeper of psychiatric and other specialist services.Due to the limitation in number of relevant studies, further evaluation studies are still needed to build the evidence for the utility of the PHQ-9 and a future meta-analysis needs to be performed again when more studies are available.

Conclusion
The systematic review has demonstrated a high predictive ability of the PHQ-9 as a depression screening instrument in Chinese patients.
The review and analyses were limited by the small number of studies that met the inclusion criteria.A further systematic review to compare different screening modalities are required when more studies become available.

Figure 1 .
Figure 1.Flowchart of screening study selection

Figure 2 .
Figure 2. Pooled Area under the ROC Curve

Table 1 .
Summary of included studies.AUC: Area Under the Receiver Operator Characteristic Curve; CES-D: Center for Epidemiologic Studies Depression Scale; HAMD: Hamilton Rating Scale for Depression; PHQ: Patient Health Questionnaire; Q-LES-Q SF: Short Form of the Quality of Life Enjoyment and Satisfaction Questionnaire; SE: Standard Error; SF-12 (v2) MCS: Short Form-12 Health Survey (version 2) Mental Component Summary

Table 2 .
Meta-analysis of the Area under the Receiver Operator Characteristic Curves for PHQ-9 Instruments.