Evaluating the diagnostic properties of the Whooley questionnaire as a case-finding instrument for depression among Chinese women during and after pregnancy

Abstract Purpose: There is a rising prevalence in undetected perinatal depression in many countries, more effort in screening and early identification of perinatal depression is needed. While the Whooley questionnaire is the recommended case-finding strategy for perinatal depression, there is no validated Chinese version. The aim was to evaluate the diagnostic accuracy and stability of the translated Chinese Whooley questionnaire against gold standard measurement during and early after pregnancy. Materials and Methods: This observational study recruited 131 pregnant women from an antenatal clinic in Hong Kong from September 2019 to May 2020. We translated the Whooley questionnaire in Chinese and evaluated self-reported responses against an interviewer-assessed diagnostic standard (DSM-IV criteria) in 107 women at 26–28 gestational weeks. We calculated sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio, negative likelihood ratio and diagnostic odds ratio, with DSM-IV diagnosis as the gold standard. Results: The Chinese Whooley questions had a sensitivity of 79% (95% CI 54.4–93.9), a specificity of 97% (95% CI 90.4–99.3), a positive likelihood ratio of 23.2 (95% CI 7.4–72.1) and a negative likelihood ratio of 0.2 (95% CI 0.1–0.5) in identifying perinatal depression. Conclusion: The translated Chinese Whooley questionnaire has an acceptable diagnostic accuracy in identifying perinatal depression. It can be implemented in health services among Cantonesespeaking Chinese population.


Introduction
Depression is a major contributor to disability and global disease burden.Perinatal depression is depression during pregnancy and up to a year after childbirth.It affects approximately 20% of pregnant women [2].A meta-analysis found that 10.7% of women suffer from antenatal depression [3], whereas a systematic review of 58 studies reported an overall postpartum depression rate of 17% [4].An epidemiological study of 357 women in Hong Kong found that 18.9% of women in the second trimester and 22.1% in the third trimester suffered from antenatal depression [5].
Perinatal depression is linked to adverse outcomes in maternal health, familial relationships, and development of the neonate [6].Antenatal depression is associated with adverse neonatal outcomes, poorer self-reported health, increased substance and alcohol abuse [7].Postnatal depression is found to have negative impacts on relationships between women and their partners, the neonates [8], and family [9].It also adversely affected emotional and cognitive development of the neonates [10].It is reported that women with perinatal depressive symptoms used more healthcare resources during pregnancy, which potentially allowed for screening and intervention [11].It is also found that among women with postpartum depression, over 50% were diagnosed either before or during pregnancy [12].
The American Academy of Family Physicians, and American College of Obstetricians and Gynecologists recommend screening all postpartum women for depression.Women should be screened for depression at least once during the perinatal period.The Edinburgh Postnatal Depression Scale (EPDS) is the most commonly use tool, a cut off score of 12 yielded sensitivities and specificities from 80% to 90% [13].Screening for postpartum depression appears to be effective.A systematic review found a lower prevalence of postpartum depression at follow-up for those screened four to eight weeks after delivery [14].The American College of Obstetricians and Gynecologists does not include guidance on the specific timing and frequency of screening.One approach is to screen at least once during pregnancy and again four to eight weeks after delivery.The American Academy of Pediatrics recommends that physicians screen mothers for postpartum depression when infant is at 1, 2, and 4 months.Screening women at the perinatal stage for possible depression is also important for implementing early interventions and preventing further adverse outcomes [15,16].In the UK, a case-finding strategy with two "ultra-brief" questions, the Whooley questions [17], is endorsed by the National Institute for Health and Care Excellence (NICE) to identify perinatal depression.Individuals with positive screens would receive a more comprehensive clinical assessment for depression.The Whooley questions were validated with a sensitivity of 100% and a specificity of 68% during pregnancy [18].
Perinatal depression is often undetected during pregnancy in Hong Kong.A local population-based study has found the EDPS [19] to be reliable in identifying pregnant women at high risks of postpartum depression [20].It is suggested that screening all pregnant women in their second trimester could be a preventive measure [20].While EPDS demonstrated good reliability and validity [21], it is not routinely used at the antenatal stage.Reasons for this could be that EPDS has more items and relatively lower sensitivity.Compared to EDPS, the Whooley questions required fewer time and resources to administer and are thus more accessible to healthcare systems with limited time and resources.Early screening and detection of depressive symptoms during pregnancy is known to help prevent perinatal depression.The obvious next step is to determine which of the tests can identify the likely presence or absence of depression so appropriate decision making can be encouraged in the healthcare system in Hong Kong.

Objectives
The primary objective was to evaluate the diagnostic accuracy of the Whooley questionnaire against the gold standard measurement during pregnancy and early postnatal period.Also, we aim to identify the stability of positive or negative screening of depression between antenatal stage and postnatal and to estimate if earlier testing optimizes screening.

Materials and methods
This is a two-phase observational study (Figure 1).Participants were recruited from the antenatal obstetrics and gynecology outpatient clinic in a public hospital in Hong Kong using convenience sampling.Potential participants were approached sequentially at a routine antenatal talk at the clinic.All eligible subjects were approached and invited to participate.Sample size was estimated using the method developed by Flahault and colleagues [18,22].Based on a study conducted in an obstetrics setting [2] and a local study [5], we estimated the prevalence of perinatal depression to be 20%.Using an expected sensitivity of 95%, with a 0.95 probability that the minimum acceptable lower 95% confidence interval (CI) limit would not fall below 70%, we determined that we would require a sample size of 120.

Inclusion and exclusion criteria
The inclusion criteria were (1)18 years of age or older, (2) Cantonese-speaking, (3) Hong Kong resident for >1 year, (4) singleton pregnancy, and (5) no serious medical or obstetrical complications.Women who were not literate or were planning to move or be away from Hong Kong after birth were excluded from the study.
At the postnatal phase, women were excluded if their infants (1) were born at <37 gestational weeks, (2) had a birth weight of <2,500 g, (3) had severe medical conditions or congenital malformation, (4) were placed in Special Care Baby Unit for over 48 h, or (5) were placed in Neonatal Intensive Care Unit at any time.

Index test
The Whooley questionnaire comprised two yes-no questions.If the respondents answered "yes" to one or both of the questions, it is considered a positive screen.The scale was translated to Chinese by two local bilingual healthcare professionals, and the final translation was reached by consensus.The consensus version was then back-translated to English by a third independent translator and compared with the original to ensure accuracy.All translators were bilingual with native or near-native fluency in English and Chinese and familiar with the local dialect and culture.

Diagnostic gold standard
Telephone interviews were conducted to confirm the diagnosis using DSM-IV diagnostic criteria for major depressive disorder [23].The semi-structured verbal interviews were conducted using the Structured Clinical Interview for DSM-IV-Clinical version [24], with questions to identify depressive symptoms [25] which were shown to be valid both over the phone or in person [26,27].

Procedure
This is a two-phase study which aims to validate the use of Whooley questions in the antenatal and postnatal periods [28].The study is reported in accordance with the Standards for the Reporting of Diagnostic accuracy studies (STARD) [29].
During the antenatal phase, participants self-completed a questionnaire consisted of demographic questions and the Whooley questionnaire at the antenatal clinic.A researcher was present to address any query.
During the postpartum phase at about 5-6 weeks postnatal, a copy of the questionnaire with return envelope was mailed to the participants to be completed and returned within seven days.Non-respondents were contacted by telephone either for reminder or for a phone interview to complete the questionnaire.
In addition, participants were invited to complete an antenatal and a postnatal diagnostic interview.Within 14 days after completion of the Whooley questionnaires during the antenatal phase, a diagnostic interview was conducted by an experienced clinician-researcher.The second diagnostic interview was conducted during the postnatal phase based on their preference of time and day on their returned questionnaire.To prevent bias, the interviewer was blinded to the participants' responses to the Whooley questionnaire before the interviews.

Ethical approval
Ethical approval was obtained from Institutional Review Board of the University of Hong Kong/Hospital Authority Hong Kong West Cluster (UW-18-649) and the Hospital Authority Kowloon Central/Kowloon East Cluster Research Ethics Committees (KC/KE-18-0278).
The study was conducted in accordance with Declaration of Helsinki and its later amendments.All participants provided written informed consent to participate prior to their inclusion in the study.

Statistical analysis
Descriptive statistics for demographic data were reported.We assessed the diagnostic accuracy and utility of the Whooley questionnaire by examining its sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio, negative likelihood ratio and diagnostic odds ratio, with DSM-IV diagnosis as the gold standard.Each such measure was accompanied by a 95% confidence interval (CI).Values of these statistics were based on a positive case identified with a positive response to at least one of the two Whooley items.The diagnostic accuracy and utility measures were obtained separately at antenatal and postnatal time points.
Stability of positive and negative screening between the two time-points was assessed according to the number of cases that remained stable or changed diagnosis.We anticipate that the stability of the Whooley questionnaire shall be the same to the DSM-IV in changes of cases between the two timepoints.Stability or change in depression status between time-points was tested by McNemar test.All analysis was conducted using STATA version16 [30].

Results
From September 2019 to May 2020, 131 participants were recruited.A total of 107 participants has completed the Whooley questionnaire and DSM-IV interviews at the antenatal stage.Of those, 78 participants (73%) also completed the Whooley questionnaire and DSM-IV interviews at the postnatal stage (Figure 1).Baseline characteristics are shown in Table 1.Overall, participants who remained in the study at postnatal stage were similar to the whole sample at antenatal stage.At the antenatal stage, the DSM-IV identified 19 participants (18%) with depression.The Whooley questionnaire identified 15 true positive, 85 true negative, 3 false positive and 4 false negative compared to the DSM-IV.Sensitivity of the Whooley questionnaire at antenatal stage was 78.9%, with specificity of 96.6%, positive predictive value of 83% and negative predictive value of 96%, likelihood ratio (positive) of 23.2 and likelihood ratio (negative) of 0.2 (Table 2).At the postnatal stage, the DSM-IV identified two participants with depression (7%).The Whooley questionnaire identified 10 true positive, 56 true negative, 3 false positive and 9 false.Sensitivity for the Whooley questionnaire at postnatal stage was 76.9%, with a specificity of 86.2%, positive predictive value of 52.6% and negative predictive value of 94.9%, likelihood ratio (positive) of 5.6 and likelihood ratio (negative) of 0.3 (Table 2).The McNemar test suggested that DSM-IV and Whooley questionnaire remained stable between the antenatal and postnatal stage with no significant differences (v 2 ¼ 0.05, p ¼ 1.00, v 2 ¼ 1.8, p ¼ 0.26 respectively) (Table 3).

Discussion
In this study, the Chinese translation of the Whooley questions showed promising diagnostic characteristic in Chinese antenatal population with a strong specificity and moderate sensitivity.The Whooley questionnaire among the Chinese population appeared to be stable as a screening tool for depression in early and later pregnancy and after birth.
In the current study, we have identified 18% of antenatal women with depression using the DSM-IV.This is higher than the prevalence of 10.7% reported in a recent meta-analysis [31], but lower than the highest rate reported by individual studies, 24% [31].It is also within the range of antenatal depression rates reported in studies in China, which ranges from 5.5% to 23.1% [32,33].A local study in Hong Kong found the antenatal depression rate to be 22.1% in the first trimester and 18.9% in the second trimester [5], which were similar to the current findings.The study's overall positive predictive value is reported to be 83.3% of screening for depression during antenatal stage and negative predictive value is reported to be 95.5% that truly do not have the condition can therefore be generalize to the population.The translated screening tool showed moderate sensitivity (78.9% at antenatal and 76.9% at postnatal) and high specificity (96.6% in antenatal and 86.2% postnatal).There is a reduction of specificity and positive predictive value from pregnancy to postpartum.This could be due to the substantial reduction (27%) of sample size at postpartum, leading to a larger variation of the estimated accuracy measures as reflected in the wider confidence intervals.A larger study of women at postpartum would be needed to confirm if there is a genuine reduction of the accuracy.The sensitivity is lower than the pooled sensitivity of 95%, while the specificity is higher than the pooled specificity of 65% reported in a diagnostic meta-analysis on 10 studies on different population groups [34].Similarly, a study on pregnant women in the UK has reported a higher sensitivity, 100%, and lower specificity, 68% [18].The homogenous sample in the current study may lead to lower sensitivity and higher specificity.It has also been suggested that using the Whooley questions in primary care or community settings may lead to lower sensitivity and higher specificity as there is a more diverse population in primary care than specialist settings [35].
The translated Whooley questionnaire shows promising diagnostic accuracy.The high specificity of in our antenatal (96.6%) and postnatal validation (86.2%) shows that the Whooley questions could be used to identify perinatal depression.Cases screened as depression using the Whooley questionnaire is 23.2 times and 5.6 times more likely to be diagnosed with depression using DSM-IV at antenatal and postnatal phase respectively, compared to those with negative screens.For those identified as not depression using the Whooley questionnaires, they were 5 times (antenatal) and 3.3 times (postnatal) less likely to be diagnosed with depression with DSM-IV.The tool has a diagnostic odds ratio of 106.3 at antenatal and 20.74 at postnatal phase, showing it could be a good screening tool for perinatal depression.
Screening or case-finding instruments are simple, quick, and inexpensive methods to help detect and manage depression in primary care settings [36].They should be considered triage tests rather than replacement to existing assessment tools [29].The aim of triage tests is not to diagnose, but to filter patients for further assessments based on necessity.Adopting the two case-finding questions as screening tools in clinical settings could lower the number of patients for further clinical assessments using more comprehensive tools such as EDPS and Patient Health Questionnaire (PHQ-9) by over 50% [18].
The strength of the study was the use of gold standard comparison.The integrity of diagnosis and the follow-up enabled us to examine the stability of the Whooley questions.One limitation of the study was the smaller-than-expected sample size which may have affected the sensitivity and specificity ratios.The expected sample size was not reached as recruitment was cut short by local social unrests and the COVID-19 pandemic.The study could have been adapted to collect data on the impact of COVID-19 pandemic on perinatal and postnatal depression.Also, the study sample is homogenous as we recruited from one study site and only included women in their third trimester and first three months after birth.Further research with a larger sample size, diverse perinatal populations, improved sensitivity rates and longer follow-up is warranted.In addition, future studies assessing the application of Whooley questions during different trimesters of pregnancy and postpartum period is needed.Lastly, how the incorporation of the two questions affect perinatal care and outcomes could be evaluated in future studies, such as assessing the acceptability among health professionals and women.
Overall, our study found that the translated Whooley questions have promising diagnostic accuracy.As it requires only two yes-no questions, it could be widely implemented in health services for Chinesespeaking population in Hong Kong.Future research should examine the most effective approach to adopt Whooley questions as a screening tool for perinatal depression.

Table 1 .
Characteristics of participants at antenatal stage and postnatal stage.

Table 2 .
2 � 2 Contingency table of diagnostic properties for the Whooley questionnaire.

Table 3 .
Contingency table for stability of the DSM-IV and Whooley questionnaire.