Construct validity and internal consistency of the Patient Health Questionnaire-9 (PHQ-9) depression screening measure translated into two Ugandan languages.

Background: In Uganda, depression is a major public health issue because of its direct disease burden and as a risk factor and co-morbidity for other pervasive health issues. Psychometric assessment of translated depression measures is critical to public health planning to ensure proper screening, surveillance, and treatment of depression and related outcomes. We examined aspects of the validity and reliability of the Patient Health Questionnaire-9 (PHQ-9) translated into Luganda and Runyoro in a large population-based cohort of Ugandan adolescents and adults. Methods: Data from the ongoing open cohort AMBSO Population Health Surveillance study were analyzed from the Wakiso and Hoima districts in Uganda. Descriptive statistics were calculated for the overall sample and stratified by translated language. Construct validity was assessed for each translated scale using confirmatory factor analysis for ordinal data. The internal consistency of each translated scale was assessed using Cronbach’s alpha, McDonald’s omega total and omega hierarchical. Results: Compared to the Runyoro-speaking subsample from Hoima (n=2297), participants in the Luganda-speaking subsample from Wakiso (n=672) were older (27 vs 21 years, p < 0.01) and a greater proportion were female (62% vs. 55%, p < 0.01). The Luganda-translated PHQ-9 had a sample mean of 3.46 (SD=3.26), supported a single-factor structure (RMSEA=0.05, CFI=0.96, TLI=0.94), and demonstrated satisfactory internal consistency (Cronbach’s alpha=0.73, McDonald’s omega total=0.76, McDonald’s omega hierarchical=0.53). The Runyoro-translated PHQ-9 had a comparable sample mean of 3.58 (SD=3.00), also supported a one-factor structure (RMSEA=0.08, CFI=0.92, TLI=0.90), and demonstrated satisfactory internal consistency (Cronbach’s alpha=0.72, McDonald’s omega total=0.76, McDonald’s omega hierarchical=0.57). Conclusions: Our preliminary findings indicate that the Luganda and Runyoro translations of the PHQ-9 had satisfactory construct validity and internal consistency in our sample of Ugandan adolescents and adults. Future studies should expand on this promising work by assessing additional psychometric characteristics of these translated measures in other communities in Uganda.


Introduction
Depressive disorders (e.g., major depressive disorder [MDD], dysthymia) affect over 264 million people globally [1]. While MDD is a mood disorder characterized by persistent feelings of sadness, manifestations and severity of depression symptomology vary from person to person. Underlying causes of depression are not fully understood but have been attributed to a combination of genetic, biological, behavioral, psychological and environmental factors [2]. Depression can be a debilitating illness, with 5.45% of the total years lived with disability (YLD, a standardized metric used to measure disease morbidity) attributed to it globally [3]. In its most severe form, depression can lead to intentional selfharm and loss of life by suicide; neurobiological processes and genetic factors predisposing an individual to increased risk of suicide are not well understood [4]. Although depression is common worldwide, individuals residing in low-and middle-income countries (LMIC) experience a greater burden of risk factors for depression (e.g., poverty and food insecurity) and are more likely to go untreated due to mental health stigma and limited access to mental health services [5,6].
In sub-Saharan Africa (SSA), depression is the leading cause of YLD [7]. The estimated prevalence of depression in SSA ranges from 2-6% in the general population aged 15-49 years [3] and is significantly higher (up to 37%) among persons living with HIV (PLWH) [8]. Depression is a major public health issue because of its direct disease burden and as a risk factor and co-morbidity for other pervasive health issues including alcohol and other substance use disorders, HIV infection, intimate partner violence, and other chronic diseases [9]. In recent decades, understanding the epidemiology and burden of depression has become increasingly prioritized in SSA given its important role in shaping health outcomes, including risk of HIV acquisition [10] and HIV care and treatment outcomes [11].
However, resources devoted to mental health surveillance and service delivery are limited as other health issues continue to take precedence in this resource-constrained setting [5].
Although English is the national language of Uganda, it is neither the first nor the common local language of many Ugandans. The most widely spoken indigenous language in Uganda is Luganda, which is spoken by approximately 17% of the population [36]. Several studies have utilized the Luganda-translated PHQ-9 to screen for depression in the general population [19,20], PLWH [21,22], and female sex workers [23]. However, few assessed the scale's psychometric properties. Nakku et al (2016) assessed the sensitivity, specificity, and positive predictive value of the translated PHQ-9 among a sample of individuals attending rural primary clinics in eastern Uganda [19]. and also used the optimum cut-off for probable depression in a community sample in the same district [20]. Wagner et al (2017) and Ortblad et al (2020) used a cut-off score for probable depression identified from previous psychometric research of the PHQ-9 elsewhere in SSA [22,23] but did not examine the validity or reliability of the translated scale in their samples.
Rigorous psychometric research of depression measures is critical to public health planning efforts because it allows for more accurate estimates of disease burden which can guide resource allocation in resource-constrained settings like SSA. Prior PHQ-9 translation and psychometric assessment efforts in Uganda have been in Luganda since it is the most widely spoken non-English language. However, Uganda is home to a diverse ethnic landscape and many languages are spoken throughout the country. In Midwestern Uganda (where the Banyoro people reside), Runyoro is the dominant language (spoken by ~670,000 Ugandans) [37]. To inform mental health needs in Uganda, it is important to have an accurate estimate of the burden of depression, including among those who speak languages other than Luganda or English. In this study, we examined aspects of the validity and reliability of the PHQ-9 translated in Luganda and Runyoro in a large population-based cohort of Ugandans residing in the Wakiso and Hoima districts, respectively. Both districts are located in regions of Uganda that experience a high burden of HIV (6%-8%) [38] and qualitative research in both areas suggests that mental distress and mental health disorders are common [39]. Considering that mental health assessment tools can perform differently in varying languages, we examined certain psychometric properties of the Luganda-and Runyoro-translated PHQ-9 scales separately. We theorized a priori that the translated scales would be unidimensional in both languages, given findings from previous studies in SSA [24,25,[32][33][34][35]. We also hypothesized that the translated scales would have good internal consistency in both study samples. The present study provides valuable insight pertaining to the performance of the PHQ-9 in two Ugandan languages and the estimated burden of depression in a large, generalizable sample of Ugandans.

Study Population and Setting
This analysis includes data collected from the baseline round (May 2018-July 2019) of the Africa Medical and Behavioural Sciences Organization (AMBSO) Population Health Surveillance (APHS) study, which is an ongoing open cohort longitudinal mixed methods study implemented in Uganda. The primary aim of the APHS study is to monitor trends in infectious and non-communicable diseases, preventative as well as risk health behaviors, and family and population structures in the general population. The quantitative component of the APHS study is conducted continuously, with each subsequent round of data collection beginning immediately after the conclusion of the preceding round. At the beginning of every round, a household census is first conducted to enumerate the population in each study community and collect household sociodemographic information on all household members. Individuals who are eligible for survey participation are initially contacted via a hand delivered invitation with a follow-up call placed to those who do not reply. Individuals who participate and complete the survey, are asked to invite any remaining eligible persons in their household in a third attempt to reach eligible community members. Prior to participation, written informed consent (or assent for minors) is obtained for all participants. As previously described, [39] all census participants in enumerated households aged 13-80 years were eligible (i.e., met inclusion criteria) for participation in the quantitative baseline survey, which consisted of a structured questionnaire and collection of biological specimens for HIV and other sexually transmitted infections (STI) testing. The baseline round of the APHS was conducted in six communities across two districts in Central and Midwestern Uganda (i.e., Wakiso and Hoima). Study communities were selected to represent the diverse community types that comprise the districts (i.e., an urban, peri-urban, and rural community in each district) to improve generalizability of the study findings to the broader context of Uganda. Participants in this ongoing study are followed up every 12-18 months for subsequent APHS data collection rounds. The study was approved by the Clarke International University-Research Ethics Committee (CIU-REC) in Uganda and the Uganda National Council for Science and Technology (UNCST).
The PHQ-9 was incorporated into the survey after data collection was initiated. As such, the sample included in this analysis is restricted to participants from the four communities (one urban community in the Wakiso district and three communities in the Hoima district) who were administered the survey that included the PHQ-9 from September 2018 to July 2019.

Data Collection
The survey was translated into Luganda and Runyoro (Table 1) by qualified and trained social workers fluent in each language and designated by the Common European Framework (CEFR) as C1 and C2 proficient English speakers. Data collection occurred in the most widely spoken local language (i.e., Luganda in the Wakiso district and Runyoro in the Hoima district). All data were collected in-person through one-on-one interviews which typically lasted 50-70 minutes. Data were collected by experienced research assistants who were trained in research ethics and fluent in the local language of each community. Participants were compensated for their time and travel costs with 7,000 UGX ($1.85 USD at the time of data collection) and 3,000 UGX ($0.79 USD at the time of data collection).

2.2.1
Depression.-Depression refers to persistent feelings of sadness affecting one's thoughts, actions, and behaviors. Depressive symptoms were measured using the 9-item PHQ-9 with 4-point Likert-type responses (0=not at all, 1=several days, 2=more than half the days, 3=nearly every day, Table 1). The scale items reflect both somatic and non-somatic (cognitive) symptomology as well as a single item measuring suicidal ideation, and were based on the version of the Diagnostic and Statistical Manual of Mental Disorders used at the time of the scale's development (i.e., DSM-IV) [40]. A summed score was calculated from item responses, with higher scores indicating more depressive symptomology (range=0-27). For additional descriptive purposes, PHQ-9 scores were categorized using previously established cutoff values (i.e., 0-4=no depressive symptoms, 5-9=mild depressive symptoms, 10=moderate/severe depressive symptoms) [14].

2.2.2
Sociodemographics.-Age, sex, level of education (i.e., no formal schooling through post-college apprenticeship), and frequency of alcohol use (past 12 months) were collected as self-reported measures. District (i.e., Wakiso or Hoima) and community type (i.e., urban, peri-urban or rural) were determined by household location. HIV serostatus was determined through HIV testing (Determine™ HIV-1/2 Strip, Abbott Laboratories), followed by a confirmatory test using an HIV 1/2 STAT-PAK® Assay (Chembio Laboratories) for positive cases, and an SD Bioline HIV-1/2 3.0 test (Abbott Laboratories) if the results from the first two tests were inconsistent).

Sample
Characteristics.-Descriptive statistics were calculated for the overall sample and stratified by translated language (i.e., Luganda-speaking individuals in Wakiso and Runyoro-speaking individuals in Hoima). Means and medians were used for continuous variables and statistical comparisons between participants from the two districts were estimated using two-sided t-tests with unequal variances or Wilcoxon rank-sum test for independent samples. Frequencies were used to describe categorical variables and likelihood-ratio Chi-squared tests were used for statistical comparisons between the two subsamples.

Construct
Validity.-Confirmatory factor analysis (CFA) was used to examine the underlying dimensions explaining the relationships between the items in each translated PHQ-9 scale. CFA for ordinal data [41,42] was conducted using the cfa function in the lavaan package in R [43].We assessed model fit using robust estimators of the Root Mean Square Error of Approximation (RMSEA), Comparative Fit Index (CFI), and Tucker-Lewis Index (TLI), with a RMSEA value less than 0.10, and CFI and TLI values greater than 0.90 suggesting an acceptable fit.

Internal
Consistency.-We used three reliability coefficients (i.e., Cronbach's alpha, McDonald's omega total, and McDonald's omega hierarchical) to provide a more comprehensive assessment of the internal consistency of each translated scale using the omega function in the psych package in R [44]. Assessing the scale's Cronbach's alpha is beneficial as it is frequently implemented and allows for comparison with other reported findings. However, Cronbach's alpha has some limitations [45,46] and has been shown to underestimate [47] or overestimate [48] internal consistency in certain scenarios. As such, we also assessed McDonald's omega total and omega hierarchical, which are useful alternative indicators of reliability that overcome some of the limitations of Cronbach's alpha [49,50].
All analyses were performed in R Studio (Version 3.6.1) or STATA SE (Version 16.1).

Sample Characteristics
In the overall sample (N=2969), participants were on average 23 years old (standard deviation [SD]=13.91), approximately half were female (57%), 53% reported at least secondary education, approximately half lived in an urban area (52%), most reported never drinking alcohol in the past 12 months (71%), and 7% were HIV positive ( Table 2).

Psychometric Findings for the Luganda-translated PHQ-9
Participants who completed the Luganda-translated PHQ-9 had an overall sample mean of 3.46 (SD=3.26, Table 2). In terms of item-level characteristics, Items 4 (i.e., "feeling tired or having little energy") and 9 (i.e., "thoughts that you would be better off dead or of hurting yourself in some way") had the highest and lowest item-level averages, respectively (Table  1). Across items, the magnitude of standard errors increased across increasing threshold parameters (e.g., Item 9 with a standard error of 0.08 for the threshold between response options 0 and 1 compared to a standard error of 0.20 for the threshold between more severe response options 2 and 3, Table 3). Over 80% of the sample responded with the lowest response option ("not at all") for Items 6 through 9 (i.e., "feeling bad about yourself," "trouble concentrating on things," "moving or speaking slowly," "thoughts that you might be better off dead," Supplementary Table 1). Of the remaining response options, the response option indicating the most severe depressive symptomology (i.e., "nearly every day") was selected by less than 5% of the sample across any of the nine scale items.
CFA of the PHQ-9 translated in Luganda indicated that a one-factor structure was a reasonable fit for the data. The factor loadings from the nine scale items ranged from 0.45 to 0.76 (Table 3) and the robust RMSEA, CFI, and TLI values were 0.05, 0.96, and 0.94, respectively. In terms of internal consistency, the PHQ-9 translated in Luganda had a Cronbach's alpha, McDonald's omega total, and omega hierarchical of 0.73, 0.76, and 0.53, respectively.

Psychometric Findings for the Runyoro-translated PHQ-9
Participants who completed the Runyoro-translated PHQ-9 had an overall sample mean of 3.58 (SD=3.00, Table 2). In terms of item-specific characteristics, Items 4 ("feeling tired) and 9 (i.e., "thoughts that you would be better off dead") also had the highest and lowest item-level averages, respectively (Table 1). Across all nine items, the magnitude of standard errors increased across increasing threshold parameters (e.g., Item 9 with a standard error of 0.04 for the threshold between response options 0 and 1 compared to a standard error of 0.21 for the threshold between more severe response options 2 and 3, Table 3). Over 80% of the sample responded with the lowest response option ("not at all") for Items 6, 8, and 9 (i.e., "feeling bad about yourself," "moving or speaking slowly," and "thoughts that you might be better off dead," Supplementary Table 1). Of the remaining response options, the response option indicating the most severe depressive symptomology (i.e., "nearly every day") was selected by 3% or less of the sample across any of the scale items.
CFA of the PHQ-9 translated in Runyoro also indicated that a single-factor structure was a reasonable fit for the data. The factor loadings from the nine scale items ranged from 0.45 to 0.73 (Table 3) and the robust RMSEA, CFI, and TLI values were 0.08, 0.92, and 0.90, respectively. In terms of the internal consistency coefficients, the PHQ-9 translated in Runyoro had a Cronbach's alpha, McDonald's omega total, and omega hierarchical of 0.72, 0.76, and 0.57, respectively.

Discussion
Our study evaluated certain psychometric properties of the PHQ-9 in Luganda-and Runyoro-speaking populations in Uganda. We evaluated the prevalence of depressive symptoms in our samples, assessed construct validity through CFA, and examined the internal consistency of the translated PHQ-9 scales in both samples. This is the first psychometric analysis of the PHQ-9 in Runyoro and provides valuable insight into the performance of this measure in a widely spoken language in Uganda. Furthermore, our work adds to the limited evidence regarding the performance of the PHQ-9 in Luganda, which is the most widely spoken language in Uganda.
The PHQ-9 means observed in our Luganda-and Runyoro-speaking samples were comparable to those reported in studies conducted in Uganda [22], Kenya [34], Ethiopia [27], and Ghana [51], but were lower than those reported in other studies conducted in SSA [23-25, 52, 53]. We also observed that the proportion of our sample that would be classified as moderately or severely depressed was comparable to that reported in another study conducted among a sample representative of rural communities in Uganda [20]. These consistent findings were expected given that our study populations were sampled to represent the overall populations in the districts of Wakiso and Hoima, although they should be considered preliminary and interpreted with caution given that evaluating appropriate depression cut-off scores was beyond the scope of this analysis. The observed response option frequencies were also expected, with a larger proportion of participants endorsing lower severity depression items. These findings suggest that the lowest item thresholds were estimated well in our samples, which is beneficial considering that it is important for psychometric studies to examine the spectrum of depression and provide information regarding different regions of that continuum. However, as individuals with severe levels of depression were not well represented in our sample, we were unable to adequately estimate the performance of items measuring severe depressive symptoms (i.e., Items 7-9). The notable increase in standard errors across increasing threshold parameters suggests that while lower depression severity was well estimated in our sample, future studies should evaluate the performance of the PHQ-9 in samples with a higher expected prevalence of depression and suicidal ideation (e.g., patients in clinical psychiatric settings). It is also possible that certain mental health topics (such as suicide) are highly stigmatized in Uganda [54] and it may be more useful to use idioms in each language that target suicidal ideation but in terms that may be less stigmatizing and more culturally appropriate. Kaiser et al's systematic review of the meaning of the widely used idiom "thinking too much" underscores the heterogeneity of assigned meaning and the underlining constructs that a turn of phrase can have, which emphasizes the importance of centering questions and word choice in cultural contexts [55]. Suicidality has been captured in the Luganda-translated MINI among PLWH in Uganda, suggesting that severe depression can be measured in this population using translated and adapted measures [56,57]. While the MINI in its original English form uses explicit questions regarding suicidality (similar to Item 9 of the PHQ-9), neither study described their translation process nor provided the translated items. Future work should examine whether the Luganda-translation of the suicidality item in the MINI was perhaps culturally adapted to improve performance in ways that can be applied to the PHQ-9 in this setting.
Our CFA findings indicate that the translated PHQ-9 scales had good construct validity with one underlying depression construct in each sample. Our findings are comparable to other studies in Kenya [25,34], Botswana [32], Ethiopia [33], Rwanda [58],and South Africa [35] that examined the factor structure of the PHQ-9 in different languages and found that the scale had a single-factor structure in those settings. Furthermore, the range of factor loadings observed in our samples were comparable to those reported in studies in Kenya [25] and South Africa [35]. This suggests that the translated scales performed as designed in two widely spoken languages in Uganda.
In terms of internal consistency, the Cronbach's alpha coefficient observed in our sample was comparable to Cronbach's alpha estimates reported in studies in South Africa [18], Kenya [25], Ethiopia [27], as well as Uganda and Zambia [23]. Compared to these studies, we expanded our assessment of internal consistency by examining alternative measures of reliability. The observed McDonald's omega hierarchical coefficient indicated that while a large proportion of the total variance in our sample was due to one primary factor, some of the variability observed across items was due to other factors unrelated to the depression construct. The source of this variability could be due to numerous factors, including translational and cultural issues affecting the meaning and interpretation of items. The reliability of the translated PHQ-9 scales should be further explored in future work (e.g., differential item functioning, inter-rater reliability, test-retest reliability).
Our study has several limitations. First, the one-on-one interviews may have introduced social desirability bias if participants underreported their depressive symptoms in the presence of the interviewer. To mitigate this source of bias, interviewers received training pertaining to mental health and how to foster a non-judgmental environment during survey administration. Second, data collection in Wakiso was only performed in an urban community, limiting the generalizability of our findings to peri-urban and rural communities in the district. Future data from the APHS study should be used to replicate our findings in a Wakiso sample with more variation in community type. Third, while the mean depression scores that we observed are consistent with the prevalence of depression in comparable general populations, we were unable to comprehensively examine the performance of the items pertaining to more severe depression symptomology because there were not enough individuals in our sample who endorsed these items (e.g., Item 9). Future validation studies should assess these more severe depression items in samples with higher expected depression symptomology (e.g., psychiatric samples or highly stigmatized populations with greater mental disorder burden). Fourth, considering that scale validation was not the primary aim of the APHS Study, other aspects of the validity and reliability of the translated PHQ-9 scales (e.g., convergent validity, concurrent validity, test-retest reliability) could not be evaluated in this sample because those data were not collected. Furthermore, both of our samples consisted of predominately younger individuals with only 35 and 116 individuals aged 50 years or older in the Runyoro-and Luganda-speaking samples, respectively. We were unable to examine construct validity across age categories given these sample size limitations, and this should be explored in future research. Despite these limitations, our preliminary findings provide important information pertaining to the construct validity and internal consistency of the Luganda-and Runyoro-translated PHQ-9, an important step towards obtaining accurate estimates of the disease burden of depression in among Luganda and Runyoro speaking Ugandans. Four percent of Uganda's healthcare budget is allocated to mental health services [59], and although Uganda's healthcare delivery is generally decentralized, the bulk of mental health resources are allocated to the national referral hospital in Kampala [60]. Accurate estimates of the prevalence of depression in the general population can inform both the need to increase the proportion of the healthcare budget allocated to mental health services as well as where (i.e., which clinical facilities and communities) mental health care service scale-up is most needed. Future studies should expand upon our work and evaluate other important psychometric properties of these scales in Luganda and Runyoro, including identifying optimum cut-off scores for probable depression and referral for care.

Conclusions
Accurate estimates of the burden of disease attributable to depression in Uganda are critical to resource allocation and the scale up of mental health services. In Uganda and SSA more broadly, mental health services are limited with small proportions of the overall public health budget set aside for mental health service delivery. Accurately quantifying the burden of depression can inform and facilitate re-allocation of funds to meet the public health need. The present study provides important preliminary information regarding the performance of the PHQ-9 in two widely spoken languages in Uganda, including one language (Runyoro) in which the measure had not been previously translated. We provide estimates for the prevalence of depressive symptomology in a generalizable community sample and describe the construct validity and internal consistency of the PHQ-9 in two languages. The preliminary findings regarding construct validity suggest that the Luganda-and Runyoro-translated PHQ-9 may be good measures for depression in comparable samples of Ugandans. Future work should assess different aspects of the validity of the translated PHQ-9 scales in this setting, which can be facilitated by including additional measures (e.g., other depression or related mental health measures) in future rounds of data collection. These future efforts could establish PHQ-9 translations that are culturally appropriate and allow for precise, real-time screening of depression and referral for treatment. Psychometric assessment of these translated scales is also needed in populations with higher levels of depression to assess how the items measuring severe depressive symptoms perform in this setting.