Assessment of Burden Among Family Caregivers of Schizophrenia: Psychometric Testing for Short-Form Zarit Burden Interviews

Objective: Although various short forms of Zarit Burden Interview (ZBI) have been developed, there is a lack of standard psychometric testing and comparison among them. The study aims to examine the psychometric properties of ten short versions of the most frequently used ZBI among a sample of schizophrenia caregivers and to find the one with the best performance. Methods: Cross-sectional door-to-door survey of ZBI-22 and a series of validated instrument data from 327 family caregivers of schizophrenia patients in a Chinese rural community were conducted from October 2015 to January 2016. Reliability was assessed using McDonald's omega coefficient (ω). Validity including concurrent validity, known group's validity, and criterion validity were assessed by Spearman correlations and Mann-Whitney U tests. Overall discrimination ability was evaluated using the area under the receiver operating characteristic curve (AUC). Results: Reliability was generally good for all short forms (ω = 0.69–0.84), except for the Gort ZBI-4 (ω = 0.58), which is acceptable considering its small item numbers. Concurrent validity was good across all various ZBI forms with significant negative correlations with patient's function (r = −0.34 to −0.48, p < 0.01), as well as significant positive correlations with caregiver's depression (r = 0.49–0.65, p < 0.01), and anxiety symptoms (r = 0.45–0.58, p < 0.01). Known groups' validity (carers with disease vs. without disease; carers being parents vs. spouse vs. others) showed inconsistent results among various short forms. Criterion validity was generally good for all short forms with significant positive correlations with Family Burden Interview Schedule (r = 0.67–0.75, p < 0.01), except for the Higginson ZBI-1(r = 0.57, p < 0.01). Discriminative ability was also good for all short forms (AUC range: 0.85–0.99), with various cutpoints proposed. Among all ten short forms, the Ballesteros ZBI-12 and the Gort ZBI-7 outperformed others with almost equally good performance in comprehensive psychometric testing. Conclusions: This study provides support for the reliability, validity, and discriminative ability of the ten various short forms of ZBI for use among schizophrenia family caregivers, with the Ballesteros ZBI-12 and the Gort ZBI-7 endorsed as the best ones.


INTRODUCTION
The Zarit Burden Interview (ZBI) is one of the oldest and most commonly used instruments for assessing caregiving burden at an international level (Knight et al., 2000). Initially developed more than 30 years ago, the ZBI was intended to measure burden and stress experienced by caregivers caring for people with dementia with 29 items on a four-point Likert type scale (Zarit et al., 1980). A revised ZBI (Zarit et al., 1985a,b) was later introduced with 22 items on a five-point Likert type scale, which has been widely translated into various languages, and validated across countries and cultures, such as Europe (Braun et al., 2010;Martin-Carrasco et al., 2010;Chattat et al., 2011), Africa (Imarhiagbe et al., 2017) and Asian countries like China (Wang et al., 2008;Lu et al., 2009) and Japan (Hirono et al., 1998) allowing for international comparison. Evidence also shows that the ZBI can be interpreted similarly across gender or across educational level (Lin et al., 2017a). Furthermore, a meta-analytic study has proved that the ZBI are reliable across populations of caregivers (i.e., spouses/partners, children, and parents), care-recipients (i.e., AD/dementia, physical illness, and mental illness), and most language versions (e.g., French, Spanish, and Chinese) (Bachner and O'Rourke, 2007).
The widespread use of ZBI in both research and clinical settings worldwide has fostered interest in simplification of the 22-item scale for ease of administration, minimization of respondent burden, and quick screening in busy occasions (Lin et al., 2017b). A number of short forms of ZBI have been proposed, with the number of items ranging from 1 to 14, as shown in Table 1. All those short forms are developed or assessed either by using classical test theory or item response theory, with theory-guided choice of item selection. Comparison of various abridged versions of ZBI have been conducted with consistent findings supporting for the reliability and validity of various short forms, while uniformly endorsing as the best one either the Bedard 12-item ZBI (Higginson et al., 2010;Hagell et al., 2017) or the Ballesteros 12-item ZBI (Lin et al., 2017b).
However, these previous performance comparisons fall short in four key aspects: First, among all ten short forms reported in the literature, only six versions have been assessed, rendering the comparison incomplete, and inconclusive (Higginson et al., 2010;Hagell et al., 2017;Lin et al., 2017b). Secondly, the majority of short forms were developed and tested in a population of dementia caregivers, little evidence exists on their psychometric properties among caregivers of mental illness, although the full 22-item form has already been widely used, and validated among such a population (Higginson et al., 2010;Flynn Longmire and Knight, 2011;Tang et al., 2016;Hagell et al., 2017;Lin et al., 2017b). Thirdly, previous validity tests were based on comparison between short forms of ZBI with self-designed questions, such as informal care hours and financial situations (Lin et al., 2017b), or using the full 22-item version as a self-comparison (Higginson et al., 2010), instead of standardized or validated instrument, thus weakening the robustness of validity results. Lastly, although the 12-item ZBI has been tested as having the best performance in previous studies that were mostly conducted in western countries (O'Rourke and Tuokko, 2003;Higginson et al., 2010;Hagell et al., 2017), it does not incorporate some cultural-sensitive salient concept in Asian societies (Lim et al., 2014), which may limit its use across countries that share similar Confusianism traditions (e.g., Japan, South Korea, and China) (Zhou et al., 2016;Lin et al., 2017b).
In light of the above-mentioned limitations of previous studies, the present study was conducted with the purpose of reassessing the psychometric properties of ten short forms of ZBI-22 among Chinese family caregivers of people with schizophrenia using a series of standardized or validated instrument for validity test.

Participants and Procedure
This is a descriptive, cross-sectional study with partial data coming from the research entitled "Family burden and caregiving experiences of schizophrenia in Chinese rural communities" , conducted in Ningxiang County, Hunan Province of China. The research recruited a sample of 352 primary caregiver s of schizophrenia through China's first program for treatment and management of serious mental illness ("Central Government Support for the Local Management and Treatment of Severe Mental Illnesses Project, " 686 Project) (Ma, 2012). The care recipient must be registered in the 686 Program and fulfilling the Chinese Classification of Mental Disorders-3(CCMD-3) or the International Classification of Diseases-10 (ICD-10) criteria for schizophrenia as diagnosed by special psychiatrist. The primary caregiver must be a family member who is living with the patient and taking the most responsibility of caring, 16 years old or above, and able to understand and communicate. After excluding 14 participants who refused to participate and 11 withdrawals, our final sample included 327 community-dwelling primary caregivers of patients with schizophrenia.
Data collection was conducted from October 2015 to January 2016. A team of nine post-graduates from the School of Public Health of Central South University were recruited as interviewers. The interviewers have a background of public health, psychology or psychiatry. All interviewers received a 1week uniform formal training to conduct the interviews provided by a psychologist before the formal study. The training was composed of half lecturing and half practice of role plays. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Institutional Review Board of the Xiangya School of Public Health of Central South University. We paid door-to-door visit at the primary caregivers' home and conducted face-to-face interviews with them for an average duration of 50-70 min, after obtaining written consent from primary caregivers for the study. Details of the study enrollment and procedure have been published elsewhere .

Zarit Burden Interview (ZBI)
The ZBI-22 (Table 1) consists of 22 items scored in 5-point Likert scale from 0 (never) to 4 (nearly always), except for the final item on global burden, rated from 0 (not at all) to 4 (extremely).  The total score ranges from 0 to 88 with higher scores indicating higher burden. In addition, several short forms of the ZBI-22 have been proposed as a rapid screening tool, which are scored according to the same principle as the original ZBI-22. These short forms include: the 14-item ZBI by Knight et al. (2000), two different 12-item ZBIs by Ballesteros et al. (2012) and Bedard et al. (2001), the 8-item ZBI by Arai et al. (2003), two different 7-item ZBIs by Zhou (2011) and Gort et al. (2005), the 6-item ZBI by Higginson et al. (2010), two different 4-item ZBIs by Gort et al. (2010) and Bedard et al. (2001), and finally, 1-item ZBI by Higginson et al. (2010) (Table 1). In the present study, the Chinese version of ZBI showed acceptable internal consistency with a McDonald's ω coefficient of 0.89.

Family Burden Interview Schedule (FBIS)
The FBIS (Pai and Kapur, 1981) was used to assess family burden and consists of 24 items rated on a 3-point Likert scale from 0 (no burden) to 2 (serious burden). The total scores range from 0 to 48 with higher scores showing higher burden. In the present study, the Chinese version of FBIS showed acceptable internal consistency with a McDonald's ω coefficient of 0.87.

Global Assessment of Functioning (GAF)
The GAF was used to assess patient function and consists of only one 100-point single item covering three major domains: social functioning, occupational functioning, and psychological functioning (American Psychiatric Association, 1994). The total scores range from 1 to 100, with higher scores indicating higher function. Examples are given for each ten-level interval.
Patient Health Questionnaire (PHQ-9) The PHQ-9 (Spitzer et al., 1999) was used to assess caregivers' depression symptoms and consists of 9 items rated on a 4-point Likert scale from 0 (not at all) to 3 (nearly every day). The total scores range from 0 to 27, with a cut-off point of 10 differentiating depression and non-depression (Manea et al., 2015). The Chinese version of the PHQ-9 demonstrated good internal consistency in the current study with a McDonald's ω coefficient of 0.89.

Generalized Anxiety Disorder Scale (GAD-7)
The GAD-7 (Spitzer et al., 2006) was used to assess caregivers' anxiety and consists of 7 items rated on a 4-point Likert scale from 0 (not at all) to 3 (nearly every day). The total scores range from 0 to 21, with a cut-off point of 10 differentiating anxiety and non-anxiety (Schalet et al., 2014). The Chinese version of the GAD-7 demonstrated good internal consistency in the current study with a McDonald's ω coefficient of 0.91.

Data Analysis
For the first step, internal consistency of all short scales was examined with McDonald's ω coefficient (McDonald, 1999;Zhang and Yuan, 2016). An ω value of 0.7-0.9 for long scales and 0.6 for short scales (e.g., four items) indicate good internal consistency, while an ω value of >0.9 suggests redundant items (Youden, 1950;Nunnally and Bernstein, 1994). Secondly, validity of the scales was tested and compared for the following three tests: concurrent validity, known group's validity and criterion validity. We first tested the normal distribution of FBIS score, GAF score, and GAD score using sktest and found none of them fit normal distribution (p < 0.01), and thus used non-parametric testing in the following validity testing.
Concurrent validity was measured using Spearman correlations with expected significant negative correlations with patient's function (GAF score), as well as significant positive correlations with caregiver's depression (PHQ-9 score), and anxiety symptoms (GAD-7 score) (Ji et al., 2014;Wang et al., 2017;Sun et al., 2018;Yu et al., 2018). We further performed Fisher r-to-Z test to compare the statistical significance of different correlations.
Known group's validity was assessed using Mann-Whitney U tests. We expect that caregivers with physical disease exhibited higher caregiver burden than those without physical disease . Also, caregiver burden varies according to different caregiving roles, such as parents, spouse, and others (Chang et al., 2017;Yu et al., 2017).
Criterion-related validity were measured by Spearman correlations between short-form total scores and the gold standard-FBIS total scores, with an expected correlation coefficient of above 0.7 (Higginson et al., 2010). The coefficient of determination (r 2 ) was also computed to assess how much the variance of the gold standard can be explained by respective short forms.
Thirdly, the discriminative performance of the short forms was assessed and compared with the receiver operating characteristic (ROC) curve (Coffin and Sukhatme, 1997), using the ZBI-22 cut-off 21 as the reference (Zarit et al., 1985a). The ROC was constructed by plotting sensitivity against 1specificity, with each point representing a sensitivity/1-specificity pair corresponding to a particular cutoff value. The closer a ROC plot is to the upper-left corner, the higher the overall accuracy of the test. A point closest to (0, 1) on the ROC curve indicates the ideal condition (100% sensitivity and 100% specificity) (Coffin and Sukhatme, 1997). The areas under the curves (AUC) were calculated using the trapezoidal method (Weinstein et al., 1980;Coffin and Sukhatme, 1997) to represent the scale's the ability to correctly classify those with and without burden. The range of the AUC is 0.5-1.0. A discriminative test is considered perfect if AUC = 1.0, good if AUC = 0.8-1.0, moderate if AUC = 0.6-0.8, and poor if AUC = 0.5-0.6; an area of 0.5 reflects a random rating model (Weinstein et al., 1980). 95% Confidence intervals of AUCs were computed. A P-value below 0.05 was considered as statistically significant. Besides, cut-points for the ZBI short forms were estimated based on the Youden index (Fischer et al., 2003).
All analyses were carried out using SPSS 16.0 (SPSS Inc., Chicago, IL, USA) and R 3.5.1. Table 2 shows the descriptive data on the sample. The mean (SD) age of the caregivers was 57.7 (12.5) years, and most of them were married (82.3%) and with primary education (59.9%). Slightly more than half of the caregivers were female (53.8%) and employed (52.9%). Most of their relationships to the care-recipient were parents (43.8%) and spouses (33.7%). The caregivers spent a median (q1-q3; minimal-maximum) of 15 years (9-25; 1-49) in caring for the patients. The median score of patient function as measured by GAF was 40.0 (20.0-61.0; 1-99). The mean score of depression, anxiety, and family burden as measured by PHQ-9, GAD-7, and FBIS were 9.75 (SD: 7.31), 9.31 (SD: 6.61), 23.62 (SD: 9.76), respectively. Reliability and validity results were shown in Table 3. Reliability was generally good with McDonald's ω coefficient ranging from 0.69 to 0.85 across various ZBI formats, except for the Gort ZBI-4 (ω = 0.58). The low ω value for the Gort ZBI-4 may be explained by the property of the McDonald's ω, which is not an independent measure. McDonald's ω is sensitive to the number of items and increase with increased item numbers (Sijtsma, 2009). Based on this theory, we believe a ω of 0.58 being acceptable for such a small item scale like the Gort ZBI-4. Among all short forms, the Ballesteros ZBI-12 showed the highest ω of 0.85, followed closely by Bedard ZBI-12, the Arai ZBI-8, and Zhou ZBI-7 (ω = 0.84 for all).

RESULTS
External concurrent validity was in general accordance with expectations and similar across the various ZBI forms, with significant negative correlations with GAF score (r = −0.34 to −0.48), and significant positive correlations with PHQ-9 score (r = 0.49-0.65) and GAD-7 score (r = 0.45-0.58). Among all the various ZBI forms, the Ballesteros ZBI-12 showed the highest correlations with the scores of GAF, PHQ-9, and GAD-7. A subsequent Fisher r-to-Z test found no statistical differences among those correlations of various short forms, indicating that all short forms performed equally well in concurrent validity.
For known groups' validity, among all short forms, five showed significantly higher burden scores in caregivers with disease than those without disease. For caregiving role and caregiver burden, eight forms exhibited significant higher burden scores in parent caregiver group, yet only five showed exactly the same pattern as the full ZBI scale: parents > spouse, others. Among all short forms, only three displayed favorable known group validity consistently: Higginson ZBI-1, the Gort ZBI-7, and the Ballesteros ZBI-12.
Optimal cut-points according to the Youden index relative to the gold standard cut-point of 21 on the ZBI-22 are shown in Figure 1 and Table 4. All shorter versions were overall successful in differentiating low-and high-burden individuals with AUCs ranging from 0.85 to 0.99. Among all short forms, the Ballesteros ZBI-12 showed the highest AUC (0.99), followed closely by all with the same AUC of 0.98.

DISCUSSION
This study assessed and validated ten short forms of the Zarit Burden Interview (ZBI) among family caregivers of people with schizophrenia. We found general support for the psychometric testing of all ten short forms. While previous studies proposed either the Bedard 12-item ZBI (Higginson et al., 2010;Hagell et al., 2017) or the Ballesteros 12-item ZBI (Lin et al., 2017b) as the best one, our comparison produced two structures that outperformed other structures in comprehensive psychometric properties: the Ballesteros 12-item and the  Compared to the full ZBI form, the short forms are easier to administer and less time-consuming due to the abridged items. The use of ZBI short forms facilitates rapid identification of caregiver burden and quick further assessment or referral in busy clinical situations and is thus favored by physicians. Also, it alleviates respondent burden of caregivers who focus more on the patients they cared for than their own feelings and thus only want to answer the briefest questionnaire.
Four key findings emerge from our analysis. First, we found high internal consistency of all short versions of the ZBI with McDonald's ω ≥ 0.70, except for Gort ZBI-4 (McDonald's ω = 0.58). On the one hand, it reflects the property of the McDonald's omega coefficient that is sensitive to the number of items and decreases with reduced items (Sijtsma, 2009). On the other hand, it may imply the potential item redundancy of the full ZBI scale and propose better Knight ZBI-14 (Knight et al., 2000) ZBI-22 (Zarit et al., 1985a) INTERNAL CONSISTENCY McDonald's ω   Frontiers in Psychology | www.frontiersin.org applicability of short forms, as shown in another study (Lin et al., 2017b). Second, while all ten ZBI short forms uniformly exhibited desirable concurrent validity correlations and criterion-related validity, there is inconsistency in the result of known groups' validity. Although caregiver burden has been widely proven to be positively associated with physical disease in the literature (Sethabouppha and Kane, 2005;Gater et al., 2014;Yu et al., 2017), only five ZBI short forms demonstrated significant higher burden scores in caregivers with disease than those without in the current study. Moreover, past studies have shown that caregiver burden varies according to caregiving role, with the majority supporting the highest burden in parents, followed by spouse, siblings, children, and others in turn (Lu, 2009;Hsiao and Tsai, 2014a,b). In the present study, eight out of ten short forms have successfully distinguished caregiver burden among various caregiving roles, while only 5 short forms showed exactly the same pattern as the full ZBI scale: parents > spouse, others. In sum, only the Higginson ZBI-1, the Gort ZBI-7, and the Ballesteros 12-item ZBI displayed favorable known group validity consistently, which were highly recommended for future use in screening for various groups.
Third, our findings found the Ballesteros 12-item and the Gort ZBI-7 outperformed other short forms with almost equally good performance in psychometric testing. This finding partly supports Lin's (Lin et al., 2017b) previous conclusion of Ballesteros 12-item ZBI as the best one, yet conflicts with the findings of Hagell and Higginson, both proposed the Bedard 12-item ZBI (Higginson et al., 2010;Hagell et al., 2017) as the best one. A careful comparison between the Gort ZBI-7 and the Bedard ZBI-12 showed that the latter included all items of the former except for the item 22 "Overall, how burdened do you feel in caring." One possible explanation may be that item 22 acts as an overall summary of the whole scale and is more representative of the full ZBI score than any other item. In fact, some authors even propose reducing the whole scale into an extreme one item-the Higginson ZBI-1 as a rapid assessment of caregiver burden (Higginson et al., 2010;Ballesteros et al., 2012). However, the measurement of item 22 alone is not enough to represent the whole scale because it is based on the answering and learning process of the all previous items, leading to its analysis in isolation from the rest of the information unpractical (Ballesteros et al., 2012). Another explanation may be that the item 22 acts as more like a supplement to the previous items than a summary, as shown in our analysis, the Gort ZBI-7 is actually a combination of the Higginson ZBI-1 and Higginson ZBI-6 but performed much better than the other two. It is likely that item 22 may have measures some salient concept that other items may not be able to capture, especially in Asian cultures. As a result, this item is an integral part of the scale that can neither be isolated, nor deleted.
Our fourth finding is cut-off values ranging from 4 to 17 for the various short forms of ZBI relative to that of the ZBI-22. The results were slightly different from those previously identified among family caregivers of people with cancer, brain injury, and dementia (Higginson et al., 2010;Hagell et al., 2017), which may be explained by the various sample as mentioned above and provides useful guidance for future use of short forms among such a population. Furthermore, all short forms displayed good AUC values of above 0.8, further corroborating their high discriminative performances to distinguish those with and without burden, which accords with past studies (Higginson et al., 2010;Hagell et al., 2017). The study falls short in several aspects. First, this is a cross-sectional study with no information on short forms' responsiveness to change, which is important for psychometric property in intervention and longitudinal studies (Kirshner and Guyatt, 1985). Second, we did not run test-retest reliability in the current study. Further study may consider collecting data in multiple time points to detect responsiveness to change and test-retest reliability. Third, we did not test the factorial structure of the ZBI and treated all the different short forms of ZBI as a unidimensional structure. Conflicting evidence exists on the factor analysis of both the 22-item ZBI and its short forms in past studies (Ballesteros et al., 2012;Branger et al., 2016;Tang et al., 2017). One possible explanation may be that factor analysis is sample dependent and may vary according to different study populations, thus making testing and comparing factor structures among various short forms in the current single study population implausible (Yu et al., 2015), future research may benefit from adding factor analysis into comparison among short forms across diverse populations. Fourth, the finding of the current study is based on quantitative analysis of different short form versions, there is a lack of qualitative evaluation of each item for cultural adaptation, future research may consider mixedmethod comparison for these short forms. Finally, the sample came from one single rural county of Hunan province, which may limit the study's representativeness. Future multi-center large sample comparative studies are warranted for firmer conclusions.
In conclusion, our observations provide initial support for the psychometric properties of various ZBI short forms for use among family caregivers of people with schizophrenia. The Ballesteros ZBI-12 and the Gort ZBI-7 showed the best performance in reliability and validity, which paves the way for future studies to further utilize and validate the scales.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.

AUTHOR CONTRIBUTIONS
YY and SX contributed to the conception and design of the study. YY, ZL, WZ, XC, MH, and XZ contributed to the research conduction and data collection. YY and ZL contributed to data analyses and interpretation, analysis tools or data. YY drafted the article while ZL, WZ, XC, MH, XZ, and SX critically appraised it and revised it. All authors approved the final version of manuscript for submission and publication.

ACKNOWLEDGMENTS
The authors would like to thank all the families of the schizophrenia individuals we interviewed during the study for openly sharing their feelings and experiences. We'd also like to thank the health and family planning bureau of Ningxiang County and the government of the Liushahe town, Shungfupu town, Chengjiao xiang, and Yutan town for their administrative support, as well as all village/community doctors for guiding us to visit each household of the schizophrenia individuals in the rural areas of Ningxiang county, Hunan province.