Development and validation of a multi-dimensional scale to assess community health worker motivation

Background Ensuring that Community Health Workers (CHWs) are motivated is critical to their performance, retention and well-being – and ultimately to the effectiveness of community health systems worldwide. While CHW motivation is as multi-dimensional construct, there is no multi-dimensional measure available to guide programming. In this study, we developed and validated a pragmatic, multi-dimensional measure of CHW motivation. Methods Scale validation entailed qualitative and survey research in Mali and Bangladesh. We developed a pool of work satisfaction items as well as several items assessing the importance of hypothesized sub-dimensions of motivation, based on the literature and expert consultations. Qualitative research helped finalize scale sub-dimensions and items. We tested the scale in surveys with CHWs in Mali (n = 152, 40% female, mean age 32) and Bangladesh (n = 76 women, mean age 46). We applied a split-sample exploratory/confirmatory factor analysis (EFA/CFA) in Mali, and EFA in Bangladesh, then assessed reliability. We also gauged convergent/predictive validity, assessing associations between scale scores with conceptually related variables. Results The final 22-item scale has four sub-dimensions: Quality of supervision, Feeling valued and capacitated in your work, Peer respect and support, and Compensation and workload. Model fit in CFAs was good, as were reliabilities for the full scale (alpha: 0.84 in Mali, 0.93 in Bangladesh) and all sub-dimensions. To construct scores for the final scale, we weighted the scores for each sub-dimension by CHW-reported importance of that sub-dimension. Final possible range was -6 to +6 (sub-dimensions), -24 to +24 (full scale). Mean (standard deviation) of full-scale scores were 5.0 (3.3) in Mali and 14.5 (5.3) in Bangladesh. In both countries, higher motivation was significantly associated with higher overall interest in their work, feeling able to improve health/well-being in their community, as well as indicators of higher performance and retention. Conclusions We found that the Multi-dimensional Motivation (MM) scale for CHWs is a valid and reliable measure that comprehensively assesses motivation. We recommend the scale be employed in future research around CHW performance and community health systems strengthening worldwide. The scale should be further evaluated within longitudinal studies assessing CHW performance and retention outcomes over time.

clinics called Union Health and Family Welfare Centres (eg, facility-linked). FWAs list and mobilize pregnant women for services; counsel eligible couples on contraceptive methods and side effects; provide pills, condoms, and the second dose of injectable contraception; identify and refer couples for clinical contraception; and assist in immunizations. FWVs complete referrals they receive from FWAs; provide antenatal, postnatal, and reproductive clinical services and counseling, including IUD insertion; and immunize children.

Scale development
We identified relevant conceptual dimensions and developed a pool of 34 items based on a review of the literature and consultation with community health experts, and in-country stakeholders. Initial dimensions identified included: quality of supervision, feeling secure and validated in your work, growth opportunity, adaptive management and support, peer respect and support, and compensation. We drafted about 4-8 items for each of these conceptual domains. Ten items were adapted from Bhatnagar's job satisfaction scale [17]. The set of items was introduced by "How would you rate your satisfaction with the following aspects of your work?" Items were the statements like "Support your direct supervisor gives you in your work", "Cooperation amongst CHWs" and "Amount of total financial incentives you receive." Response options included very dissatisfied, dissatisfied, satisfied, and very satisfied.
We also created several items to assess the importance to CHWs of each dimension of their work, drawing on the content of each hypothesized sub-dimension of the work satisfaction items. The set of items was introduced by "How important to you are the following aspects of your work?" with the interviewer asked to read each statement out in full, including parentheses. An item example is "Quality of supervision (that is, your supervisor(s) actively support and value your work, and treat you respectfully and fairly)". Response options were not very important, important and very important.

Survey methods
We implemented the scale in surveys with 152 CHWs in Mali (January 2020) and 76 CHWs in Bangladesh (November 2019-January 2020). In both settings, we applied a census-based sampling approach in which all active and consenting CHWs in the study sites were interviewed. In Bangladesh, among these 76 CHWs, we additionally observed 1260 CHW-client interactions and conducted follow up interviews with 1384 clients about their counseling and care experience. This nested sampling in Bangladesh allowed for linking CHW surveys with observations and client surveys, which was helpful in assessing convergent validity.
Surveys were administered by trained research assistants in a private setting in the CHW's workplace, either in the community or at a facility, and lasted around 45 minutes. In Mali, surveys were administered in French or the local language and in Bangladesh they were conducted in Bengali. Research assistants administered the survey using the Ona.io platform on ARCHOS tablets in Mali, while surveys were paper-based in Bangladesh with data entry undergoing quality checks. No compensation was provided to CHWs for participation.

Qualitative research to assess sub-dimensions and content
We conducted eight focus group discussions (FGDs) across four sites with CHWs in Mali (with 59 total participants; January 2020), and 20 in-depth interviews (IDIs) with CHWs and supervisors in Bangladesh (October-November 2019). In Mali, all CHWs who participated in the quantitative survey were invited to participate in the FGDs. In Bangladesh, IDIs were conducted with FWAs assigned to geographic areas near 10 different UHFWCs (n = 10), FWVs working at five different UHFWCs (n = 5), and Family Planning Inspectors that supervise FWAs (n = 5). The choice of using FGDs vs IDIs, as well as means of identifying/selecting participants, was made by each country study team based on anticipated value of those methods to answering study research questions.
Study team members trained in qualitative methods facilitated the FGDs/IDIs. FGDs in Mali were conducted in a combination of local languages and French; IDIs in Bangladesh were conducted in the local language (Bengali). Interviews/discussions lasted between thirty minutes to one hour. Audio recordings were transcribed verbatim and translated into English. Data were analyzed using content analysis. Coding was performed in NVivo 12 software using a codebook based on hypothesized sub-dimensions motivation, as well as related emergent themes.
Qualitative findings generally supported the relevance of each of the sub-dimensions covered in the scale items on the surveys, and often also the content of particular items, that were tested. These findings, including illustrative quotes, are described in more detail in the results section.

Factor analyses
All survey data analyses were conducted for each country separately, in Stata v15 [22]. We began by inspecting frequencies for responses to each item, to ensure items had enough variation across the four response categories (ie, from very dissatisfied to very satisfied).
We then conducted exploratory factor analysis (EFA) to explore the factor structure and item composition. In Mali, given a larger sample size, we split the sample randomly in half and conducted EFA on one half, and confirmatory factor analysis (CFA) on the other half in order to test the factor structure suggested by the EFA. In Bangladesh, we only conducted an EFA.
For the EFA in each country, we began by assessing the number of factors underlying the data by counting the number of eigenvalues >1, examining the scree plot (counting number of factors above the 'elbow') and conducting parallel analysis (counting the number of factors above the superimposed line). Based on this, we specified that number of factors, and used promax rotation since the factors were correlated. We also inspected one fewer, and one greater, than the number of factors, to help clarify how well items were loading on each factor. For each item, we examined: factor loading, with loadings <0.3 suggesting removal; uniqueness (inverse of communality), with values >0.7 or >0.8 suggesting removal; cross-loading (loading on more than one factor), also suggesting removal [23].
For the CFA, we tested the hypothesized factor structure resulting from the EFA. We first tested each factor (ie, scale sub-dimension) separately. We examined item factor loadings, with non-significant loading or loading <0.3 suggesting item removal, as well as model fit statistics including the root mean square error of approximation (RMSEA; with cutoff <0.08 indicating adequate fit), comparative fit index (CFI) and the Tucker-Lewis index (TLI) (cutoff >0.90 for both), and the standardized root mean square residual (SRMR; cutoff <0.08) [24]. We then added correlated error terms suggested by modification indices, and re-examined model fit [25].
We then tested a higher-order model in which the factors (scale sub-dimensions) just described loaded on a single higher-order factor (ie, Motivation). We confirmed adequate 'factor loadings' of each sub-dimension on the higher-order factor. We then added correlated error terms suggested by modification indices, and re-examined model fit.

Reliability analyses
To assess reliability of the full scale and each sub-dimension, we calculated Cronbach's alpha (with values of >0.7 indicating adequate fit and >0.8 good fit) [26], as well as Ordinal Theta, a measure of reliability similar to alpha but more suitable for the limited number of response categories (eg, 4) and scale items (eg, for the subdimensions) [27]. We also examined correlations between the factors as further evidence for a higher-order factor structure.

Constructing final scale scores -weighting by importance
To generate a final score for each sub-dimension of Motivation, we calculated the mean of non-missing values for satisfaction items (each scored -2, -1, 1, 2 for very dissatisfied, dissatisfied, satisfied, very satisfied, respectively) for that sub-dimension. We then multiplied the mean score for each subdimension (which ranged from -2 to +2) by how important the respondent said it was to them, scored 1 = not very important, 2 = important, or 3 = very important. This resulted in a final sub-dimension score ranging from -6 to +6.
Of note, the negative (-2 -1)/positive (1 2) scoring for satisfaction items (and hence mean satisfaction score) was preferable to a positive scoring (eg, 1 2 3 4), since otherwise a low sub-dimension score for satisfaction multiplied by a high value on importance would result in a higher final score than if it was multiplied by a low value of importance. In other words, being unsatisfied with a dimension of your work, and considering that of high importance, should result in lower motivation compared with considering if of low importance. A visual representation of the rationale for this scoring is included in Figure 1.
Finally, to generate a motivation score for the full scale (ie, Motivation), we added up the sub-dimension scores, resulting in a final full-scale score ranging from -24 to +24 (since there were four final factors).

Convergent validity analyses
To assess convergent validity -evidence of similarity between measures of theoretically related constructs [23]-we assessed associations between the scale scores (full and each sub-dimension) and conceptually-related constructs/variables. We also assessed associations between scale scores and performance-related variables (eg, self-reported number of household visits conducted; client-reported outcomes), as well as variables suggestive of retention (eg, years working as a CHW; reporting frequently thinking of quitting). This contributed to establishing predictive validity, evidence of the scale's ability to predict future outcomes. Measures are described in Table 1. Variables differed somewhat by country.

VIEWPOINTS RESEARCH THEME 3: COMMUNITY HEALTH INITIATIVES
We conducted multivariate regression analyses with the MM scale scores as the outcome; linear regression was employed in Mali; finite mixture modeling was employed in Bangladesh due to the bimodal distribution of motivation scores (as described further in the results section). We controlled for potential confounders, including gender (in Mali), age, education, marital status, number of years working as a CHW, and district/union. We chose to present multivariate rather than bivariate results to adjust for confounding of the true relationships between the variables of greatest interest.

RESULTS
A total of 152 CHWs completed surveys in Mali, and 76 CHWs in Bangladesh. About forty percent of respondents in Mali, and all respondents in Bangladesh, were female. Over half of all respondents had completed more than a secondary-level education. Majority were married and over half lived in the community/village in which they work as a CHW. All respondents in Bangladesh, and nearly three-quarters in Mali, reported receiving financial compensation only. Other sample characteristics are included in Table 2. Bangladesh respondents were older and worked as CHW for longer than those in Mali.  Table 3 includes item response frequencies and means for Satisfaction items (note only final scale items are presented). CHWs in both countries tended to report being satisfied or very satisfied with most items. In general, reported satisfaction was lower in Mali than in Bangladesh. Satisfaction was lowest for items related to compensation and workload in both countries, particularly in Mali. There was a notable degree of variation in endorsement of different items within the Feeling valued and capacitated in your work and compensation and workload sub-dimensions.

EFA results
In EFAs conducted with the pool of 32 candidate items in Mali (EFA sample) and Bangladesh, the number of eigenvalues >1, scree plot and parallel analysis indicated there were three or four latent factors in Mali, and three in Bangladesh. We tried three, four, and five factor solutions in each country. Certain items consistently performed poorly across factor solutions in one or both countries. Three items had factor loadings <0.3 and/or cross-loaded on multiple factors, and four others had high uniqueness (>0.7). (A list of dropped items is available from the authors upon request.) We therefore removed these seven items and re-ran EFAs. Four conceptually clear factors emerged, which we labeled Quality of supervision, Feeling valued and capacitated in your work, Peer respect and support, and Compensation and workload. The most consistent/cleanly loading sets of items in both countries were for the factors Quality of supervision and Peer respect and support. While most of the items eventually assigned to Feeling valued and capacitated in your work and Compensation and workload loaded clearly on those respective factors, some items had less consistent loadings; these few items were grouped with the dimensions we felt they fit best, for confirmatory testing in CFAs.

CFA results
We tested the four factors emerging from the EFAs in CFAs in Mali. CFAs suggested all items had good factor loadings for Quality of supervision and Peer respect and support. For Feeling valued and respected in your work, we dropped two items ("Your personal safety while working in the community" and "Your ability to improve the health and well-being of the community"), and for Compensation and workload we dropped one item ("Amount of non-financial incentives you receive") with low factor loadings and which caused inadequate model fit. We then re-ran CFAs for each sub-dimension with 22 total items. We added several correlated errors between items within each sub-dimension (as noted below in Table 4), and assessed final model fit, which was good ( Table 4). For the higher-order CFA, after adding additional correlated error terms between items in different sub-dimensions, final model fit for the higher-order model was adequate.
Final CFA factor loadings for each item, and loadings for each factor on the higher-order factor, are included in the middle column of Table 3. Correlations between the four sub-dimensions provided further support for the higher-order factor structure, although more so in Bangladesh than in Mali. Correlations ranged from 0.09 to 0.45 in Mali, and from 0.61 to 0.71 in Bangladesh (with the lowest values in each country corresponding to Quality of supervision with Compensation and workload, and the highest corresponding to Quality of supervision with Feeling valued and capacitated in your work).

Reliability
Internal consistency reliability is presented in Assessing importance to CHWs of each factor Table 6 presents CHWs' reports of how important each sub-dimension was to them (asked via reading out a composite statement describing the sub-dimension, in its entirety). In Mali, a majority answered "very important" for Quality of supervision and Compensation and workload, and "important" for the other two sub-dimensions. In Bangladesh, a large majority answered "Very important" for each of the sub-dimensions. Qualitative findings relating to motivation and its sub-dimensions In both countries, CHWs who participated in the qualitative research recognized that they play critical roles in their communities and take pride in their work. Qualitative data in each country supported the relevance of each sub-dimension to motivation. While there was some consistency in how satisfied CHWs were with the different dimensions of their work, this also varied to some extent-particularly with respect to 'extrinsic' factors like amount of incentives or training received. Illustrative quotes for each sub-dimension are included in Table 7. CHW -community health worker * Original items were worded as follows: Growth opportunities (that is, having enough opportunities to improve your professional skills, make your own decisions, and have your ideas considered); Compensation (that is, sufficient amount and timeliness of incentives, including in relation to your workload and the time you spend with your family). † 3 of 5 items in the final "Feeling valued and capacitated in your work" sub-dimension ( Table 2) originally fell under "Growth opportunities", which is why this importance item was used in the present analyses. Quality of supervision, in terms of both 'soft' skills such as being friendly and respectful, and 'harder' skills like assisting in-person when problems arose, was described as critical to supporting CHWs to do their job well, including in relation to facing problems in the community, as well as finding solutions to reaching targets. In Mali, where CHWs had fewer years of experience than in Bangladesh and cover a range of health areas, they appreciated that supervisors engaged in a constant manner, in person and via phone, to answer questions and address problems as they arose.
Feeling valued and capacitated in your work was described in terms of feeling valued in the community and feeling valued and capacitated by the community health system, the latter relating both to ideas and autonomy as well as adequate training and supplies (eg, job aides, medicines).
Peer respect and support was consistently described in the interviews; in Bangladesh this was related both to mutual support between FWAs (ie, the community-based CHWs), and between FWAs and FWVs (the facility-based CHWs), and was seen as particularly important when confronted with community or religious pushback to family planning services.
Compensation and workload were described as important to staying motivated, and evaluated differently in relation to salary vs other forms of compensation such as travel reimbursements or incentive-based payments. Timeliness of compensation was described as a problem in Mali, but not in Bangladesh. Also in Mali, job security was perceived as threatened since they work for a nongovernmental organization (NGO) on a non-regular basis, rather than for the government.
Finally, CHW and supervisors described certain contextual circumstances as limiting their ability to do their job well. Examples in Bangladesh were the high mobility of clients, particularly in urban settings, and residual cultural norms against family planning, and in Mali, the periodic political instability. CHWs seemed to see these factors as relatively immutable, and hence seemed to downplay them as central to their sense of motivation in their work.

Final scale scores (weighted by importance)
Descriptive statistics for the final scale scores are included in Table 8. The mean scores for the full scale and for each sub-dimension were lower in Mali than in Bangladesh. Compensation and workload was the sub-dimension with the lowest mean score in both countries, and had a negative value in Mali. The highest mean score was for Quality of supervision in Mali, and for Peer respect and support in Bangladesh. In Mali, scores did not differ significantly by the CHW's gender (data not shown).
The distributions of scale scores are presented in Figure 2, using histograms overlaid with smoothed kernel density estimates. In Mali, the scores were normally distributed and were somewhat skewed towards higher scores (except for Compensation and workload). In Bangladesh, with the exception of the Compensation and workload which was normally distributed, the full scale and other sub-dimensions each had a bimodal distribution, consisting of two distinct 'normal' distributions, one with lower values and one with higher values. We conducted ancillary analyses to try to identify factors that may be causing the bimodal distribution in Bangladesh, but found that none of the following factors were associated (in finite mixture models, as described in more detail below): CHW cadre (FWA vs FWV), years working as a CHW, age, education, and union.

Convergent and predictive validity
Results of multivariate regression analyses are presented in Table 9 for Mali and Table 10 for Bangladesh. In Bangladesh, due to the bimodal distribution of the full scale as well as the sub-dimensions (except for Compensation and workload), we used finite mixture modeling (via the "fmm 2" command in Stata) to generate two coefficients, one for the distribution with lower values, and one for the distribution with higher values (Figure 2). Control variables are noted below each table.
In both countries, the full scale as well as each sub-dimension were positively and significantly associated with two items reflecting general work satisfaction/motivation: "Overall interest in job" and "Ability to improve health and wellbeing in your community". In Mali, the full scale and sub-dimensions were also associated with a general assessment of motivation "Overall is motivated to work here" as well as reporting feeling adequately supervised (these two items were not included in the Bangladesh survey).   Turning to associations with variables suggestive of performance (controlling for potential confounders), in Mali a higher score on the full motivation scale was associated with visiting a larger number of households in the last month, as was the Quality of supervision sub-dimension. In Bangladesh, a higher score on the full scale and/ or sub-dimensions was associated with several indicators of performance. These associations varied somewhat based on whether the CHW fell within the lower vs higher distribution of motivation. For observation-based performance, higher motivation around Quality of supervision and Peer respect and support was associated with higher performance among CHWs in the higher distribution; higher motivation around Feeling valued and capacitated in your work was associated with higher performance among those in the lower distribution. For client-reported quality of care, Peer respect and support was associated with higher quality of care but Feeling valued and capacitated in your work with lower quality of care, among CHWs in the higher distribution. Finally, among CHWs in the lower distribution, a higher score on the full motivation scale was associated with higher client-reported empowerment, Quality of supervision was associated with higher empowerment as well as trust in CHWs, and Feeling valued and capacitated in your work was associated with higher trust in CHWs.
Finally, for associations with variables suggestive of retention, in Mali, a higher score on the full motivation scale was inversely associated with frequently thinking of quitting, as were the Quality of supervision and Compensation and workload sub-dimensions, but there were no associations with number of years working as a CHW. In Bangladesh, there was a strong association between a higher score on the full motivation scale and number of years working as a CHW, but only for CHWs on the lower distribution of motivation.

DISCUSSION
We found that the new Multi-dimensional Motivation (MM) scale among CHWs was valid and reliable and we encourage its use to track and evaluate interventions to improve CHW motivation. While perhaps unconventional, we believe the two-step approach of assessing key dimensions of satisfaction, then weighting each by its importance, operationalizes the construct motivation in a pragmatic way. The scale demonstrated very good psychometric properties, as well as associations with other related variables, including indicators of performance and retention. The final four scale sub-dimensions align well with previous qualitative research with CHWs in other contexts as well as our own qualitative research [9,13,16].
The final scale sub-dimensions were similar to those we had hypothesized, particularly for Quality of supervision, Peer respect and support, and Compensation and workload. The Feeling valued and capacitated in your work sub-dimension includes a combination of items from three other originally hypothesized dimensions. In fact, recent multi-country qualitative research has shown that characteristics like feeling respected by the community, having agency in decision-making, and having the tools/resources to do your job often form a single theme around feeling valued [9].
Mean levels and distributions of motivation varied substantially by country, with the mean scores for the full scale and each sub-dimension being higher in Bangladesh than in Mali. This could owe to the more mature community health system in Bangladesh including a formal cadre of CHWs who are regularly compensated for their work. In comparison, the CHWs in Mali were contracted by an NGO, leading to uncertain job security and income (as noted in qualitative interviews) due to fluctuations in funding availability. While there were relatively high mean scores in Bangladesh, it does not appear to have a 'ceiling effect' -that is, the distribution is not too highly skewed towards high motivation -at least in Mali and Bangladesh, unlike another recent scale related to CHW motivation (that was administered in Ethiopia, Kenya, Malawi and Mozambique) [18]. The distribution was bimodal in Bangladesh, suggesting that there may be two distinct motivation experiences among CHWs. It remains unclear why there is a bimodal distribution; no logical underlying factors (eg, CHW cadre, years as a CHW, union, age, education) appeared to explain it. We recommend future implementers and researchers pay close attention to the distributions of the full scale and its sub-dimensions, as well as reasons for any non-normal distributions. The level of importance CHWs assigned to each dimension also differed between the two countries. In Mali, two sub-dimensions (Feeling valued and capacitated in your work and Peer respect and support) were seen by a majority as "important" vs "very important", whereas in Bangladesh nearly all respondents reported that all sub-dimensions were very important. It may be less critical to weight the satisfaction sub-dimensions by importance in contexts like Bangladesh where CHWs see all dimensions as similarly important (given that doing so would have a minimal effect on variation of the final score). However, we believe it is still useful to assess relative importance in surveys and recommend following the weighting/scoring protocol recommended in this paper for comparability across contexts.
We believe the scale is likely to perform well in other contexts. Item content and wording is intended to be applicable to any geographic context, cadre of CHWs, and health area. We also based scale sub-dimensions/ item content on research findings from multiple countries and types of CHW programs [9,[13][14][15][16]. While we evaluated the scale in only two countries, they are in two different regions of the world with distinct socio-cultural contexts and community health systems, and varied scopes of work for CHWs. Still, the scale will benefit greatly from continued evaluation in different contexts.
There are several implications findings for use of this newly validated scale in future research and community health systems strengthening efforts. The scale provides an in-depth understanding of motivation among CHWs, and can be used to explore differences by relevant subgroups as well as changes over time. Longitudinal research is needed to further understand the scale's ability to predict outcomes of interest, as well as changes in these outcomes based on an intervention. In any research around CHW motivation and performance, as well as intervention effects, it is critical to assess the influence of contextual factors such as health system policy and practice, safety and security, and socio-cultural factors, among others [28,29]. Our qualitative findings in Mali and Bangladesh suggested that both health system and societal-level factors fundamentally shape CHWs' work experiences and well-being.
The MM scale captures several of the components put forth by World Health Organization on the recent guidelines on health policy and system support to optimize CHW programs [2] and could be used to monitor global recommendations around the need for countries to document effects of community health systems strengthening strategies including CHW selection, pre-and in-service trainings, certification, renumeration, and career development. The MM scale could also complement studies on CHW job preferences such as discrete choice experiments [30,31]. The MM scale may be feasible to integrate into in national CHW surveys [32], special studies, and/or community health roadmap monitoring strategies [2,33].
This study has several limitations. First, we evaluated the new scale in relatively small samples in two countries. However, findings about item performance and factor structure, item performance, and scale performance were quite similar in the two countries, and were further reinforced by qualitative findings. Second, sampling of CHWs for participation in surveys was non-random, potentially leading to selection bias. Third, we conducted qualitative research in parallel with scale development and refinement, rather than prior to it as is preferable [23]. Fourth, due to the cross-sectional nature of survey data, convergent and predictive validity analyses cannot demonstrate causality; as noted previously longitudinal studies are needed. Finally, the generalizability of findings within the two countries, other countries and CHW cadres, or other health areas (eg, HIV/TB) is not assured. Confidence in applicability of the scale across varied contexts is increased for the reasons noted above.

CONCLUSION
The availability of multidimensional measures of motivation among CHWs is essential to community health systems research and community health services management. These measures should be valid and reliable, as well as pragmatic and applicable across contexts. Findings from this study show that the MM scale meets these criteria. We hope that this new scale, alongside others developed as part of the Frontline Health Project, will have a positive impact on supporting CHWs and strengthening community health systems worldwide.