Measuring the psychological drivers of participation in collective action to address violence against women in Mumbai, India

Background: A growing number of global health interventions involve community members in activism to prevent violence against women (VAW), but the psychological drivers of participation are presently ill-understood. We developed a new scale for measuring three proposed drivers of participation in collective action to address VAW in the context of urban informal settlements in Mumbai, India: perceived legitimacy, perceived efficacy, and collective action norms. Methods: We did a household survey of 1307 men, 1331 women, and 4 trans persons. We checked for 1) social desirability bias by comparing responses to self-administered and face-to-face interviews, 2) acquiescence bias by comparing responses to positive and negatively worded items on the same construct, 3) factor structure using confirmatory factor analysis, and 4) convergent validity by examining associations between construct scores and participation in groups to address VAW and intent to intervene in case of VAW. Results: Of the ten items, seven showed less than five percentage point difference in agreement rates between self-administered and face-to-face conditions. Correlations between opposite worded items on the same construct were negative (p<0.05), while correlations between similarly worded items were positive (p<0.001). A hierarchical factor structure showed adequate fit (Tucker-Lewis index, 0.919; root mean square error of approximation, 0.036; weighted root mean square residual, 1.949). Comparison of multi-group models across gender, education, caste, and marital status showed little evidence against measurement invariance. Perceived legitimacy, efficacy and collective action norms all predicted participation in groups to address VAW and intent to intervene in case of VAW, even after adjusting for social capital (p<0.05). Conclusion: This is the first study to operationalize a measure of the psychological drivers of participation in collective action to address VAW in a low- and middle-income context. Our novel scale may provide insight into modifiable beliefs and attitudes community mobilisation interventions can address to inspire activism in similar low-resource contexts.


Introduction
Worldwide, violence against women (VAW) is a critical public health problem with severe human, emotional, and economic costs 1 . One form of VAW, intimate partner violence, affects 30% of women at least once in their lifetime, and is an important cause of mental, physical, sexual, and reproductive harm 2 . International declarations including the United Nations Sustainable Development Goals and the Convention on the Elimination of all forms of Discrimination Against Women have committed national governments to eliminating VAW 3 . However, our understanding of appropriate policies for achieving this is evolving.
Community mobilisation interventions have long been of interest to policymakers and practitioners as a means of addressing challenging societal and environmental barriers to achieving health 4 . They can be defined as interventions in which local individuals collaborate with external agents in identifying, prioritising, and tackling the causes of ill-health based on principles of bottom-up leadership and empowerment 5 . For example, interventions in South Africa and Uganda have trained volunteer activists to take action against violence, engaged community groups in reflection and action over unequal gender norms, and organised large-scale campaigns and marches [6][7][8] .
A key problem for the delivery of community mobilisation interventions is the extent to which they are able to successfully engage community members in activism 9 . Given the risks associated with standing up to perpetrators of violence, community mobilisation interventions primarily seek to engage individuals in addressing VAW as part of coordinated efforts rather than as isolated actors [6][7][8] . Collective action -defined here as voluntary joint action by a group of people in pursuit of a shared goal 10 becomes a particularly apt construct for exploring activism. However, participation in collective action poses unique theoretical problems for research and practice, because socially related individuals making decisions together behave differently from single individuals making isolated decisions about whether to take action against violence 9 . Thus, collective action to address VAW overlaps with, but differs from the related concept of 'bystander intervention' 11 by emphasising intentional participation in a collective effort rather than ad hoc crisis response by individuals.
Promoting women's capacity for collective action to address VAW also contributes to global policy commitments to advance gender equality and women's empowerment 12 . Researchers have extensively studied individualistic conceptions of agency and empowerment using measures of household decision-making power, spousal bargaining power, and access to material resources [13][14][15][16] . However, women's collective agency and empowerment remain poorly understood, and few quantitative measures are available 17 . Group participation 18 and social network structure 19 have been used in the past, but such proxies preclude researchers from separating out capacity for collective action from participation in such action. Groups and associations are often dynamic social structures, which form, disband or evolve based on perceived need 20 , thus capacity for collective action may be just as important to track as actual participation.
Social scientists have long studied capacity for collective action for environmental and political causes 21 and proposed a range of psychological drivers of participation such action, which have yet to be widely applied to community mobilisation research in low-and middle-income countries. From a social psychology perspective, the main drivers are the perceived legitimacy of collective action, its perceived efficacy, and its relevance for community members' social identity 22 . From a sociologic and economic perspective, an important driver is the extent to which social norms reward or punish participation in collective action 23 . We wanted to draw on these theories to develop a new scale for measuring drivers of participation in collective action to address VAW in a low-resource context and so obtain information on community members' capacity for such action.
Our study was embedded in an ongoing cluster-randomized controlled trial of a complex community intervention to prevent violence against women in urban informal settlements (slums) in Mumbai, India 24 . The primary outcomes were the prevalence of physical or sexual domestic violence and the prevalence of emotional or economic domestic violence, control, or neglect, both in the preceding 12 months. Secondary outcomes included non-partner sexual violence. The community mobilization intervention engaged community organisers in convening groups of women, men, and adolescents over a three-year period to address VAW on a platform of counselling, therapy, and legal services. Our research question addressed the extent to which it was possible to measure the psychological drivers of collective action against VAW in the context of urban informal settlements in Mumbai. Figure 1 shows the overall theoretical framework for our measuring tool. We have discussed the general conceptual basis

Amendments from Version 1
We have added an assessment of measurement invariance across gender, education, caste, and marital status. We have added a table with indirect effects of social capital on behavioural outcomes mediated by perceived legitimacy, perceived efficacy, and collective action norms to clarify that social capital has positive indirect effects on behavioural outcomes, even if its direct effects are null or negative. We have added supplementary data to the Results section describing baseline levels of social capital in the local communities. We have expanded our results section on social desirability bias to comment on gender differences in social desirability. We have rewritten our introduction and theoretical framework to clarify points raised by reviewers concerning women's empowerment and the concept of legitimacy. We have rewritten our methods section to clarify questions regarding our sampling strategy. We have also further emphasised issues around respondent understanding of violence and missing data in the discussion section.
Any further responses from the reviewers can be found at the end of the article REVISED for applying collective action theory to community mobilisation elsewhere 9 . Specifically for our context, community activism to prevent VAW may involve social dilemmas in which community members have an individual interest in abstaining from costly activism to change entrenched patriarchal norms perpetuating violence and letting others contribute, but no benefit is produced if nobody participates. To overcome such dilemmas, community members may be motivated through beliefs in the intrinsic rightness of participation in activism 22 , beliefs that their own participation makes a difference 22 , or beliefs that external rewards (or punishments) will ensue from participation (or nonparticipation) 23 . To measure these beliefs, we examined the following constructs: Perceived legitimacy. This construct refers to the extent to which action against VAW is seen as justified. It aligns with a number of theories positing that perceived grievance, injustice, or deprivation 22 motivate collective action for social change, while perceived justification for the status quo demotivates collective action 25 . We divided the construct into three sub-constructs referring to respondent concern about VAW in general, acceptance of male power and control in the household, and beliefs about the acceptability of intervening in cases of VAW.
These three sub-constructs are thought of as contributing to respondent feelings regarding the intrinsic rightness (or wrongness) of action against VAW.
Perceived efficacy. This refers to the extent to which participation in collective action is seen as an effective approach to addressing VAW. This construct aligns with theories positing that individuals need to feel their participation is potentially impactful before they judge it worthwhile 22 . We divided it into three sub-constructs denoting respondents' perceived efficacy to achieve specific outcomes (e.g. stop violence or get the police to take action), perceived efficacy of specific interventions (e.g. group discussions or marches and rallies), and the perceived contribution of their own participation.
Collective action norms refer to the extent to which community members expect others to approve or disapprove of them taking action to address VAW. This construct aligns with a tradition proposing that social norms imposing rewards or penalties for participation in collective action affect the ability of collectives to maintain high levels of participation 23 . We divided it into two sub-constructs referencing respondents' perceptions about the reaction of family and community members to their participation in action.

Setting
The NGO Society for Nutrition, Education and Health Action (SNEHA) runs a program on primary, secondary, and tertiary prevention of violence against women and children in Mumbai, India 26 . The main beneficiaries of the program are residents of informal settlements, constituting 41% of Mumbai households 27 . These are characterized by overcrowding, insubstantial housing, insufficient water and sanitation, lack of tenure, and hazardous location 28 . Primary prevention is addressed through a combination of community group activities and resulting individual voluntarism. Secondary prevention includes local crisis response and psychological first aid by community organisers and referral to centres which provide counselling, legal, and psychotherapeutic support, with links to the police and medical, shelter, and social service providers. Tertiary prevention is provided primarily through referral to psychiatric and legal services.

Indicator selection
We conducted a literature review of social movements, collective action and community mobilisation to inform choice of indicators 4,9 . We did not conduct a formal expert review. Items were selected and adapted from existing surveys where possible. We selected interview questions to ensure that different aspects of each theoretical construct were captured and each indicator had local relevance (Extended data, Supplementary Table 1 29 lists survey items for main and complementary measures in full). We selected indicators of perceived legitimacy from the Australian National Community Attitudes towards Violence Against Women Survey (ANCAVAWS) of 2009 30 . We measured perceived efficacy by adapting existing indicators of collective efficacy in community mobilisation research 31 and adding indicators from ANCAVAWS 30 . We created our own items to measure collective action norms as no relevant existing measures were found, asking respondents what their family and community would think of them joining in activities to prevent VAW.

Complementary measures
Community social capital. We selected indicators of social capital from the World Bank Social Capital Assessment Tool 32 . Items represented a broad set of aspects of social capital including social networks, social cohesion, trust, cooperation, and altruism 33 . The items asked if respondents knew their neighbours, trusted them, cooperated with them and could rely on them in emergencies. Following previous analyses of cluster-level social constructs 34,35 , we used multilevel factor analysis to generate estimates of factor scores 36 . We modelled item values as arising from an individual's perception of social capital using a 1-factor model (ordinal α = 0.855). These individual perceptions were aggregated into a measure of community social capital using a 1-factor model at the cluster level.
Participation in groups to address VAW. We adapted prior indicators of participation in groups to address specific issues in community mobilisation research 31 . We first asked respondents whether they had participated in large-scale marches, rallies and protests, meetings organised by local community-based groups, or meetings of a non-governmental organisation in the past year. We then asked whether any of the community meetings or mass gatherings they had attended addressed VAW. If this was the case, we considered the respondent to have participated in a group to address VAW.
Intent to intervene in cases of VAW. We selected two indicators from the ANCAVAWS 30 . The indicators asked how respondents would react if they were present when a woman was being physically assaulted by her partner. The first indicator asked how they would react if the woman was a stranger, the second if she was a family member or close friend. If the respondent indicated they would physically intervene or "say or do something else to try to help", we classified them as intending to intervene in case of VAW.

Piloting
We conducted iterative rounds of testing and modification of survey questions.
LG developed survey items and ND and DO reviewed them. Qualitative researchers with extensive ethnographic experience in the same context also reviewed questions and translated them into Hindi and Marathi. LG conducted three unstructured group discussions with 14 interviewers about their understanding of the questions. LG observed 20 pilot interviews with local women and men, asking respondents clarifying questions where needed. These were not formal cognitive interviews 37 , but organic interactions emerging from respondents providing short anecdotes or observations in response to survey items. If respondents only answered "yes" or "no" to our questions, LG probed for the reason behind their answer to stimulate discussion. Interviews were conducted face-to-face using smartphones running the CommCare application. At the end of this process, questions appeared to be well understood by respondents and could be asked within 45 minutes.
We were concerned about potential social desirability bias 38 . Our survey involved respondents self-reporting motivation to take action against VAW to interviewers whom they knew came from an organization dedicated to eliminating VAW. Respondents might have felt social pressure to provide pleasing answers. To check for this, we designed a system that allowed respondents to self-administer survey questions: the interviewer would hand their smartphone to the respondent and ask them to press simple graphic icons to choose their answer without showing it to the interviewer, who would in turn read the questions aloud. The smartphone application chose 1 in 7 respondents randomly to receive this type of interview. Given its logistically onerous nature, we only tested it on questions about perceived efficacy to achieve specific outcomes, perceived efficacy of specific interventions and collective action norms.
We were also concerned about acquiescence bias 38 . Respondents might feel pressured to agree to items regardless of content to avoid saying 'no' to the interviewer. Respondents might also agree with items without trying to properly understand them to finish the interview faster. To check for this, we ensured the survey contained both positively and negatively worded questions. If respondents agreed with questions without considering their meaning, they would agree to everything, including survey items making opposite statements. We tested this on questions about collective action norms.

Data collection
Between December 2017 and December 2019, we carried out a baseline survey of community attitudes to VAW in households across 54 informal settlement clusters. Each cluster contained about 500 households. Clusters were in four large urban informal settlements, chosen for their vulnerability, low risk of rehabilitation, low coverage by organisations working to address VAW, and low proportion of rental tenancies. From a random starting point in each cluster, 16 investigators selected 25 women and 25 men aged 18 to 65 years -a single interviewee per household -by visiting households sequentially. We thus obtained approximately 50 interviews per cluster. Participants were enrolled in person. Inclusion criteria were that respondents should fall into these age groups and should provide signed consent.
The initial baseline survey comprised questions on attitudes to gender roles, gender equality, VAW, and bystander intervention, as described in our protocol 24 . Questions on action to address VAW were added later, resulting in 92 respondents missing data on these questions. After dropping these (3%), the final sample size was 2642, of whom 1307 were cis men, 1331 cis women, and 4 trans women. Although there is currently no consensus method for determining sample sizes for scale validation 39 , our sample size far exceeds the recommended minimum acceptable thresholds for factor analysis of 300 participants by Comrey and Lee 40 and 20 per survey item by Kline 41 (given we have 27 survey items).
We also randomised a calendared subgroup of 1899 respondents to receive either the self-administered or the face-to-face interview from June 2018 to December 2019. In total, 247 received the self-administered survey (13%) and 1652 received the face-toface interview (87%). Interviews were conducted after provision of participant information sheets and signed consent. There was no requirement that the interview be private as the questions were not deemed sufficiently sensitive to put people at risk for answering them.
Data analysis Item validity. We investigated item validity by checking for acquiescence and social desirability bias. To check for acquiescence bias, we compared responses to positively and negatively worded items on collective action norms using tetrachoric correlation 42 . We chose tetrachoric correlation as items were binary. A well-performing scale would show negative correlation between positively and negatively worded items for a construct.
To check for social desirability bias, we compared answers to self-administered and interviewer-administered questions using Pearson chi-squared tests. If bias was absent, we would see little difference. We further checked for differential impact of self-administered interviews on response patterns by gender. We ran separate logistic regression models for each item against administration mode interacted with gender using robust variance estimators to account for clustering.

Construct validity.
We investigated construct validity 43 using categorical confirmatory factor analysis, comparing four different factor structures in order of decreasing model restrictiveness: 1. A unidimensional model relating all items to a single factor.

2.
A three-dimensional model relating items directly to the three main constructs: perceived legitimacy, perceived efficacy and collective action norms.
3. A hierarchical model relating items to eight first-order factors representing the eight sub-constructs from our theoretical framework (see Figure 1). These first-order factors loaded onto three second-order factors representing our three main constructs.
4. An eight-dimensional model relating items to eight first-order factors as in Model 4, but without the three second-order factors present.
We used the Tucker-Lewis index (TLI) and the root mean square error of approximation (RMSEA) to do this 44 . For the TLI, a good fit was indicated by a value greater than 0.95, a poor fit by a value less than 0.90, and an adequate fit by a value in between. For the RMSEA, a good fit was indicated by a value less than 0.06, a poor fit by a value greater than 0.08, and an adequate fit by a value in between 44 . We also computed weighted root mean square residual (WRMR) for which a good fit is usually indicated by a value less than 1.0 44 . However, the cut-off value of 1.0 is known to be overly sensitive to minor model deviations for sample sizes above 1,000, so WRMR is considered 'experimental' 45 .
We assessed internal consistency using ordinal α, a modified version of Cronbach's α for ordinal data 46 . We did not assess test-retest reliability. Past experience in our context -namely vulnerable, low-literacy populations living in informal settlements in Mumbai -has shown that returning to re-interview respondents can create problems, as respondents believe their anonymity has been breached by us being able to track them down for a re-interview.
Finally, we assessed measurement invariance across gender, education, caste, and marital status by comparing fit statistics (TLI, RMSEA, and WRMR) for models of configural invariance, invariance of first-order factor loadings, invariance of secondorder factor loadings, and invariance of both 47 . For the purpose of this comparison, we used binary codes for education (any versus no education), caste (Open/General versus other caste groups), and marital status (currently married versus not currently married). We did not conduct χ2 tests as these are known to be overly dependent on sample size, which creates oversensitivity to small deviations from H 0 in large samples and spurious sensitivity to differences in sample size between comparison groups in measurement invariance tests 48 .
Criterion validity. We examined criterion validity 43 by checking for convergent validity. We calculated empirical Bayes estimates for the each construct in our preferred model from the prior factor analysis 49 . We fitted separate generalized structural equation models for each factor with paths from social capital to the factor, from the factor to a behaviour-related outcome, and from social capital directly to the same outcome. We examined three outcomes: participation in groups to address VAW, intent to intervene in case of violence against a stranger, and intent to intervene in case of violence against a family member. We adjusted for clustering using robust standard errors. We modelled all outcomes as binary responses linked to predictors via a logit link. By checking whether each factor was associated with each outcome, even after adjusting for social capital, we obtained evidence for convergent validity. In case our preferred model was a hierarchical model, we fitted generalized structural equation models for 2 nd -order factors, but only logistic regression models for 1 st -order factors, adjusting for social capital; in such a case, associations with 2 nd -order factors were our primary interest.

Missing data
In total, 30% of respondents did not know the answer to at least one question on collective action. These respondents were slightly more likely to be younger, unmarried, Muslim, of non-scheduled caste, uneducated, and unemployed, although chance could only be ruled out for age, caste, and educational differences (p<0.05; see Extended data, Supplementary Table 2 29 ). In 86% of these cases, respondents were able to respond to at least 24 out of 27 questions and the proportion of "don't know" answers never exceeded 8% for any individual item.
We therefore used complete-case analysis for item validity. To correct for bias in assessing criterion validity, we imputed factor scores in Empirical Bayes estimates in which items on collective action to address VAW were missing. We used weighted least squares estimation under a missing at random conditional on observables assumption 50 , modelling factor scores as dependent on age, marital status, religion, caste, education and employment.  Table 1 shows the demographic profile of the sample. Most respondents were 25-44 years' old and married. Male respondents were more likely to be unmarried than female respondents. The majority of residents identified as Hindu or Muslim and belonged to a general or scheduled caste. In total, 43% of women and 32% of men did not have a high-school education, while 78% of women and only 24% of men had no employment. Employed women were substantially more likely than men to do home-based piecework. De-identified, individual-level results are available as Underlying data 29 .

Descriptive data
We found high levels of social cohesion in our communities, as 81-93% of respondents reported feeling at home in their neighbourhood or getting along with neighbours (see Extended data, Supplementary Table 3 29 ). 71-91% agreed that people came together to keep the neighbourhood free from crime or solve issues such as disruptions to the water supply. However, levels of trust were lower, as 58-61% stated that their neighbours could not be trusted or only looked out for themselves. 56% of women and 46% of men stated they did not recognise most people in their neighbourhood (p<0.001 for a gender difference).

Item validity
Social desirability bias. Table 2 shows item responses on the constructs of perceived efficacy to achieve specific outcomes, perceived efficacy of specific interventions, and collective action norms where questions were either selfadministered or entered by the interviewer. Due to our large sample size, we found statistically significant differences for some items, even if these seem small. For example, the proportion of respondents disagreeing with the item "together you can persuade families to support women facing domestic violence" only rose from 2% to 5% in the self-administered condition (p<0.001). Of the ten items, seven showed less than five percentage points difference in the proportion of respondents agreeing with the item between self-and interviewer-administered conditions.
However, the proportion agreeing that "your family members consider activities to stop VAW opposed to their own values" rose from 22% in the face-to-face to 34% in the selfadministered condition (p=0.002). The proportion agreeing that "you would be embarrassed to say in public that you work to prevent VAW" rose from 4% to 11% (p<0.001), while the proportion disagreeing that sit-ins, strikes, and blockades are effective in preventing VAW fell from 64% to 54% (p<0.001). These items might have been particularly sensitive to social desirability bias.
The proportion of respondents providing "don't know" answers generally increased by 1-4 percentage points the self-administered survey condition (p<0.05). However, for the item "People in your neighbourhood approve of you joining activities to stop violence against women", the proportion of "don't know" answers increased by fully 6 pp (p=0.005). With respect to gender differences in social desirability, we found no evidence for an interaction with gender in all indicators (p>0.05) except for the item "Your family members approve of you joining activities to stop violence against women" (p=0.008), where the self-administered interview had a clear effect on men's odds of agreeing (OR 0.51, p=0.037, 95% CI 0.27-0.97), but not women's odds (OR 1.36, p=0.20, 95% 0.85-2.19).
Acquiescence bias. Table 3 shows pairwise tetrachoric correlations of items for collective action norms. For all items except   one, we found high negative correlations between items of opposite polarity within the same sub-construct, ranging from -0.75 to -0.63. For example, the correlation between the item "people in your neighbourhood approve of you joining activities to stop VAW" and "people in your neighbourhood would mock you for joining activities to stop VAW" was -0.63. The correlation between the item "you would be embarrassed to say in public that you work to prevent VAW" and the item "people in your neighbourhood approve of you joining activities to stop VAW" was only -0.12. However, it was still negative with sufficient evidence to reject a null hypothesis of zero correlation (p=0.016).
Except for one item, correlations between items of the same polarity within the same sub-construct were also high, ranging from 0.81 to 0.83. Correlations between items across sub-constructs were smaller in magnitude, ranging from -0.34 to 0.55. The correlation between the item "you would be embarrassed to say in public that you work to prevent VAW" and the item "people in your neighbourhood would mock you for joining activities to stop VAW" was only 0.29. However, it was still positive with strong evidence to reject a null hypothesis of zero correlation (p<0.001). These results suggest that, overall, respondents were not simply agreeing with all survey items regardless of their content. Table 4 shows the results of confirmatory factor analysis, which indicated a poor fit for the unidimensional and three-factor models (TLI<0.9, RMSEA>0.05, WRMR>1) and an adequate fit for the hierarchical and eight-factor models (TLI>0.9, RMSEA<0.05, WRMR>1). There was little statistical reason to favour one of the two latter models. The TLI and RMSEA for both were nearly identical, although the WRMR for the eightfactor morel was slightly better than that of the hierarchical model (1.949 vs. 1.837). We chose the hierarchical model to assess criterion validity, as it exhibited greater parsimony in the number of model parameters and was more consistent with our theoretical framework. Figure 2 shows the factor loadings and correlations from the hierarchical model. All loadings and correlations were highly statistically significant (p<0.001). All except one were positive and negative in expected directions. For example, the loading on 'if a man mistreats his wife, then others should intervene' was positive (0.335), while all loadings on all other items for the same sub-construct were negative (≤-0.410). This made sense as the other items expressed the opposite attitude, that it was inappropriate to intervene in cases of violence. However, the sub-construct 'concern for VAW' loaded weakly on its parent construct 'perceived legitimacy' (-0.095) compared to subconstructs 'acceptability of male power and control' (-0.866) and 'acceptability of intervention in cases of VAW' (0.905). This indicates, 'concern for VAW' is better considered as falling into a separate class of construct of its own, as opposed to sharing a family resemblance to the other two sub-constructs. Table 5 shows ordinal alphas for main and sub-constructs. We found generally high levels of internal consistency for both main and sub-constructs considering the small number of items per construct. Ordinal alphas for collective action norms (0.874) and perceived efficacy (0.831) were high, even if sub-constructs community norms (0.574) and personal efficacy (0.658) had moderately low scores. The legitimacy domain had a moderately low score (0.694), as did sub-constructs acceptability of intervening in violence (0.499) and general concern over VAW (0.662).

Construct validity
We did not find strong evidence against measurement invariance (Table 6). Comparing fit across models accounting for gender, education, caste, and marital status, both WRMR and TLI decreased slightly as constraints on factor loadings were lifted, while RMSEA stayed relatively constant. The decrease in WRMR indicates better fit for models allowing for heterogeneity across groups, but the decrease in TLI indicates a worse fit, possibly due to lower parsimony in the unconstrained models. Given these results, we did not see a need for separate measurement models by gender or demographic group.

Criterion validity
We found good evidence that perceived legitimacy, perceived efficacy, and collective action norms related to outcomes, even after adjusting for community social capital ( Figure 3). For each standard deviation increase in perceived legitimacy, odds of participating in a group to address VAW increased 24% (p=0.001, 95% CI 10-41%). For perceived efficacy, odds increased 68% (p<0.001, 40-102%). For collective action norms, odds increased 55% (p<0.001, 30-85%). All three constructs were associated with intent to intervene in case of Figure 2. Factor loadings and correlations for higher-order model of drivers of collective action. All factor loadings and correlations are statistically significant at p<0.001. The three higher-order constructs collective action norms, perceived efficacy, and perceived legitimacy have been standardised to mean 0 and standard deviation 1. VAW, violence against women. VAW (p<0.05), with stronger associations for intervening on behalf of a close friend or family member (29-144% increase in odds) than on behalf of a stranger (19-74%).
Social capital was itself positively associated with perceived legitimacy, perceived efficacy, and collective action norms (p<0.001). There was insufficient evidence that social capital was directly associated with participation in groups to address VAW and intent to intervene on behalf of a family member (p>0.05) after adjusting for mediators. Social capital itself was directly negatively associated with intent to intervene on behalf of a stranger, showing a 27-30% reduction in odds of intervening with one standard deviation increase in social capital. However, the indirect effect of social capital on outcomes through perceived legitimacy, perceived efficacy, and collective action norms was positive in all cases (Table 7). Overall, these results suggest that our three constructs did not simply predict outcomes due to their association with social capital. Table 8 shows associations between individual sub-constructs and our three outcomes. Point estimates showed positive associations with action to address VAW for all sub-constructs except acceptability of male power and control, which showed a negative association; this was consistent with a priori theoretical expectations. We found strong evidence that all eight sub-constructs were associated with participation in groups to address VAW (p<0.005). For all sub-constructs except two, we found evidence for an association with intent to intervene in VAW on behalf of a stranger (p<0.005) and a family member (p<0.05). However, we found no evidence for perceived efficacy of specific interventions being associated with intent to intervene in VAW on behalf of a stranger (p=0.102) and for general concern over VAW being associated with intent to intervene in VAW on behalf of either a stranger (p=0.108) or a family member (p=0.334).
Overall, this suggests the predictive value of our main three constructs, perceived legitimacy, perceived efficacy, and   collective action norms, does not simply derive from a single sub-construct.

Discussion
To our knowledge, this is the first study to operationalize a measure of the psychological drivers of participation in collective action to address VAW in a low-and middle-income country context. Previous studies of participation in activism against VAW have addressed demographic correlates, but have not measured psychological drivers 51 . We developed our tool on the basis of a literature review of theories of collective action in social psychology, economics, and political science 9 . Testing the tool on household survey data collected in urban informal settlements in Mumbai, we found evidence for good item, construct, and criterion validity. Generalised structural equation models showed that our main three hypothesized constructs predicted both intent to intervene in cases of VAW and participation in groups to address VAW, as did almost all of their sub-constructs. Overall, we believe there is sufficient evidence to assert that our scale can provide useful insight into the drivers of collective action to address VAW in our context.
Confirmatory factor analysis revealed an adequate fit of a hierarchical factor structure, in which individual items loaded on first-order factors which themselves loaded on second-order factors representing our three main constructs. However, the subconstruct 'concern for VAW' loaded weakly on parent construct perceived legitimacy, had a low internal consistency (ordinal α = 0.662) and was not associated with intent to intervene in case of VAW (p>0.1). It may be that this sub-construct was poorly captured by generic questions on the prevalence and severity of VAW in the respondent's community. It may also be that abstract concerns over VAW bear little relationship to actual willingness to take action in concrete situations. Social movement researchers have long posited that at any given moment in time there are simply too many different potential causes for an individual to care about for the mere concern with an issue to trigger action 52,53 . Future versions of this scale might benefit from measuring alternatives to 'concern with VAW'.
Our analyses found perceived efficacy and collective action norms were more strongly associated with participation in collective action to address VAW than perceived legitimacy. We emphasize that the primary purpose of this paper was to validate a new measure of possible psychological drivers of collective action, rather provide causal evidence for their role in stimulating action to address VAW. Causality cannot be assumed from our associational analyses due to risks of confounding and reverse causality. Nonetheless, our results provide clues that community mobilisers might benefit from expanding beyond a pure focus on persuading residents of the wrongness of VAW towards engaging with their efficacy and normative beliefs. We also found larger impacts on intent to intervene on behalf of family compared to strangers, indicating it is easier for community mobilisers to encourage action on behalf of family members compared to strangers. In a context in which extended family members often act as perpetrators of violence rather than supporters of victims 54 , violence prevention programmes might need to emphasise action to support non-family members rather than provide generic calls to action. These findings show the utility of our scale, although further research is required to fully disentangle these complex relationships.
We found no evidence for social capital being positively associated with participation in collective action after adjusting for psychological drivers. Past evidence paints a mixed picture of the role of general social capital in preventing domestic violence, as studies have found it variously beneficial 55 , harmful 56 , or neutral 57 . Although community-level social capital may increase access to support networks for women, it may also empower male community members to police women's use of such networks 58 . Social norms disapproving of violence may also be required to translate social capital into action: a trial of a violence prevention programme in Uganda found that social capital was only associated with bystander intervention in intervention areas, not control areas 51 . We even found that social capital was negatively associated with intent to intervene in case of VAW against a stranger. This echoes literature on the 'dark side of social capital' 59 , which suggests that tightly connected social networks can be detrimental to the health of perceived outsiders by excluding them from the support of network insiders. However, further research is required to unpack the relationship between social capital and VAW.
Surprisingly, we found little evidence for social desirability bias as most items showed little difference in agreement rates between self-administered and face-to-face conditions. Two items that showed more than a five percentage point difference concerned the views of family and neighbours: "your family members consider activities to stop violence against women opposed to their own values" and "you would be embarrassed to say in public that you work to prevent VAW." As we did not conduct our interviews in private, these differences may reflect respondents feeling better able to voice their opinion when hiding it from their neighbours and family members, rather than from the interviewer. Such biases could be overcome in future surveys by ensuring privacy for the respondent. The third item concerning the effectiveness of sit-ins, strikes, and blockades in stopping VAW might have been interpreted as expressing support for such strategies. Such support might have been controversial to express given the long history of violent clashes between police and residents over forced demolitions of people's homes in Mumbai's informal settlements 60 .
It is possible that the lack of difference between self-and interviewer-administered formats stemmed from respondents feeling insufficiently reassured by the self-administered interview to voice their true opinions. Some respondents may have had difficulty navigating the mobile phone technology on their own, as the proportion of "don't know" answers increased in the self-administered condition. However, this may also reflect respondents being more comfortable voicing genuine ignorance in self-administered interviews than face-to-face interviews.
There is no perfect way of measuring social desirability. Methods involving list randomization, randomized responses, or bogus pipelines 61 are too burdensome for respondents to work in low-literacy, large-N survey settings. Scales for measuring social desirability 62 require a leap of faith that biases exhibited on generic trait scales carry over into response patterns for the target construct. Our own manipulation reassured respondents enough to cause an 12-point shift in agreement rates for one item, while the direction of change in other items was generally consistent with respondents feeling free to express less positive attitudes about violence prevention in the self-administered condition. To the best of our knowledge using feasible methods of measuring social desirability bias, we do not have reason to suspect strong hidden bias.
Nonetheless, our scale has limitations. We tried to measure social identity, which refers to the extent to which community members feel a sense of shared group membership with others in their reference group 22 . We wanted to measure politicized collective identity as being an 'activist' 63 , since the category of 'women' as a whole had been criticized for being too large, vague, and internally divided to constitute an effective identity for feminist activism 64 . However, prior measures of activist identities for gender equality have asked respondents to self-identify as 'feminists' 65 , a term that was poorly understood in our setting. Items asking respondents if respondents thought themselves 'similar to' 66 activists trying to stop VAW were taken too literally and elicited the response that it would be impossible to know for sure as they had never met such people in person. Similarly, questions about whether respondents 'had a bond with', 'felt connected to', or 'felt strong ties with' 67 such activists elicited the response that they had never met such people, so how could they have ties with them? Asking people if they considered themselves part of the 'women's movement' was interpreted to mean participation in protest, as the term 'movement' (andolan) primarily signified mass protest. Questions asking respondents if they saw themselves as 'the kind of person' 68 who would take action against VAW ended up simply reflecting whether they in fact had taken such action. In the end, we decided not to measure this construct, but we cannot rule out the possibility that future researchers might discover creative ways of capturing this construct.
Our scale also relied on asking people whether they were willing to engage in activism to address 'violence against women' (mahila ke khilaaf hinsa) or 'domestic violence' (gharelu hinsa).
The World Health Organization and the Demographic Health Surveys recommend avoiding the term 'violence' wherever possible, as there is considerable individual variation in respondent interpretation of the term, including a tendency to consider only extreme practices (e.g. beating or choking) forms of violence, whilst ignoring 'milder' practices (e.g. slapping) 69,70 . However, in a survey setting, it would be unreasonably unwieldy to ask separate questions for attitudes to action against slapping, attitudes to action against kicking, attitudes to action against forced sex, etc. Global health researchers thus universally invoke the generic term 'violence' in questions on participation in action to address VAW 51,71 , as do researchers on bystander intervention 72 . However, this practice does cause confusion: respondents sometimes asked during interviews what the word 'violence' meant, which required interviewers to clarify. We piloted alternatives for the word 'violence', but these created even more confusion -'forcing/coercing' someone (zabardasti karna) has the unfortunate alternate sense of 'insisting on something', while the word 'force'/ bal does not have an intrinsic negative valence like the word 'violence' and may even have positive connotations of 'strength' and 'power'. In future research, there may be merit in exploring more blunt phrases such as 'action to stop husbands beating their wives' as proxies for 'action to address violence'. During piloting, respondents said it was hard to distinguish collective action against different forms of VAW, as physical, sexual, and emotional violence tended to occur together for survivors of violence.
Finally, our study was limited by missing data, as 30% of respondents lacked data for at least one indicator. However, at most 8% of respondents did not know the answer to any single indicator. As stated earlier, we used complete case analysis for item validity on an item-by-item basis, while we used imputation to create an index of items to assess criterion validity. As respondents with missing data were more likely to be poorly educated, this may have upwardly biased assessments of scale validity.

Conclusion
We present a new scale for measuring the psychological drivers of collective action to prevent VAW, developed in the context of a community mobilisation programme in urban India. Our scale may offer fresh clues to modifiable beliefs and attitudes that global health interventions can address to maximally inspire activism. Discovering clues is highly relevant for a policy landscape in which participatory approaches to gender equality and health are rapidly gaining momentum 73 . We invite researchers and practitioners to adapt and test our scale in their own contexts in order to advance our knowledge of pathways to activism.    Thank you for the opportunity to review this interesting paper by Gram and colleagues. The paper attempts to fill out a critical gap in the violence prevention literature -the psychosocial determinants of community voluntary participation in violence prevention efforts. The study was conducted in an LMIC, which is novel as a study context. The authors used a reasonable theoretical framework; original data; an adequately sized sample; appropriate analytical strategies. I'll keep my comments to a selected few. It seems to be that the authors relied on a value expectancy model like health belief model in formulating their hypothesis of the factors helping community members to If this correct? If yes, the authors might want to be explicit about their use of a value expectancy model, and make a case for why a value expectancy model would apply to frame factors affecting motivation among the study population (i.e., namely vulnerable, low-literacy populations living in informal settlements in Mumbai). If not, the authors might want to use a framework to inform their hypothesis and use of three constructs.

1.
Related to # 1, I could also argue that community members may be motivated through beliefs in the legitimacy of the external agents. Community members may consider external agents as legitimate authority figures to carry out violence prevention efforts if the agents: provide them with material incentives: have social ties with the political elites, or there are normative appeals for the external agents in the community (i.e., the agents represent a virtue [e.g., education, nationality, etc.] deemed desirable or valued in the 2. community).
The authors define Perceived legitimacy as "This construct refers to the extent to which action against VAW is seen as a legitimate endeavour." This definition seems tautological to me (defining legitimacy with legitimacy), unless I am missing something obvious here. Also, lacking an adequate conceptualization, the basis of dividing perceived autonomy into three sub-constructs appears to be a bit ad hoc. Also, I would think that "General concern about VAW" would be a predictor of Perceived Legitimacy, and not a dimension or a sub-construct. I think I am struggling with perceived legitimacy as a construct and its relation with the subconstructs.

3.
The authors might want to disaggregate their main analysis by gender. Since the practice world seems to have transitioned from targeting women to targeting men to targeting both men and women for violence prevention interventions, understanding whether and to what extent men and women are different in intention to participate in community activism might be helpful for programmatic purposes.

4.
I was curious as to why the authors did not control for the demographic correlates of participation in collective action to address VAW. If the objective of this study is to inform the practice world about the ways of mobilizing masses, understanding the contribution of psychometric factors relative to that of demographic factors would have been most useful, I think. Since the authors are likely to have access to the data on demographic correlates, why not rerun the model by adjusting for demographic correlates.

5.
Perhaps I missed it, I wonder about the goodness of fit statistics for the models in Table 6. 6.
I don't know where #7 fits, or if this fits for this analysis at all, but I could not help but wondering about each of perceived legitimacy, perceived efficacy, and perceived norms being context-dependent (and therefore not a random occurrence!). We tend to formulate a perception or an opinion by applying a deliberate system of thinking (i.e., rational choice) or an automatic system of thinking (i.e., mental models, learned behavior). Given the study population, who allegedly may have limited access to information or knowledge about VAW, its likely their perceptions have been informed by community perceptions or by the perception of individuals of influence in the community. Do the authors agree with this argument? If yes, does it make sense to fit multilevel models as opposed to individual level models in Table 6. Also, I think the authors have done it, but at a minimum the error terms should be adjusted for community level clustering.

7.
Was this by design that the authors sampled 500 households from 54 informal settlement clusters. How many households are there per cluster on the average? What was the sampling strategy-simple probably or proportional probably sampling? If simple, there are potentially 9-10 households from some clusters. If yes, calculation of standard errors needs to be adjusted for unequal number of households per cluster or potentially small number of households in some clusters. Minor point, but perhaps worth exploring, especially since the authors have adjusted for social capital levels at the cluster level.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 23 Jun 2020 Lu Gram, Institute for Global Health, London, UK Thank you for your thorough critique and interesting, novel insights. We have revised the manuscript in accordance with your suggestions, wherever possible.

It seems to be that the authors relied on a value expectancy model like health belief model in formulating their hypothesis of the factors helping community members to If this correct? If yes, the authors might want to be explicit about their use of a value expectancy model, and make a case for why a value expectancy model would apply to frame factors affecting motivation among the study population (i.e., namely vulnerable, low-literacy populations living in informal settlements in Mumbai). If not, the authors might want to use a framework to inform their hypothesis and use of three constructs.
This is an interesting comment. We appreciate the similarity between our conceptualisation and frameworks such as the Health Belief Model or the Theory of Planned Behaviour. The literature on collective action is vast and each of the constructs, legitimacy, efficacy, and norms are independently associated with a sprawling body of theoretical literature [3,7]. For example, an enduring debate exists over whether such constructs represent conscious deliberation, heuristics, periphrastic shorthands for causes of behaviour, equilibrium outcomes of individual learning, equilibrium outcomes of cultural evolution, etc. [8][9][10].
Given restrictions on the level of detail we can realistically obtain through surveys in LMIC settings, we have abstracted away from these considerations in our framework. These distinctions may also matter ultimately less for programmatic purposes: if perceived legitimacy, efficacy and norms affect participation in collective action and intervention planners find low levels of these at baseline, it makes sense for them to address it regardless of the precise interpretation of this relationship in terms of the above theories. We have referenced a past paper discussing overarching theory and are hesitant to overcomplicate the current paper by adding further theory into the introduction [1]. Thank you for this comment. Our measure mainly seeks to capture the legitimacy of action against VAW as opposed to the legitimacy of external organisations working in this field. Poor, vulnerable women face overwhelming pressure to say to local service providers, NGOs and government agents that they are legitimate actors. As such, perceived legitimacy of organisations is unlikely to be captured accurately through a survey, even were we to use self-administered surveys to ask about this.

Related to # 1, I could also argue that community members may be motivated through beliefs
Perceived legitimacy of organisations also most likely increases participation in collective action through its effect on perceived legitimacy of action, perceived efficacy of action, and collective action norms. It is of course possible that community members will organise collective action to prevent VAW simply because they are told to do so by a reputable organisation, even if they personally believe such action is unjust, ineffective and embarrassing to do.
However, that is rarely how community mobilisation interventions operate, as they encourage behaviour change through participation rather than coercion. In the field of VAW prevention in particular, there are strong ethical objections to using material or political incentives to promote action against VAW -rewarding community members for identifying, referring or supporting survivors of violence in monetary terms or through political access would create perverse incentives for community members to 'find' more cases of violence.
For all the above reasons, we opted not to measure legitimacy of organisations in our scale.

The authors define Perceived legitimacy as "This construct refers to the extent to which action
against VAW is seen as a legitimate endeavour." This definition seems tautological to me (defining legitimacy with legitimacy), unless I am missing something obvious here. Also, lacking an adequate conceptualization, the basis of dividing perceived autonomy into three sub-constructs appears to be a bit ad hoc. Also, I would think that "General concern about VAW" would be a predictor of Perceived Legitimacy, and not a dimension or a sub-construct. I think I am struggling with perceived legitimacy as a construct and its relation with the sub-constructs.
We have defined perceived legitimacy similarly to how we defined perceived efficacy as the extent to which action was seen as effective. We conceptualised as primarily related to our statement about community members being motivated (or demotivated) by the intrinsic rightness (or wrongness) of participation in collective action: To overcome such dilemmas, community members may be motivated through beliefs in the intrinsic rightness of participation in activism 13 … (from Section: Theoretical Framework) As such, participation in collective action to address VAW can be seen as more justified, the more community members are concerned about VAW, believe men have no right to control and dominate women, and feel interventions in cases of VAW do not violate privacy norms. We have now revised the paragraph to clarify this.
Perceived legitimacy. This construct refers to the extent to which action against VAW is seen as justified. It aligns with a number of theories positing that perceived grievance, injustice, or deprivation 13 motivate collective action for social change, while perceived justification for the status quo demotivates collective action 16 . We divided the construct into three subconstructs referring to respondent concern about VAW in general, acceptance of male power and control in the household, and beliefs about the acceptability of intervening in cases of VAW. These three sub-constructs are thought of as contributing to respondent feelings regarding the intrinsic rightness (or wrongness) of action against VAW.
(from Section: Theoretical Framework) The division into three sub-constructs was based on feminist literature, which ascribes a major role to norms of privacy and male domination over women in perpetuating VAW [11]. The construct, "general concern about VAW," was informed by past conceptual models of community mobilisation which have included a dimension of "shared concern" with a particular issue [12]. For policy purposes, we thought it was relatively clear that differences in the influence of privacy norms, beliefs about male domination and control, and general concern for VAW had distinct programmatic implications.
However, we agree that further theoretical work may clarify the structure of our 'legitimacy' construct. We wanted to avoid an extended discussion on its exact meaning given the heterogeneous literature associated with it. Longer and precise definitions of unobservable constructs might be less ambiguous, but complete clarity can never be attained [13]. Highly specific questions about big concepts may produce more precise, but less valid, measurement, as the questions become over-specific [14]. As such, we have not significantly expanded our discussion of the concept in this paper.

The authors might want to disaggregate their main analysis by gender. Since the practice world seems to have transitioned from targeting women to targeting men to targeting both men and women for violence prevention interventions, understanding whether and to what extent men and women are different in intention to participate in community activism might be helpful for programmatic purposes.
We agree that it is important to disaggregate by gender for programmatic purposes. As stated in the discussion, we wanted the primary focus of the paper to be on measurement and look into substantive issues in forthcoming papers. However, we agree that it may be worthwhile to look at gender differences in the response to our measure. We have divided this task into three sub-tasks: looking at gender differences in the make-up of factor structure, looking at gender differences in the response to social desirability items, and looking at gender differences in convergent and criterion validity.
Gender differences in factor structure: We conducted measurement invariance tests across gender, education, caste and marital status and found little evidence for differences in model fit by these groups. The table is displayed in the Results section of the revised manuscript. We added the following text: We did not find strong evidence against measurement invariance (Table 7). Comparing fit across models accounting for gender, education, caste, and marital status, both WRMR and TLI decreased slightly as constraints on factor loadings were lifted, while RMSEA stayed relatively constant. The decrease in WRMR indicates better fit for models allowing for heterogeneity across groups, but the decrease in TLI indicates a worse fit, possibly due to lower parsimony in the unconstrained models. Given these results, we did not see a need for separate measurement models by gender or demographic group. (from Section: Results -construct validity) Gender differences in social desirability: We have also tested for interactions with these variables in measuring social desirability bias. Here, we find little evidence that female respondents were more strongly affected by social desirability bias than male respondents. The only item for which we found a significant interaction showed men displaying greater social desirability effects than women: With respect to gender differences in social desirability, we found no evidence for an interaction with gender in all indicators (p>0.05) except for the item "Your family members approve of you joining activities to stop violence against women" (p=0.008), where the selfadministered interview had a clear effect on men's odds of agreeing (OR 0.51, p=0.037, 95% CI 0.27-0.97), but not women's odds (OR 1.36, p=0.20, 95% 0.85-2.19). (from Section: Results -item validity) Gender differences in associations between psychological constructs and behavioural outcomes: As mentioned above, we wanted the primary focus of the paper to be on measurement and look into substantive issues in forthcoming papers. While it is possible, even likely, that men and women will show different associations between different psychological drivers and participation in collective action, we do not have strong enough evidence to make a priori hypotheses which could inform us if the patterns observed in the data indicate high or low validity for our scale. As such, we think this issue is better explored in a separate paper looking at unpacking these associations in more depth, which we are currently preparing.

I was curious as to why the authors did not control for the demographic correlates of participation in collective action to address VAW. If the objective of this study is to inform the practice world about the ways of mobilizing masses, understanding the contribution of psychometric factors relative to that of demographic factors would have been most useful, I think.
Since the authors are likely to have access to the data on demographic correlates, why not rerun the model by adjusting for demographic correlates.
The results do not change when adjusted for demographic correlates. As stated above, we wanted the primary focus of the paper to be on measurement and disentangle issues of reverse causality, confounding, and effect modification in forthcoming papers. It is conventional in measurement theory to use simple correlations to check for convergent validity -just like factor analysis is carried out on crude correlation matrices -so we have kept adjustments to the bare minimum [15][16][17]. We have briefly mentioned substantive implications of our results to illustrate how our scale might be used in practice. We have now further clarified this in the main text: These findings show the utility of our scale, although further research is required to fully disentangle these complex relationships.

Perhaps I missed it, I wonder about the goodness of fit statistics for the models in
These are simple logistic regression models, so comparable goodness of fit statistics as those displayed in Table 4 are not available for them. Goodness of fit statistics for logistic regression models such as Hosmer-Lemeshow χ 2 or Tukey's link test are sensitive to omitted variable bias, which is not relevant, when we are primarily using the models to establish simple correlations to assess convergent validity.
I don't know where #7 fits, or if this fits for this analysis at all, but I could not help but wondering about each of perceived legitimacy, perceived efficacy, and perceived norms being contextdependent (and therefore not a random occurrence!). We tend to formulate a perception or an opinion by applying a deliberate system of thinking (i.e., rational choice) or an automatic system of thinking (i.e., mental models, learned behavior). Given the study population, who allegedly may have limited access to information or knowledge about VAW, its likely their perceptions have been informed by community perceptions or by the perception of individuals of influence in the community. Do the authors agree with this argument? If yes, does it make sense to fit multilevel models as opposed to individual level models in Table 6. This is an interesting idea. As mentioned in our answer to Point 1, there is a large and lively debate as to the extent to which human preferences, beliefs and norms have come about through deliberation, imitation, learning, cultural evolution, etc. For the purpose of this paper, we are simply investigating whether we can measure perceived legitimacy, perceived efficacy, and perceived norms at an individual level. Further research is required to establish the precise ways in which individual perceptions interact at a community level. We are not sure we can assume one mechanism rather than another at this stage. 56% of women and 46% of men do not recognise most people in their neighbourhood (Supplementary Table 3), so there may not necessarily be strong peer effects operating at a community level.

Also, I think the authors have done it, but at a minimum the error terms should be adjusted for community level clustering.
Indeed, the error terms are already adjusted for community level clustering.
Was this by design that the authors sampled 500 households from 54 informal settlement clusters. How many households are there per cluster on the average? What was the sampling strategy-simple probably or proportional probably sampling? If simple, there are potentially 9-10 households from some clusters. If yes, calculation of standard errors needs to be adjusted for unequal number of households per cluster or potentially small number of households in some clusters. Minor point, but perhaps worth exploring, especially since the authors have adjusted for social capital levels at the cluster level.
The data were collected in a survey conducted before our community intervention began in a cluster randomised controlled trial [18]. The trial includes 54 clusters of ~500 households each, identified within much larger agglomerated informal settlements. We took a systematic random sampling approach similar to other surveys: clusters were selected at random and households within them visited systematically until 25 women and 25 men had completed questionnaires in each cluster. The clusters were of equal size and each yielded ~50 questionnaires. We have now clarified this: Between December 2017 and December 2019, we carried out a baseline survey of community attitudes to VAW in households across 54 informal settlement clusters. Each cluster contained about 500 households. Clusters were in four large urban informal settlements, chosen for their vulnerability, low risk of rehabilitation, low coverage by organisations working to address VAW, and low proportion of rental tenancies. From a random starting point in each cluster, 16 investigators selected 25 women and 25 men aged 18 to 65 years -a single interviewee per household -by visiting households sequentially. We thus obtained approximately 50 interviews per cluster. Participants were enrolled in person. Inclusion criteria were that respondents should fall into these age groups and should provide signed consent.
(from Section: Data Collection)

Aksha Centre for Equity and Wellbeing, Mumbai, India
This article contributes to the growing body of evidence on violence against women in India, and is unique in several ways and goes beyond other studies in several ways. The study, situated in informal settlements in Mumbai, India, focuses on a rarely studied dimension, namely the key factors associated with participation in collective action to address such violence. In doing so, it presents an evidence-informed and thoughtful theoretical framework, and also develops an evidence-informed scale to measure three drivers of collective action, that is, perceived legitimacy, perceived efficacy and collective action norms. And it tests, in various ways, the robustness of this scale and its components -social desirability, by comparing responses using self-administered and face-to-face interviewing techniques; acquiescence bias by comparing positively and negatively worded questions; construct validity; and criterion validity. Authors conclude that their scale is indeed robust, offering "clues to modifiable beliefs and attitudes that global health interventions can address to maximally inspire activism". Although the methodology appears to me to be sound and well-explained, I do not feel qualified to comment on its rigour and appropriateness. I have a few observations on other issues.
Objectives are not clear, and a clear statement of objectives is needed. It appears to be both "to develop a new scale measuring drivers of participation in collective action to address VAW" and to better understand the associations of perceived efficacy, collective action norms and perceived legitimacy with participation in collective action. Although there is a considerable discussion of the latter, authors emphasise that the "sole" purpose of the paper is to validate the new measure of drivers. Obviously causality cannot be assessed from a cross-sectional study, but associations certainly can. Authors need to be very clear about what the objectives are, and if one is the "sole" purpose of the paper, justify the need to address the second as well.
The study was based on a survey of 2642 respondents residing in 54 informal settlements of the city. The sample was equally divided between men (1307) and women (1311) aged 18-65, and analysis was performed for the entire sample. Would a gender-segmented analysis have provided authors with a more nuanced set of insights? As is well known, India is a patriarchal and gender stratified society, characterised by men exercising a sense of entitlement to control women, and women largely submissive to unbalanced gender norms (see for example, Santhya et al. 2013 1 ; Jejeebhoy et al. 2013 2 ). Domestic violence is widespread, with one-third of married women reporting an experience; it is, moreover, widely justified, with half (52%) of women and two-fifths (42%) of men justifying violence for at least one of a host of reasons, including going out without permission, neglecting the household and disrespecting in-laws (International Institute for Population Sciences (IIPS) and ICF, 2017 3 ). In this hierarchical setting, it is likely that men and women will respond differently to the items on the scale developed in this study, for example, embarrassment to declare engagement in violence prevention work, or attitudes about whether protests are effective in stopping violence, and differences would emerge in perceived legitimacy, efficacy and social norms, as well as social desirability responses. Even if the sample size does not permit a sub-group analysis, it seems worthwhile to conduct the analysis separately for men and women to observe any key disparities, with regard to both the scale as well as the correlates.
Authors themselves raise questions about their respondents' understanding of the term 'violence against women' and note that they required to explain the term to respondents in some instances. Given that the whole study focuses on violence against women, this possible ambiguity in understanding of the term is cause for concern. Both the World Health Organisation (World Health Organisation, 2005) 4 and the Demographic and Health Surveys (see, for example, Kishor and Johnson, 2004) 5 have recognised this ambiguity, and have posed, instead of a single question, a battery of questions relating to different acts of physical violence, as well as sexual and emotional violence. Both research programmes recognise the need to specifically define various acts of violence precisely so that the violence measure is not affected by different understandings of what constitutes violence. In India, many associate the term 'violence' with extreme practices such as severe beating that results in injury, choking and so on; a slap is an everyday event that few would consider violence, and few would agree that participation in action to counter such an act is warranted. Authors describe steps taken if the respondent asked for clarification, but it is likely that others who did not request clarification may have had a limited definition of violence. An introduction that defines what is implied by the term and applied to all respondents may have reduced these ambiguities in terminology. While this limitation cannot be addressed, it would be helpful to highlight this limitation at greater length, and speculate about possible effects on the scale or other outcomes.
Given the limited voice that women and girls exercise in families in India, many personal questions can raise discomfort levels among them. Hence, typically, sections of surveys dealing with these issues are conducted in private, without mothers and mothers-in-law listening in. Without an assurance of privacy, it is very possible that women's responses would be more likely to adhere to traditional norms and expectations than if privacy is ensured. In this case, while it is true that personal experiences are not elicited, attitudes about the acceptability of intervention and collective action or perceptions about how families would react to their participation in collective action to stop a behaviour that is perceived as so acceptable, may also cause discomfort to young women who are audible to their mothers-in-law, and result in responses that are acceptable to those listening in. While here too, nothing can be done now, it would be helpful to highlight this limitation and offer some ideas about how it might have affected findings.
Small points. Social capital is discussed, and adjustments made for it, but there is no indication in the paper of the level of social capital in the study sample, it would be good to highlight this or explain why it has been omitted. Also, authors may want to think of a friendlier way of displaying the information provided in Figure 3.
Overall, the paper addresses important and poorly understood issues relating to participation in community action. It is perhaps the only study of its kind in India that has measured social desirability and acquiescence biases, has developed a scale to encompass the various drivers of participation, and has shown the extent to which these constructs are associated with outcomes such as participation in groups and intent to intervene in incidents of violence. It is very wellwritten, if dense. My suggestions for revision are relatively small, and call for a more consistent statement of objectives, a possible gender segmented analysis, and a stronger acknowledgement of study limitations.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Gender, notably female empowerment, violence against women; adolescent health and development; sexual and reproductive health and rights; India; progamme evaluation.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Author Response 23 Jun 2020 Lu Gram, Institute for Global Health, London, UK We appreciate you taking your time to provide your valuable insights from your many years of working in this field and in this context. We have implemented your constructive suggestions as much as possible.
Objectives are not clear, and a clear statement of objectives is needed. It appears to be both "to develop a new scale measuring drivers of participation in collective action to address VAW" and to better understand the associations of perceived efficacy, collective action norms and perceived legitimacy with participation in collective action. Although there is a considerable discussion of the latter, authors emphasise that the "sole" purpose of the paper is to validate the new measure of drivers. Obviously causality cannot be assessed from a cross-sectional study, but associations certainly can. Authors need to be very clear about what the objectives are, and if one is the "sole" purpose of the paper, justify the need to address the second as well.
Thank you for this. We completely understand your concern. There is a chicken-and-egg problem in building measurement and substantive theory in that we need valid measurements to provide evidence for substantive theory, but substantive theory is also needed to interpret and validate new measures. In a relatively nascent subject like women's collective action to prevent VAW, we have neither prior measures to compare against nor substantive theory backed by strong evidence. Past scale validation studies running into this problem have combined the two objectives into one [6]. We opted for running checks on convergent and discriminant validity using simple correlations and regressions, while leaving the detailed disentanglement of associations between our measure and outcomes to a follow-up paper (which we are writing). The brief discussion of the implications of different magnitudes of associations with perceived efficacy, legitimacy and norms was merely meant to illustrate how our scale might be useful to policy-makers. To clarify this, we have now replaced "sole" with "primary" and added a sentence to the end of the paragraph: These findings show the utility of our scale, although further research is required to fully disentangle these complex relationships. (from Section: Discussion) The study was based on a survey of 2642 respondents residing in 54 informal settlements of the city. The sample was equally divided between men (1307) and women (1311) aged 18-65, and analysis was performed for the entire sample. Would a gender-segmented analysis have provided authors with a more nuanced set of insights? As is well known, India is a patriarchal and gender stratified society, characterised by men exercising a sense of entitlement to control women, and women largely submissive to unbalanced gender norms (see for example, Santhya et al. 20131;Jejeebhoy et al. 20132). Domestic violence is widespread, with one-third of married women reporting an experience; it is, moreover, widely justified, with half (52%) of women and two-fifths (42%) of men justifying violence for at least one of a host of reasons, including going out without permission, neglecting the household and disrespecting in-laws (International Institute for Population Sciences (IIPS) and ICF, 20173). In this hierarchical setting, it is likely that men and women will respond differently to the items on the scale developed in this study, for example, embarrassment to declare engagement in violence prevention work, or attitudes about whether protests are effective in stopping violence, and differences would emerge in perceived legitimacy, efficacy and social norms, as well as social desirability responses. Even if the sample size does not permit a sub-group analysis, it seems worthwhile to conduct the analysis separately for men and women to observe any key disparities, with regard to both the scale as well as the correlates.
We fully agree that it is key to look at gender differences in how people respond to the scale. We have divided this task into three sub-tasks: looking at gender differences in the make-up of factor structure, looking at gender differences in the response to social desirability items, and looking at gender differences in convergent and criterion validity.
Gender differences in factor structure: We conducted measurement invariance tests across gender, education, caste and marital status and found little evidence for differences in model fit by these groups. The table is displayed in the Results section of the revised manuscript. We added the following text: We did not find strong evidence against measurement invariance (Table 7). Comparing fit across models accounting for gender, education, caste, and marital status, both WRMR and TLI decreased slightly as constraints on factor loadings were lifted, while RMSEA stayed relatively constant. The decrease in WRMR indicates better fit for models allowing for heterogeneity across groups, but the decrease in TLI indicates a worse fit, possibly due to lower parsimony in the unconstrained models. Given these results, we did not see a need for separate measurement models by gender or demographic group. (from Section: Results -construct validity) Gender differences in social desirability: We have also tested for interactions with these variables in measuring social desirability bias. Here, we find little evidence that female respondents were more strongly affected by social desirability bias than male respondents. The only item for which we found a significant interaction showed men displaying greater social desirability effects than women: With respect to gender differences in social desirability, we found no evidence for an interaction with gender in all indicators (p>0.05) except for the item "Your family members approve of you joining activities to stop violence against women" (p=0.008), where the selfadministered interview had a clear effect on men's odds of agreeing (OR 0.51, p=0.037, 95% CI 0.27-0.97), but not women's odds (OR 1.36, p=0.20, 95% 0.85-2.19). (from Section: Results -item validity) Gender differences in associations between psychological constructs and behavioural outcomes: As mentioned above, we wanted the primary focus of the paper to be on measurement and look into substantive issues in forthcoming papers. While it is possible, even likely, that men and women will show different associations between different psychological drivers and participation in collective action, we do not have strong enough evidence to make a priori hypotheses which could inform us if the patterns observed in the data indicate high or low validity for our scale. As such, we think this issue is better explored in a separate paper looking at unpacking these associations in more depth, which we are currently preparing.
Authors themselves raise questions about their respondents' understanding of the term 'violence against women' and note that they required to explain the term to respondents in some instances. Given that the whole study focuses on violence against women, this possible ambiguity in understanding of the term is cause for concern. Both the World Health Organisation (World Health Organisation, 2005) 4  Thank you for raising this issue. We agree that it is important to reference the WHO and DHS recommendations and have added these points to further emphasise this limitation. Reviewer 2 was also unclear about the main message of this paragraph, so we have extensively rewritten it to clarify: Our scale also relied on asking people whether they were willing to engage in activism to address 'violence against women' (mahila ke khilaaf hinsa) or 'domestic violence' (gharelu hinsa). The World Health Organization and the Demographic Health Surveys recommend avoiding the term 'violence' wherever possible as there is considerable individual variation in respondent interpretation of the term, including a tendency to consider only extreme practices (e.g. beating or choking) forms of violence, whilst ignoring 'milder' practices (e.g. slapping) [INSERT REFERENCES 65 & 66]. However, in a survey setting, it would be unreasonably unwieldy to ask separate questions for attitudes to action against slapping, attitudes to action against kicking, attitudes to action against forced sex, etc. Global health researchers thus universally invoke the generic term 'violence' in questions on participation in action to address VAW 39, 55 , as do researchers on bystander intervention 56 . However, this practice does cause confusion: respondents sometimes asked during interviews what the word 'violence' meant, which required interviewers to clarify. We piloted alternatives for the word 'violence', but these created even more confusion -'forcing/coercing' someone ( zabardasti karna) has the unfortunate alternate sense of 'insisting on something', while the word 'force'/ bal does not have an intrinsic negative valence like the word 'violence' and may even have positive connotations of 'strength' and 'power'. In future research, there may be merit in exploring more blunt phrases such as 'action to stop husbands beating their wives' as proxies for 'action to address violence'. During piloting, respondents said it was hard to distinguish collective action against different forms of VAW, as physical, sexual, and emotional violence tended to occur together for survivors of violence.
(from Section: Discussion) Given the limited voice that women and girls exercise in families in India, many personal questions can raise discomfort levels among them. Hence, typically, sections of surveys dealing with these issues are conducted in private, without mothers and mothers-in-law listening in. Without an assurance of privacy, it is very possible that women's responses would be more likely to adhere to traditional norms and expectations than if privacy is ensured. In this case, while it is true that personal experiences are not elicited, attitudes about the acceptability of intervention and collective action or perceptions about how families would react to their participation in collective action to stop a behaviour that is perceived as so acceptable, may also cause discomfort to young women who are audible to their mothers-in-law, and result in responses that are acceptable to those listening in. While here too, nothing can be done now, it would be helpful to highlight this limitation and offer some ideas about how it might have affected findings.
Thank you for this point. We have discussed this in connection with social desirability, as our findings did indicate that respondents showed the greatest difference in answers on the question concerning family attitudes and norms. We felt we had already addressed this point in the existing text, so we did not want to add more text.
Small points. Social capital is discussed, and adjustments made for it, but there is no indication in the paper of the level of social capital in the study sample, it would be good to highlight this or explain why it has been omitted.
We have added a brief descriptive paragraph about our social capital data to supply this information. We added the precise numbers into a Supplementary Table to preserve readability.
We found high levels of social cohesion in our communities, as 81-93% of respondents reported feeling at home in their neighbourhood or getting along with neighbours (see Extended data, Supplementary Table 3 20 ). 71-91% agreed that people came together to keep the neighbourhood free from crime or solve issues such as disruptions to the water supply. However, levels of trust were lower, as 58-61% stated that their neighbours could not be trusted or only looked out for themselves. 56% of women and 46% of men stated they did not recognise most people in their neighbourhood (p<0.001 for a gender difference). (from Section: Results -Descriptive statistics) expertise to confirm that it is of an acceptable scientific standard.
Author Response 23 Jun 2020 Lu Gram, Institute for Global Health, London, UK Thank you so much for your constructive feedback. We have replied point-by-point below: I wonder if a note around limitations is needed about the fact 30% could not answer one item (or more), and the implications of this for the scale. This is a good suggestion. We have added a note in the limitations section.
Finally, our study was limited by missing data, as 30% of respondents lacked data for at least one indicator. However, at most 8% of respondents did not know the answer to any single indicator. As stated earlier, we used complete case analysis for item validity on an item-by-item basis, while we used imputation to create an index of items to assess criterion validity. As respondents with missing data were more likely to be poorly educated, this may have upwardly biased assessments of scale validity.

Also, did this vary by administration technique?
You are right that it would be useful to check for this. There were indications that the selfadministered technique produced slightly more "don't know" answers. This might either be due to respondents feeling more comfortable admitting to not knowing the answer in the self-administered condition, or because the technique was more difficult for them to use. We have added detail on this to the Results and discussed in the Discussion: The proportion of respondents providing "don't know" answers generally increased by 1-4 percentage points in the self-administered survey condition (p<0.05). However, for the item "People in your neighbourhood approve of you joining activities to stop violence against women", the proportion of "don't know" answers increased by fully 6 pp (p=0.005).
(from Section: Results -Item validity) Some respondents may have had difficulty navigating the mobile phone technology on their own, as the proportion of "don't know" answers increased in the self-administered condition. However, this may also reflect respondents being more comfortable voicing genuine ignorance in self-administered than in to face-to-face interviews.
(from Section: Discussion) On page 9, the first mention of "clinically significant" appears, without really clarifying what this means (in terms of scales). Could this be clarified.
Thank you for this comment. We agree the term is misleading and it would be simpler to just state the relevant magnitudes. We have changed the phrase "[these differences] were not clinically significant" to "[these differences] seem small", followed by just stating the magnitudes.
In the discussion, it is stated that an important thing for community activists to do would be to shift from persuading residents of the wrongness of VAW towards engaging them around their efficacy and normative beliefs. I think this is a key point that gets lost a little, as it's an important thing to consider in terms of intervention development, and it somewhat gets lost.
We agree that this is an important point to make. We are a little hesitant to further draw out the point: as pointed out by Reviewer 3, there is a distinction between a paper seeking to validate a new scale and a paper seeking to draw new substantive conclusions about the world. In this paper, we only looked briefly at associations with behavioural measures to establish convergent validity and strengthen the plausibility of our scale. To truly unpack the complex relationships between psychological drivers of collective action, participation in such action, and domestic violence, we would need to take into account a host of issues (confounding, reverse causality and effect modification) that fit better into a separate paper. We are currently in the process of writing this paper and will definitely take your advice into account in framing its conclusions. We have now added a caveat about this: These findings show the utility of our scale, although further research is required to fully disentangle these complex relationships.
(from Section: Discussion) Closely linked, is potentially the need to think a little more of the implications of the analysis for social norms theory. Social norms are briefly mentioned at the start of the paper, and it would be worth returning to this concept again in the discussion. In some ways, based on the point above, the authors may be suggesting social norms are not the main issue, but rather the need to support people to recognise they can make a difference. A reflection on social norms theory, would therefore be useful. This is an interesting suggestion. In line with our point above, we are wary about adding too much reflection on substantive implications of this research. However, we will certainty include consideration of social norms theory in our follow-up paper on substantive issues. Thank you for asking this question; it made us realise that the paragraph was unclearly rewritten. We have extensively rewritten the paragraph, so that the message is clearer.
Our scale also relied on asking people whether they were willing to engage in activism to address 'violence against women' (mahila ke khilaaf hinsa) or 'domestic violence' (gharelu hinsa). The World Health Organization and the Demographic Health Surveys recommend avoiding the term 'violence' wherever possible as there is considerable individual variation in respondent interpretation of the term, including a tendency to consider only extreme practices (e.g. beating or choking) forms of violence, whilst ignoring 'milder' practices (e.g. slapping) [INSERT REFERENCES 65 & 66]. However, in a survey setting, it would be unreasonably unwieldy to ask separate questions for attitudes to action against slapping, attitudes to action against kicking, attitudes to action against forced sex, etc. Global health researchers thus universally invoke the generic term 'violence' in questions on participation in action to address VAW 39, 55 , as do researchers on bystander intervention 56 . However, this practice does cause confusion: respondents sometimes asked during interviews what the word 'violence' meant, which required interviewers to clarify. We piloted alternatives for the word 'violence', but these created even more confusion -'forcing/coercing' someone ( zabardasti karna) has the unfortunate alternate sense of 'insisting on something', while the word 'force'/ bal does not have an intrinsic negative valence like the word 'violence' and may even have positive connotations of 'strength' and 'power'. In future research, there may be merit in exploring more blunt phrases such as 'action to stop husbands beating their wives' as proxies for 'action to address violence'. During piloting, respondents said it was hard to distinguish collective action against different forms of VAW, as physical, sexual, and emotional violence tended to occur together for survivors of violence.
(from Section: Discussion) In the analysis, why did the authors choose not to conduct an exploratory factor analysis of the items before conducting a confirmative factor analysis? An EFA normally is recommended when new constructs are being measured with little empirical evidence of their validity. I encourage the authors to consider EFA first to refine the measurement model, and then CFA.

3.
Did the authors assess the measurement invariance of the constructs of interest across theoretically relevant subgroups in the study site. Some demonstration of within-sample measurement invariance would strengthen the claim that the measures developed are useful across diverse and theoretically relevant populations of interest.

4.
I am concerned about the absence of a relation between social capital (a resource for collective action) and participation in collective action. I encourage the authors to discuss in greater detail their measure of social capital.

5.
Some reference to the theoretical literature on women's empowerment is warranted. how to resources for women's agency, including collective agency, factor into the conceptualization of this analysis?

6.
Thank you for the opportunity to review this important manuscript. I wish the authors all the best in this important work.

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Partly
Competing Interests: No competing interests were disclosed. Reviewer Expertise: Measurement; violence against women prevention.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
constructs have been explored extensively outside the context of preventing violence against women in low-and middle-income contexts [3,4]. Some social psychologists and anthropologists whom we consulted even thought it redundant to test these constructs in the context of VAW. We do believe evidence is needed, as past studies have primarily taken place in high-income contexts on issues unrelated to violence against women or health more broadly, but we used CFA rather than EFA because: We were testing constructs that had already been proposed elsewhere.

○
We were unsure how to compare the results of EFA and CFA given the fact that hierarchical CFA imposes a number of identifiability restrictions that do not have equivalents in EFA regarding the relation between lower-and higher-order factors.

○
Given that EFA and CFA on the same dataset is generally not recommended to avoid overfitting [5], we were also unsure about how to use the results of EFA post hoc now that we have conducted our CFA. This is a good point. We have added measurement invariance tests across gender, education, caste and marital status now. Please see the new table in the results section of the updated manuscript. While RMSEA stays nearly constant (up to rounding errors), WRMR reduces with fewer parameter restrictions, indicating a better model fit. However, TLI also reduces with fewer parameter restrictions, indicating a worse fit -possibly due to TLI penalizing lack of parsimony -as we go from scalar invariant to configural invariant models. All in all, the changes in TLI, RMSEA and WRMR seem small and all models have good fit statistics on the TLI and RMSEA by conventional criteria. Given these results, we did not feel justified in using separate models by gender, education, caste or marital status in estimating our latent constructs. We have now added this information in the text itself: Finally, we assessed measurement invariance across gender, education, caste, and marital status by comparing fit statistics (TLI, RMSEA, and WRMR) for models of configural invariance, invariance of first-order factor loadings, invariance of second-order factor loadings, and invariance of both [INSERT REFERENCE 59]. For the purpose of this comparison, we used binary codes for education (any versus no education), caste (Open/General versus other caste groups), and marital status (currently married versus not currently married). We did not conduct χ2 tests as these are known to be overly dependent on sample size, which creates oversensitivity to small deviations from H 0 in large samples and spurious sensitivity to differences in sample size between comparison groups in measurement invariance tests [INSERT REFERENCE 60].
(from Section: Data analysis -construct validity) We did not find strong evidence against measurement invariance (Table 7). Comparing fit across models accounting for gender, education, caste, and marital status, both WRMR and TLI decreased slightly as constraints on factor loadings were lifted, while RMSEA stayed relatively constant. The decrease in WRMR indicates better fit for models allowing for heterogeneity across groups, but the decrease in TLI indicates a worse fit, possibly due to lower parsimony in the unconstrained models. Given these results, we did not see a need for separate measurement models by gender or demographic group.
(from Section: Results -construct validity) I am concerned about the absence of a relation between social capital (a resource for collective action) and participation in collective action. I encourage the authors to discuss in greater detail their measure of social capital.
Thank you for highlighting this. We realise our wording in the Results section has been misleading: while we did not find a direct association between social capital and participation in collective action after adjusting for potential mediators, we did find an indirect association: social capital increased mediator variables, which in turn increased participation. We have added a table with indirect effects of social capital to clarify this (see main document) and also changed the text: Social capital was itself positively associated with perceived legitimacy, perceived efficacy, and collective action norms (p<0.001). There was insufficient evidence that social capital was directly associated with participation in groups to address VAW and intent to intervene on behalf of a family member (p>0.05) after adjusting for mediators. Social capital itself was directly negatively associated with intent to intervene on behalf of a stranger, showing a 27-30% reduction in odds of intervening with one standard deviation increase in social capital. However, the indirect effect of social capital on outcomes through perceived legitimacy, perceived efficacy, and collective action norms was positive in all cases (Table 8). Overall, these results suggest that our three constructs did not simply predict outcomes due to their association with social capital.
(from Section: Results -criterion validity) The relationship between social capital and intervention in domestic violence is likely complex. Greater social cohesion can act as a resource for collective action to prevent VAW, e.g. fostering collective action norms rewarding participation in such action, and it may increase perceived efficacy as community members feel their close links with neighbours empower them to overcome common threats. However, cohesive communities led by male community leaders may also exert stronger patriarchal control over women. As a validation paper, we did a first check to see if the scale produces plausible evidence for convergent and discriminant validity. We plan to unpack these complex relationships in greater depth in future papers on substantive theory. We have now clarified this: We found no evidence for social capital being positively associated with participation in collective action after adjusting for psychological drivers. violence may also be required to translate social capital into action: a trial of a violence prevention programme in Uganda found that social capital was only associated with bystander intervention in intervention areas, not control areas 39 . We even found that social capital was negatively associated with intent to intervene in case of VAW against a stranger. This echoes literature on the 'dark side of social capital' 45 , which suggests that tightly connected social networks can be detrimental to the health of perceived outsiders by excluding them from the support of network insiders. However, further research is required to unpack the relationship between social capital and VAW.
(from Section: Discussion) We were a little unsure as to what additional detail is needed for our measure of social capital. In case it has to do with choice of indicators, we have clarified that these are now all spelled out in full in the extended data.