Measurement properties of the project-level Women's Empowerment in Agriculture Index

Highlights • We tested the measurement properties of the project-level Women’s Empowerment in Agriculture Index in two projects/sites.• We used IRT methods to assess item sets used in pro-WEAI indicators for intrinsic, instrumental, and collective agency.• One item set capturing intrinsic agency in the right to bodily integrity was measurement equivalent across the two projects.• We recommend refining the other item sets to improve their measurement properties.• We offer steps to test, refine, validate, and shorten pro-WEAI and other women’s empowerment scales in development studies.


Framework for Women's Empowerment
Women's empowerment, a multidimensional construct (Agarwala & Lynch, 2006;Lukes, 1974;Malhotra & Schuler, 2005;Mason, 2005), is the process whereby women claim new resources that may enhance their agency, or ability to make strategic life choices that enable them to achieve individual or collective goals (Kabeer, 1999). Human resources may include formal or informal schooling or training that expands valued knowledge or skills. Economic resources may include income, savings, or property. Social resources may include informal or formal networks of access or support, typically outside the family. We conceptualize resources as primarily observable, or measured directly in surveys, such as grades of schooling, chickens owned, or organizational memberships. Observed resource variables are depicted with squares in Figure 1.
[ Figure 1] Agency is the ability to make strategic life choices in contexts where these choices once were denied (Kabeer, 1999). Contexts of constraint may include patriarchal family systems and institutions that privilege men, often the focus in discussions of women's empowerment. Contexts of constraint also may include other oppressive systems, such as poverty. Pro-WEAI, and the framework presented here, conceptualize agency as a multidimensional construct. Intrinsic agency-power within-refers to the process by which one develops a critical consciousness of one's own aspirations, capabilities, and rights (Batliwala, 1994;Freire, 1972;Kabeer, 1999;Komter, 1989;Stromquist, 1995). Instrumental agency-power to-is strategic action to achieve one's self-defined goals. Collective agency-power with-is joint action to achieve shared goals (Bandura, 2000;Freire, 1972;Kabeer, 1999;Lukes, 1974;Rowlands, 1995Rowlands, , 1997Rowlands, , 1998Stromquist, 1995). These types of agency are derived conceptually from multi-dimensional typologies of power described first by Komter (1989) with respect to gender and rooted in the seminal works of Freire (1972) and Lukes (1974), who wrote on power and freedom from oppression without explicit reference to gender. 'Power over,' also discussed in this literature, is excluded from this framework and pro-WEAI, as Electronic copy available at: https://ssrn.com/abstract=3363753 relations govern women's and men's behavior (Presser & Sen, 2000).
Efforts to measure women's empowerment directly also have been challenged. First, with recent exceptions (Miedema et al., 2018), scholars have given more attention to measuring economic resources than to measuring human and social resources for women's empowerment (Grootaert, Narayan, Jones, & Woolcock, 2004). Second, scholars have focused more on measuring instrumental agency than on measuring intrinsic and collective agency (James-Hawkins, Peters, VanderEnde, Bardin, & Yount, 2016;Smith, 2003;Thorpe, VanderEnde, Peters, Bardin, & Yount, 2015), such that transformative changes in intrapersonal critical consciousness and collective actions among women have been understudied (Brody et al., 2017;O'Hara & Clement, 2018). Third, the field has tended to construct observed indicators, such as summative scales, to capture latent constructs, like women's agency (Kumar et al., 2019;Mahmud, Shah, & Becker, 2012). Thus, researchers have ignored potential variation in the relationships of items with latentagency constructs and possible systematic measurement error in these items. Fourth, with some exceptions (Agarwala & Lynch, 2006;Cheong et al., 2017;Crandall, Rahim, & Yount, 2015;Meidema, Haardörfer, Girard, & Yount, 2018;Yount et al., nd;Yount, VanderEnde, Zureick-Brown, Hoang, et al., 2014;Yount et al., 2016), scholars have not fully assessed the measurement properties of agency scales, including their measurement equivalence across meaningful groups, such as program beneficiaries and non-beneficiaries, program types, geographic contexts, and time. Given these limitations, the 'end users' of tools to measure women's empowerment cannot discern the utility of one scale over another, and researchers and practitioners continue to construct measures using inconsistent terms, item sets, and methods, diminishing the capacity to make meaningful global comparisons.
Novel approaches to develop and validate measures of women's intrinsic and instrumental agency in the household have included the use of psychometric methods, such as factor analysis, item response theory (IRT) methods, and structural equation modeling (Cheong et al., 2017;Crandall et al., 2015;Miedema et al., 2018;Yount, VanderEnde, Zureick-Brown, Hoang, et al., 2014;Yount et al., 2016). Such methods help to identify survey question sets that are valid, observed items of latent constructs, like women's agency. To be valid, item sets should operationalize well-defined constructs and should be empirically (psychometrically) 'comparable' across settings, social groups, and time. Using these methods, Yount and colleagues have identified three indices of women's intrinsic agency. The first-critical consciousness of women's right to bodily integrity-uses attitudinal questions about IPV against women that are psychometrically comparable across genders (Yount, VanderEnde, Zureick-Brown, Hoang, et al., 2014), age-at-marriage groups (Yount, VanderEnde, Dodell, & Cheong, 2016), and countries (Miedema et al., 2018). The second index-perceived self-efficacy-validates the general self-efficacy scale in young Qatari women (Crandall et al., 2015). The third index-critical consciousness of women's social and economic rights-uses attitudinal items derived from qualitative research that are comparable across Qatari and non-Qatari women (Yount et al., nd).
Other analyses by Yount and colleagues have identified two indices for women's instrumental agency. The first-women's influence in household decisions-uses items capturing a woman's influence in decisions about her earnings, her husband's earnings, large household purchases, daily household purchases, seeking medical treatment, and visits to family and friends; psychometrically, these questions are valid at the national level (Miedema et al., 2018;Yount et al., 2016) and are comparable across age-at-marriage groups (Yount et al., 2016), multiple East African countries (Meidema et al., 2018), and time (Cheong et al., 2017).
The second index-freedom of movement-uses survey questions capturing the ability of women to visit venues outside the home; psychometrically, these questions are valid at the national level (Yount et al., 2016), and are comparable across age-at-marriage groups (Yount et al., 2016) and time (Cheong et al., 2017). These efforts have identified general measures of women's intrinsic and instrumental agency that are empirically comparable across a range of contexts, population groups, and time periods -and invoke a call for similar measures of women's collective agency.

The Women's Empowerment in Agriculture Index (WEAI)
In 2012, the US Agency for International Development, IFPRI, and the Oxford Poverty and Human Development Initiative launched the Women's Empowerment in Agriculture Index (WEAI) (Alkire et al., 2013) as a monitoring and evaluation tool to assess population levels and changes over time in women's empowerment in agriculture across countries, regions, and population groups. The WEAI measures women's empowerment through a household survey that focuses conceptually on women's agency. The WEAI consists of two sub-indices. The Five Domains of Empowerment Index (5DE) is designed to measure the incidence (headcount) and intensity of dis-empowerment. The Gender Parity Index (GPI) is designed to provide information on women's empowerment relative to that of men in their households (Alkire et al., 2013).

Motivations for pro-WEAI
Since its launch, the WEAI has undergone several adaptations (Malapit et al., 2017). Pro-WEAI, the most recent adaptation, is designed for diagnosing disempowerment and assessing impact in agricultural development projects. The WEAI and pro-WEAI are based on the Alkire-Foster counting methodology for index construction (Alkire & Foster, 2011;Malapit et al., 2019), applied to measure intrinsic, instrumental, and collective agency; however, the two indices differ in notable ways. First, the WEAI is designed to capture levels and trends in women's empowerment in agriculture at the national level; whereas, pro-WEAI is designed for impact evaluation by agricultural development projects. Second, pro-WEAI includes new indicators, such as IPV attitudes and freedom of movement.
The main advantages of pro-WEAI, relative to other measures of women's empowerment, is its intuitive design and broad applicability. Pro-WEAI provides an 'information platform' (Alkire, 2018) to measure women's empowerment in agriculture and, to some extent, more broadly. As a headline figure, pro-WEAI Electronic copy available at: https://ssrn.com/abstract=3363753 provides an overall measure of women's empowerment in agriculture that is designed to be comparable at all levels for which the data are representative, such as intervention or population groups. Pro-WEAI also can be disaggregated into two sub-indices-the 3DE and GPI-and 12 indicators, each of which is designed to capture distinct aspects of intrinsic, instrumental, or collective agency (Malapit et al., 2019). Pro-WEAI indicators for intrinsic agency include autonomy in income, self-efficacy, attitudes about IPV, and respect among household members. Pro-WEAI indicators for instrumental agency include input to productive decisions, ownership of assets, access to and decisions on credit, control over use of income, work balance, and visiting important locations. Pro-WEAI indicators for collective agency include group membership and membership in influential groups. Using this decomposition, one can, in theory, assess how changes in the joint distribution of indicator-level achievements contribute to changes in the overall index value. The capacity for this decomposition stems from the counting-based approach used to construct pro-WEAI, which requires that the definitions, thresholds, and weights used to create each indicator are explicit.
The broad applicability of pro-WEAI may be thought to impede the accurate measurement of women's empowerment in agriculture in local contexts. Indeed, debate continues about the universality or contextspecificity of measures for women's empowerment (Alkire et al., 2013;Malhotra & Schuler, 2005;Mason, 1986;Mason & Smith, 2003;Richardson, 2018;Yount et al., nd). What may be indicative of empowerment among women in Bangladesh-such as, joint decision making on salient agricultural decisions-may not be indicative of empowerment among women in Ghana, where norms around agriculture differ (Seymour & Peterman, 2018). Researchers have shown that item sets measuring women's social assets, intrinsic agency, and instrumental agency are cross-nationally comparable, while other items are context-specific, across diverse countries in East Africa (Miedema et al., 2018). This work suggests that the general concepts of women's intrinsic and instrumental agency are meaningful cross-culturally, that core question sets can measure aspects of women's empowerment across contexts, and that other questions work well for in-depth analyses within contexts (Miedema et al., 2018).
Although pro-WEAI was designed to be comparable across different agricultural systems, countries, and cultural contexts (Malapit et al., 2019), pro-WEAI does not ignore cultural differences. The design of pro-WEAI involved qualitative research to explore concepts of empowerment in diverse rural settings, and the suite of pro-WEAI instruments includes customizable qualitative guides designed to capture nuanced local meanings and processes of women's empowerment (Meinzen-Dick, Rubin, Elias, Mulema, & Myers, 2019). Also, the quantitative pro-WEAI index has undergone sensitivity analysis to test its robustness to alternative specifications (Malapit et al., 2019). The conceptual basis for pro-WEAI, for example, which indicators to include and how to define and weight them, does not prioritize any one country or context over another. In practical terms, the indicators in pro-WEAI are defined and weighted to be applicable across the widest possible set of circumstances.
Given the design advantages of pro-WEAI, this analysis aimed to assess the measurement properties of survey question (item) sets in pro-WEAI related to intrinsic, instrumental, and collective agency and to make recommendations for pro-WEAI's refinement as an impact evaluation tool. We leveraged itemresponse theory (IRT) methods to assess the measurement properties of the aforementioned item sets, collected across two GAAP2 projects in Bangladesh (South Asia) and Burkina Faso (West Africa). The analysis reveals the utility of IRT methods for assessing the measurement properties of item sets used to construct selected pro-WEAI indicators, guiding refinements of pro-WEAI to improve its indicator-specific and overall measurement properties, and underscoring the value of shortening the full pro-WEAI and creating from that a validated short-form pro-WEAI for national and program-level monitoring.

Study Contexts
This analysis uses quantitative baseline data from two GAAP2 projects: Targeting  package, and the comparison group is women in similar savings groups in non-program areas who did not receive the BRB intervention package.

Samples and Data
This analysis uses data from the baseline pro-WEAI survey from each project (Appendix 1). The baseline survey in TRAIN was administered between November 2016 and February 2017 to 5,040 households in which at least one woman 18-35 years was present (some households did not include a male adult). The baseline survey in BRB was administered in May 2016 to a subset of households, including 380 women (190 intervention group; 190 comparison group), as well as their husbands or the male heads of household, for a total of 760 respondents. Only the women's responses were analyzed here.
In the pro-WEAI survey, the target beneficiary and spouse were asked questions about household decisionmaking around production and income; access to productive capital; access to financial services; time allocation; group membership; frequency and freedom of movement; intra-household relationships including respect for household members; autonomy in decision-making using vignettes inspired by the Relative Autonomy Index (RAI) (Ryan & Deci, 2000); self-efficacy using the new general self-efficacy scale (Chen, Gully, & Eden, 2001); and attitudes about IPV against women using validated items from the Demographic and Health Surveys (DHS) (Yount, VanderEnde, Zureick-Brown, Hoang, et al., 2014). 2 Data for descriptive analysis of the two samples consisted of socioeconomic and demographic information from the household rosters. Data for the IRT analyses consisted of responses to seven question (item) sets used to construct two indicators for intrinsic agency, three indicators for instrumental agency, and one indicator for collective agency collected in both project sites. In the present analysis, we followed the definitions of intrinsic agency (critical consciousness of capacity and rights) and instrumental agency (behavioral action) and kept some item sets distinct. Appendix 2 compares the items sets as we have organized them and how they contributed to each pro-WEAI indicator (Malapit et al., 2019).
Here, intrinsic agency in the right to bodily integrity was captured using women's yes/no responses to the question, 'Is a husband justified in hitting his wife…,' in five situations, such as 'she burns the food?,' 'she goes out without telling him?,' and 'she neglects the children?' Intrinsic agency or autonomy in use of income was captured using vignettes, inspired by the Relative Autonomy Index (Ryan & Deci, 2000), which sought to measure the motivations behind women's actions with respect to their income, distinguishing external and internal forms of regulation. We analyzed women's responses to the question, 'How similar are you to someone who…' behaves in four ways with respect to her income, including 'uses her income as determined by necessity,' ' uses her income how her family or community tells her she must' (external), 'uses her income how her family or community expects because she wants them to approve of her' (external) and 'chooses to use her income how she wants to and thinks is best for herself and her family' Accounting for skip patterns in the pro-WEAI questionnaire, item-level missingness due to non-response was generally low across project sites and constructs of agency (Results). For all IRT analyses except one, missingness was included as a response category, so the influence of missingness on estimated model parameters could be assessed. In the IRT analysis of instrumental agency in income, only one observation was missing data on staple grain farming, so missingness could not be included as a response category. In this case, the missing observation was assumed to be missing at random and was dropped from the analysis.

Analysis
We chose item response theory (IRT) methods to examine item sets designed to measure dimensions of intrinsic, instrumental, and collective agency in pro-WEAI in the two project sites. IRT methods, a family of statistical techniques for analyzing latent variables, allow researchers to assess the empirical relationships between observed items, such as responses to survey questions, that are theorized to be causal expressions of a person's status along the continuum of an unobserved (latent) trait (Lord, 1980). IRT methods have several advantages over other psychometric methods in scale development. First, IRT methods uniquely allow comparison of estimated latent traits and item characteristics, as they are placed on a common scale.
Second, IRT methods allow estimation of the standard error of measurement, which may differ across levels of the latent trait and is general across populations (Embretson & Reise, 2000). Third, IRT methods allow items to vary in difficulty, and take this information into account when scaling the items. Fourth, IRT methods are useful to explore and to test the functional form of item-level response options, such as those intended to be ordinal (e.g., 0=not at all, 1=small extent, 2=medium extent, 3=high extent). Finally, IRT methods can be applied to reduce a valid 'long form' (full pro-WEAI) to a valid short-form (short-form pro-WEAI) (Meade & Lautenschlager, 2004) that captures, as precisely as possible, the desired range of values along each latent-trait continuum being measured. 5 Here, we followed the steps described in Toland (2014) and in Tay, Meade, and Cao (2015) to assess the measurement properties of item sets designed to measure dimensions of agency collected in the baseline pro-WEAI surveys of two independent GAAP2 projects. We assessed the item sets, described above, 5 Researchers often desire long questionnaires to measure several constructs. Long questionnaires may produce 'transient measurement errors' (Schmidt, Le, & Ilies, 2003) if participants respond carelessly when overwhelmed or bored by the assessment. Such concerns have spurred researchers to create short forms from valid long forms (Stanton, Sinar, Balzer, & Smith, 2002). Researchers should assess whether the short form has comparable within-sample measurement properties to the long form and functions similarly across independent samples (Smith, McCarthy, & Anderson, 2000). Errors creating short forms may arise because guidance for developing them is recent (Marsh, Ellis, Parada, Richards, & Heubeck, 2005;Smith et al., 2000;Stanton, Sinar, Balzer, & Smith, 2002) and non-existent in the women's empowerment literature. Using short forms with poor or poorly understood measurement properties may impede progress toward identifying comparable measures of women's empowerment to monitor SDG5. designed to measure intrinsic agency in: the right to bodily integrity, autonomy in use of income, livelihood activities, and group membership. We also assessed item sets, described above, designed to measure instrumental agency in: livelihoods activities, the sale or use of outputs and income generated from livelihoods activities, and borrowing. Together, these item sets contributed to six of the 12 pro-WEAI indicators: autonomy in income, attitudes about IPV against women, input in productive decisions, access to and decisions on financial services, control over use of income, and group membership (H. Malapit et al., 2019). Our analytic steps are summarized in Table 1.
[ Table 1] After clarifying the purpose of the analysis (Step 1), we considered relevant item response models for each item set (Step 2). In general, we estimated unidimensional IRT models for item sets theorized to capture single indicators for agency. For the five IPV-attitudes items capturing women's intrinsic agency in the right to bodily integrity, we chose a two-parameter logistic (2PL) response model for dichotomous outcomes, expressed as: (1) where Xis denotes the response of women s to item i (0 or 1), Ɵs denotes the 'ability' or level of the latent trait Ɵ for women s, bi denotes the threshold or 'difficulty' of item i, and ai denotes the slope or 'discrimination' of item i. The difficulty refers to the level of the latent trait at which the probability of an endorsed response to the item-IPV not justified (=1)-is 0.5. 6 The discrimination refers to an item's capacity to distinguish respondents at specific levels of the latent-agency trait, with larger values suggesting a greater discrimination ability.
For autonomy in income, we chose the graded-response (GR) model (Samejima, 1969), which can be considered an extension of the 2PL model for use with items having two or more ordered response categories k, where k=1,…,K. For items designed to measure autonomy in income, response options were designed to be ordered on a 0-3 scale, as described above. The GR model estimates a unique discrimination parameter for each item across its K ordered response categories as well as K -1 between-category thresholds, which indicate the level of the latent-agency trait needed to have a 50% chance of responding above one of the K-1 thresholds between the response categories.
Finally, for intrinsic agency in livelihoods activities and instrumental agency in livelihoods activities, outputs/income generated from livelihoods activities, and borrowing, respondents were asked about their participation (yes/no), and participants were asked about their level of agency with respect to each activity, scored 0-2 or 0-3, as discussed, above. Although the response options for items in these sets appeared to be partially ordered, we used a nominal response model (NRM) (Bock, 1972) to test this assumption. Thus, NRMs are a class of IRT models that handle unordered, polytomous data.
After considering the relevant item response models, we conducted univariate analysis of the TRAIN and BRB samples to describe their demographic characteristics and to explore all item sets planned for inclusion in the IRT analyses ( Step 3). This step helped to ensure that response options were not too sparse and the missing-at-random assumption was reasonable.
Then, before fitting IRT models, we assessed the assumption of uni-dimensionality for item sets with binary and ordered response options (IPV-attitudes and autonomy in income, respectively) (Step 4). This assumption-that one continuous latent variable can explain the item responses (Toland, 2014)-is implied in the construction of each pro-WEAI indicator. Uni-dimensionality can be assessed a priori using non-IRT methods, such as exploratory factor analysis (EFA) or confirmatory factor analysis (CFA) for dichotomous and ordered polytomous items. EFA is recommended when minimal prior research exists on a construct; whereas, CFA is recommended for well-theorized, validated constructs. Because prior WEAI instruments and theory informed the items sets used to construct the pro-WEAI indicators, we performed CFA separately for the five IPV-attitudes items and for the four autonomy-in-income items. We used three indices as guides to assess fit for a unidimensional CFA model: the comparative fit index (CFI should be ≥ .95), Tucker Lewis Index (TLI should be ≥ .95), and root-mean-square error of approximation (RMSEA should be ≤ 0.06 and 90% CI ≤ 0.06) (Hu & Bentler, 1999;Yu, 2002). Because CFA is not well suited for item sets with nominal or partially ordered response options, we skipped the step of testing unidimensionality for the other item sets.
Then, for agency item sets with well-fitting unidimensional CFA models, or with nominal response options, we performed IRT analyses, evaluating model assumptions and testing competing models (Step 4). We first evaluated the model assumption of local independence (LI). LI means that the only influence on a woman's response to an agency item is that of the latent trait variable being measured. No other agency item and no other latent trait variable influences the woman's item responses. Thus, for a given woman with a known agency score, her response to one item is independent of her response to any other item. Violating the LI assumption is problematic because model estimates, model fit statistics, and derived scores and associated standard errors can be distorted, and thus, differ from the construct being measured (Toland, 2014). LD can occur, for example, when similar wording is used across question stems or items, such that women cannot distinguish them and select the same response category repeatedly. To assess the LI assumption, we used the (approximately) standardized LD χ2 statistic for each item pair in a set (W.-H. Chen & Thissen, 1997).
LD χ2 statistics greater than |10| were considered large, providing evidence of probable LD and residual variance unaccounted for by the unidimensional IRT model. LD χ2 statistics between |5| and |10| were considered moderate, providing evidence of possible LD. LD χ2 statistics less than |5| were considered small and inconsequential . Sensitivity analyses were performed in which single items with the highest LD χ2 statistic were removed to assess their impact on violation of the LI assumption and to see if an item subset could be found that met the LI assumption.
Other assumptions assessed were functional form and model-data fit (Step 4). Regarding functional form, the GR model implies that all threshold parameters are ordered and the items share a common slope (Toland, 2014). To assess functional form, we graphed all response option functions against the latent-trait continuum to check whether each theoretically higher response option was more likely to be selected than prior response options at higher levels of the latent-trait continuum. We then assessed model-data fit at the item and model levels. The standardized chi-squared (S-Χ 2 ) item-fit statistic was used to test the degree of similarity between model-predicted and empirical (observed) response frequencies for each item response category. A statistically significant S-Χ 2 value indicated the model did not fit a given item. Poorly fitting items were candidates for removal, usually one at a time; and the item response model was re-estimated with the remaining items. With reasonable item-level model fit, we then assessed model-level fit by comparing IRT models with different levels of complexity. 7 We used the Bayesian information criterion (BIC) and the Akaike information criterion (AIC) to compare the fits of competing models.
When model assumptions held, we then described, graphically and numerically, the item properties that included the estimates of the thresholds and slopes as well as the precision for each item, item subset, or full scale at a particular location or range of the latent-trait continuum. Item characteristic curves (ICCs) related the probability of endorsing each response option (e.g., 0=IPV justified versus 1=IPV not justified) for an item as a function of the level of the latent-agency trait. Together, the ICCs allowed us to assess visually the distribution of the location parameters for each item along the latent-trait continuum, the strength of the relationship between each item and the latent trait (discrimination), and if items had multiple response options, their empirical ordering. Item information curves (IICs) provided information about the precision of a specific item along the latent-trait continuum. The total information curves (TICs) depicted the sum of the IICs and indicated the precision of the entire item set along the latent-trait continuum. IICs and TICs could be used to decide which item pairs or sets had similar (redundant) precision, and therefore, were candidates for dropping.
As a final step (Step 6), we assessed the measurement equivalence for item sets across the TRAIN and BRB samples that met IRT model assumptions of uni-dimensionality, local independence, and within-setting model-data fit. Following established guidelines (Tay et al., 2015), we investigated whether any items displayed differential item functioning by comparing the difficulty and discrimination parameters across TRAIN and BRB, holding constant the latent-trait level.

Characteristics of Respondents
As shown in Table 2, most women in the TRAIN sample had received some formal schooling. 8 Few women in either sample participated in wage employment (9% TRAIN; 19% BRB). Among women who participated in non-agricultural activities, most reported being able to access the information they needed to make informed decisions. Qualitatively, relatively fewer women in TRAIN than in BRB solely or jointly cultivated land (24% versus 99%) and solely or jointly owned land (11% versus 66%). A minority of women in both samples (32% TRAIN, 11% BRB) solely or jointly held an account at a bank or other formal financial institution. Most women in both samples reported being able to access basic food, clothing, and medicines for themselves and their children (Table 2). Among women who had children under five in both samples, most reported having access to child care, if needed ( Table 2). The ability of households to borrow money from various sources differed across samples. Qualitatively, a lower percentage of households in TRAIN than BRB reportedly could borrow from group-based microfinance (35% versus 76%) or informal credit groups (24% versus 96%); whereas a higher percentage of households in TRAIN than BRB reportedly could borrow from formal (64% versus 25%) and informal (62% versus 28%) lenders.
[ Table 2] Table 3 shows the distributions of baseline responses in the TRAIN and BRB samples with respect to intrinsic-agency item sets considered for IRT models. For intrinsic agency in bodily integrity, 1.1% or fewer values were missing for any item, and responses were adequately distributed across response options. In TRAIN, the prevalence of justifying IPV ranged from 5.0% to 26.9% across situations (items). In BRB, this prevalence ranged from 21.7% to 56.3% across items. Qualitatively, women justified IPV more often when a wife argues with her husband than if she burns the food.

Preliminary Inspection of Agency Item Sets
[ Table 3] For intrinsic agency in livelihoods activities, very few responses were missing for any item in TRAIN, and less than 1.8% of responses were missing for item in BRB. As expected, the percentage of women who did not participate in agricultural activities varied by activity. In TRAIN and BRB, more than 90% of women reportedly did not participate in fishpond agriculture, and more than 80% did not participate in wage and salary employment. Among women in TRAIN, a majority did not participate in high-value crop farming, raising small livestock, non-farm economic activities, and occasional large household purchases. For nonparticipating women, questions were not asked about the extent they felt they could participate in decisions about these activities. For women who reported participating in specific livelihoods activities, the distributions of their responses about felt capacity to influence decisions varied across activities. In both samples, a majority of women felt they could participate to a medium or high extent in decisions about staple grain farming, raising poultry, and routine household purchases. In BRB, a majority of women felt they could participate to this extent in decisions about raising small livestock, non-farm economic activities, and occasional large household purchases. A majority of participating women in TRAIN felt they could participate to any (small, medium, high) extent in decisions about raising large livestock. In both samples, among the minorities of women who participated in customarily male-dominated livelihoods activities, a majority felt they could participate to a medium or high extent in decisions about those activities. Thus, higher intrinsic agency in livelihoods activities was related to whether or not women participated in the activity at all.
Regarding autonomy in income, none of these questions were filtered by skip patterns, and little to no data were missing for other reasons in both samples. In TRAIN, a majority of women reported that they were somewhat or completely like others who used their income according to necessity or how others told them or expected them to (three items); however, a majority of women also reported that they were somewhat or completely like others who chose to use their income how they wanted to (one item). In BRB, a majority of women consistently reported being somewhat or completely like others who used their income as they chose and somewhat or completely different from others who used their income according to necessity or how others told or expected them to.
Regarding intrinsic agency to influence group decisions, most women in TRAIN reported that either the group was not present or they were not an active member, so follow-up questions about felt ability to influence group decisions were not asked. In BRB, high percentages of women had either missing data for presence of the group (mostly reflecting 'don't know' responses), 9 or reported no group or non-participation in the group. For women in BRB who reported being an active group member, the majority felt they could influence decisions to a medium or high extent.
Regarding the item set designed to capture instrumental agency in livelihoods activities, the extent of missingness and (by design) non-participation were similar to the item set for intrinsic agency in livelihoods activities. Among women who reported participating in specific livelihoods activities, the majority reported engaging in some or most/all of the decisions for that activity. Participating women also reported engaging in some or most/all decisions regarding the outputs and income generated from specific livelihoods activities. Thus, within both samples, substantial similarities were observed in the distributions of responses for item sets designed to capture intrinsic and instrumental agency in livelihoods activities. Among women who participated in each activity, there was a tendency to report a medium/high extent of intrinsic agency and some/most-all instrumental agency.
Finally, regarding women's instrumental agency in decisions about borrowing money, women generally reported that their households were not involved in borrowing money from specific sources (especially in TRAIN). When households were involved, a minority of women in TRAIN reported being involved in decisions about borrowing, and most women in BRB reported not being involved in these decisions.

Evaluating the Assumption of Uni-Dimensionality
As a next step, we evaluated the assumption of uni-dimensionality by fitting a one-factor CFA to each intrinsic-agency item set with low missingness and binary or ordered response options (right to bodily integrity, autonomy in income

Evaluating the Assumption of Local Independence
As a next step, for each estimated IRT model, we evaluated the assumption of local independence using the LD Χ 2 statistic for item pairs in sets for which the assumption of uni-dimensionality was met in CFA or response options were nominal (and CFA was not estimated). [ Table 5] Table 6 presents model estimates and item-level fit statistics (S-Χ 2 ) for IRT analyses of all five IPV-attitudes items (Panel 1) and a subset (Panel 2). In Panel 1, S-X 2 statistics indicate a satisfactory fit for all five IPVattitudes items in BRB and for three of the five items in TRAIN. The items 'she refuses to have sex with him' and 'she burns the food' showed poor model-data fit at the item level in TRAIN. To address this issue, we removed the item 'she burns the food,' which had the poorest fit (the highest S-X 2 value), and reestimated the model (Panel 2). Based on S-X 2 test statistics, model-data fit for the four remaining items was good in both samples.

Assessing Model Fit
[ Table 6] For all other item sets (not shown), all or nearly all items exhibited significant S-X 2 values, providing strong evidence that model-predicted and observed response frequencies differed. We experimented with removing items having the highest S-X 2 values, but model-data fit improved little, and the assumption of local independence remained untenable (results available on request). Consequently, we continue to present results only for the four IPV-attitudes items for which the assumptions of uni-dimensionality, local independence, and model-data fit at the item level were met. We then discuss, with illustrative graphs, some challenges of interpretation regarding results for the other item sets for which model assumptions were not met. We focus on graphs for selected items and item sets for intrinsic and instrumental agency in livelihoods activities and discuss possible reasons for the challenges of interpretation they expose.

Comparing Competing Models
In our next step for the analysis of the four IPV-attitudes items, we compared the posited 2PL IRT model having a separate discrimination parameter for each item to a 1PL IRT model, where the discrimination parameter was fixed at one across all four items. The AIC and BIC were larger in the alternative 1PL IRT model (AIC=15377.10, BIC=15456.26) than in the original 2PL IRT model (AIC=15343.41, BIC=15462.14). This finding suggests that the more parsimonious common-slope model was insufficient to capture the extent of cross-item heterogeneity in discrimination parameters.

Evaluating and Interpreting Results
To assess and interpret results of the final, four IPV-attitudes items 2PL IRT model, we relied on item characteristic curves (ICCs), item information curves (IICs) and total information curves (TICs). Figure 2 shows a matrix plot of the ICCs for the four IPV-attitudes items from model estimates shown in Table 6, Panel 2. The value of theta where the ICCs intersect with one another in each graph gives the estimate of the difficulty parameter for each item. Consistent with descriptive findings in Section 4.2, the ICCs show that the item 'she argues with him' is the most difficult one to answer 'not justified' in both samples. The slope of the item 'she argues with him' also is the steepest and most discriminating item of the four for TRAIN. For BRB, the slope of item 'she neglects the children' is the most discriminating.
[ Figure 2] Figure 3 displays a matrix plot of the item information curves. The graphs show that 'she neglects the children' for the BRB sample and 'she argues with him' for the TRAIN sample provide maximum precision around the mean level of the latent trait, where Ɵ=0. Figure 4 presents the total information curves for the same four IPV-attitudes items for both samples. Both curves suggest that the item set provides more precision around the mean level of the latent-agency trait, where Ɵ=0, and less precision at higher and lower levels of the latent trait. The TICs for both samples also are similar to one another, which suggests that these four IPV-attitudes items provide similar precision across the two samples.

Assessing Measurement Equivalence
Because model assumptions (of uni-dimensionality, local independence, and model fit) were met in both samples only for the four IPV-attitudes items, we limited our assessment of cross-sample measurement equivalence to this item set. We investigated whether any of the four items displayed differential item functioning (DIF), or whether estimates of the discriminations (a parameters) and the difficulties (b parameters) differed across the two samples, holding constant the level of the latent-agency trait (see Y. F. Cheong et al. (nd) for more detail). We detected two items with DIF across TRAIN and BRB, a husband is justified to beat his wife if 'she argues with him' and 'she refuses to have sex with him.' However, the impact of DIF on the mean difference in the agency scores was small, .06, such that the four items were considered for practical purposes to have measurement equivalence across the two samples.

Considerations for Interpretation of Other Item-Sets in the pro-WEAI Survey
As discussed, IRT estimates for other item sets in the pro-WEAI survey showed that the assumption of local independence was untenable and model-data fit was poor. As a result, the model parameter estimates and their standard errors were likely distorted. Here, we present some graphical results of the functional forms of the nominal response models for the item sets designed to capture instrumental agency in livelihood activities (10 items), instrumental agency in use of outputs/income generated from livelihoods activities (14 items), and intrinsic agency in livelihood activities (10 items). The graphical displays are illustrative only and offer tentative reasons on how the items behaved.
First, we examined category characteristic curves (CCCs) for items tapping instrumental agency in livelihood activities. 10 In Figure 5, three distinct patterns in the CCCs can be observed that correspond to different levels of women's participation in livelihood activities. First, respondents tended to report nonparticipation over a small range of the latent trait of instrumental agency in livelihood activities for grain farming (shown in Figure 5), poultry, and routine household purchases. Second, respondents tended to report non-participation over a moderate range of the instrumental agency latent trait for herding large livestock. Third, respondents tended to report non-participation over a full range of the latent trait for wage employment (shown in Figure 5), horticulture, fishpond, small livestock and large household purchases.
This patterning in the response options may be illustrative of the non-ordered nature of the response options and the heterogeneous latent agency of women who report non-participation in some livelihoods activities.
[ Figure 5] Next, we examined the item information curves of the 14 items for use of output/income generated from livelihoods activities. Figure 6 illustrates the IICs for grain farming in TRAIN. As in Figure 6, for items that asked respondents to report the keeping-not-selling and income use decisions on the same activities, their item information curves were very similar (full set of IICs available on request). This result may suggest that either item set on use of output or income may be dropped with little loss of precision. It might also indicate that the participants did not differentiate the two sets of decisions for the same livelihoods activities in their responses.
[ Figure 6] Finally, we examined the test information functions for each of the three item sets that captured intrinsic agency in livelihoods activities, instrumental agency in agricultural activities, and instrumental agency in use/sale of outputs and income generated from livelihoods activities ( Figure 7). As Figure 7 shows, the precision of each item set along the latent-agency-trait continuum is similar to that of the other two sets.
Thus, item sets that are used to construct the same pro-WEAI indicator (Appendix 2) could be dropped without a loss of precision.

DISCUSSION
This analysis is the first to use IRT methods to assess the measurement properties of a women's empowerment scale in development studies. It also is the first to assess the measurement properties of item sets that form the basis of indicators in pro-WEAI, an instrument in the WEAI series designed to assess women's empowerment in agriculture and more generally. The methodological innovations applied here provide a guide for development researchers to design, test, and refine questionnaires that include item sets aiming to capture women's intrinsic, instrumental, and collective agency in agricultural and other development programs designed to empower women.

Findings and Implications
A relevant descriptive finding from this analysis was that the participation of women in livelihoods activities, financial services, and community-based groups varied across activities, services, and groups as well as across agricultural development projects and contexts. In the case of items designed to capture women's felt influence in community groups, high levels of non-participation (and the reported absence or lack of knowledge about the presence of community groups) precluded estimation of IRT models. Notably, the BRB program in Burkina Faso was designed to intervene via women's groups; however, the baseline survey occurred before the project was implemented, so women would not have reported participation in project-related groups. Moreover, the TRAIN project did not involve a group-based intervention, and women in the TRAIN sample were relatively young, and perhaps less likely to participate in community groups. Low reported participation in non-project related groups also may have resulted from interview burden-if interviewers and/or respondents were overwhelmed by the assessment length, not reporting groups or reporting non-participation in groups would have reduced interview time. Alternatively, women may have understood these questions contrary to their intent-and did not report on all groups or all groups in which they were participating or limited their responses only to formal groups in their community, even though informal groups were listed in the questionnaire. Cognitive interviewing would allow us to assess the salience of these considerations for revisions to this module. Also, pro-WEAI might consider questions about forms of collective agency that do not require group membership but instead reflect non-institutional collective action. Candidates for consideration may include survey questions from early versions of the WEAI about engaging in community projects and helping other families in the community.
Similarly, women's non-participation was high for some livelihoods activities and financial services. The agency of these women was not measured directly. Some reported non-participation may be related to the Thus, for non-agricultural and wage-based economic activity, high rates of non-participation may, in part, have resulted from using single-key word questions.
Moreover, in other analyses (available on request), women's non-participation in specific livelihoods activities was differentially associated (positively and negatively) with scores for women's human, economic, and social resources for empowerment. In other words, the relationships of women's resources for empowerment to non-participation in specific livelihoods activities varied by the type of resource and livelihoods activity. This finding suggests that non-participants in specific livelihoods activities are heterogeneous with respect to their resources for empowerment. Consistently, boundary characteristic curves from IRT models showed that respondent or household non-participation in specific livelihoods-, income-, or borrowing-related activities was systematically related to the latent-agency trait, and that this relationship differed across items within item sets. Therefore, making assumptions at the indicator-level that systematic non-participants are 'inadequately' empowered may warrant further study to rule out misclassification of women who report non-participation in listed activities because they have other resources at their disposal (or because single-key word questions were used for non-agricultural and wage-based activities).
Given these descriptive results, a major finding from this analysis was that one item set-capturing intrinsic agency in the right to bodily integrity-met the IRT assumptions of uni-dimensionality, local independence, model-data fit, and measurement equivalence (also implied in the construction, interpretation, and crossgroup comparison of pro-WEAI indicators). This finding confirms a prior validation of these and other IPV-attitudes items (Yount, VanderEnde, Zureick-Brown, Anh, et al., 2014). One caveat of the pro-WEAI item set is that it provides limited precision at the lower and higher ends of the latent intrinsic agency trait, and thus, may have limited capacity to assess change over time. To ensure precise measurement at the extremes of this latent trait, four other validated IPV-attitudes items might be added to the pro-WEAI item set (Yount, VanderEnde, Zureick-Brown, Anh, et al., 2014). Alternatively, response options for current IPV-attitudes items might be expanded to be ordinal, allowing each item to have higher precision across a wider range of the latent-agency trait.
A second major finding was that the remaining item sets did not meet the assumptions of uni-dimensionality or local independence (LI). For autonomy in income, weak evidence of uni-dimensionality may have resulted from having items designed to capture different theoretical constructs included in the same set.
Consistent with Deci and Ryan (1985Ryan ( , 1995Ryan ( , 2000, the items, 'uses her income how her family or community' 'tell her she must' and 'expect because she wants them to approve of her' likely capture external motivations in her use of income; whereas, 'chooses to use her income how she wants to and thinks is best for herself and her family' likely captures internal motivation, or autonomy. If so, then substituting items that capture external motivation in use of income with items that capture internal motivation in use of income may better reflect the intended uni-dimensional construct. For the other item sets, strong evidence of local dependence may be problematic for interpretation of derived indicator values. Again, LI means that the latent trait variable being measured is the only influence on a woman's response to an agency item; thus, for a given woman with a known agency score, her response to one item should be independent of her response to any other item. Empirically, evidence of local dependence means that model estimates, model fit statistics, and derived scores and associated standard errors may be distorted and may differ from the theoretical construct being measured (Toland, 2014). LD can occur when items or questions within a set sound similar to respondents, who then repeatedly provide the same answer. The many matrices in the pro-WEAI-in which multiple questions are asked of lists of items-corroborates this interpretation. In practice, women who do not participate in an activity are not asked questions about that activity in pro-WEAI, so are not asked the full matrix. However, respondents may not distinguish similar questions asked of the same activity, resulting in similar answers to questions designed to tap different theoretical constructs. Cognitive interviewing of these matrices may identify clearer wording to minimize this possible source of LD.
A third major finding was the high overlap in total information curves for item sets designed to capture distinct agency constructs. The TICs for instrumental agency in livelihoods activities and use of income were illustrative. These findings suggest that one or the other item sets could be dropped from pro-WEAI without a substantial loss of precision in measuring the latent-agency-trait continuum. Alternatively, pro-WEAI modules could be revised to enhance the distinctiveness of item-sets for respondents. Modules could begin with a more detailed introduction clarifying the purpose of new question sets. Item sets could begin with a warm-up to ensure correct interpretation. Questions in multiple-question matrices could be revised to ensure distinctiveness for respondents. However, if respondents do not, in practice, make fine distinctions between types of agency, then avoiding question sets that seek nuance between types of agency may be advised.

Limitations and Strengths of the Analysis and pro-WEAI
Some caveats of the analysis are notable. First, TRAIN and BRB implemented slightly different versions of the pro-WEAI questionnaire, and not all modules were asked in both countries. Second, interview duration varied across projects, on average, requiring one hour in TRAIN and one hour forty minutes in BRB (which included the health and nutrition add on). Despite variations in average interview duration, results of the analysis were broadly consistent. Third, although we aimed to validate the collective agency item set, the IRT estimation was not possible because the item set focused on felt influence in group decisions among active members. As discussed, most women reported that groups were not known or not present in their community (especially in TRAIN) or that they were not members (especially in BRB). As such, information on felt influence was limited to the few women who reported being active members.
Fourth, we were unable to estimate IRT models for instrumental agency in land use because too few items measured this construct. Fifth, we were unable to assess the measurement equivalence of most item sets across contexts because model assumptions of uni-dimensionality, local independence, and model-data fit were not met within contexts. After refining the pro-WEAI instrument to address these issues, we suggest that this validation be reapplied to the revised pro-WEAI long-form to confirm that item sets align with their intended theoretical constructs (or indicators) within contexts. Then, the measurement equivalence of pro-WEAI item sets across contexts, genders, and time can be assessed, and a valid short-form version can be identified.
A sixth caveat of the analysis was its application to 2 of the 13 GAAP2 projects, so findings are generalizable only to the TRAIN and BRB samples. Validation of the revised pro-WEAI ideally will occur across more projects, contexts, and genders. Finally, the analysis focuses on the measurement properties of survey items in pro-WEAI, and does not fully account for the aggregation methodology used for constructing pro-WEAI indicators (e.g., setting adequacy thresholds and censoring headcounts); thus, we cannot comment definitively about the implications of the findings for the overall calculation of pro-WEAI.
These caveats notwithstanding, the many strengths of this analysis are notable. IRT methods are powerful techniques to validate instruments, like pro-WEAI, within and across settings. Results can help researchers to target questionnaire refinements, such as dropping redundant questions or revising poorly functioning questions to improve clarity. IRT methods also are useful to identify precise (and theoretically sound) item subsets for use in short-form versions of validated long forms. Nominal item-response models can test assumptions about the ordering of polytomous response options. Ordered polytomous response options provide additional information on a respondent's quantity of the latent trait; however, binary response options could provide similar information with less complexity. These uses can improve instrument quality, reduce respondent burden, and improve the data collected. Finally, this IRT analysis is the first to outline a clear process for researchers and evaluators to assess the measurement properties of any major instrument to measure women's empowerment. We urge all researchers to use these methods in the first phase of instrument development to ensure that tools recommended for monitoring and evaluation of development programs are empirically sound and consistent with theory. This analytic approach sets the standard for developing and validating measures of women's empowerment going forward. Notably, the software required to assess the dimensionality of nominal IRTs and to estimate multidimensional IRTs is evolving.
The utility of different IRT software packages is presented elsewhere (Cheong et al., nd).
The strengths of pro-WEAI also warrant emphasis. Pro-WEAI is the first instrument designed to measure comprehensively women's empowerment in agricultural development projects. Its design was based on well-defined theoretical constructs and local knowledge from a diverse set of projects across contexts. The design of pro-WEAI also incorporated learning from efforts to develop the WEAI (Alkire et al., 2013) and other versions (Malapit et al., 2017). Important modifications in pro-WEAI include a more explicit theoretical emphasis on intrinsic, instrumental, and collective agency as well as the creation of a broader set of indicators that allow for a more refined decomposition of changes in women's agency over a project's timeline. Tying these strengths with our proposed refinements will improve our capacity to design projects with pro-WEAI in mind and to assess the impacts of agricultural development programs on women's empowerment.

Recommendations for Projects
Major takeaways from this analysis are twofold. First, program evaluation would benefit from strategic refinements and shortening of the long-form pro-WEAI and a revalidation following the steps outlined here.
Second, program monitoring would benefit from a short-form version of the revised pro-WEAI long-form.
Creating a short-form pro-WEAI for monitoring was outside our scope, given our findings that questionnaire refinements are recommended. A short-form pro-WEAI for program monitoring would include simpler question-item sets totaling a 10-minute interview to maximize respondent attentiveness and focus. With a validated long-form and systematically derived short-form, researchers and program managers would be fully equipped to monitor progress and to assess the impacts of agricultural development projects designed to empower women.     Electronic copy available at: https://ssrn.com/abstract=3363753 Test information Standard error TABLES Table 1

. General Steps in Item Response Theory (IRT) Analysis of Measurement Properties of Women's Empowerment Scales
Step Description

Procedures for Analysis 1 Clarify Purpose of Study
Assess the measurement properties of item sets used to construct pro-WEAI indicators before using the indicators and overall index for impact evaluation of GAAP2 projects. The analysis is designed to ensure that item-sets assessed are as precise as possible across a desired score range or suitably matched to latent trait levels of the intended population. 1. Dimensionality (in our case, uni-dimensionality) before IRT estimation using exploratory factor analysis (EFA) or confirmatory factor analysis (CFA) depending upon the stage of development and prior validation of the scales 2. Local independence (LI) within items sets using standardized LD χ2 statistic for item pairs LD χ2 < |5| likely local independence LD χ2 > |5| questionable LD LD χ2 > |10| likely LD Note: If assumptions 1 and 2 are not met, IRT model parameter estimates are not presented, as the parameter estimates and scores may be distorted 3. Functional form of response options using visual or graphical inspection a. Assess model-data fit at item-level using standardized X 2 statistic at item-level b. Assess model-data fit at model-level by comparing BIC (Bayesian information criterion) and AIC (Akaike information criterion)-both relative information criteria-of base and competing model; smaller values for BIC and AIC indicate better model fit c. Assess functional form of response options with graphical displays 4. Normality of distribution of latent variable in the population (assumed with use of IRT methods)

Assess item properties with item characteristic curves (ICCs) and item information curves (IICs)
Assess scale properties with total information functions (TIFs) Produce IRT score estimates 6 Perform Measurement Equivalence Analysis 1. Assess measurement equivalence of item sets across projects/social groups (in our case TRAIN and BRB) 2. Estimate the effect size of the differential item functioning, if detected Note. Adapted from Toland (2014) and Tay and colleagues (2015)

MODULE G. WOMEN'S EMPOWERMENT IN AGRICULTURE INDEX -Pilot Pro-WEAI Version
Note to survey designers: The information in module G1 can be captured in different ways; however, there must be a way to: (a) identify the proper individual within the household to be asked the survey, (b) link this individual from the module to the household roster, (c) code the outcome of the interview, especially if the individual is not available, to distinguish this from missing data, and (d) record who else in the household was present during the interview. This instrument must be adapted for country context including adding relevant examples and translations into local languages when appropriate.

Note to enumerators:
This questionnaire should be administered separately to the primary and secondary respondents identified in the household roster of the household level questionnaire. You should complete this coversheet for each individual identified in the "selection section" even if the individual is not available to be interviewed for reporting purposes. For some surveys (such as those focusing on nutrition outcomes), the female respondent may be the beneficiary woman or mother or primary caregiver of the index child (also the respondent for the pro-WEAI nutrition module). Please make sure that she is also the person interviewed for this questionnaire and that the male respondent is her spouse/partner (if applicable).
Please double-check to ensure: • You have completed the roster section of the household questionnaire to identify the correct primary and/or secondary respondent(s); • You have noted the household ID and individual ID correctly for the person you are about to interview; • You have gained informed consent from the individual in the household questionnaire; • You have sought to interview the individual in private or where other members of the household cannot overhear or contribute answers.
• Do not attempt to make responses between the primary and secondary respondents the same-it is okay for them to be different. Electronic copy available at: https://ssrn.com/abstract=3363753

HOUSEHOLD IDENTIFICATION (IN DATA FILE, EACH SUB-MODULE (G2-G8) MUST BE LINKED WITH A HH AND RESPONDENT ID)
HOUSEHOLD ID RESPONDENT ID  Electronic copy available at: https://ssrn.com/abstract=3363753

MODULE G3(B): ACCESS TO FINANCIAL SERVICES
Next I'd like to ask about your household's experience with borrowing money or other items (in-kind) in the past 12 months.
Would you or anyone in your household be able to take a loan or borrow cash/in-kind from [SOURCE] if you wanted to?
Has anyone in your household taken any loans or borrowed cash/in-kind from [SOURCE] in the past 12 months?

G3.13
An account can be used to save money, to make or receive payments, or to receive wages or financial help. Do you, either by yourself or together with someone else, currently have an account at any of the following places: a bank or other formal institution (e.g., post office)?

MODULE G8(C): LIFE SATISFACTION
The following questions ask how satisfied you feel with your life as a whole, on a scale from 1 to 5, where 1 means you feel "very dissatisfied" and 5 means you feel "very satisfied." Electronic copy available at: https://ssrn.com/abstract=3363753 70 HOUSEHOLD ID RESPONDENT ID

MODULE G9. Attitudes about Domestic Violence
Now I would like to ask about your opinion on the following issues. Please keep in mind that I am not asking about your personal experience or whether the following scenarios have happened to you. I would only like to know whether you think the following issues are acceptable.
In your opinion, is a husband justified in hitting or beating his wife in the following situations?