Measuring Women's Decisionmaking: Indicator Choice and Survey Design Experiments from Cash and Food Transfer Evaluations in Ecuador, Uganda, and Yemen

Despite wide use of women’s decisionmaking indicators, both as a direct measure of intrahousehold decisionmaking and as a proxy for women’s empowerment or bargaining power, little has been done to explore what such indicators capture and how effective they measure program impacts on empowerment. We review theoretical and operational evidence from recent literature on women’s decisionmaking and analyze survey experiments undertaken in cash and food transfer programs in Ecuador, Yemen, and Uganda from 2010 to 2012. We find large variations in how women are ranked in terms of decisionmaking depending on how indicators are constructed. In addition, we find that across countries, composite decisionmaking indicators are not consistently associated with other proxy measures of women’s empowerment or household welfare, such as women’s education levels or household food consumption. We also find mixed evidence across countries related to the impact of transfer programs on women’s decisionmaking indicators. We conclude with implications of our findings for future research and use of decisionmaking indicators for program evaluation in developing countries.


INTRODUCTION
Women's empowerment is increasingly seen as a crosscutting issue in development programming: not only is the field motivated by an intrinsic concern for gender equality, but women's empowerment also is perceived as a means to achieve development goals. The latter dimension stems from research demonstrating that more resource control by mothers (relative to fathers) is associated with a larger share of resources allocated to improving children's well-being, including through schooling and nutrition investments (for example, see Hoddinott and Haddad 1995 in Cote d'Ivoire; Thomas 1997 in Brazil;and Quisumbing and Maluccio 2000 in Bangladesh, Indonesia, Ethiopia, and South Africa). Indeed, many program designs and development outcomes have been shown to depend on women's ability to negotiate favorable allocations of resources within the household (Doss 2013). Thus, women's empowerment is recognized as a potential pathway for-or alternatively a constraint to-interventions such as agricultural, microfinance, or social cash transfer schemes, with potential to increase a broader range of social welfare indicators such as health and nutrition. Despite recognition that women's empowerment is important both as "an end" and "means" for economic development, there is limited evidence on how to increase it through policy, due at least in part to ambiguity on how best to measure it. One challenge is that there are a range of potentially important dimensions of empowerment. These include sociocultural, economic, familial, legal, political, and psychological domains. 1 A second challenge is that, even within a given domain, little consensus exists on what indicators are most meaningful and can show responsiveness to intervention. For example, an indicator can be captured at differing levels of aggregation (for example, individual, household, community, institutional); can be measured more or less directly (for example, measuring economic empowerment through direct questions on control over economic decisions or through indirect proxy measures like education level); and can be a strong predictor but not change in response to new intervention (for example, predetermined characteristics like assets brought to marriage or completed education). A final challenge is that dimensions and indicators are culturally and setting specific, further hampering the ability to make generalized and consistent conclusions across studies and geographic regions.
In the context of socioeconomic and health data collection, a commonly used quantitative measure of women's individual empowerment uses women's self-reported ability to participate in major decisions within the household. Although variations exist, these questions typically ask "Who in your household usually has the final say?" on decisions ranging from child health and education to household purchases and use of earned income, with possible responses including the respondent herself, her partner, the respondent and partner jointly, or others in the household. 2 Having a voice in intrahousehold decisionmaking can be considered an inherently meaningful dimension of empowerment, since it may be desirable in its own right and also determine directly how resources are allocated within the household. 3 The questions are also appealing in that they aim to measure the decisionmaking process in the household 1 In a review commissioned by the World Bank, Malhotra, Schuler, and Boender characterize the literature on women's empowerment as "vast and interconnected" and assert that "neither the World Bank, nor any other major development agency has developed a rigorous method for measuring and tracking changes in levels of empowerment" (2002,3). They present indicators for a variety of empowerment domains (sociocultural, economic, familial, legal, political, psychological) at different levels of aggregation (individual, household, community, institutional). The most commonly used definition of women's individual empowerment-some version of Kabeer's (2001) description of "the expansion in people's ability to make strategic life choices in a context where this ability was previously denied to them"-builds on a combination of themes rather than a single focus, including the dynamic expansion of options, choice, control, or power. Although some progress has been made toward harmonization in the last 10 years, particularly within specific domains of empowerment and for aggregate country-level indicators, considerable ambiguity remains around the diversity of indicators and associated policy prescriptions. directly rather than by drawing on proxies, provide measures at the disaggregated level of the household, and can in principle immediately capture changes in decisionmaking dynamics due to an intervention.
However, in practice, considerable ambiguity exists regarding how to construct indicators based on such questions, what the indicators can capture, and how useful they are for assessing effects of an intervention. 4 At the most basic level, it may not be straightforward to characterize which responses to the decisionmaking questions indicate a meaningful voice in the decisionmaking process; the issue arises particularly in deciding how to classify jointly made decisions. The interpretation of a given response may also differ based on the scenario in which a decision was made. For instance, being the sole decisionmaker in a case where there was actual disagreement with one's partner may reflect quite a different dynamic than being the sole decisionmaker when there was no disagreement. Additionally if the goal is to assess women's underlying empowerment, it may be important to capture how the decisionmaking arrangement that women themselves ideally prefer aligns with actual decisionmaking; however, specifying what decisionmaking role women consider ideal is not straightforward. For example, a woman may prefer to make a decision solely (for example, have the independence to decide without outside consultation), prefer to make a decision jointly (for example, possibly experience inherent utility from cooperation with a spouse, as Ashraf et al. 2014 suggest), or prefer not to be involved at all in a particular decision, and this may differ by domain. Moreover, because responses are self-reported, they could be subject to a range of biases, including social desirability bias that could skew the indicators constructed. In addition, once an indicator has been constructed over various decision domains, it may not be clear what that indicator actually captures. 5 If some decision domains included are irrelevant to women-for example, because they are not perceived as important decisions to contribute to-the overall indicator may be sufficiently noisy that it does not pick up meaningful measures of what might represent empowerment for the woman. Relatedly, in the context of measuring empowerment effects of interventions, it is not clear how relevant or responsive decisionmaking indicators might be. For instance, even if an intervention changes the dynamics of decisionmaking, it may not change "who" makes a particular decision, but cause a more nuanced change in the relative voices of individuals in the decisionmaking process that cannot be picked up by the indicators.
Despite the widespread use of decisionmaking indicators, there has been very little exploration of these issues. In this paper, we aim to shed light on them using a multicountry study conducted by the International Food Policy Research Institute (IFPRI) from 2010 to 2012. Our analysis explores the standard women's decisionmaking measures in the context of household survey data from impact evaluations of transfer programs in three different countries-Ecuador, Uganda, and Yemen. We conduct four sets of analyses, each focused on exploring a particular question: 1. Do relative rankings of decisionmaking change based on variations in indicator construction? Specifically those variations include (a) whether indicator construction includes joint decisionmaking as opposed to only sole decisionmaking; (b) whether indicator construction focuses on a scenario of threat points (who makes the final decision in the case of a dispute or disagreement); and (c) whether indicator construction takes into account women's own preferences on who makes decisions in particular domains. 2. Does the framing of decisionmaking modules affect the responses given? This is assessed by randomizing framing of the decisionmaking module to either (a) start with an introduction highlighting that many women have decisionmaking power (positive); (b) start with an introduction highlighting that many women lack decisionmaking power (negative); or (c) provide no introduction.
3. How are various composite decisionmaking summary scores and indexes associated with other proxy measures of women's empowerment or household-level welfare? Those proxy measures include (a) the woman's level of education; (b) the woman's age in years; (c) the household's Dietary Diversity Index; and (d) the household's per capita value of monthly food consumption and per capita value of monthly total consumption. 4. What is the estimated impact of various types of transfer programs on the women's decisionmaking indicators? This set of analyses uses the randomized evaluation design of the transfer programs to shed light on whether transfers-a type of intervention often aimed at increasing resources in the hands of women-had a measurable impact on decisionmaking, and whether the measurement of that effect varies with the type of decisionmaking indicator used. The randomized design of the evaluations introduces external variation that allows avoiding endogeneity bias. 6 Overall the paper aims to contribute to a dialogue on how better to measure and proxy for women's empowerment and what factors may cause bias in such indicators. Findings have implications for the broader use of women's empowerment and decisionmaking measures as inputs, as well as outcomes, in development research.
The remainder of the paper is structured as follows. Section 2 reviews frameworks used to guide analysis of women's empowerment and bargaining power, as well as reviews use of women's decisionmaking measures in the context of cash or near-cash transfer programs in developing countries. Section 3 describes the contexts in which the studies took place, the transfer programs, and the data collection. Section 4 reviews methods and indicators used in the analysis. Section 5 reports results. Section 6 concludes with a discussion of policy and research implications.

REVIEW OF DECISIONMAKING IN DEVELOPMENT RESEARCH Frameworks
Although we do not seek to contribute to underlying household bargaining theories, it is helpful to briefly review the theories, frameworks, and approaches that have guided analysis of intrahousehold resource allocation and women's decisionmaking. Doss (2013) reviews four different frameworks that have been used to investigate different components of intrahousehold bargaining, resource allocation, and decisionmaking in economics. The first is the early literature, which sought to test assumptions around unitary models of the household. This evidence arose originally to challenge the assumptions of the household as a single production or consumption unit, and produced ample support in diverse contexts to demonstrate that a household's decisions and outcomes were influenced by power dynamics and resource allocation within the household (Alderman et al. 1995). The second group of research aims to test the efficiency of production or consumption decisions in household allocations, using a variety of cooperative and noncooperative models. The third group of research examines determinants of household decisionmaking and resource allocation but does not focus on, or evoke, the objective of testing efficiency modules. The fourth uses experimental field-based games-for example, focused on trust, communications, savings, altruism, or risk-to understand how decisions are made within the household and other social settings (for example, Ashraf 2009 in the Philippines;Iverson et al. 2006 in Uganda). The literature we draw on to inform indicator choice and dynamics with transfer programs comes mainly, but not exclusively, from applications from the second group of research. For further discussion of frameworks and evidence, including usefulness for policy prescriptions, see Doss (2013).

Indicators
Intrahousehold bargaining questions typically ask some variation of the following to the female head, female spouse, or other adult woman in the household "Who in your household usually has the final say on a given domain?" Such questions can be broad or narrow; however, the standard domains typically refer to child health, child education, large and small household purchases, woman's own health care, woman's mobility, and use of woman's earned income. In research designed to measure impacts of a certain type of program, the questions can be more specific-for example, they may relate to decisions about whether to use family planning or decisions on the nutrition or feeding of young children. Responses typically are coded either as the woman herself, her partner, joint decisions between the woman and partner, or others in the household. Answers to these questions are routinely collected in the Demographic and Health Surveys, now covering more than 40 developing countries globally and widely used in observational research and program impact evaluations to inform associations and dynamics of women's empowerment.
While there has been little evidence explicitly testing indicators and sources of bias in conventional intrahousehold decisionmaking, the literature does discuss a number of reoccurring limitations. The first is around the treatment of jointness in decisionmaking. Although questions are typically sensitive enough to identify whether a decision is made solely by the woman or jointly by the woman and someone else, how should we treat these distinctions? Whereas it is tempting to assume for all cases that an autonomous decision, relative to a joint decision, is the one in which the woman has more power, the rationale for that possible ranking must clearly be conditioned on household composition. In a household with several adult members, a woman is more likely to make joint decisions based on sharing of resources and responsibilities. In addition, in such cases, it is often difficult to understand in the presentation of indicators with whom the decision is being made jointly and how much that matters for rankings. The implications for women's empowerment may be very different if the woman is making a decision jointly with her spouse or if she is making it jointly with her father, mother-in-law, or son. Further, in western societies, we often think that in the most equitable partnerships, decisions are discussed through open communication and made jointly. Therefore, it could be claimed that joint decisions should be ranked equal to or preferred to sole decisions; however, the actual dynamic may vary case by case. The issue of jointness further interacts with the importance of the decisionmaking domain. For example, one woman may make a sole decision on a relatively less important domain (for example, daily food preparation) and another woman a joint decision on a relatively more important domain (for example, purchase of a house). In this case, how would we rank or interpret their decisionmaking power relative to each other?
The second issue centers on threat points and underlying power in decisionmaking. For example, suppose a woman reports that she jointly makes decisions about her children's education with her spouse or partner. In one scenario, if there is a dispute or a decision is made that is out of line with the preferences of her spouse or partner, the decision could be reversed. Therefore, the joint decisionmaking power the woman reports would be a false representation of power dynamics in the household. Women might be more likely to report such false power dynamics for repeated small decisions in domains on which members of the household have "agreed" or have aligned preferences, such as, for example, what to cook for dinner or small daily household expenses.
The third potential issue centers on division of tasks in a household. Diverse household structures operate as consumption and production units in which each household member has certain tasks to perform and realizes a variety of benefits from taking part in activities. Consequently, division of labor, and potentially decisionmaking around those domains, must also occur. For example, if one spouse is expected to be responsible for attending to issues regarding children's education and the other is responsible for attending to issues regarding children's health, we might expect decisionmaking power in those domains to persist, even if one party becomes more economically empowered. In fact, in a world of scarce resources, including time, one might appreciate not having to be involved in decisionmaking and issues related to certain tasks. 7 When measuring traditional decisionmaking domains, this aspect is obscured. Typical measures assume that more decisions are preferable to everyone, whereas in reality, an individual may prefer to increase decisionmaking in one or two domains, and not care to be involved in others. As a result, interpreting the aggregation of decisionmaking indicators may not be straightforward.
Whereas the first three issues touch on aspects of decisionmaking amenable to survey design and thus potential measurement improvements, the last issue concerns perceptions of social desirability. Social desirability bias has been a long-researched topic, particularly around sensitive indicators ranging from reproductive health and sexual behavior to political opinions. In the case of decisionmaking, we might expect that a woman may be more likely to report that she has decisionmaking power over a domain when she thinks most other women in her community (or her peers) make decisions about that domain. In contrast, she may be more likely to report she does not make a decision in cases where she believes no women in her community (or her peers) make decisions in a given domain. Such biases may be particularly pronounced for more sensitive topics, such as decisions about family planning use, or when she views the interviewer as being distinctly different from her-in terms of age, class, race, or sex-and thus more likely to judge her decisionmaking responses. Although examination and robustness or sensitivity analysis to address these sources of indicator construction and bias have become more common in related gender topics (including asset ownership), there is less evidence on whether and how decisionmaking may be affected.

Decisionmaking and Social Cash Transfers
As the implementation and evaluation of social cash transfers (SCTs) has proliferated, gender has garnered increasing attention, particularly the potential for SCTs to empower women. Despite that potential, and the often stated objective of transfer programs to empower women, we lack empirical evidence from direct measures of empowerment that this has historically been the case. In a recent review of women's empowerment and nutrition, van den Bold, Quisumbing, and Gillespie (2013) summarize quantitative and qualitative evidence around key interventions seeking to empower women, including SCTs. Although the review focuses on cash transfers, rather than food or other in-kind transfers, the authors suggest that, although qualitative evidence on conditional cash transfers (CCTs) generally points to positive impacts, empirical results are mixed. In addition, due to the paucity of evidence from unconditional cash transfers and mixed results of current studies, few conclusions can be drawn regarding their ability to empower women. Although the review by van den Bold, Quisumbing, and Gillespie is inclusive in the variety of indicators included to proxy for empowerment, the review here will be limited to direct measures of empowerment, including decisionmaking indicators.
The majority of CCTs with documented impacts on direct decisionmaking come from the early government programs in Latin America in the late 1990s and 2000s. For example, qualitative evidence from Mexico's Progresa CCT, which gave cash to mothers, shows that the program contributed to women's empowerment through a number of channels, including increasing control over resources, educating women on health and nutrition, and providing opportunities for them to leave their homes. These and other dynamics resulted in positive impacts on empowerment domains including self-esteem and sense of self. However quantitative impacts were more mixed. For example, analysis of decisionmaking indicators found impacts only in the area of spending of own income out of five domains (Handa et al. 2009). Similar to Mexico, ethnographic work examining the Red de Proteccion Social CCT in Nicaragua found that the program has positive impacts on women's self-esteem and sense of independence as well as increasing intrahousehold relations (Adato and Roopnaraine 2004). Impact analysis of Brazil's CCT, Bolsa Familia, showed that out of decisionmaking indicators across eight areas, the program increased women's exclusive control over decisions regarding contraceptive use; weak but significant increases in decisionmaking on children's health and purchase of durable goods were also noted (de Brauw et al. 2013). However, these impacts were concentrated among households in urban areas, and in some cases, the transfer may have led to lower decisionmaking power in rural areas. A mixed-methods evaluation of the Kenya Hunger Safety Net Programme, where approximately 70 percent of recipients are women, found mixed results on women's social and economic empowerment (OPM and IDS 2012). Although the program increased women's status in their homes and communities, there were also reports of increased tension between spouses. Finally, the national Zambian Child Grant Program, an unconditional cash transfer targeted to mothers of children under the age of five, found no impact on decisionmaking indicators across nine domains after 24 months, regardless of whether the measurements were of sole or joint decisionmaking or indexes of decisionmaking (AIR 2013).
Since many transfer programs state as an objective the empowerment of women, and often impact evaluations are set up with the ability to parse out casual effects, transfer programs offer a promising tool for policymakers seeking to implement gender-transformative development programs. However, the mixed nature of results, particularly within quantitative data, is concerning. In addition, the evidence available likely suffers from publication bias, where analysis of nonimpacts may not be deemed interesting or worthy of publication, further indicating that impacts may be fewer or smaller than the reviewed body of evidence. A small but growing number of unpublished recent manuscripts demonstrate that a more careful examination of the effects of gender targeting on human capital outcomes, through randomization of the recipient's sex, is needed (Akresh et al. 2012;Undurraga et al. 2014). Therefore, it is essential for researchers and program implementers to critically examine whether the programs are actually designed in a way that facilitates empowerment, and whether the measures used are sensitive and specific enough to identify impacts on direct measures the programs aim to change.

DATA AND SETTINGS
From 2010 to 2012, IFPRI conducted impact evaluations of four World Food Programme (WFP) interventions testing alternative modalities to food assistance. 8 Although the transfer programs varied in design and focus, all evaluations rigorously tested differences in impacts and cost-effectiveness of food versus cash (and in one case food vouchers) on consumption, dietary diversity, and food security of poor households. In Ecuador, Uganda, and Yemen, specific attention was given to collecting a range of women's decisionmaking indicators, allowing for a comparative analysis investigating aspects of both survey design and impact of the transfer programs. The WFP purposefully selected countries and study designs with diverse cultural, geographic, and relief-related food security settings. In Ecuador, the program was a six-month food, cash, and food voucher distribution to poor Ecuadorians and Colombian refugees in eight urban and peri-urban centers of the northern provinces of Carchi and Sucumbíos ( Figure  A.1). The transfers were US$40 equivalent and given monthly, targeted to the female head of the household, conditional on attending a monthly nutrition training.9 The food transfers consisted of rice (24 kilograms), vegetable oil (4 liters), lentils (8 kilograms), and sardines (eight cans of 0.425 kilogram). In Uganda, the program was a 12-month food and cash distribution to caregivers of preschool-aged children attending UNICEF-sponsored early childhood development (ECD) centers in the northeastern districts of Kaabong, Kotido, and Napak in the Karamoja subregion ( Figure A.2). The transfers were unconditional, equivalent to $12, and distributed approximately every six weeks. The food transfers consisted of multiple-micronutrient-fortified corn-soy blend, vitamin A-fortified oil, and sugar, equivalent to 1,200 calories per day per child, including 99 percent of daily recommended iron requirements. In Yemen, the program was an approximately six-month emergency seasonal safety net to food-insecure households across 27 districts, with the evaluation taking place in Hajjah and Ibb governorates ( Figure A.3). The transfers were targeted to people meeting poverty and vulnerability criteria, unconditional and equivalent to on average $49 per household, distributed bimonthly. Transfers were given primarily to men due to concerns about women's mobility and handling of cash in program areas. Food transfers consisted of 50 kilograms of wheat flour and 5 liters of vegetable oil. Further details on program operations, timing, and contexts can be found in the main impact evaluations for each country (Hidrobo et al. 2014;Gilligan and Roy 2013;Schwab 2013).
Program evaluations were designed to measure the impact and cost-effectiveness of alternative modalities to food transfers, with variations by country. The Ecuador study design was a randomized controlled trial with randomization of 145 clusters to four arms-food, cash, food vouchers, and control-targeted to poor Ecuadorians and Colombian refugees in peri-urban centers along the northern provinces of Carchi and Sucumbíos. The baseline surveyed 2,357 households in March to April 2011 and the follow-up resurveyed 2,122 households in October to November 2011 (attrition rate of 10 percent). The Ecuador evaluation demonstrated that although all three modalities led to increases in diet quality and quantity, food transfers led to significantly larger increases in calories consumed, while vouchers led to significantly larger increases in dietary diversity (Hidrobo et al. 2014). The Uganda study design was a randomized controlled trial, which randomized 98 ECD centers to cash, food, or control. The baseline was conducted in September to October 2010 and surveyed 2,568 households with a child aged three to five, and the endline resurveyed 2,461 households in March to April of 2012 (attrition rate of 4 percent). The Uganda evaluation found that although food transfers had no impact on children's cognitive measures, cash significantly increased cognition, as well as intermediary outcomes including reductions in anemia, ECD attendance, and diet quality . The Yemen study randomized 136 food distribution points in the governorates of Hajjah and Ibb to cash or food. Based on conversations with the WFP country office, it was decided that randomization of a pure control group would be infeasible due to implementation concerns. The baseline was conducted in August 2011 and surveyed 3,536 households, of which 1,983 received transfers, and the endline was conducted among 3,510 households in March 2012 (attrition rate of 1 percent). The Yemeni evaluation found that on average, compared with food transfer households, cash transfer households had larger increases in dietary diversity, driven by higher consumption of protein-rich foods such as meat and fish (Schwab 2013). However, food transfer households had larger increases in daily caloric consumption, largely driven by higher consumption of the food basket items (wheat flour and vegetable oil).
Each setting represents a markedly different development and cultural context, particularly in relation to gender dynamics. According to the United Nations' most recent Human Development Index, from 2012, Ecuador, Uganda, and Yemen rank 89, 161, and 160 respectively out of a total of 186 rankings (UN 2013). These rankings stay relatively consistent at 83 (Ecuador), 110 (Uganda), and 148 (Yemen) when examining the Gender Inequality Index out of a total of 148 rankings. Despite ranking high on the Human Development Index, and having an equitable legal framework to support inheritance, asset ownership, and equal opportunities, many women in Ecuador face gender discrimination (Gallardo and Ñopo 2009). Recent national statistics show that gender-based violence is high across the country with the lifetime prevalence of intimate partner violence (IPV) estimated at 35 percent for physical violence, 14.5 percent for sexual violence, and 43.4 percent for psychological violence (INEC 2011). The population of Colombian refugees is thought to be particularly vulnerable to violence, discrimination, and trafficking. Although Uganda has made significant progress in terms of legal provisions stipulating women's property ownership, division of assets in the case of divorce or widowhood, and laws on gender-based violence, gaps remain in terms of actual implementation of laws and contradictions with customary law. The most recent Demographic and Health Survey shows that the median age at marriage among women aged 25 to 49 is 17.9 years, the total fertility rate is 6.2, and 56 percent of women aged 15 to 49 have experienced IPV since age 15 (UBOS and IFC International 2012). Yemen consistently ranks low or last on a range of gender-related indexes, reflecting widespread and entrenched gender discrimination. 10 National law still does not set limits on age at marriage for girls, has no provisions criminalizing IPV, including spousal rape or trafficking, and restricts women's freedom of movement without male guardians (SIGI 2011). Although data providing national statistics are scarce, particularly on issues such as IPV, there is a perception that incidences of gender-related discrimination, including honor killings in the case a woman is suspected of being unfaithful, and child marriage are high. National data from 2006 found that approximately 14 percent of girls were married before age 15 and 52 percent were married before age 18 in Yemen (Yemen, MoHP and UNICEF 2008).

METHODS
We conduct four sets of analyses, using country-level data and the randomized study design. The core sets of analyses all use endline surveys. This choice is made because the majority of questionnaire design variations regarding decisionmaking indicators were included only in endline surveys. In addition, since all three evaluations were successfully randomized at baseline, and none of the evaluations suffered from differential attrition, single-difference estimators based on endline surveys can be considered unbiased measures of the impact of the transfer programs on decisionmaking indicators. The strength of this estimation strategy relies on balancing of background socioeconomic and demographic characteristics at baseline, as well as non-differential attrition, which we demonstrate in all three country studies. 11

Exploring How Decisionmaking Rankings Are Affected by Variations in Indicator Construction
The first set of analyses explores the following question "Do relative rankings of decisionmaking change based on variations in indicator construction?" Specifically, such variations include (a) whether indicator construction includes joint decisionmaking as opposed to only sole decisionmaking; 12 (b) whether indicator construction focuses on a scenario of threat points (who makes the final decision in the case of a dispute or disagreement); and (c) whether indicator construction takes into account women's own preferences on who makes decisions in particular domains. Exploration a uses data from all three countries, whereas b and c use data from only Ecuador and Yemen.
Questionnaires in the three countries vary in their wording and inclusion of decisionmaking domains. Relevant modules from each country are included in Appendix Tables A.1 through A.3. In all three countries, a roster of decisionmaking questions is used to construct the various indicators, administered to one woman per household (typically the female head or spouse) aged 15 and above. The core question is "Who in your household usually has the final say in the following decision?" and is asked across the following domains: (1) woman's own work for pay; (2) children's education; (3) children's health; (4) woman's own health; (5) small daily food purchases; (6) large purchases of items like furniture, cattle, TV, or other assets; (7) bulk food purchases; (8) use of family planning; and (9) opening of bank accounts or borrowing money. In Ecuador, all question domains were asked, while in Uganda and Yemen, questions 7 and 9 were not asked. Further, because the percentage of women in Yemen and Uganda responding "not applicable" to the family planning domain was large, we omit question 8 in the composite analysis of both countries. Possible responses to these questions include (a) woman herself; (b) spouse or partner; (c) woman and spouse/partner jointly; (d) someone else in the household; (e) woman and someone else jointly; and (f) decision not taken/question does not apply. This core question is followed by several additional questions asking whether a disagreement or dispute had occurred about the decision during a specified recall period (corresponding to the intervention period in that country: 6 months for Ecuador, 3 months for Yemen, 12 months for Uganda), and if so, when the decision was resolved, who the ultimate decisionmaker was. Responses to this question based on "threat point" are restricted to (a) woman herself; (b) spouse/partner; or (c) someone else in the household. Only in Ecuador and Yemen, if no dispute took place in the recall period, the question is followed by a hypothetical question asking if there were ever to be a disagreement, who would have the ultimate decisionmaking power. Finally, only in Ecuador and Yemen, the question "In an ideal situation, who in your household would make the decision?" is asked for each decisionmaking domain above.
To explore how differences in indicator design affect rankings of decisionmaking, we create several composite measures. First, we construct four different measures, using simple summations over the number of domains in which (1) the woman is the sole decisionmaker; (2) the woman is a sole or joint decisionmaker; (3) the woman is the ultimate sole decisionmaker after a dispute/disagreement, actual or hypothetical; and (4) the woman's ideal decisionmaker aligns with the actual decisionmaker. We note that only numbers 1 and 2 can be constructed in Uganda; number 3 cannot be constructed because hypothetical disagreement was not asked about in Uganda and actual reported disagreement was very low, while number 4 cannot be constructed because the question on the ideal decisionmaker was not asked.
Second, we create analogous indexes to these four count measures using the first factor from factor analysis rather than simple summation. Factor analysis may yield a more nuanced construction of the composite indicators, since it accounts for joint variations across the responses by domain (Kline 1993).
Because each of these composite indicators requires responses for all domains included, they are constructed only for the sample of women that report applicable decisionmaking occurring across all domains. Women who report "decision not made/not applicable" for any of the domains are excluded from this analysis. 13

Exploring How Differences in Framing Affect Responses to Decisionmaking Questions
The second set of analyses explores the following: Does the framing of decisionmaking modules affect the responses given? In Uganda only, the framing of the decisionmaking module was randomized to either (a) start with an introduction highlighting that many women have decisionmaking power (positive); (b) start with an introduction highlighting that many women lack decisionmaking power (negative); or (c) provide no introduction. In particular, the positive script includes the sentence "There are many women in Uganda who are able to exert control over decisions in their household and can influence important aspects of their lives, while the negative script includes the sentence There are many women in Uganda who are not able to exert control over decisions in their household and cannot influence important aspects of their lives." This introduction is meant to frame a reference point for the respondent that she may assume is expected or plays out in other households similar to her own. We assess whether the different framing results in any measurable difference in responses, as captured by the composite decisionmaking indicators.

Exploring Associations between Decisionmaking Indicators and Other Proxy Measures of Women's Empowerment or Household Welfare
The third set of analyses examines the following: How are various composite decisionmaking summary scores and indexes associated with other proxy measures of women's empowerment or household-level welfare? The other proxy measures include (a) the woman's level of education; (b) the woman's age in years; (c) the household's Dietary Diversity Index (DDI); and (d) the household's per capita value of monthly food consumption and per capita value of monthly total consumption. Means for the analysis sample of these proxy measures are reported for each country in Appendix Table A.13.
Existing evidence using correlating proxy measures of women's status with decisionmaking hypothesize and show that in many contexts "more empowered" women tend to be older, more educated, and reside in wealthier households (Kishor and Subaiya 2008). It is therefore interesting to assess whether various constructions of the decisionmaking indicators show the same pattern of associations with age, education, and household consumption in our data. The analysis of associations is conducted using unadjusted ordinary least squares regression with standard errors clustered at the community or village level, corresponding to the survey design in each country.
Measures of women's education and age come from the household roster administered at the start of each household interview in all three countries. The DDI measure sums the number of distinct food items consumed by the household in the previous seven days (Kennedy, Ballard, and Dop 2011). 14 The household food consumption aggregates are constructed from the value of food eaten inside and outside the home in the last seven days. Food eaten in the home is composed of different food items consumed from not only food purchased in the marketplace but also food produced at home, food received as gifts or remittances from other households or institutions, and food received as payments for in-kind services. Median prices from food purchased are used to calculate the total value of food consumed from home production or received as gifts or in-kind payments. Weekly household values of food consumed are converted to monthly values, which are then converted to household per capita values by dividing by the number of household members. Nonfood consumption is calculated from the value of items purchased or acquired in the last month or three months from categories ranging from household and kitchenware to clothes and shoes. All values are again converted to monthly per capita values. Total consumption is constructed from a household's nonfood and food consumption. Further information on these household welfare aggregates is found in Hidrobo et al. (2014), Gilligan and Roy (2013), and Schwab (2013).

Exploring Impacts of Transfer Programs on Decisionmaking Indicators
The final set of analyses uses the evaluation study designs of the transfer programs to answer the following: What is the estimated impact of various types of transfers on these women's decisionmaking indicators? This analysis will shed light on whether the various decisionmaking indicators are responsive to receipt of the transfer programs, which in some cases were designed to increase resources in the hands of women. It is worth noting that if no statistically significant impact is found, that could reflect either that the transfers truly did not cause any meaningful change in the decisionmaking process or alternatively that the indicators we construct are not appropriate for capturing these changes. 15 Estimations are conducted using ordinary least squares regression and controlling for geographiclevel fixed effects at the level of stratification (province in Ecuador, district in Uganda, and district in Yemen). Standard errors are clustered at the neighborhood, ECD center, or food distribution point level, corresponding to the level at which transfer modalities were randomized in each country. In Yemen, because there was no dedicated randomized control group, estimates also include a set of covariates to control for unobserved targeting and other factors related to difference in eligibility cutoffs. Those covariates include age, education and marital status of female respondent, household size, household demographic indicators, wealth quintiles, and district fixed effects measured at the time of the follow-up.

How Are Decisionmaking Rankings Affected by Variations in Indicator Construction?
We begin by describing statistics on the distinct decisionmaking domains used to construct the composite measures, shown in Appendix Tables A.4 through A.12. The analysis sample is made up of all households in the endline surveys across countries with at least one woman eligible to answer the decisionmaking module and with applicable decisions across domains, resulting in approximately 1,174 women in Ecuador, 921 women in Yemen,16 and 1,860 women in Uganda. As previously noted, statistics are very similar for the analysis sample (women reporting applicable decisionmaking in all domains) and the full sample for each question (women reporting applicable decisionmaking in at least that domain), minimizing concerns that the restriction biases the analysis sample. 17 In Ecuador, the decisionmaking domains in which the most women have sole decisionmaking power are those related to their own health (74 percent) and small daily food purchases (61 percent), whereas the fewest women report sole decisionmaking related to large asset purchases (34 percent) and ability to open bank accounts and borrow money (38 percent). Similarly, in Yemen and Uganda, the most women report sole decisionmaking for their own health (47 and 66 percent respectively) and the fewest for large asset purchases (37 and 19 percent respectively).
We then turn to the composite summary scores and indexes described in Section 4 and reported in Table 5.1. In Ecuador, on average women make sole decisions on approximately 4.5 out of 9 domains, while they make sole or joint decisions on approximately 7.5 out of 9 domains. When focusing on threat points in the case of disagreement, women make the final decision on an average of 6 decisionmaking domains. In approximately 5.3 out of 9 domains, women's ideal decisionmaker is the actual decisionmaker. In Yemen, on average women make sole decisions in approximately 2.5 out of 6 domains and either sole or joint decisions in 3.1 domains. Focusing on threat points, the number of domains in which women make final decisions decreases slightly to 2.4, and women's ideal decisionmaker and the actual decisionmaker align in only 2.3 domains. In Uganda, women report making sole decisions in an average of 2.5 domains out of 6, and sole or joint decisions in about 4.1 domains. As described above, indicators related to decisionmakers under threat points and ideal decisionmakers cannot be constructed in Uganda.
In all three countries, the analogous "index" for each of the count measures is constructed based on the first factor from factor analysis. By construction, these are mean 0; Table 5.1 shows their standard errors. 16 Sample sizes are comparatively smaller in Yemen because only female enumerators (approximately half the enumeration team) collected data on women's indicators, as it was infeasible for male enumerators to interview women alone due to cultural acceptability. 17 The sample size difference between the analysis sample and the sample answering any given decisionmaking question varies. For example, the analysis sample makes up 77 percent, 100 percent, and 83 percent of the sample answering over a decision on children's education in Ecuador, Yemen, and Uganda respectively. Alternatively, the analysis sample makes up 61 percent, 91 percent, and 85 percent of the sample answering over a decision to work for pay in Ecuador, Yemen, and Uganda respectively. The ratio of the total sample for any given question will depend on, among other things, the demographic composition of the sample, as well as their consumption patterns for daily and larger asset purchases. Finally, we present simple correlations between the indexes to assess whether implied rankings of women's decisionmaking differ across the different constructions of indicators. Table 5.2 shows these correlations for all three countries. In Ecuador, correlations between the indexes are fairly low. The correlation between the index using only sole decisionmaking and the index using sole or joint decisionmaking is only 0.31, reflecting that there are considerable shares of women reporting joint decisions in domains (Appendix Tables A.4 to A.9), and choosing whether to include these joint decisions as a meaningful voice in the decisionmaking indicator substantially changes how women's decisionmaking is ranked across households. The correlation between the index using sole decisionmaking (without focus on a threat point) and the index using sole decisionmaking after a disagreement is 0.65, which while fairly high is notably less than 1. This discrepancy (taken together with the descriptives in Table 5.1) reflects that there are cases in which women report that they do not make sole decisions within a domain, yet when asked to consider the case of a disagreement, report that they would ultimately be the sole decisionmaker. Given that the two constructions do not perfectly correlate, the implication is that the choice of whether to focus on a threat point in indicator construction may indeed matter for rankings. Moreover, given that the latter construction may better capture the ability to make sole decisions (since a joint decision with no disagreement is nearly equivalent to a sole decision), the finding indicates it may be worth using indicators that focus on the case of threat points if the key interest is in fact on sole decisionmaking. Finally, the correlation between the index based on "ideal" decisionmaker and the other indexes ranges from 0.43 to 0.52, again notably less than 1. This implies that women's notion of the ideal decisionmaker does not perfectly align with any of these other categories. For example, the correlation between the index using sole decisionmaking and the index on ideal decisionmaking is only 0.52; if women's ideal decisionmaking arrangement were to always be the sole decisionmaker, the correlation would be 1, and similarly for the other categories. This finding highlights that any indicator construction that uniformly chooses a particular response as constituting a voice in decisionmaking may not necessarily align with every woman's desired voice in decisionmaking, and rankings from constructions based on researcher specifications as opposed to respondent preferences may again meaningfully differ.

Table 5.2 Correlations between composite decisionmaking indexes by country
Source: Authors' calculations based on impact evaluation surveys. Notes: Correlations for indexes created using the first factor from factor analysis, with alpha statistics of fit, reported in column below indicator. Correlations using summations of indicators result in similar bounds on correlations and thus are not reported.
In Yemen, in contrast to Ecuador, correlations are quite high, ranging from 0.74 to 0.89. This is consistent with the observation that in Appendix Tables A.4 to A.9, the shares reporting decisionmaking according to different definitions within a given domain are very similar. For example, there are relatively low reports of joint decisionmaking in most domains, such that whether joint decisionmaking is included or excluded has little effect on shares; there are also very similar shares reported of sole decisionmaking with and without mention of threat points, such that the issue of threat points appears not to be as relevant as in Ecuador. In fact, the descriptives suggest that joint decisionmaking between males and females may be somewhat uncommon in Yemen, and decisions may typically be made by one party or another without much discussion. The implication may be that, in such cases where decisionmaking processes appear more segregated across spheres, the nuances in indicator construction may also matter less, and implied rankings across the various indicators may be quite similar.
Finally, in Uganda, where the main relationship to be assessed is between the index using sole decisionmaking only and the index using sole or joint decisionmaking, the correlation is 0.38. This finding is very similar to the finding in Ecuador. Appendix Tables A.4 to A.9 show that, as in Ecuador, considerable shares of women in Uganda report joint decisions in domains, such that choosing whether to include these joint decisions as a meaningful voice in the decisionmaking indicator substantially changes how women's decisionmaking is ranked across households.
Overall, the implication of these findings is that choices made in indicator construction can substantially change how households are ranked with respect to the corresponding indexes of women's decisionmaking. As might be expected, the extent to which differences in indicator construction matter appears to depend on how nuanced the underlying decision process is. In Ecuador and Uganda, where substantial shares of women report joint decisions in many domains, whether sole decisions only or both sole and joint decisions are counted in the indicator as a voice in decisionmaking results in a substantial difference in the implied ranking; meanwhile, in Yemen, where joint decisions are uncommon, including or excluding joint decisions naturally has less effect on the rankings. In Ecuador, where higher shares of women tend to report taking sole decisions after disagreement than taking sole decisions in general, whether the threat point is considered substantially affects the ranking; in Yemen, where the shares reported of sole decisionmaking do not differ meaningfully by whether the threat point is raised, the resulting rankings are also very similar. Finally in Ecuador, rankings based on whether women's ideal decisionmaker is the actual decisionmaker do not align closely with rankings based on any of the other categories, indicating that women in Ecuador may not prefer one uniform type of decisionmaker across all domains; in Yemen, the rankings are more similar between the "ideal decisionmaker" index and all other categories, indicating there may be more uniformity in women's preferred decisionmaker across domains. A takeaway is that when choosing indicator construction the researcher should consider carefully what he or she intends to measure (For example, does the researcher want to consider joint decisions a meaningful voice in decisionmaking? If the researcher's focus is on sole decisionmaking, does that need to be distinguished from joint decisionmaking without disagreement? Does the researcher want to specify which role for the female respondent constitutes a meaningful voice in decisionmaking or simply allow her to state whether the decisionmaker in each domain is her ideal one?), because depending on the complexity of the decisionmaking process in the study context, such choices may substantially change the rankings.

Do Variations in Questionnaire Framing Affect Responses to Decisionmaking Questions?
We then consider our evidence for whether the way decisionmaking questions themselves are asked affects the responses given. For this analysis, we draw on results from Uganda, where the framing of the decisionmaking module was randomized into three groups: (1) women receiving a "positive" introduction; (2) women receiving a "negative" introduction; and (3) women receiving no introduction. Table 5.3 shows no significant differences in any of the composite indicators by randomized introduction type. With the caveat that we have only one country's data for this analysis and only one type of framing variation, we find no evidence for effects of questionnaire framing on resulting decisionmaking indicators.

How Are Decisionmaking Indicators Associated with Other Proxy Measures of Women's Empowerment or Household Welfare?
We next turn to associations between the composite decisionmaking measures and other proxy measures of women's empowerment or household welfare. Tables 5.4, 5.5, and 5.6 present summaries of these associations in the Ecuador data, Yemen data, and Uganda data respectively. In Ecuador, we find that women's older age and households with higher per capita monthly value of food and total consumption are significantly associated with higher decisionmaking. However, women's education is not associated with decisionmaking, and results for household DDI are mixed. In Yemen, women's older age is again consistently correlated with higher decisionmaking. Results for women's education are similarly mixed, we see no significant associations with household per capita value or food or total consumption, and associations with DDI are perhaps the opposite of what we would expect. In particular, we might anticipate from existing evidence that households where women's decisionmaking is higher would be those where dietary diversity is also higher; the opposite holds. In Uganda, women's older age is the only consistent proxy measure associated with higher decisionmaking. All household-level indicators show no or weak associations. Women's education is significantly associated with higher decisionmaking for only the measures related to sole or joint decisionmaking, but not for the measures related to sole decisionmaking only; this reflects the possibility that having at least a primary education may allow women to participate at least jointly in household decisions, even if not make them solely.
With the caveat that these are associations, and do not represent any causal relationship or control for other characteristics, our evidence overall shows no consistent patterns between the decisionmaking indicators and other proxies for women's empowerment or household welfare, other than a strong positive correlation with women's older age.  Notes: Each cell represents a separate regression. Standard errors in parentheses. * p < 0.1; ** p < 0.05; *** p < 0.01. Sums and indexes computed through factor analysis represent nine domains of decisionmaking asked to female head of household or spouse aged 15 and above. Decisionmaking after disagreement is hypothetical or actual disagreement. Analysis sample is restricted to women who have answers across all domains of decisionmaking. All regressions control for province of residence.

What Is the Impact of Transfer Programs on Women's Decisionmaking Indicators?
We turn finally to assessing impacts of transfer programs on the women's decisionmaking indicators. As described above, because we compare endline indicators in studies where randomization was successful in balancing pre-program characteristics across treatment arms, we can reasonably interpret any differences found as causal impacts of the transfers. For Ecuador, Table 5.7 shows the absolute impacts of the pooled treatment (food, cash, or voucher) on women's decisionmaking indicators. Table 5.8 shows impacts by transfer modality. Results indicate that the transfer program had no measurable absolute impact on increasing these decisionmaking indicators, regardless of which composite score is used and regardless of transfer modality (food transfer, cash transfer, or voucher). Table 5.8 also presents Wald tests of significant differences between treatment arms, showing whether there are significant relative impacts between each pair of transfer modalities (that is, food versus voucher, cash versus voucher, food versus cash) on the composite indicators. Between nearly all pairs of transfer modalities, there are no significant differences in treatment impact; the exception is a small difference between the impact of food and the impact of cash on the count of sole or joint decisions, which is likely to be negligible given its borderline significance (p-value = 0.09). We conclude that in Ecuador there were no measurable causal impacts on these women's decisionmaking indicators due to the transfer program.  Source: Authors' calculations based on impact evaluation surveys. Notes: Each cell represents a separate regression. Standard errors in parentheses. * p < 0.1; ** p < 0.05; *** p < 0.01. Sums and indexes computed through factor analysis represent nine domains of decisionmaking asked to female head of household or spouse aged 15 and above. Decisionmaking after disagreement is hypothetical or actual disagreement. Analysis sample is restricted to women who have answers across all domains of decisionmaking. All regressions include a constant and control for province of residence.
Electronic copy available at: https://ssrn.com/abstract=2685232 For Yemen, we cannot assess absolute impacts of transfers, because the study contained no randomized control group. However, because food and cash transfer modalities were randomized within the treatment group, the relative impacts of food versus cash can be estimated. Table 5.9 reports the relative impact of food as compared with cash on the composite decisionmaking indicators. Results indicate that, relative to cash, food has a significantly larger impact on the two measures of ideal decisionmaking. A caveat in this finding is that we cannot establish whether the food transfer has any absolute impact on these measures relative to no transfer (for example, it is possible that the cash transfer could in fact have a negative absolute impact, such that even no absolute impact from food would still be a significantly larger impact than cash). However, one possibility for why the food transfer might change decisionmaking to align more with women's ideal pattern than cash is that in Yemen food preparation is traditionally a female role, whereas cash is traditionally less in women's domain of control. Notes: Each cell represents a separate regression. Standard errors in parentheses. * p < 0.1; ** p < 0.05; *** p < 0.01. Sums and indexes computed through factor analysis represent six domains of decisionmaking asked to female head of household or spouse aged 15 and above. Decisionmaking after disagreement is hypothetical or actual disagreement. Analysis sample is restricted to women who have answers across all domains of decisionmaking. All regressions include a constant and control for the following: age in years, education level and marital status of female head, household size and demographics, wealth quintiles, and district of residence.
Electronic copy available at: https://ssrn.com/abstract=2685232 For Uganda, Table 5.10 shows the absolute impacts of the pooled treatment (food or cash) on women's decisionmaking indicators. Table 5.11 shows impacts by transfer modality. Results indicate that at the pooled level, the transfer program has a weakly significant absolute impact on the sole and joint decisionmaking indicators. Distinguishing by modality, the cash transfer appears to be driving this increase in the measures of sole or joint decisionmaking. Given that cash has no significant impact on the measures of only sole decisionmaking, this reflects that the cash transfers increase joint decisionmaking by approximately 0.5 of a decision (out of 6 possible decisions). Analogous to the Ecuador results, Table  5.11 presents Wald tests of significant differences between treatment arms. We see that cash has a significantly larger impact than food on the measures related to sole or joint decisions (p-value = 0.03), but not those related to only sole decisions. (0.08)*** (0.13)*** (0.05) (0.07) Source: Authors' calculations based on impact evaluation surveys. Notes: Each cell represents a separate regression. Standard errors in parentheses. * p < 0.1; ** p < 0.05; *** p < 0.01. Sums and indexes computed through factor analysis represent six domains of decisionmaking asked to female head of household or spouse aged 15 and above. Analysis sample is restricted to women who have answers across all domains of decisionmaking. All regressions include a constant and control for district of residence. 0.55 0.03 ** 0.14 0.03 ** Source: Authors' calculations based on impact evaluation surveys. Notes: Each cell represents a separate regression. Standard errors in parentheses. * p < 0.1; ** p < 0.05; *** p < 0.01. Sums and indexes computed through factor analysis represent six domains of decisionmaking asked to female head of household or spouse aged 15 and above. Analysis sample is restricted to women who have answers across all domains of decisionmaking. All regressions include a constant and control for district of residence.

DISCUSSION AND CONCLUSIONS
Our results across three countries show a number of notable findings related to decisionmaking indicators. First, we find that small variations in how indicators are constructed using decisionmaking questions can result in substantially different rankings of women's decisionmaking across households, and that the sensitivity to indicator construction is itself heterogeneous across the country studies. Variations in indicator construction include whether to consider joint decisions (as opposed to only sole decisions) as a meaningful voice in decisionmaking, whether to focus on decisionmaking in a "threat point" scenario where there is disagreement, and whether to account for a woman's own preference on the ideal decisionmaker in each domain (which may not include herself). Such compositional differences cause more changes in rankings of decisionmaking power in contexts where the decisionmaking process itself seems more nuanced. For example, in Ecuador and Uganda, where large shares of women report joint decisions across many domains, inclusion of joint decisionmaking changes rankings more than in Yemen, where women rarely report joint decisionmaking. Second-with the caveat that this analysis draws on only one example from one country study (Uganda)-we find no evidence that framing decisionmaking questions with a positive or negative social desirability bias affects responses in a way that significantly changes composite decisionmaking indicators. Third, we find that the composite decisionmaking indicators we construct show few consistent associations with other proxies for women's empowerment or household welfare, other than being positively correlated with women's older age; other associations vary considerably across countries. Finally, we find a mixed pattern of impacts from transfer programs on the composite women's decisionmaking indicators. In Ecuador, the transfer program has no measurable impact on the decisionmaking indicators. In Yemen, where only the relative impact of food versus cash can be estimated, food has a significantly larger impact than cash on the measures related to women's ideal decisionmaking. The modality difference in Yemen could plausibly be due to food preparation falling more within women's traditional roles than handling of cash. In Uganda, the pooled treatment (food or cash) shows weakly significant impacts driven by the cash treatment specifically causing increases in the measures related to sole or joint decisionmaking; given that cash does not show any significant impact on measures related to only sole decisionmaking, this likely reflects that the cash transfers in Uganda increase women's role in joint decisionmaking.
These findings potentially raise as many questions as they answer. Although our results show that choices in indicator construction matter for implied rankings, they cannot provide clear guidance on which choices should be made. Specifically, whether to consider participation in joint decisions a meaningful voice, whether to focus on decisionmaking only in the case where there is actual disagreement, and whether to base indicators on women's own ideal decisionmaking arrangements are all choices that require greater insight into the actual dynamics of the decisionmaking process in a given study context. The variation across countries in the sensitivity of the indicators to construction suggests that, similar to some psychometric instruments, country-or region-specific validation might be required for meaningful interpretation. By first understanding the relationship of these indicators to a much more detailed assessment of empowerment and decisionmaking in a small sample, researchers may then be able to extract meaning from variation in these coarser decisionmaking variables among a larger sample from a comparable population. In addition, it is notable that the only proxy characteristic of women or households that we find to be consistently associated with the composite decisionmaking indicators is women's older age; however, it is unclear what to make of the mixed pattern of other associations across countries. For example, women's education may in fact allow greater participation in joint decisions in Uganda, while no similar dynamic exists in Ecuador or Yemen; alternatively, it is possible that the lack of consistent associations across countries reflects that the composite indicators do not robustly capture the important dimensions of empowerment. Similarly, the mixed findings across countries on the impacts of transfer programs on decisionmaking indicators reflect the possibility that transfers could have very different effects on decisionmaking in different circumstances; however, it is less clear which particular circumstances underlie such differences.
Taken together, the results suggest that considerable room exists for further research on how decisionmaking is understood and measured using household surveys in diverse contexts, including interrogation of its relationship with other direct and indirect measures for empowerment, with an important role for qualitative work. Indeed, similar conclusions are made by Carter and colleagues (2014), who use quantitative and qualitative methods to consider four dimensions of empowerment-(1) power over; (2) power to; (3) power within; and (4) power with-in three projects implemented by HELVETAS Swiss Intercooperation. In these projects, in very different settings and contexts-a vocational training in Nepal, value chain and income generation in Bangladesh, and a local governance project in Kosovo-the main finding is that quantitative methods capture "power to," or individual agency, while qualitative methods are necessary to engage in the more holistic spectrum of impacts and outcomes. For example, the authors argue that dimensions such as "power within," or the psychological strength felt by individuals, do not lend themselves to be captured by quantitative survey methodology. In fact, some development practitioners suggest that empowerment must be measured qualitatively, since it cannot be externally defined, and instead should be judged by the individual who is undergoing the empowerment him-or herself (Carter et al. 2014;Bishop and Bowman 2014). Yet such a reaction confuses the inadequacy of commonly used empowerment indicators for the futility of all quantitative approaches to measurement of empowerment and intrahousehold issues more generally. Even if one were to accept the premise that such issues defy objective assessment, quantitative tools can readily be applied to issues of subjectivity (see, for example, work on subjective poverty lines- Groedhart et al. 1977;Ravallion 2012).
Rather than a simple either/or approach, a more thoughtful integration of quantitative and qualitative methods could address a substantial weakness in current methods identified here: that we do not know what the typical decisionmaking indicator is actually measuring. Carefully conducted qualitative work among smaller samples in populations of interest could be used to validate and facilitate the interpretation of quantitative instruments used on a large scale at the household level. Such a model leverages the ability of methods that permit the exploration and identification of deeper elements of identification to inform the collection of data on a larger, more standardized scale.
Results also highlight the challenges of measuring impacts of programs (including transfer programs) on women's decisionmaking using traditional indicators. A clear complication is in distinguishing no significant impacts on specific indicators from no significant impacts on a broader notion of empowerment. The Ecuador study provides a useful example. Using the same intervention and dataset, but using additional quantitative indicators as well as qualitative methods, Hidrobo, Peterman, and Heise (2015) find that all three transfer modalities in Ecuador significantly reduced dimensions of IPV experienced by women. In particular, transfers reduced controlling behaviors, moderate physical violence, and indicators of any physical or sexual violence, with possible mechanisms including improvements in women's bargaining power as well as decreases in poverty-related conflict. These findings would indicate that a meaningful dimension of women's empowerment did improve, although it is not reflected by impacts in the composite decisionmaking indicators studied in this paper. The combination of results from the analysis in Hidrobo, Peterman and Heise (2015) and in this paper provide a useful caution: impacts (or lack thereof) on the decisionmaking indicators should not be overinterpreted to conclusively state that a program had (or did not have) effects on "empowerment" broadly.
These findings are consistent with recent reviews of literature on SCTs globally. This lack of impact could be due to both challenges in measurement and the sensitivity of design features within transfer programs to actually make sustainable changes in women's decisionmaking, empowerment, and bargaining power. Transfers may simply increase total resources within the household without changing power dynamics. The aforementioned study by Carter and colleagues (2014) also suggests that emphasis on women's empowerment by donors and their focus on showing impact on these indicators can create problems for nongovernmental organizations if the routinely collected indicators are not specific or sensitive enough to actually show changes. Thus, despite a number of specific program components designed to allow women to engage in and benefit from programs, including child care, female staff, microcredit access, and women-specific trainings, program staff assert that capturing casual impacts between their intervention and women's empowerment indicators was far beyond the realm of their monitoring and evaluation systems.
Such arguments are somewhat less critical if, as in this paper, we focus on decisionmaking as processes that may inform relative power positions within the household, rather than claim this measure represents a holistic measure of empowerment. However, even if we restrict our analysis to decisionmaking, by implementing a "standard"-type module such as in the Demographic and Health Surveys, we are making the implicit assumptions that these domains are equally (or similarly) important to women within or across different cultural contexts and that such domains are relevant to measure the impact of the specific intervention. As an example, after implementing the decisionmaking module in Uganda, we ask women which decisionmaking domain they would most like to increase their control over-and get varied responses. Discrimination and gender norms play out in vastly different ways in different cultural, economic, and political contexts. In addition, a set of standard questions that have the benefit of some degree of applicability to all households may simply be too general for a program evaluation context. For example, if we use child health and schooling decisions to measure the impact of a cash transfer for food security given to women, we may miss the more specific changes in decisionmaking about where she goes to buy the food or about who gets served first at mealtime. The approach of increasing the specificity of domains is taken by the Women's Empowerment in Agriculture Index, which asks questions specific to agricultural production, such as about decisions on crop choice and sale of crops, to understand specific domains that agricultural interventions may affect (Alkire et al. 2013). It should also be recognized that empowerment is not always a linear process. Interventions intended to have an empowering effect may have an initial backlash, where gender relations worsen in the short term due to women's challenging of traditional norms (see, for example, discussion in Hidrobo, Peterman and Heise, 2015). Therefore, evaluations that measure decisionmaking in the short term (for example, one to three years) and at one point in time may in fact conclude that there is no program impact if change is slow or nonlinear.
Several limitations to the current study merit mentioning, in addition to those previously described regarding the decisionmaking indicators themselves. First, the households in the studies cannot necessarily be taken as representative of populations in the study countries, and therefore the impacts (or lack thereof) of interventions may also not be generalizable. In addition, the interventions were not designed specifically with the goal of empowering women. Although the WFP often prioritizes gender considerations in its programs, these particular interventions (with the possible exception of Ecuador) were largely designed to directly alleviate household food security. It is important to note that we found no negative impacts on women's decisionmaking as measured across all constructs in all countries. Finally, these interventions were all short-term transfers (one year or less), and thus it is possible that not enough time had elapsed to identify effects on complex and potentially slow-moving targets like women's decisionmaking.
There are a number of additional important issues surrounding the measurement of direct indicators of women's empowerment that we do not attempt to investigate but that merit further exploration. For example, as Doss (2013) highlights, one important issue is who in the household is participating in bargaining. For the purposes of simplicity and comparison, many analyses restrict the analysis to a woman and her spouse, when in reality diverse household structures exist (for example, women living with their spouses as well as mothers-in-law, women living with other wives in polygamous households, and so forth), and these may play a critical role in terms of which bargaining parties determine how resources are allocated and which factors affect that bargaining. To our knowledge, no analyses have examined experimentally how indicators change if they are asked in relation to couples versus entire households. A second important issue is that although nearly all women's bargaining measures are collected by asking woman decisionmaking questions, a literature from a variety of topics (including health behavior, consumption, and others) demonstrates that spouses often have very different perceptions of answers even to very basic questions. In addition, qualitative methodology soliciting the range of domains a woman finds most important in her life and would ideally seek to influence may allow more nuance in how and why decisions are made in the household than when domains are assumed from the standard questions. It is likely that at least in some cases, the general decisionmaking questions asked do not reflect domains that empowered women seek to change or challenge with increasing autonomy. Finally, a variety of potential response biases around interviewer characteristics, including sex, age, class, and race differences, may be important in the measurement and analysis of decisionmaking indicators. Moving forward, these are the types of questions researchers should address in an effort to understand which programs and policies can contribute most significantly to the empowerment of women within development contexts, and the perceived success (or failure) of certain programs to empower women should not rest simply on a set of standard questions on general decisionmaking domains.

Figure A.2 Location of study sites in Karamoja (Uganda)
Source: Gilligan and Roy (2013).

Figure A.3 Map of food distribution sites in Hajjah and Ibb governorates (Yemen)
Source: WFP CO-Yemen (2011). Electronic copy available at: https://ssrn.com/abstract=2685232 Electronic copy available at: https://ssrn.com/abstract=2685232 Electronic copy available at: https://ssrn.com/abstract=2685232   Notes: Standard errors in parentheses. Questions asked to female head of household or spouse aged 15 and above. Decisionmaking after disagreement is hypothetical or actual disagreement. Analysis sample is restricted to women who have answers across all domains of decisionmaking. In assessing disagreement in the domain, recall periods used were 6 months in Ecuador, 3 months in Yemen, and 12 months in Uganda, to correspond to respective intervention periods. NA = not applicable. Notes: Standard errors in parentheses. Questions asked to female head of household or spouse aged 15 and above. Decisionmaking after disagreement is hypothetical or actual disagreement. Analysis sample is restricted to women who have answers across all domains of decisionmaking. In assessing disagreement in the domain, recall periods used were 6 months in Ecuador, 3 months in Yemen, and 12 months in Uganda, to correspond to respective intervention periods. NA = not applicable. Notes: Standard errors in parentheses. Questions asked to female head of household or spouse aged 15 and above. Decisionmaking after disagreement is hypothetical or actual disagreement. Analysis sample is restricted to women who have answers across all domains of decisionmaking. In assessing disagreement in the domain, recall periods used were 6 months in Ecuador, 3 months in Yemen, and 12 months in Uganda, to correspond to respective intervention periods. NA = not applicable. Notes: Standard errors in parentheses. Questions asked to female head of household or spouse aged 15 and above. Decisionmaking after disagreement is hypothetical or actual disagreement. Analysis sample is restricted to women who have answers across all domains of decisionmaking. In assessing disagreement in the domain, recall periods used were 6 months in Ecuador, 3 months in Yemen, and 12 months in Uganda, to correspond to respective intervention periods. NA = not applicable. Decisionmaking after disagreement is hypothetical or actual disagreement. Analysis sample is restricted to women who have answers across all domains of decisionmaking. In assessing disagreement in the domain, the recall period used was 6 months, to correspond to the intervention period. Decisionmaking after disagreement is hypothetical or actual disagreement. Analysis sample is restricted to women who have answers across all domains of decisionmaking. In assessing disagreement in the domain, the recall period used was 6 months, to correspond to the intervention period.