Economic Self-Help Group Programs for Improving Women ’ s Empowerment : A Systematic Review

s screened (n = 3498) Records excluded (n = 3136) Full-text articles assessed for eligibility (n = 362) Full-text articles excluded, with reasons (n = 257) Studies included in qualitative synthesis (n = 11) Studies included in quantitative synthesis (n = 23) Refined screening of remaining 107 full-text articles; 74 excluded with reasons


Plain language summary
Motivation: Self-help groups (SHGs) are implemented around the world to empower women, supported by many developing country governments and agencies. A relatively large number of studies purport to demonstrate the effectiveness of SHGs. This is the first systematic review of that evidence.
Approach: We conducted a systematic review of the effectiveness of women's economic SHG programs, incorporating evidence from quantitative and qualitative studies. We systematically searched for published and unpublished literature, and applied inclusion criteria based on the study protocol. We critically appraised all included studies and used a combination of statistical meta-analysis and metaethnography to synthesize the findings based on a theory of change.
Findings from quantitative synthesis: Our review suggests that economic SHGs have positive effects on various dimensions of women's empowerment, including economic, social, and political empowerment. However, we did not find evidence for positive effects of SHGs on psychological empowerment. Our findings further suggest there are important variations in the impacts of SHGs on empowerment that are associated with program design and contextual characteristics.
Findings from qualitative synthesis: Women's perspectives on factors determining their participation in, and benefits from, SHGs suggest various pathways through which SHGs could achieve the identified positive impacts. Evidence suggested that the positive effects of SHGs on economic, social, and political empowerment run through the channels of familiarity with handling money and independence in financial decision making, solidarity, improved social networks, and respect from the household and other community members. In contrast to the quantitative evidence, the qualitative synthesis suggests that women participating in SHGs perceive themselves to be psychologically empowered. Women also perceive low participation of the poorest of the poor in SHGs due to various barriers, which could potentially limit the benefits the poorest could gain from SHG membership.
Findings from integrated synthesis: Our integration of the quantitative and qualitative evidence suggests there is no evidence for adverse effects of women's SHGs on the likelihood of domestic violence. Women's perspectives in the qualitative research indicate that even if domestic violence occurs in the short term, in the long term the benefits from SHG membership may mitigate the initial adverse consequences of SHGs on domestic violence.

Executive summary BACKGROUND
Women bear an unequal share of the burden of poverty globally due to societal and structural barriers. One way that governments, development agencies, and grassroots women's groups have tried to address these inequalities is through women's SHGs. This review focuses on the impacts of SHGs with a broad range of collective finance, enterprise, and livelihood components on women's political, economic, social, and psychological empowerment.

OBJECTIVES
The primary objective of this review was to examine the impact of women's economic SHGs on women's individual-level empowerment in low-and middleincome countries using evidence from rigorous quantitative evaluations. The secondary objective was to examine the perspectives of female participants on their experiences of empowerment as a result of participation in economic SHGs in lowand middle-income countries using evidence from high-quality qualitative evaluations. We conducted an integrated mixed-methods systematic review that examined data generated through both quantitative and qualitative research methods.

SEARCH METHOD S
We searched electronic databases, grey literature, relevant journals and organization websites and performed keyword hand searches and requested recommendation from key personnel. The search was conducted from March 2013-February 2014.

SELECTION C RITERIA
We included studies conducted from 1980-January 2014 that examined the impact of SHGs on the empowerment of and perspectives of women of all ages in low-and middle-income countries, as defined by the World Bank, who participated in SHGs in which female participants physically came together and received a collective finance and enterprise and/or livelihoods group intervention. To be included in the review, quantitative studies had to measure economic empowerment, political empowerment, psychological empowerment or social empowerment. We also examined adverse outcomes including intimate partner violence, stigma, disappointment, and reduced subjective well-being. We included quantitative studies with experimental designs using random assignment to the intervention and quasi-experimental designs with non-random assignment (such as regression discontinuity designs, "natural experiments," and studies in which participants selfselect into the program). In addition, we included qualitative studies that explored empowerment from the perspectives of women participants in SHGs using in-depth interviews, ethnography/participant observation, and focus groups.

DATA COLLECTION AND A NALYSIS
We systematically coded information from the included studies and critically appraised them. We conducted statistical meta-analysis from the data extracted from quantitative experimental and quasi-experimental studies, and used metaethnographic methods to synthesize the textual data extracted from the women's quotes in the qualitative studies. We then integrated the findings from the qualitative synthesis with those from the quantitative studies to develop a framework for assessing how economic SHGs might impact women's empowerment.

RESULTS
We included a total of 23 quantitative and 11 qualitative studies in the final analysis. Initially, we reviewed 3,536 abstracts from electronic database searches and 351 abstracts from the gray literature searches. We found that women's economic SHGs have positive statistically significant effects on various dimensions of women's empowerment, including economic, social and political empowerment ranging from 0.06-0.41 SD. We did not find evidence for statistically significant effects of SHGs on psychological empowerment. We also did not find statistical evidence of adverse effects of women's SHGs. Our integration of the quantitative and qualitative evidence indicates that SHGs do not have adverse consequences for domestic violence. Our synthesis of women's perspectives on factors determining their participation in, and benefits from SHGs suggests various pathways through which SHGs could achieve the identified positive impacts on empowerment. Women's experiences suggested that the positive effects of SHGs on economic, social, and political empowerment run through several channels including: familiarity with handling money and independence in financial decision making; solidarity; improved social networks; and respect from the household and other community members. Our synthesis of the qualitative evidence (key informant interviews and focus groups) also indicates that women perceive there to be low participation of the poorest of the poor in SHGs, as compared to less poor women.

IMPLICATIONS FOR POLICY, PRACTICE AND RESEARCH
For Policy: SHGs can have positive effects on women's economic, social, and political empowerment. However, we did not find evidence for positive effects on psychological empowerment. These findings indicate that donors can consider funding women's SHGs in order to stimulate women's economic, social, and political empowerment, but the effects of SHGs on psychological empowerment are less clear. Women SHG members perceive that the poorest of the poor participate less than other women. In part, this might be because the poorest of the poor are too financially and/or socially constrained to join SHGs or to benefit from the financial services most often provided through SHGs. Other barriers such as class or caste discrimination might also be present. Poorer or marginalized women may not feel accepted by groups that are made up of wealthier or more well-connected community members. It is important for policy makers to identify ways to build in support and reduce barriers for individual women who want to participate in SHGs but who do not have the financial resources or freedoms to join.
For Practice: We do not find evidence for adverse effects of women SHGs on domestic violence based on the integration of the quantitative and the qualitative evidence. Although there may be adverse consequences in the short term, analysis of women's reports suggest that SHGs do not contribute to increases in domestic violence in the long term. Furthermore, participation of the poorest of the poor in SHGs may be stimulated by incentives. These incentives could be financial, for example, by giving the poorest of the poor the opportunity to participate without a savings requirements, or non-financial, for example, by stimulating the husbands or mothers-in-law of the poorest of the poor to let their spouses and daughters-in-law participate in SHGs or conducting outreach activities to marginalized groups. As new programs are implemented in different contexts, it is also important that program designs are tailored to the local settings in ways that allow them to evolve over time. This review has shown that one-size does not fit all, and while it is important to take best practices across programs for implementation, this means that flexibility is required to adapt programs successfully for the greatest impact in women's lives.
For Research: There is a need for more rigorous quantitative studies that can correct for selection bias, spillovers and the difficulties of measuring empowerment. There is also a need for more research, focused on examining possible factors that meditate and/or moderate the impact of SHGs on women's empowerment to further understand the pathways or mechanisms through which SHGs impact empowerment. For the latter it is crucial to conduct rigorous qualitative research in addition to rigorous quantitative research. Whereas quantitative research is useful in understanding certain aspects of the impact of SHGs on empowerment, qualitative studies could show us more nuanced ideas about how to measure empowerment. Importantly, both quantitative and qualitative studies need to describe more fully the various components of the SHGs being studied. Greater detail in the description of the program design will help in determining moderating factors in the design of SHGs.

DESCRIPTION OF THE P ROBLEM
Women bear an unequal share of the burden of poverty globally due to societal and structural barriers. According to economist and Nobel laureate Amartya Sen (2001), women worldwide have less access to "substantive freedoms" such as education, employment, health care, and democratic freedoms. First, girls are enrolled in school at lower rates than boys, resulting in women making up more than two-thirds of the world's illiterate adults (UNESCO, 2013). Second, women experience unequal access to health care starting from birth and throughout their reproductive years (WHO, 2007). Third, women are missing from all levels of government-local, regional, and national (Lopez-Claros, 2005). Women also have fewer economic freedoms. In sub-Saharan Africa, only 16 to 18 per cent of loans issued to small and medium businesses are issued to women owners; and in South Asia, only 6 per cent (IFC, 2014). In addition, in many countries, women cannot own land. In South and Southeast Asia, women comprise more than 60 per cent of the agricultural labor force. However, in India, Nepal, and Thailand less than 10 per cent of women farmers own land (FAO, 2008). These facts describe what economists call the feminization of poverty. This phrase is meant to capture women's unequal share of poverty, in terms of both wealth and choices and opportunities (Sen, 2001).
One way that governments, development agencies, and grassroots women's groups have tried to address these inequalities is through women's economic self-help groups (SHGs). The basic assumptions undergirding these income-generating group programs are that giving women access to working capital can increase their ability to "generate choices and exercise bargaining power as well as develop a sense of selfworth, a belief in one's ability to secure desired changes, and the right to control one's life" (UN, 2000). SHGs of women could facilitate these goals through the development of social capital and mobilization of women (IFAD, 2003).

DESCRIPTION OF THE I NTERVENTION
SHGs, also known as mutual aid or support groups, are small voluntary groups that are formed by people related by an affinity for a specific purpose who provide support for each other. They are created with the underlying assumption that when individuals join together to take action toward overcoming obstacles and attaining social change, the result can be individual, and/or collective empowerment. SHG members typically use strategies such as savings, credit, or social involvement as instruments of empowerment. The types of SHGs that exist in developing countries are numerous and can include economic, legal, health, and cultural objectives.
The canonical economic SHG model starts with an initial period of collective savings in the name of the group to facilitate intragroup lending. The basic idea underlying this model is that groups then gradually take larger loans, for example, from banks. In addition, SHGs often provide support in the form of training, which can take multiple forms. Trainings can, for example, focus on entrepreneurial skills, women's rights, political participation, basic education, and justice (Van Kempen, 2009). SHGs can be linked directly with banks or can function through non-governmental organizations (NGOs) and tend to be more fundamentally grassroots in nature than the many microfinance institutes (MFIs) that now exist worldwide. Although SHGs share some important characteristics, there are major differences across SHGs as well. For example, Thorp et al. (2005) suggest that some SHGs focus on resolving market failures, such as saving and credit constraints, while others put a stronger emphasis on rights, for example group members' rights to access resources or political participation.
India and other countries in South and Southeast Asia have a long history of SHG activity. South Asia's largest and perhaps most well-known program is the Self-Help Group-Bank Linkage Program (SBLP). This Indian program was started in 1992 and has rapidly expanded since then. In 2009, the SBLP covered approximately 86 million poor households in 6.1 million saving-linked SHGs and 4.2 million creditlinked SHGs. The SBLP is best known for its expansive outreach and high repayment rates of over 95 per cent. The literature suggests that the program has been effective at targeting poor women and is associated with improvements in household income, livestock ownership, savings and households' ability to withstand economic shocks (Sinha, 2008). In addition, the program might have contributed to improvements in women's decision-making power, control over household resources, and participation in the public sphere (Sinha, 2008). In other parts of the world, such as sub-Saharan Africa and Latin America, the South Asian model has been adapted to match the cultural and social context in those specific settings. For example, SHGs in sub-Saharan Africa, such as Jeunes sans Frontières in Burkina Faso, have a stronger emphasis on HIV/AIDS than SHGs in Asia. African SHGs may thus have contributed to overcoming the stigma surrounding HIV/AIDS in sub-Saharan Africa (Nguyen, 2005).
The majority of SHGs target women with the explicit goal of empowering them. For example, the SHG model "was introduced as a core strategy to achieve empowerment in the Ninth Plan (1997)(1998)(1999)(2000)(2001)(2002) with the objective to 'organize women into Self-help group [sic] and thus mark the beginning of a major process of empowering women' (Planning Commission, 1997). Jakimow and Kilby (2006) argue, however, that in practice the South Asian SHG model is often focused on solving market failures, by emphasizing credit and saving, rather than empowering women.
This review focuses on SHGs that offer women a collective finance, enterprise, and/or livelihood component. Collective finance and enterprise can include savings and loans, group credit, collective income-generation, and micro-insurance. Livelihood interventions can include life skills training, business training, financial education, and labor and trade group organizing.

HOW THE INTERVENTION MIGHT WORK
Many different perspectives, definitions, measures, and outcomes have been associated with women's empowerment. The growing literature presents different definitions of empowerment, and no one definition seems to be universally accepted. For example, women's empowerment is used interchangeably with other terms such as women's autonomy, status, and agency. These terms have subsequently been measured in different ways. For example, women's autonomy has been measured by assessing the degree to which women participate in decision-making in their households (Upadhyay, 2005) or by determining women's mobility (Malhotra, 2002). Additional challenges in defining and measuring women's empowerment include variations in the cultural context that affect how empowerment may occur. For example, women's mobility may be a central issue to women's empowerment in one setting and a peripheral issue in another. Differences in the approach to measure empowerment and contextual differences complicate the process of defining whether different measures of empowerment can be considered part of the same construct in this systematic review. We will discuss this issue in detail in later stages of this review.
Nonetheless, much of the research agrees that empowerment is a process and an outcome that can occur at multiple levels and within different dimensions. After the International Conference on Population and Development (United Nations Population Information Network & United Nations Population Fund, 1996), the UN delineated five major components of empowerment: 1. Women's sense of self-worth 2. Women's right to have and to determine choices 3. Women's right to have access to opportunities and resources 4. Women's right to have the power to control their own lives, both within and outside the home 5. Women's ability to influence the direction of social change to create a more just social and economic order, nationally and internationally.
One of the more comprehensive and broadly cited definitions of empowerment comes from a study by Kabeer (1999, p. 437) who states that empowerment is "the expansion in people's ability to make strategic life choices in a context where this ability was previously denied to them; a process that entails thinking outside the system and challenging the status quo, where people can make choices from the vantage point of real alternatives without punishingly high costs." This definition is reflected in our theory of change underlying economic SHGs, which includes resources (for example, increased income, savings, and loan repayments), agency (for example, increased autonomy, self-confidence, or self-efficacy), and achievements (for example, ability to transform choices into desired action and opportunities) (Kabeer, 1999). We based our review on the theory of change underlying economic SHGs as depicted in Figure 1.1. Based on the literature, we hypothesized that women's participation in economic and livelihood SHGs would enable women to gain access to resources in the form of credit, training, loans, or capital. As a result, women SHG members might experience an increase in income, savings, and/or loan repayments. In addition, participants would be exposed to group support. As a result of group support, women SHG members might experience increased feelings of autonomy, selfconfidence, and self-efficacy. Following increased financial stability and selfconfidence, women SHG members might then be able to make meaningful life choices, and their patterns of spending and savings might change. As a result of these changes, women SHG members might experience an increased ability to transform their choices into desired actions, which would lead to the emergence of economic, political, social, and psychological empowerment (Eyben, Kabeer & Cornwall, 2008). The potential for these changes to occur are dependent upon "context, commitment and capacity" (Kabeer, 2005).
Empowerment studies have lent credence to the concept that women can and should be central actors in social and economic development, but empowerment of an individual or a small group alone might invoke negative reactions when familial, community, and structural factors have not yet adjusted to women's changing roles. Intimate partner violence, for example, has been shown to increase when women's economic empowerment is not complemented with additional interventions that focus on mitigating the potential adverse consequences at the household and community level (Ahmed, 2005;Dalal, 2011). Thus, several studies recommend complementing interventions with an emphasis on empowering women with interventions that focus on changing the gender norms of men (for example, Barker & Schulte, 2010;Dworkin et al., 2011;Dworkin, Forthcoming).
Studies also suggest that increasing women's monetary contributions to the family without also taking into account the upheaval this might cause with respect to expected gender and domestic responsibilities can lead to increased household and community tensions and decreased emotional well-being for women (Ahmed 2005;Ahmed & Chowdhury, 2001;. Short-and long-term backlash tendencies are, therefore, important to consider when examining the impacts of SHGs on empowerment. Numerous factors can modify the pathways described here. For example, the literature highlights that empowerment can occur at the individual and collective levels (Eyben, Kabeer & Cornwall, 2008). Individual empowerment refers to changes that occur within an individual. Collective empowerment refers to structural changes at the societal level in terms of how relationships and institutions impact households and individuals. Although SHG participation might lead to improved self-efficacy of an individual (individual empowerment), the systematic marginalization of the group might remain unchanged (collective empowerment). Hence, individual empowerment does not necessarily result in collective empowerment. The economic climate, program fidelity, role of the facilitator, and underlying race, ethnicity, class and/or caste issues can also affect how program benefits are realized.

WHY IT IS IMPORTANT TO DO THIS REVIEW
Today, women's empowerment is considered an essential component of international development and poverty reduction. The concept of women's empowerment has gained increased attention over the past two decades. This concept first held international prominence at the International Conference on Population and Development in Cairo in 1994 and then again at the Fourth World Conference on Women, Beijing, 1995. But the central role of women in development originated during grassroots movements that commenced years earlier.
The international conferences at Cairo and Beijing announced the shift from thinking of women as targets for fertility control policies to acknowledging women as autonomous agents with rights. As a result of these conferences, a broad assessment of women's empowerment throughout the United Nations (UN) system was undertaken. By 2000, when 189 UN member states created eight poverty reduction targets called the Millennium Development Goals, they agreed that "promoting gender equality and empowering women" deserved to be included as a stand-alone goal in addition to the other health and education-related targets (UNDP, 2010). In addition, the UN now assesses the different implications of development planning for women and men and integrates poverty eradication strategies into programs for women (African Women's Development and Communication Network, 2010).
The international conferences at Cairo and Beijing helped shift resources and ideologies toward women's role in development, but the emergence of women's empowerment as a central concept in development was the result of earlier grassroots movements aimed at empowering disenfranchised communities with women playing a central role. Grassroots organizing included the formation of SHGs, which became a central ground for women's activism and participation and helped to shape the changing development landscape in South Asia. Nowadays SHGs are among the most popular programs that aim to stimulate the empowerment of women in South Asia (Jakimow & Kilby, 2006). Although SHGs have a less prominent history in low-and middle-income countries outside South Asia, the formation of SHGs has also diffused to countries in other parts of Asia, Sub-Saharan Africa, and Latin America.
The concept of the SHG as a catalyst for change in developing countries was based on the self-help approach pioneered in India in the early 1980s. It emphasized high levels of group ownership, control, and management concerning goals, processes, and outcomes. It has been argued that the very process of making decisions within the group is an empowering process and can lead to broader development outcomes such as the greater participation of women in local governance and community structures (Mayoux, 1998). For example, in case studies of women's cooperatives in rural Nigeria and rural India, women who were engaged in cooperative activities appeared to be more productive and had higher levels of economic well-being, than non-members (Amaza, Kwagbe & Amos, 1999;Datta & Gailey, 2012).
As these smaller SHGs became successful, larger umbrella organizations emerged with the goal of harnessing the energy of smaller groups and advocating for the rights of the poor and of women on the global stage. One example of an umbrella organization is the Self Employment Women's Association (SEWA), which was launched in the state of Gujarat, India, by female garment workers, who first met in a park to discuss their working conditions and eventually organized into a trade union. This project, which was launched in 1972, has included thousands of women and their families (Narayan et al., 2000).
Following the global recognition of the critical role of women in poverty reduction strategies, a wave of microfinance programs and other livelihood support interventions were implemented worldwide, specifically targeting rural women and women's SHGs. As discussed above, a large majority of these programs focus explicitly on empowerment, although the emphasis is sometimes on resolving market failures.
We based our review on the understanding that a great deal of evidence about women's SHGs has already been generated from quantitative and qualitative research, much of which might be useful in informing policy and practice.
Several systematic reviews focus on the impact of microfinance on economic wellbeing. First, Duvendack and colleagues (2011) reviewed the evidence of the impact of microfinance on the well-being of poor people. The authors found only limited evidence that microfinance improves economic well-being, but felt limited by the lack of rigorous impact evaluations on microfinance. Second, a systematic review by Stewart and colleagues (2010) on the impact of microfinance on poor people in Sub-Saharan Africa came to similar conclusions with respect to microcredit. The authors concluded, however, that based on the evidence they included in their review, microsavings appeared to be more effective in improving the well-being of poor people. Following this conclusion, the authors called for more rigorous evidence on the impact of microsavings programs. Third, Stewart and colleagues (2011) reviewed whether microcredit, microsavings, and microleasing serve as effective financial inclusion interventions enabling poor people, and especially women, to engage in meaningful economic opportunities in low-and middle-income countries. The authors found mixed results once again. In some cases, microcredit and microsavings reduced poverty but not in all circumstances or for all clients. The authors also showed that there was not enough evidence to say that microfinance interventions targeting women exclusively were more successful at reducing poverty than those targeting both men and women.
The findings of these reviews stand in stark contrast to the prevailing positive view about the impact of microfinance on poverty reduction before these reviews and a number of randomized controlled trials (RCTs) were conducted. The prevailing positive view was mostly based on anecdotal evidence and studies that were vulnerable to selection bias (Roodman, 2011). Both donor and nongovernmental organizations promoted microfinance on the basis of an understanding that it reduced poverty and empowered women (White & Waddington, 2012). However, new rigorous evidence from RCTs on the impacts of microcredit on poverty reduction and women's empowerment suggests that the effectiveness of microcredit is at best modest (Attanasio et al., 2015;Augsburg et al., 2015;Banerjee et al., 2015;Crepon et al., 2015).
A recent systematic review on the impact of microcredit on women's bargaining power also suggests that the prevailing positive view on the effects of microcredit on women's empowerment might be overstated (Vaessen et al., 2014). The evidence from the most rigorous studies in that review, including those based on RCTs and credible quasi-experiments, suggested there was no evidence for a causal link between microcredit and women's control over household spending.
There are, however, several mechanisms through which SHGs can improve women's empowerment. Apart from the economic channel, it is also important to focus on the potential effects that group-support and training might have on women's empowerment. We focused on both of these mechanisms in the theory of change described above.
The reviews cited previously were restricted to microcredit and microsavings interventions and did not comprehensively review and synthesize the evidence on the impact of SHGs that included collective finance, enterprise, and/or livelihoods components. In addition, the reviews did not comprehensively cover a range of key empowerment outcomes such as decision making within households, feelings of selfconfidence or autonomy, or the ability to exercise control over family planning. Although Vaessen et al.'s review is the only one with an explicit focus on women's empowerment, the review does not focus exclusively on SHGs, covers only microcredit interventions, and does not synthesize empowerment outcomes other than women's control over household resources.
The current review focuses on quantitative studies evaluating the impact of SHGs with a broad range of collective finance, enterprise, and livelihood components on political, economic, social, and psychological empowerment in addition to women's control over household resources. This systematic review thus goes beyond determining the effects of microcredit on women's empowerment to ensure that we learn about the credit, saving, group support, and training components of women's SHGs.
In order to identify some of the pathways and moderators, we also included qualitative studies of women's perceptions of the barriers and facilitators to women's empowerment within SHGs. We recognize that heterogeneity in the design and implementation of SHGs makes it difficult to interpret the existing evidence on the impact of SHGs on women's empowerment. Our systematic review assesses the effects of women's SHGs and the pathways and moderators to explain these effects by using a mixed-methods evidence synthesis as in the systematic review on the effects of farmer field schools (Waddington et al., 2014).
The protocol of this study is available through the Campbell Collaboration Library of Systematic Reviews (Brody et al., 2014).

Objectives of the review
The primary objective of this review is to examine the impact of women's economic SHGs on individual-level empowerment for women in low-and middle-income countries, using evidence from rigorous quantitative impact evaluations (review objective 1).
The secondary objective of this review is to examine the perspectives of female participants on factors determining their participation in, and benefits from, economic SHGs in low-and middle-income countries using evidence from highquality qualitative evaluations (review objective 2).
Finally, this review aims to refine the theory of change introduced in section 1 that describes how women's economically oriented SHGs lead to women's empowerment using evidence drawn from both rigorous quantitative impact evaluation studies and qualitative studies about perspectives of women who are SHGs participants.

CRITERIA F OR INCLUDING STUDIES IN THE REVIEW
We conducted an integrated mixed-methods review that examines data generated through both quantitative and qualitative research methods. We believe this study design will enhance the review's utility and impact for practitioners and policymakers. This approach allowed us to capture a broader range of evidence than a review of quantitative studies alone so that we could answer relevant policy questions more comprehensively.
We included studies in the review that fulfilled the following criteria.

Participants
SHG participants included women of all ages in low-and middle-income countries, as defined by the World Bank categorization of low-and middle-income countries, at the time the data were collected. Women's SHGs and SHGs in which participation was either limited exclusively to women or, if this was not the case, in which impacts on women were assessed separately from men, were included. In contrast, studies were excluded in which impacts were not disaggregated by gender and/or self-help groups were comprised exclusively of men.

Interventions: type of women's self-help group programs
We included studies on SHGs in which female participants physically came together and received a collective finance and enterprise and/or livelihoods group intervention:  We defined SHGs, also known as mutual aid or support groups, as those groups that involved people who provide support for each other and/or are created with the underlying assumption that when individuals join together to take action toward overcoming obstacles and attaining social change, individual, and/or collective empowerment can result.
 We planned to examine those groups that were initiated by an external agency (that is, a development organization or research group) as well as those that had come into existence without any direct external involvement. In practice, however, all included studies focused on groups that were initiated by an external agency.
 SHGs needed to receive an economic intervention that included or contained the following components: collective finance and enterprise 1 (such as savings and loans, group credit, collective income-generation, micro-insurance) and/or livelihoods interventions (such as life skills, capacity-building, business training, financial education, labor and trade group organizing). 2  We excluded studies evaluating individual self-help or group programs that were not explicitly designed as self-help programs or did not have a collective finance, enterprise, or livelihoods intervention component.

Primary outcomes
To be included in the review, studies had to measure at least one of the following empowerment outcomes. 3 Economic empowerment: We defined women's economic empowerment as the ability of women to access, own, and control resources. It could be measured in a variety of ways, using outcome indicators such as income generation by women, female ownership of assets and land, expenditure patterns, degree of women participation in paid employment, division of domestic labor across men and women, and control over financial decision making by women.
Political empowerment: We defined political empowerment as the ability to participate in decision making focused on access to resources, rights, and entitlements within communities. It could be measured using indicators such as awareness of rights or laws, political participation such as voting, the ability to own land legally, the ability to inherit property legally, and the ability to gain leadership positions in the government.
Social empowerment: We defined social empowerment as the ability to exert control over decision making within the household. Measures included women's mobility or freedom of movement, freedom from violence, negotiations and discussion around sex, women's control over choosing a spouse, women's control over age at marriage, women's control over family size decision making, and women's access to education.
Psychological empowerment: We defined psychological empowerment as the ability to make choices and act on them. It could be measured using outcome indicators such as self-efficacy or agency; feelings of autonomy; and sense of self-worth, selfconfidence, or self-esteem.
The definition of the outcome measures shows that empowerment is a broad concept even when we divide it into four empowerment constructs. Furthermore, study authors of primary studies use a large number of different operational definitions to measure economic, social, political, and psychological empowerment. The large number of outcome measures to operationalize empowerment is not surprising, since the concept is difficult to define. Nonetheless, we had to be careful in grouping outcome variables when we were not certain whether these outcome variables measured the same construct. At the same time, the literature on measuring empowerment suggests that empowerment should be considered a latent construct that cannot be measured using one specific outcome variable. Thus, several researchers use an index to measure empowerment (Pitt, Khandker & Cartwright, 2006;Bali Swain & Wallentin, 2009). These indices suggest that different operational definitions to measure empowerment can be considered part of the same construct. For example, several studies construct indices based on variables that measure different elements of women's bargaining power, mobility, family-size decision-making, and political, as well as psychological empowerment (Pitt et al., 2006;Bali Swain & Wallentin, 2009). Nonetheless, we took seriously the concern that different operational definitions of empowerment cannot always be considered part of the same construct. Thus, we used an iterative approach in the definition of our outcome measures. First, we grouped outcome variables under economic, social, psychological, and political empowerment. Second, we synthesized the evidence on the effects of women's SHGs on these four constructs of women's empowerment under the assumption that it is appropriate to group the outcome variables under the same construct. Third, we analyzed the robustness of the results to excluding studies with outcome measures that might not measure the same construct as the other outcome variables.

Secondary outcomes
We also examined spillover effects from women's SHG participants to nonparticipating women in the same communities on the same outcomes.
In addition, we examined adverse outcomes including:  Reduced subjective well-being.

Study types
To answer our review questions, we included studies with study designs and methods of analysis appropriate to each review objective.

Review objective 1: quantitative studies
We included the following study designs: 1) experimental designs using random assignment to the intervention and 2) quasi-experimental designs with non-random assignment (such as regression discontinuity designs, "natural experiments," and studies in which participants self-select into the program). To be included, the studies needed to 1) collect data at baseline and endline (longitudinal) and/or crosssectional (endline) data from treatment and comparison groups; and 2) use propensity score or other type of matching, difference-in differences estimation, instrumental variables regression, multivariate cross-sectional regression analysis; or other forms of multivariate analysis (such as the Heckman selection model or multivariate ordinary least squares (OLS) regression analysis) that are able to correct for selection bias under specific circumstances. We included studies in which data were collected at the individual and/or group level. For studies that utilized interrupted time series, at least three data points needed to be collected before and after the intervention for the study to be included. Eligible comparison conditions were no intervention, pipeline, or "business as usual." We also included studies in which the outcomes of SHG members, who were member for a short amount of time, as defined by the researchers, were used as a comparison condition and/or used the time of participation in the SHG as the treatment variable. However, we were not able to include three studies that used time as a continuous explanatory variable in the meta-analysis because these studies did not allow for estimating the average impact of SHGs regardless of the time the women were members of the SHGs. We did, however, analyze the results of these studies in a narrative manner. Studies without any type of control or comparison group as outlined were excluded, including single group pre-post studies which are likely to provide biased estimates of effects due to confounding.

Review objective 2: qualitative studies
We included qualitative studies that explored empowerment from the perspectives of women participants in SHGs using the following methodologies: in-depth interviews, ethnography, participant observation, and focus groups. These studies needed to mention an underlying analytical methodology such as phenomenological analysis or grounded theory, report actual narratives from women reported as direct quotations, and include discussion of factors that determined women's participation in, and benefits from, economic SHGs. Qualitative studies that did not employ the defined methodologies listed previously and that did not draw from direct quotations from female SHG participants were excluded.

Other study characteristics
To ensure that we included all studies since the emergence of SHGs in the early 1980s, studies were eligible which reported in any language and were conducted between 1980 and February 2014. We excluded studies that were not conducted within this time frame, with the exception of studies that were published if we had already included the working paper on which the published paper was based (Banerjee et al., 2015;Deininger & Liu, 2013).

Electronic searches
To guide this search, we consulted an information retrieval specialist. This person is the Cochrane specialist of a research group at a large university. She gave us guidance on both search sources and search terms and built our Pubmed search strategy (below) which we used to develop all subsequent search strategies. The strategy was used to search for both qualitative and quantitative studies.
The literature search for the qualitative and qualitative studies were conducted together and this search occurred in two phases. conducting supplemental keyword searches using identified program names and locations, and contacting key experts through an online survey for additional information.
In the second phase of the search, we also conducted a supplemental keyword search in Google.com based on leads generated by the search described above. For example, if a search identified an article mentioning (but not evaluating) a self-help group program through an MFI institution in the Philippines called Tulay sa Pag-unlad, Inc. (TSPI), a search of Google.com and Google.scholar used a search of "Tulay sa Pag-unlad Inc" and several keywords to determine whether there was additional information on the program that might include evaluation information relevant to the analysis.
When we encountered studies that were not in English, we reviewed the English translation of abstracts that were available. We did not encounter any studies that did not have abstracts available in English. No non-English studies that had English abstracts met the inclusion criteria and therefore no further translation was needed.
We also searched the gray literature for dissertations, theses, government reports, nongovernmental organization reports, and funder reports using the following search engines and dissertations and theses. We reviewed the results from these additional search engines, dissertations and theses up to 100 hits ordered by relevance since we found no relevant studies when scanning titles beyond this point.

Other searches
We electronically searched the collections from UC Berkeley Library and Touro University California.

Search terms
The search strategy was used to search databases and was adjusted to fit the diversity of search options available for each database. After discussion and consultation with content experts and search strategists, we included general keywords for the "exposure" and the "outcome" in our search strategy. The labeling of self-help group participation as empowering had to come from the primary researchers. We believe this strategy more accurately represented the evidence base on the impact of self-help groups on empowerment and reduced misclassification bias of our outcomes because it excluded studies in which outcome indicators did not reflect empowerment according to the group and participants under study. This decision excluded studies if these studies did not include somewhere in their text the terms "empowerment," "power," or "control." Our hand-searches and key informant contributions did not produce any additional studies that did not include at least one of these words. Thus, we are confident that our search strategy did not miss any major studies that would have been included without the exclusion criteria concerned with the terms "empowerment," "power," or "control". The search strategy was based on several consultations and discussions with our information retrieval specialist. Truncated terms and stem-words were also used where appropriate as shown in the example below.
An example of our search strategy that was used to search the PubMed database is as follows:

Selection of studies
In the first stage, two team members independently reviewed titles and abstracts or executive summaries (where available) and excluded all references that were not relevant. Disagreements about inclusion were resolved through discussion. A third independent member of the team was used to resolve disagreement between the reviewers' conclusions.
In the second stage, two team members worked independently to apply the specified inclusion criteria to the remaining full-text studies to determine whether the study should be included for analysis. Discrepancies between the two reviewers' assessments were reviewed by a senior team member for a decision.
The full text of each study was preliminarily assessed for full-text review. These studies were retrieved and read in detail. They were screened again by four different reviewers.

Data extraction and management
Two team members working independently extracted information from each quantitative or qualitative study included in the review. Both team members used a pre-piloted data extraction form and the data were summarized in a table. Disagreements in coding were resolved through discussion. Study-, group-, outcome-, and effect-level data extraction and coding forms guided the data extraction (Appendix 1: Data extraction form).

Review objective 1: quantitative studies
Two independent reviewers assessed the quantitative studies for rigor using an adaptation of a set of criteria, developed by 3ie, to assess risk of bias in experimental and quasi-experimental studies (Hombrados & Waddington, 2012). The critical appraisal tool assessed the likely risk of the following biases: 1. Selection bias and confounding, based on quality of attribution methods (mechanisms of assignment/identification), and assessment of group equivalence 2. Performance bias, based on the extent of spillovers to women in comparison groups 3. Outcome and analysis reporting biases 4. Other biases, including a. Detection bias and placebo effects b. Motivation and courtesy biases (Hawthorn effect and John Henry effect) c. Coherence of results d. Retrospective baseline data collection e. Other biases, such as strong researcher involvement in the implementation of the intervention and the use of cash transfers as a compensating mechanism to participate in an intervention The risk of bias assessment tool can be found in Appendix 6. We judged whether a study was subject to high, medium or low risk of bias for each of these categories.
We reread studies several times if something was unclear and maximized the use of all the available information from the studies. We based our assessments on the reporting in individual papers, erring on the side of caution. For example, in those cases in which the selection of participants was not clear, we classified the study as being of high risk of selection bias. In all cases where the risk of bias was unclear we assumed this was an indication of a high risk of bias.
We reported risk of bias assessment for each included study, conducting sensitivity analyses in the meta-analysis by each risk of bias domain. For example, we conducted meta-regressions to assess whether there were either substantive or statistically significant differences between low, medium, and high risks of selection bias and confounding, performance bias, outcome and analysis reporting bias, and other biases. Based on these analyses, we then determined our preferred specification for the meta-analysis. An overview of risk of bias assessment of included effectiveness studies by risk of bias category and by category of bias can be found in Appendix 7 and Appendix 8.

Review objective 2: qualitative studies
We assessed the quality of included studies using the 9-item Critical Appraisal Skills Programme Qualitative Research Checklist (CASP, 2013), making judgments on the adequacy of stated aims, the data collection methods, the analysis, the ethical considerations and the conclusions drawn. The full checklist can be found in Appendix 5. For each item, 2 researchers determined whether the study had adequately met the item or not and gave "yes," no," or "can't tell" responses. If researchers disagreed, they discussed the item until they reached consensus. Studies that had 0-2 "no" or "can't tell" responses were considered low risk of bias, studies that had 3-5 "no" or "can't tell" responses were considered medium risk of bias and studies that had 6-9 "no" or "can't tell" responses were considered high risk of bias. An overview of risk of bias assessment of included qualitative studies by risk of bias item can be found in Appendix 9.

Measures of treatment effects
We extracted information from each quantitative study to allow for the estimation of standardized effect sizes across studies to the extent possible. In addition, we calculated standard errors and 95 per cent confidence intervals if the information from the studies allowed for this. We conducted the sample size calculations in a consistent way to ensure comparability across studies.
The quantitative studies in our review showed substantial variation in the way they measured empowerment, even in those cases in which the studies measured the same construct. This variation was not surprising as there is no consensus as to how to measure economic, psychological, social and/or political empowerment. As discussed in our section on outcome measures, we used an iterative approach to determine whether outcome measures should be considered part of the same measurement construct. First, we grouped outcome variables under economic, social, psychological, and political empowerment. Then we synthesized the evidence based on this grouping. Finally, we conducted additional analyses to determine whether the results are robust to excluding studies with outcome measures that might not measure the same construct as the other outcome variables.
Because the studies measured empowerment in different ways, they also used different measurement scales. Several studies used dichotomous variables to measure empowerment, whereas other studies used continuous variables or indexes to measure empowerment.
Because of the different measurement scales, we report two types of effect sizes: 1. Standardized mean differences (Hedges' g). 2. Odds ratios.
First, we calculated the Hedges' g sample-size-corrected standardized mean differences (SMDs) for continuous outcome variables, which measure the effect size in units of standard deviation of the outcome variable. Second, we calculated odds ratios (ORs) for dichotomous outcome variables. The odds ratio is the ratio of the odds of an event occurring in the group of beneficiaries to the odds of the same event occurring in the comparison group (Bland & Altman, 2000). We converted the odds ratios to log odds ratios and the log odds ratios to standardized mean differences in order to make the effect sizes for continuous and dichotomous outcome variables comparable to each other. We describe the procedure for calculating the effect sizes in more detail in Appendix 10.
We converted all effect sizes to standardized mean differences to ensure we could use studies with different measurement scales in the same analysis. We found it appropriate to use dichotomous variables and continuous variables in the same meta-analysis because, in our case, variables with different measurement scales measured the same construct.

Methods for handing dependent effect sizes
We included only one effect size per study in a single meta-analysis. In one case, information was presented about the effectiveness of the same program in South Africa in two different studies. In that instance, we chose to extract effect sizes from the study that presented the most recent information (Kim et al., 2009). A different study from Ethiopia presented two impact estimates for two different regions. For this study, we calculated a pooled summary effect size using a random effect metaanalysis that included the two studies to prevent bias from dependency across the two studies. We used a random effect model because the two regions in Ethiopia can be regarded as two different contexts (Desai & Tarozzi, 2011). We included this summary effect size in the final meta-analysis.
Where studies reported more than one effect size based on different statistical methods we selected the effect size with the lowest risk of bias. We used this methodology for a study in India in which the authors used both propensity score matching and instrumental variable regression analysis to determine the impact of the program . A priori it was not clear which method had the lowest risk of bias. However, the effect size calculation clarified that the instrumental variable regression method did not result in valid effect sizes because predicted empowerment values fell outside the bandwidth of values from 0-1 for dichotomous variables. Although the impact estimates from the instrumental variable regression analysis study might have presented qualitatively interesting findings, the instrumental variable linear probability model did not show unbiased impact estimates. Hence, the risk of bias of the effect size was high. Therefore, we chose to use the impact estimates from the propensity score matching model for this study because we considered these impact estimates as medium risk of bias.
Other studies presented several impact estimates for different variables that could be argued to measure the same construct. In those cases, we chose to use either the variable that we considered the best approximation of the construct or a sample-size weighted average to measure a "synthetic effect size." For example, in the study of Kim et al. (2009), we constructed a sample-size weighted average by estimating the average impact on self-confidence and financial confidence for psychological empowerment and on the challenging of gender norms and autonomy in decision making for social empowerment. In these cases, we used the average values of the standard errors (without weighing for the sample size) to estimate the pooled standard deviation. Similarly, for the study by , we chose to calculate a sample-size weighted average for social empowerment by averaging the effects on the women's autonomy to go to the market without their husbands' permission and the women's autonomy to go to the doctor without their husbands' permission.

Unit of analysis issues
Where the standard error did not take clustering of outcomes into account in the estimation of standard errors (that is, where the outcome variables were likely to be clustered at a higher level of aggregation than the individual or household level but this was not taken into consideration in the estimation of the standard errors and confidence intervals), we used adjusted standard errors. For these studies with a risk of unit of analysis error, we applied corrections to the standard errors and confidence intervals using the variance inflation factor (Higgins & Green, 2011): Here, m is the number of observations per cluster and ICC is the intracluster correlation coefficient.
For the ICC, we used estimated ICCs for empowerment outcomes from a primary study in Odisha, India, that was also included in the systematic review . These ICCs were likely to be similar to ICCs in other studies, taking into consideration the large number of studies from India that we included in our systematic review. We were able to obtain the original data from the study in Odisha because one of our co-authors was also an author for this primary study . From the study in Odisha, we estimated an average ICC of 0.057 for empowerment outcomes (0.053 for social empowerment, 0.068 for psychological empowerment, 0.017 for economic empowerment, and 0.088 for measures of intimate partner violence). We used the average value of the ICC of 0.057 for the correction of the standard errors of political empowerment outcomes for which we did not have an estimate of the ICC. The other ICCs were used for the calculation of standard errors of intimate partner violence and social, psychological, and economic empowerment, respectively. If information about cluster size was not reported, we estimated the cluster size by dividing the total number of participants in each analysis (or the total number of participants if former not available) by the number of clusters. We applied this methodology to correct standard errors for 9 included studies (Ahmed, 2005;Mahmud, 1994;Nessa et al., 2012;Osmani, 2007;Rosenberg et al., 2011;Sherman et al., 2010;Steel et al., 1998;Swendeman et al., 2009;Kim et al., 2009).

Dealing with missing data
If the necessary data to calculate effect sizes were not available in the included studies, we attempted to contact the authors of the studies. In those cases in which we were not able to retrieve the missing data, we extracted or imputed effect sizes and associated standard errors based on commonly reported statistics such as the t or F statistic or p or z-values using David Wilson's practical meta-analysis effectsize calculator. Where studies did not report sample sizes for the treatment and the control or comparison group, we assumed equal sample sizes across the groups.
We faced several challenges with missing data in the calculation of effect sizes. First, the majority of studies that had a dichotomous dependent variable used a linear probability model rather than a logit or probit regression to estimate the effectiveness of self-help groups. Fortunately, empirically there are not many differences in marginal effects between linear probability models and nonlinear logit and probit models (Angrist & Pischke, 2009), which allowed us to estimate odds ratios under the assumption of linearity in the estimation of the standardized effect sizes. We applied this methodology to calculate effect sizes from linear probability models for dichotomous outcome variables for several studies Desai & Tarozzi, 2011;Desai & Joshi, 2012).
Furthermore, a number of studies did not report the standard deviation of a dichotomous outcome variable, but did report the full distribution of these variables. We estimated the variance and standard deviation of these outcome variables based on the full distribution of the dichotomous outcome variables. Thus, in cases where studies reported sample sizes and the proportion of events and non-events in the sample was available we calculated the standard deviation and the effect size based on information about the sample sizes and the proportion of events and non-events.
We also included an effect size from an ordered probit regression model under the assumption that the effect size would be approximately the same if the authors had used an ordinary least squares regression model. We assumed that the point estimate from the ordered probit model would give a good estimate of the mean difference. 4 In addition, a number of studies did not report the standard deviation of a dichotomous outcome variable but did report the full distribution of these variables. We were able to estimate the variance and standard deviation of outcome variables for which the standard deviation was not reported but for which the full distribution was reported. One study reported impact estimates using propensity score matching, but estimating the effect size in the absence of information about the standard deviation was not feasible (Deininger & Liu, 2013). In that specific case of a study from India, we imputed the standard deviation for the dichotomous outcome variables by replacing the missing standard deviations with standard deviations from similar outcome variables that were used in other studies in India (Banerjee et al., 2015;. In the absence of standard errors for the regression analysis, we also estimated the standard error of the regression analysis using the degree to which the results were statistically significant, with stars representing the significance level for one study (Mahmud, 1994).
Following all the conversions, we were able to increase the number of studies in the meta-analysis to 16 in total.
We were not able to include all studies in the meta-analysis. Two studies only demonstrated whether results were significant without the associated point estimates and standard errors. These studies did also not report t-statistics or pvalues so we were not able to estimate effect sizes (Husain, Mukherjee & Dutta, 2010;Mukherjee & Kundu, 2012). One other study showed separate time-trends for latent outcome variables of the treatment and comparison group (Bali Swain & Wallentin, 2009). But these time trends alone did not allow us to extract effect sizes from the study, also because the latent variables were constructed separately for the treatment and the comparison group. The latter raises significant concerns with respect to the validity of the results. Finally, there were four studies that did not assess the impact of SHG membership but did assess the relationship between the time women were members of self-help groups (for example, in months) and women's empowerment (Coleman, 2002;Garikipati, 2008;Garikipati, 2012;Holvoet, 2005). These studies did not allow for the estimation of the average impact of women's self-help groups on women's empowerment. Nonetheless, we discuss the results of the studies narratively in our quantitative synthesis.
In those cases in which we were not able to calculate the effect size, we contacted the authors with a request for the necessary information to calculate the effect size.

Data synthesis
We conducted an integrated mixed-methods review in order to benefit from data generated through both quantitative and qualitative research and to enhance the review's utility and impact for policymakers. An integrated review has three stages: 1) a synthesis of quantitative effects, 2) a synthesis of relevant qualitative evidence, and 3) a synthesis of both summaries that "goes beyond" the primary studies and generates new interpretations or hypotheses (Harden, 2010;Thomas et al., 2004). We conducted a meta-analysis with the data extracted from quantitative studies, and used meta-synthesis methods to synthesize the textual data extracted from the qualitative studies. We then integrated the findings from the qualitative synthesis with those from the quantitative studies to develop a framework for assessing how economic self-help groups might impact women's empowerment.

Quantitative synthesis
For our quantitative synthesis (review objective 1), we statistically combined the effect sizes and associated standard errors from 23 quantitative studies that assessed the impact of self-help group programs on women's empowerment. We only combined studies that focused on empowerment indicators that could be considered sufficiently similar. Hence, we conducted a separate meta-analysis for studies that focused on economic empowerment, social empowerment, psychological empowerment, and political empowerment, respectively. We believe these different empowerment indicators can be considered different constructs, so we did not consider it appropriate to combine these empowerment indicators in one metaanalysis.
We used inverse-variance weighted random-effects meta-analysis and used established statistical techniques to analyze heterogeneity. We used random-effects instead of fixed-effect analysis in order to allow for contextual and methodological heterogeneity in the effect sizes.
With respect to spillovers, we were unfortunately not able to report and synthesize effect sizes separately for women's self-help group participants and neighboring women who might indirectly benefit from the intervention. None of the included studies separately reported these effect sizes.

Assessment of heterogeneity
We explored heterogeneity across studies with an emphasis on social and economic empowerment using I-squared and Q as well as tau-squared and the visualization of the forest plots (Borenstein et al., 2009). The results suggested there was considerable heterogeneity in the effect sizes, although less so for impacts on economic empowerment. This result was not surprising, since a substantial number of existing studies argue there is significant heterogeneity in the effectiveness of community-based programs, such as women's self-help group interventions. This heterogeneity could be related to several contextual characteristics, such as diverging gender norms across contexts and differences in the capacity to implement community-based programs (for example, De Hoop, 2012; Mansuri & Rao, 2004;Woolcock, 2013).
It was not possible to explore heterogeneity in the impact of self-help groups on political and psychological empowerment. The number of studies focusing on these indicators was not sufficient for a reliable assessment of the heterogeneity in the impact estimates, either with a meta-analysis or with a narrative synthesis.

Investigation of heterogeneous effects for subgroups
We also investigated factors explaining heterogeneity by using inverse-variance weighted meta-regressions and stratified meta-analysis according to contextual and methodological moderator variables. We used two contextual moderating variables: type of intervention component; and geographic location.
We used a narrative synthesis to explore heterogeneity in the results for these subgroups because our sample of studies was relatively small. For this analysis, we integrated the findings of the qualitative analysis with the findings of the quantitative analysis to the extent possible. Hence, the potential catalysts and constraints toward the effectiveness of self-help groups that we present came from both the quantitative and the qualitative studies.

Sensitivity analysis
We performed an extensive sensitivity analysis for two methodological effect size moderators:  Risk of bias status for each risk of bias category (where sufficient studies were available).
We used an iterative approach based on the risk of bias assessment discussed previously to determine whether studies with different evaluation designs and different outcome measures could be combined. First, we conducted stratified metaanalyses for the randomized controlled trials and quasi-experimental evaluations in our sample. Second, we conducted meta-analyses for experimental and quasiexperimental studies with low, medium, and high risk of bias, respectively. Third, we compared the effect sizes of the different analyses to determine whether studies could potentially be combined into a single meta-analysis. In those cases in which we were not certain whether we could combine studies in a single meta-analysis, we conducted several meta-regressions to make decisions about combining studies with different characteristics in one meta-analysis. We decided to combine studies in a single analysis when the meta-regression did not show significant, either substantively or statistically, differences in the effect sizes between the studies with different risks of bias. In addition, we conducted robustness checks to determine whether studies with different outcome measures that potentially measure different empowerment constructs in the same empowerment domain (economic, social, psychological, or political empowerment) could be combined with each other in a single analysis.
We decided not to conduct meta-regressions with more than one explanatory variable because of the relatively small number of studies. Instead, we chose an iterative method in which we conducted several meta-regressions one by one to determine whether the results from studies with different methodologies and different risk of bias were sufficiently similar to combine in one meta-analysis. We started with a meta-regression to determine whether studies with studies with a low or high risk of selection bias were sufficiently similar to each other. Our approach was such that when the meta-regression presented significant, either substantively or statistically, differences between studies with a low and high risk of selection bias we excluded studies with a high risk of selection bias from the analyses. But we kept the studies with a high risk of selection bias in the analyses when the result did not show substantive or statistically significant differences between studies with a high and a low risk of selection bias. Then we continued with a meta-regression to compare findings between randomized controlled trials and quasi-experimental studies with a medium risk of bias (our synthesis did not include quasi-experimental studies with a low risk of selection bias or RCTs with a high risk of selection-bias) to see if there would be a difference between RCTs and quasi-experimental studies with medium risk of selection-bias. Similarly, we excluded quasi-experimental studies with medium risk of selection bias from the analyses if the meta-regression suggested the findings of RCTs and quasi-experimental studies with a medium risk of selection bias were significantly, either substantively or statistically, different. But we combined the studies in a single meta-analysis if the findings of RCTs and quasiexperimental studies with a medium risk of selection bias were not substantively or significantly different from each other.
We used the same approach for different risks of bias (performance bias, outcome reporting bias, and other biases) to arrive finally at a preferred specification with randomized controlled trials with low risks of bias combined with randomized controlled trials and quasi-experimental studies with a higher (medium or high) risk of bias that did not show substantively or statistically significant different effects from randomized controlled trials with a low risk of bias. But our approach was such that we only combined RCTs with quasi-experimental studies that showed similar results in one meta-analysis to account for the possibility of selection bias in quasiexperimental studies.
We used a similar approach to determine whether studies with different outcome measures to measure the same empowerment construct (economic, social, political, and psychological empowerment) could be combined with each other in one metaanalysis. For this decision we estimated meta-analysis with and without the study or studies with a different outcome measure. We excluded studies with different outcome measures from the meta-analysis or ran a separate meta-analysis if the analysis without those studies showed substantively different effects from the analysis with those studies.
In addition, we performed a sensitivity analysis to determine whether studies with different outcome measures that could potentially measure different empowerment constructs can be used in the same meta-analysis by running meta-analysis with and without studies that use different outcome variables.
We did not conduct meta-regression analysis with more than one moderator variable in our sensitivity analysis because of the relatively small number of quantitative studies in our review.

Assessment of publication bias
We assessed the potential for publication bias using funnel plots for impact estimates on economic and social empowerment. In addition, we conducted Egger's test. For psychological and political empowerment, our sample size was insufficient for funnel plots to be informative about the potential for publication bias. Our sample size for political and psychological empowerment was also not sufficient for determining publication bias by comparing published with non-published studies.

Qualitative synthesis
The qualitative synthesis (review objective 2) was based on meta-ethnographic techniques. This process was drawn from Atkins et al. (2008), Noblit and Hare (1988) and Walsh and Downe (2005). Meta-ethnography is an interpretive approach for combining the findings of qualitative research in order to provide a higher level of analysis than individual studies alone.
Our qualitative synthesis provides a summary of women's explanations of empowerment outcomes as reported in the contributing studies. The manuscripts of the included studies were first read and reread with special attention paid to themes, quotations and authors' interpretations of the quotations. Quotations from women who discussed their experiences of empowerment were then identified and labeled with respect to the topic or concept that they represented. All quotations that were labeled or coded were subsequently categorized into empowerment themes. This process included re-reading all labels or codes and deciding which codes were important and how they related to each other. Codes that related to similar themes were clustered together into categories that were also labeled. These categories or themes are presented in the results section with example quotations as evidence, in order to deepen readers understanding of the data. We used a systematic process to select and synthesize representative quotes from women SHG members. The selection of representative quotes was an iterative process in which two researchers identified quotations and discussed emergent themes from the included studies and determined how they were related, or dissonant, through a compare-and-contrast exercise. Typically in qualitative research, authors report 1-2 example quotations but we also provide additional quotations in Appendix 12 to improve readers' sense of the raw data and to demonstrate both the variability and the similarity between studies. Reporting 1-2 example quotations may result in reporting bias due to "cherry-picking" of non-representative quotations. Unfortunately, it is not feasible to fully account for reporting bias in qualitative research. Nonetheless, our approach, in which we provide additional representative quotes, mitigates some of the concern regarding reporting bias.
In a summary table, each category is defined, two representative quotations are given and the confidence in the findings for each category was assessed based on three areas: 1) the risk of bias assessment of the contributing studies, 2) the adequacy of the data and 3) the coherence of the theme that supported the finding. The risk of bias for each of the contributing studies is reported in the summary table based on the results of the CASP checklist as described earlier. Adequacy relates to consideration of the thickness of data and the number of studies. Thick data is achieved when detailed account of participants' experiences make explicit the phenomenon of interest. This is in contrast to a thin description, which is a more superficial account. Coherence relates to the strength of the theme across settings such as countries or regions. Based on an overall assessment of methodological quality through the risk of bias, as well as the adequacy and coherence of the data, the confidence in the evidence for each category was assessed as high, moderate, or low by two researchers and if assessed differently, they discusses until consensus was reached. A rationale with details about each confidence area is given in a summary table. The process of assessment of confidence we use is in alignment with the methodology used in Bohren et al. (2015).

Integrating findings from quantitative and qualitative syntheses
To integrate the findings from quantitative and qualitative synthesis, we conducted the synthesis of effects along the causal chain of the theory of change (Figure 1.1) and used the findings of the qualitative synthesis to "interrogate" and/or complement the quantitative synthesis. The information from participants gathered through qualitative investigations was used to understand whether and where any causal chain links broke down. In other words, findings from the qualitative synthesis helped describe, explore, and interpret both the nature of the empowerment process and the extent to which women experienced empowerment as recommended in the policies and guidelines of the Campbell Collaboration (Campbell Systematic Reviews, 2014).
The mixed methods review allowed us to gather information using different methodologies that informed, enhanced, and supplemented each other. The findings from the integrated synthesis were used to revise and improve our theory of change.
We did this by using information extracted from the included studies and provided insights about the nature and utility of the measures used to capture empowerment. Our aim was to synthesize the evidence produced by both bodies of research to capture the state of the evidence for the impact of self-help groups on women's empowerment.

DEVIATIONS FROM THE PROTOCOL
The review deviates from the proposed protocol in three respects.
First, we originally intended to exclude outcomes evaluating "women's control over household resources" from microcredit self-help group studies so as not to overlap with an existing Campbell review on the impact of microcredit on women's control over household resources (Vaessen et al., 2014). However, that review does not focus specifically on self-help groups, nor does it disaggregate findings for self-help group participants and non-self-help group participants. At the same time, excluding women's control over household resources would have resulted in a considerable omission of an important outcome and undermined the comprehensiveness and value added of our review. For the sake of completeness, we have, therefore, decided to include women's control over household resources as a relevant outcome measure of economic empowerment in our review.
Second, in our original study design inclusion criteria, we specified we would only include those types of quasi-experimental studies that used statistical matching, difference-in-differences estimation, instrumental variables regression, or other forms of multivariate analysis (such as Heckman's selection models) that correct for selection bias. However, we decided also to include studies that used multivariate cross-sectional regression analysis with a dummy variable for SHG participation as a treatment variable. The identification strategy to determine causal effects of these types of studies is usually not considered credible, which may result in high risk of bias. Nevertheless, Pritchett and Sandefur (2013) proposed that including these types of studies in a meta-analysis can increase the relevance of the meta-analysis because it allows for the inclusion of studies in contexts without rigorous studies regarding the specific topic. However, we protected internal validity by a strong focus on risk of bias assessment and by conducting subgroup analyses for studies with a relatively low, medium, or high risk of bias (Campbell Systematic Reviews, 2014). In these analyses, we assessed all multivariate cross-sectional regression analysis with a dummy for SHG participation as high risk of selection-bias in a meta-regression. We then compared the estimates from studies with a high risk of selection-bias with the estimates of studies with a low-or medium risk of selection-bias. Section 4 of this systematic review shows that the findings of our review are sensitive to the inclusion of multivariate cross-sectional regression analysis. Thus, we emphasize the findings of studies with a low-or medium risk of selection-bias in the interpretation of our results.
Finally, in the protocol, we proposed to provide an overall risk of bias classification for each included study. However, to align with the most recent Campbell Collaboration best practice, we avoided using an overall quality scale and instead used risk of bias assessments for specific domains, such as selection bias and confounding, performance bias, outcome and analysis reporting bias, and other biases. Evidence suggests that assessments of overall risk of bias that do not take into consideration specific domains are too dependent on the type of quality scale used and can considerably influence the interpretation of meta-analysis results (Jüni et al., 1999). This risk of randomness in the risk of bias assessment is most likely less severe when risk of bias assessments focus on a specific domain, such as selection bias and confounding, performance bias, outcome and analysis reporting bias, or other biases.

RESULTS OF THE SEARC H
The search was conducted from March 2013-February 2014. We included a total of 23 quantitative and 11 qualitative studies in the final analysis. Figure 4.1 details the flow diagram of the filtering process used to identify the final included studies.
Initially, we reviewed 3,536 abstracts from electronic database searches and 351 abstracts from the gray literature search (see Appendix 3). Of these, we excluded 38 duplicates and 3,133 irrelevant studies. We retrieved and reviewed the full text of the remaining 365 studies using the predetermined criteria for inclusion. These studies came from database searches including library catalogues (208), hand-searches of websites (108), keyword searches (48), and author contacts (2).
Based on the full-text review of the 365 studies, we excluded 257 studies when applying the criteria. There was 93 per cent agreement among reviewers. The following were the main reasons for exclusion:  The study did not meet our criteria of an empirical evaluation (145).  The intervention under study did not meet our criteria of a women's economic self-help group (88).
 The evaluation design did not employ appropriate methodologies (12).  The evaluation did not measure an empowerment outcome (7).  The study was not focused on a self-help group in a low-or middleincome country (4).

Figure 4.1: Study search
We reviewed the remaining 109 full-text studies (55 quantitative, 36 qualitative, 18 mixed methods) again, with specific attention paid to the methods employed. Through this process, we excluded another 74 studies because of the lack of a comparison group, a lack of quantitative estimates of impacts, a lack of a use of empowerment outcomes for quantitative studies, and a lack of data from direct observation or a lack of reporting on individual narratives for qualitative studies. Reasons for exclusion by study are reported in Appendix 4. The remaining 23 quantitative and 12 qualitative studies were included and used as the basis of the analysis that follows. Most studies were identified through database searches and came from peer-reviewed journals.    12 Study did not measure empowerment outcomes.

11
There was no comparison group.

7
Study did not evaluate a SHG.

4
There was no quantitative estimate of impact.

Qualitative Studies (n=46)
29 Study did not evaluate the effects of a SHG.
14 Study did not report any direct quotes from participants.
3 Study did not focus on empowerment outcomes.

Source of Included Study Quantitative Qualitative
Database searches 12 6 Keyword searches 6 2 Hand-searching of organization websites 5 2 Library Catalogue --1 Key contact 1 --

Quantitative studies (review objective 1)
The empowerment categories extracted from the quantitative studies were handled in the following way: Economic empowerment: We included all studies measuring indicators of economic empowerment, but only meta-analyzed outcomes that focused on decision-making by women in the household. For the other indicators, we did not have a sufficient number of studies with outcome measures that were sufficiently conceptually similar to perform meta-analysis. We report the effect size findings narratively for these other indicators.

Political empowerment:
We included all studies measuring indicators of political empowerment and meta-analyzed outcomes that focused on political participation, with an emphasis on voting. For the other indicators, we did not find any rigorous studies.

Social empowerment:
We included all studies measuring indicators of social empowerment and meta-analyzed outcomes with an emphasis on mobility or freedom of movement and control over family size decision making jointly and separately. For the other indicators, we did not find an adequate number of studies with outcome measures that were sufficiently conceptually similar to perform metaanalysis. We report the effect size findings narratively for these other indicators.

Psychological empowerment:
We included all studies measuring indicators of psychological empowerment and meta-analyzed outcomes that focused on selfconfidence. For the other indicators, we did not find any studies.
All indicators were measured through household surveys, validated scales, and/or structured closed-ended questionnaires. Aggregate-level empowerment outcomes such as women's right to vote, legislation against domestic violence, inheritance law, female literacy, female child survival, and so on, were excluded from this review, also because these indicators were not clearly related to women's SHGs.
We also examined spillover effects from women's self-help group participants to nonparticipating women in the same communities. Furthermore, we examined adverse outcomes including intimate partner violence, stigma, disappointment and reduced subjective well-being. Table 4.3 summarizes data on the SHG name, country, type of training provided, the outcome and methods used for the 23 included quantitative studies representing data from 21 SHGs, predominately based in South Asia. Of the evaluated self-help groups, 11 were implemented in India and 6 were implemented in Bangladesh. The remaining studies came from Thailand (1), South Africa (1), Ethiopia (1), and Haiti (1). Two self-help groups (one from India and one from South Africa) were discussed in two quantitative papers. One study consisted of two separate analyses for samples in two different regions in Ethiopia (Desai & Tarozzi, 2011). All of the study findings were based on analyses of self-reported survey data either based on experimental or observational designs.
Although in most cases detailed information on the intervention activities was not recorded clearly, several studies present some information about whether any training or services was offered to the SHG and the type of training offered. Ten of the self-help groups did not report any additional training or services beyond financial services (credit, loans, and savings). The remaining 11 groups offered some combination of the following: health education (4), business or entrepreneurial skills (6), awareness of women's rights (2), basic education (2), and communitydevelopment training (2). However, this list of training and supplemental activities only represents what was reported by authors.
All of the included self-help groups were initiated by local or international NGOs and community-based organizations. Four of the 20 groups were initiated by the Grameen Bank and three by the Bangladesh Rural Advancement Committee (BRAC). Only two of the groups, represented in three studies, were initiated as the intervention arm of a research study (Pronyk et al., 2006;Kim et al., 2009;Sherman et al., 2010).
The study designs and methods of analysis used in the studies were very diverse. Four studies used cluster-randomized assignment (Desai & Joshi, 2012;Desai & Tarozzi, 2011;Kim et al., 2009(incorporating Pronyk et al., 2006; Sherman et al., 2010). The remaining studies were based on observational data using methods of counterfactual identification such as propensity score matching (PSM) , PSM combined with double-differences (Deininger & Liu, 2009) and instrumental variables analysis (Osmani, 2007;Pitt et al., 2006). Methods used to estimate treatment effects ranged from ordinary least squares regression analysis (for example, Osmani, 2007) and logistic regression (for example, Ahmed, 2005) to the calculation of risk-or odds ratios based on events/non-events (for example, Swendeman et al., 2009).

Qualitative studies (review objective 2)
The 12 qualitative studies were also predominately from South Asia. Nine studies focused on SHGs in India. The remaining studies came from Nepal (1), Bolivia (1) and Tanzania (1). Table 4.4 describes the included qualitative studies, including the name of the SHG, the setting, the sample, the data collection, and the methods of analysis.
Most of the qualitative data were drawn from purposive or convenience samples of SHG participants through unstructured or semi-structured indepth interviews. Two studies (Dahal, 2014;Ramachandar & Pelto, 2009) randomly selected participants. Two other studies, (Maclean, 2012;Mercer, 2002) used a case study methodology to describe how SHGs operate within a village context. Six studies (Dahal, 2014;Knowles, 2014;Kilby, 2011;Maclean, 2012;Mercer, 2002;Sahu & Singh, 2012) used focus groups in addition to individual interviews. Most of the studies did not name the specific qualitative theory behind their analysis methodology but descriptions of their analysis process indicated that most studies used some adaptation of grounded theory, content analysis or thematic analysis techniques.
We present further details in the qualitative synthesis below (Chapter 4.4).

Risk of bias of quantitative studies
We relied on a risk of bias tool with 71 criteria that were related to selection bias and confounding, performance bias, outcome and analysis reporting biases, and other biases. The complete tool and a detailed assessment of the risk of bias of each individual quantitative study can be found in Appendices 7.6 and 7.8. Figure 4.2 shows that only three of the 23 quantitative studies were rated as having a low risk of selection bias. Each of these studies was a cluster-randomized controlled trial with a sufficient sample size to ensure equivalence in observable and unobservable characteristics across the treatment and the control group. RCTs with a small sample size were rated as having a medium risk of selection bias because the studies usually did not show sufficient evidence that there was equivalence in observable characteristics. In addition, quasi-experimental studies were usually not convincing in their claims that selection bias was no longer an issue after controlling for observable characteristics with statistical tools, such as propensity score matching and multivariate regression analysis. We rated studies that used propensity score matching with a large number of plausibly exogenous control variables as having a medium risk of selection bias and studies that used multivariate regression analysis as having a high risk of selection bias.
Of the 23 quantitative studies, five studies were rated as having a low risk of performance bias. These studies usually had a control or comparison group that was not in direct contact with the beneficiaries of the intervention to ensure the control or comparison group was not contaminated by the intervention or the adoption of practices by beneficiaries of the intervention as a result of their SHG membership. Studies that included a comparison group that was in direct contact with the beneficiaries but that took measures in their analysis or sampling strategy to consider this were rated as having a medium risk of performance bias. For example, Banerjee et al. (2015) acknowledged that the control group was contaminated by other microfinance services similar to the intervention they evaluated. However, the authors also demonstrated that the uptake of microcredit by beneficiaries was significantly higher in the treatment villages. Hence, performance bias could be rated as medium in this specific study. Other studies that included a comparison group that was in close contact with the beneficiaries were rated as having a high risk of performance bias.
Of the 23 included studies, six studies were rated as having a low risk of outcome and analysis reporting bias. These studies did not show signs of inconsistent reporting or unusual types of analyses. Several other studies were labeled as medium risk of outcome and analysis reporting bias because of unclear explanation of the outcome variables or the use of potentially flawed analyses. For example, we rated studies that used potentially endogenous variables as explanatory variables as having a medium risk of outcome and analysis reporting bias. In these cases the outcome equations were potentially incorrectly specified. Finally, several studies did only show tables for outcome variables that were significantly affected by self-help groups and not for outcome variables that were not significantly affected. We labeled these studies as having a high risk of outcome and analysis reporting bias because of the potential for publication bias. We also labeled studies that used an explanatory variable with the amount of time that respondents were members of SHGs as having a high risk of outcome and analysis reporting bias. Such explanatory variables increase the risk of bias due to a lack of accounting for potential nonlinearities in the impact estimates of SHGs.
Finally, of the 23 included studies, eight were rated as having a low risk of other bias. These studies did not show any other potential biases. But another 38 per cent of the studies were rated as having a medium risk of potential bias, for example, because studies did not explain well whether authors took measures to mitigate concerns regarding the measurement of potentially sensitive outcome variables, such as domestic violence. Studies with a high risk of other biases included studies that relied extensively on recall data for outcome variables, which raised the likelihood of social desirability bias. For example, SHG members may have had the perception that enumerators would like to hear that SHG membership has resulted in improvements in autonomy. Under such circumstances, the respondents might have an incentive to underestimate their level of autonomy before the start of their SHG membership and to overestimate their level of autonomy after the start of the SHG membership.
There was almost complete agreement between the two reviewers in assessments of the risk of selection and performance bias, but initially there were more disagreements about the risk of outcome and analysis reporting biases and other biases. In first instance, the reviewers disagreed about the risk of selection bias and confounding for one of the 23 included studies (Desai & Tarozzi, 2011), risk of performance bias for two of the 23 included studies (Holvoet, 2005;Sherman et al., 2010), risk of outcome and analysis reporting bias for seven of the 23 included studies (Ahmed, 2005;Sherman et al., 2010;Rosenberg et al., 2011;Swendeman et al., 2009;Pitt et al., 2006;Nessa et al., 2012) and other biases for 11 of the 23 included studies (Coleman, 2002;Banerjee et al., 2015;Desai and Joshi, 2012;Garikipati, 2008;Garikipati, 2012;Kim et al., 2009;Pronyk et al., 2006;Mukherjee and Kundu, 2012;Rosenberg et al., 2011;Osmani, 2007;Steele et al., 1998). However, in all cases where there was no immediate agreement, the reviewers reached agreement about the risk of bias assessment through consensus.

Quality of qualitative studies
The appraisals of the qualitative studies are summarized in Figure 4.3 and assessments by study are included in Appendix 9. 5 The nine-question tool aimed to determine whether a study was valid if the results were reported adequately and if the findings would be helpful locally. The nine studies were considered valuable based on responses to two screening questions and seven assessment questions. There was almost complete agreement between the two researcher assessors. In two cases associated with consideration of ethical issues and one case associated with the relationship between the researcher and the participants, one researcher felt that she could not tell whether a criterion was met, whereas the other researcher was able to identify the information to answer the criteria (Pattenden, 2011;Ramachandar & Pelto, 2009;Kumari, 2011). 5 Details of the quality appraisal assessment criteria are in Appendix 5.   Studies that received a "can't tell" or "no" did so for several main reasons. With respect to the recruitment strategy, authors did not always explain how the participants were selected and why this selection could be considered the most appropriate sampling strategy for the study. There was also not sufficient explanation of the recruitment process such as who chose to participate and who declined. With respect to data collection, authors did not adequately justify why they had chosen one method over another. Few authors described their data collection tools such as interview guides or their data format such as tape recordings or handwritten notes. No author mentioned data saturation as a reason for stopping recruitment. Most authors did not report information about the researcherparticipant relationship and did not examine the potential bias and influence they introduced during all aspects of the study. In addition, very few authors described whether and how ethical standards were maintained (such as informed consent). The authors also did not discuss any ethical issues that the study raised. Finally, many studies lacked an in-depth description of the data analysis process both in terms of the methodology used and how the analysis was carried out.

SYNTHESIS OF QUANTITATIVE STUDIES
This section presents results of meta-analysis of the effects of women's self-help groups on women's economic, social, psychological, and political empowerment and intimate partner violence (review objective 1). In addition to the preferred specification for economic, social, psychological, and political empowerment and intimate partner violence, we also present an extensive sensitivity analysis with separate impact estimates for studies with high, medium, and low risk of bias, and randomized controlled trials and quasi-experimental evaluations. Further, we analyze heterogeneity by comparing effect sizes across geographic contexts, although our sample size only permitted a narrative analysis of the differences across geographic contexts. We also present a narrative analysis to determine the separate effects of different components of self-help groups, such as microcredit, microsavings, and training. Finally, we present a narrative analysis to determine differences in effect sizes between studies within the same empowerment domain that have different outcome measures and might thus measure different empowerment constructs.

Economic Empowerment
Of the 23 included quantitative studies, ten included an impact estimate on women's economic empowerment that we were able to include in our meta-analysis, and eight included an impact estimate on women's economic empowerment but did not allow for determining the effect size of the intervention. We summarize the measurement of economic empowerment and the feasibility to include studies in the meta-analysis in Table 4.5.
Normalized index score that includes variables that measure the decision-making power of the female respondent in the household.
Normalized score from 0-1 Yes Coleman (1999) Several variables that emphasize the female ownership of assets.

Several binary variables
No; not able to estimate effect size De Hoop et al., Dummy variable that is 1 for women who make decisions about food expenditures.

Binary Yes
Deininger and Liu (2013) Dummy variable that is 1 for women who are able to save individually.

Binary
Yes; after imputing the standard deviation Desai and Joshi (2012) Several dummy variables associated with women's decision-making power about schooling and health expenditures.
Several dummy variables associated with women's decision-making power with respect to medical expenditures, borrowing, and housing repairs.

Several binary variables
No; not able to estimate effect size Swendeman et al. (2009) Dummy variables related to decision-making power of female sex workers.

Several binary variables Yes
The table demonstrates that women's empowerment was measured in different ways across studies. However, with a few exceptions, women's economic empowerment was reflected in women's bargaining power or decision-making power. We were not able to include the few studies that do not measure women's bargaining power but another component of women's economic empowerment in the meta-analysis because we were not able to calculate effect sizes for these specific studies. We discuss the results of these studies in a narrative synthesis. The measurement of women's bargaining power was mostly associated with decisions about expenditures and borrowing, but for the specific case of sex workers bargaining power was also associated with decision-making power about the number of clients for the sex worker.
The measurement of women's bargaining power might thus measure a different construct for sex workers. Therefore, we conducted meta-analyses with and without studies that measure women's bargaining power for sex workers. In addition, we also conducted a meta-analysis without the study of Deininger and Liu (2013) who emphasize women's ability to save individually. Although this concept might be related to women's bargaining power, women's ability to save individually could also be considered a different construct. evidence from 4 studies), but one which is not statistically significant at the 95 per cent level. The meta-analysis also suggests strong heterogeneity in the impact estimates of women's self-help groups on women's economic empowerment.
Observed heterogeneity in effect sizes ranged from 0.01 to 0.45 standard deviations and statistical tests suggest there is support that this heterogeneity is real rather than due to random sampling error (Q=16, Tau-sq=0.04, I-sq=81%). However, we are not able to interpret I-squared as an absolute indicator of heterogeneity (Borenstein et al., 2009;Higgins, 2011), and the estimate of the variance component tau-squared is low, suggesting the level of between-study heterogeneity may be limited. We should be careful in interpreting these results, however, as these tests are not always appropriate for a small number of studies (ibid.). There are various differences in the implementation, context and risk of bias of the RCTs. First, there was heterogeneity in the types of self-help groups that were evaluated using randomized controlled trials. For example, the study by Banerjee et al. (2015) focused on a self-help group intervention without a training component. And the study of Sherman et al. (2010) assessed the impact of a women's self-help group program on the economic empowerment of female sex workers. Arguably, the included studies were not fully comparable to each other and this needed to be taken into consideration in a sensitivity analysis. We illustrate this by a meta-regression that demonstrated that the estimated effect sizes on economic empowerment of RCTs of interventions with a training component were substantively and statistically significantly higher (SMD=0.31, 95% CI=-0.16, 0.45; Q=0.6, Tau-sq=0.00, I-sq=0%; evidence from 3 studies) than the effect size of the study by Banerjee et al. (2015). The changes in the confidence interval and the reductions in the indicators to measure heterogeneity after excluding the studies without a training component also suggest that SHG programs with training have substantively higher effect sizes on women's bargaining power than SHG programs without a training component and that heterogeneity is mostly caused by including studies with a training component. The effect sizes of each of the studies included in Figure 4.4 were all potentially subject to various biases despite the random allocation of the intervention. Both the study of Sherman et al. (2010) and Kim et al. (2009) were rated as having a medium risk of selection bias due to the small sample size of these studies. Furthermore, the study of Banerjee et al. (2015) was rated as having a medium risk of performance bias because of contamination of the control group by various other microfinance initiatives. In addition, the study of Sherman et al. (2010) was rated as having a high risk of performance bias because the control group lives in the same locality as the beneficiaries of the intervention, which may result in spillovers. Meta-regressions did not suggest statistically significant differences in effect sizes between RCTs that were rated as having differential risks of bias. Nonetheless, the evidence for heterogeneity suggested that we were not able to derive strong conclusions about the effects of women's self-help groups on women's economic empowerment based on these studies alone. Furthermore, statistical heterogeneity in the estimates suggested that it might be beneficial to include additional studies with a higher degree of precision.
We conducted a separate meta-analysis of the effects of women's self-help groups on women's economic empowerment based on quasi-experimental evaluations ( Figure  4.5). From the analysis, it appears that women's self-help groups have a positive effect on women's economic empowerment, which is statistically significant at the 95 per cent level (SMD=0.32, 95% CI=0.14, 0.50; evidence from 6 studies). Again, the metaanalysis suggested strong heterogeneity. Effect sizes of the studies ranged between 0.03 and 1.15 standard deviations, while statistical heterogeneity tests suggested that a substantial percentage of the observed heterogeneity in the effect size is real rather than random sampling error (Q=29, I-sq=83%), albeit with a small estimated variance component (tau-sq=0.03). Additional analysis suggested that the heterogeneity in the impact estimates could be partly explained by the inclusion of quasi-experimental studies with a high risk of selection-bias. Meta-analyses of quasi-experimental studies with a medium and high risk of selection-bias indicated that the impact estimate of studies with a high risk of selection-bias is notably higher than the impact estimate of studies with a medium risk of selection-bias ( Figure 11.1 and 11.2 in Appendix 11). The meta-analyses indicated that the impact estimate of quasi-experimental studies with a high risk of selection bias was on average 0.65 standard deviations (SMD=0.65, 95% CI=0.33, 0.98; Q=29, Tau-sq=0.04, I-sq=42%; evidence from 3 studies), which is approximately three times as high as the effect size for RCTs (0.22 standard deviations). The average impact estimate of studies with a medium risk of selection bias of 0.17 standard deviations (SMD=0.17, 95% CI=0.03, 0.34; Q=9, Tau-sq=0.01, I-sq=78%; evidence from 3 studies) is much closer to the impact estimate of randomized controlled trials. These results therefore suggested that we could pool RCTs and quasi-experimental studies with a medium risk of selection-bias.
Meta-regressions presented further evidence for the inability to pool randomized controlled trials and quasi-experimental studies with a high risk of selection-bias. The estimated effect sizes on economic empowerment of RCTs were substantively and statistically significantly lower than the effect sizes of quasi-experimental studies with a high risk of selection-bias (β=-0.44; 95% CI=-0.81, -0.07). At the same time, meta-regression indicated that the estimated effect sizes on economic empowerment of RCTs were not statistically significantly different from the effect sizes of quasi-experimental studies with a medium risk of selection-bias (β=-0.04; 95% CI=-0.09, 0.29). Based on these meta-analyses and meta-regressions we decided to only pool randomized controlled trials and quasi-experimental evaluations with a medium risk of selection-bias.
Further analyses of the quasi-experimental studies with a medium risk of selectionbias did not suggest evidence for differences in estimated effect sizes between evaluated self-help groups with and without a training component. A metaregression indicated that the estimated effect sizes on economic empowerment of quasi-experimental studies with a medium risk of selection-bias for SHGs with a training component are 0.06 SD higher than quasi-experimental studies with a medium risk of selection-bias focusing on SHGs without a training component. The results were, however, not statistically distinguishable from each other at the 5 per cent significance level (β=0.06; 95% CI=-0.33, 0.45).
The meta-analysis of quasi-experimental evaluations with a medium risk of selection-bias also indicated that studies with a high risk of spillovers might underestimate the impact of women's self-help groups on women's economic empowerment possibly because of contamination of the comparison group. A metaregression indicated that the estimated effect size of quasi-experimental studies with a medium risk of selection-bias and a high risk of performance bias is statistically and significantly lower than the estimated effect size of quasi-experimental studies with a medium risk of selection-bias and a low or medium risk of performance bias (β=-0.17; 95% CI=-0.06, -0.28). We explore this relationship further in the pooled analysis of randomized controlled trials and quasi-experimental studies with a medium risk of selection-bias.
Finally, we conducted meta-analysis of randomized controlled trials and quasiexperimental evaluations with a medium risk of selection-bias to determine the pooled effects of women's self-help groups on women's economic empowerment (Figure 4.6). The analysis suggests that women's self-help groups have a positive effect of 0.18 standard deviations on women's economic empowerment. The effect is statistically significant at the 95 per cent level (SMD=0.18, 95% CI=0.05, 0.31; evidence from 7 studies). The analysis also indicated strong statistical heterogeneity in the impact estimates (Q=46, Tau-sq=0.02, I-sq=87%) with effect sizes ranging from 0.01 to 0.45 standard deviations.   Table 4.6 summarizes the results of all meta-analyses with an emphasis on economic empowerment. Interestingly, the study by Sherman et al. (2010), which studied the bargaining power of sex workers towards clients, did not show an effect size that was either substantively or statistically significantly different from the effect sizes of the other studies. The interpretation of our results thus did not change when we excluded this study. Similarly, our results did not change substantively when we excluded the study of Deininger and Liu (2013), which focuses on women's ability to save individually.
We did not find evidence for differences in effect sizes of studies with a low or medium risk of spillovers and studies with a high risk of spillovers in the pooled sample. Our analyses also did not suggest evidence for significant differences in effect sizes between studies with low, medium, and high outcome and analysis reporting and other biases, respectively. A number of quasi-experimental studies could not be included in the meta-analysis. However, excluding these studies would not have significantly, either substantively or statistically, changed the results from the meta-analysis. Either the results of these studies were not very different from the results of the meta-analysis or the risk of bias of the study would have been too high to be included in the preferred specification for the meta-analysis. Coleman (2002) found positive but small effects of women's self-help groups in Thailand on women's economic empowerment. These results could not be included in the meta-analysis because Coleman (2002) focused on the effects of time in self-help groups rather than the effects of participation in self-help groups. In addition, the study did not focus on women's bargaining power but on women's ownership of assets, such as land, so the outcome indicators were not considered comparable to other studies. Garikipati (2008Garikipati ( , 2012 assessed the impact of women's self-help groups on different components of women's empowerment, including women's bargaining power as well as other components of women's economic empowerment, but found no evidence of positive effects. These studies were not included because Garikipati (2008Garikipati ( , 2012 used the time in self-help groups rather than the participation in self-help groups as an explanatory variable. Holvoet (2005) found that self-help groups had bigger positive effects on women's economic empowerment when self-help groups provided training in addition to financial services. However, although the study focused on women's bargaining power, the study was considered high risk of selection bias. Husain et al. (2010) suggested positive effects of women's self-help groups on economic empowerment, including women's bargaining power, but did not present the point estimates regarding the impact of women's self-help groups, and the study was considered high risk of selection-bias. Finally, Mukherjee and Kundu (2012) also suggested a positive effect of women's self-help groups on women's economic empowerment, again including women's bargaining power. However, they did not present the quantitative point estimates, and the study was considered high risk of selection-bias.

Social Empowerment
We also synthesized the effects of women's self-help groups on women's social empowerment using meta-analysis. Of the 23 included quantitative studies, ten included an impact estimate that we were able to include in meta-analysis, and five included an impact estimate for women's social empowerment but did not allow determination of the effect size of the intervention (Table 4.7). Analysis of outcomes indicated that social empowerment relates to two types of outcome variables: 1) outcome variables that are associated with women's mobility; and 2) outcome variables that relate to reproductive behavior and the bargaining power of women over family-size decision-making. We therefore conducted both pooled and stratified meta-analyses of studies according to these constructs. Our meta-analysis commenced with the synthesis of results from randomized controlled trials. The meta-analysis was based on three studies, two of which showed close to identical point estimates. However, the analysis also indicated strong heterogeneity in the impact estimates (Figure 4.7). The effect sizes ranged from -0.23 to 0.45 standard deviations, and the pooled effect size was not statistically significantly different from zero (SMD=0.31, 95% CI=-0.09, 0.70; Q=3, Tau-sq=0.06, I-sq=38%; evidence from 3 studies).
There were also potentially important differences between the three studies included in the meta-analysis. Two of the studies focused on family-size decision-making (Desai & Tarozzi, 2011;Desai & Joshi, 2012), while the study of Kim et al. (2009) presents the impact of a self-help group on an outcome variable associated with the challenging of gender norms by the women respondents. Unfortunately, the latter outcome variable was not very well explained in the paper, but we interpret it as being associated with women's family-size decision-making because the intervention mostly focused on that aspect of women's social empowerment. Second, each of the studies took place in a different part of the world. The study of Desai and Tarozzi (2011) focused on Ethiopia, while the study of Kim et al. (2009) presented impact estimates in the setting of South Africa. Finally, Desai and Joshi (2012) focused on the impact of women's self-help groups on women's social empowerment in the context of India. Third, although the studies of Desai and Joshi (2012) and Kim et al. (2009) both include a training component, the study of Desai and Tarozzi (2012) focused on a self-help group intervention without a training component. Fourth, there were differences in the risk of bias assessment across the three studies. Clearly, the sheer number of differences between the three different studies made it impossible to explain the differences in the effect sizes across the three studies based on a quantitative analysis alone. Therefore, we refrained from undertaking a metaregression to examine the differences in the effect sizes.

Figure 4.7: The effect of women's self-help groups on women's social empowerment (randomized controlled trials)
We interpreted the findings of the RCTs as evidence for positive effects of SHGs on women's family-size decision-making power and not of evidence for positive effects on women's mobility. None of the RCTs focused on women's mobility. In later stages of our analysis we found evidence that women's family-size decision-making power and women's mobility should not be considered part of the same construct.
We also conducted meta-analysis of quasi-experimental evaluations examining the effects of women's self-help groups on women's social empowerment (Figure 4.8  Our analyses suggested that part of the heterogeneity in the effect sizes can be explained by differences in the risk of selection-bias across quasi-experimental studies. We found strong differences between the effect sizes of studies with a high and medium risk of selection-bias, respectively ( Figure 11.5 and 11.6 in Appendix 11). The average effect size for quasi-experimental studies with a high risk of selectionbias was estimated at 0.37 standard deviations (SMD=0.37, 95% CI=0.18, 0.56; Q=3, Tau-sq=0.01, I-sq=10%; evidence from 4 studies), and statistically significantly different from the average of 0.13 standard deviations for quasi-experimental studies with a medium risk of selection bias (SMD=0.13, 95% CI=0.07, 0.19; Q=1, Tau-sq=0.00, I-sq=0%; evidence from 3 studies). Our analysis thus suggested studies with a high risk of selection-bias were biased and should not be pooled with studies with a medium risk of selection-bias. Meta-regression confirmed that the estimated effect size for quasi-experimental studies with a high risk of selection-bias was significantly higher than the estimated effect size on social empowerment of quasi-experimental studies with a medium risk of selection-bias (β=0.22, 95% CI=0.06, 0.39). Based on these analyses we concluded that, while we could pool quasi-experimental studies with a medium risk of selection-bias with randomized controlled trials, we should not pool these studies alongside quasi-experimental studies with a high risk of selection-bias.
We also estimated stratified meta-analyses for quasi-experimental studies focusing on women's family-size decision-making and women's mobility, respectively. We found large and positive pooled effects of studies with a high risk of selection bias on women's family-size decision-making (SMD=0. sq=0.08, I-sq=83%; evidence from 4 studies) (Figure 11.7 in Appendix 11). However, the results are likely to be biased because the estimates are significantly larger than the impact estimates of Pitt et al. (2006), the only quasi-experimental study with a medium risk of selection bias that focuses on family-size decision-making (SMD=0.06, 95% CI=-0.04, 0.15). Analysis of studies of effects on women's mobility, of which only quasi-experimental studies with a medium risk of selection bias were available, suggested positive and statistically significant effects (SMD=0.18, 95% CI=0.06, 0.31; Q=7, Tau-sq=0.01, I-sq=71%; evidence from 3 studies) (Figure 4.9). Interestingly, all RCTs included in the meta-analysis focused on women's bargaining power over family-size decision-making. Almost all quasi-experimental studies focused only on women's mobility. Only the study of Pitt et al. (2006) presented a weighted average estimate for women's social mobility and family-size decisionmaking. The difference in emphasis between RCTs and quasi-experimental studies might explain why randomized controlled trials tend to show a larger effect on social empowerment than quasi-experimental studies with a medium risk of selection-bias.
Thus, we interpret this finding as suggesting that SHGs have a larger impact estimate on family-size decision-making than on women's mobility. The larger effect on family-size decision-making was also illustrated by one study which assessed within-study impacts on both women's family-size decision-making power and women's mobility, finding positive effects on women's family-size decision-making but no evidence for positive effects on women's mobility (Pitt et al., 2006).

Figure 4.12: Effects of women's self-help groups on women's family-size decisionmaking
Our analyses also suggested that training in SHGs might have stronger effects on women's family-size decision-making power than on women's mobility. For familysize decision-making we found that the effect sizes of studies that focus on SHGs with a training element were substantially and statistically significantly higher than the effect sizes of studies without a training element (β=0.38; 95% CI=0.19, 0.57). Additional meta-analyses suggested that the effect size of SHGs on family-size decision-making was positive and statistically significant at the 95 per cent significance level when we excluded studies without a training component (SMD=0.41, 95% CI=0.19, 0.63; Q=3, Tau-sq=0.02, I-sq=41% ; evidence from 3 studies) (Figure 11.8 in Appendix 11). At the same time the effect sizes remained heterogeneous ranging between -0.23 and 0.49 SMD, suggesting that the type of training was important. Unfortunately, however, the included studies did not present much detail on the type of training. Thus, we have to remain careful in the interpretation of the effects of training in SHGs on women's family-size decisionmaking power.
For mobility, evidence from meta-regression suggested a counter-intuitive finding, namely that SHGs with a training component had a lower effect on mobility than studies without a training component (β=-0.15; 95% CI=-0.03, -0.27). However, this finding is driven entirely by a single study - Pitt et al. (2006) is the only study with an emphasis on the effects of SHGs without a training component on women's mobility (SMD=0.29, 95% CI=0.19, 0.38). Furthermore, the findings are counterintuitive hence we are careful in interpreting them. Figure 11.9 (Appendix 11) presents the meta-analysis for the effect of SHGs on women's mobility for studies with an emphasis on SHGs with a training component. The results show an average effect size of 0.14 SMD that is statistically significantly different from zero at the 95 per cent confidence level (SMD=0.14, 95% CI=0.06, 0.21; Q=3, Tau-sq=0.00, I-sq=0%; evidence from 2 studies).
The findings suggested that women's mobility and women's family-size decision making should not be considered as part of the same construct. Thus, in summarizing the findings the effects of SHGs on social empowerment, we separate women's mobility and family-size decision-making (Tables 4.8 and 4.9). The included studies with an emphasis on social empowerment only included one study with an emphasis on social empowerment that was not included in the metaanalysis. This paper did not report an effect size but found positive effects on women's mobility (Husain et al., 2010), consistent with the meta-analysis.
Furthermore, the distribution of effect sizes in the meta-analysis gives some indication for a relationship between contextual characteristics and the impact of women's self-help groups on women's family-size decision-making power. We analyze this distribution of effect sizes more carefully using a narrative analysis because our sample size did not allow for a stratified meta-analysis or metaregression. The study in Ethiopia showed the least convincing evidence for positive effects on women's family-size decision making power (Desai & Tarozzi, 2014). At the same time the self-help group in Rajasthan, India, showed strong effects on women's family-size decision-making (Desai & Joshi, 2012). The results suggest there may be a difference in the effects of self-help groups on women's social empowerment across regions. We will further explore this mechanism in the qualitative analysis.

Political Empowerment
We were able to include 23 quantitative studies which estimated the effects of women's self-help groups on women's political empowerment. Of these, three included an estimate of women's political empowerment resulting from SHGs that we were able to include in our meta-analysis. However, we only included two effect sizes because including the study of Swendeman et al. (2009) may result in bias due to the high risk of selection-bias. Furthermore, we were unable to determine the effect size for one study estimating the impact of SHGs on women's political empowerment (table 4.10). For our meta-analysis to determine the effects of women's self-help groups on political empowerment, we decided to pool one RCT and the quasi-experimental evaluation with a medium risk of selection bias for which we were able to estimate the effect size in one meta-analysis (Pitt et al., 2006;Desai and Joshi, 2012). We pooled these studies because our previous analyses to determine the effects of SHGs on economic and social empowerment suggest that studies with a low-or medium risk of selection bias can be pooled in one meta-analysis without biasing the results. We did not include the study of Swendeman et al. (2009) with a high risk of selection-bias in our meta-analysis because the evidence from the meta-analysis on economic and social empowerment indicated that studies with a high risk of selection-bias have an upward bias. Although we were not able to gain a nuanced understanding of the impacts of women's self-help groups based on the two studies that we were able to include in meta-analysis, the results suggested that women's self-help groups have a positive effect on women's political empowerment. The average effect of women's self-help groups on political empowerment was estimated as 0.19 standard deviations (SMD=0.19, 95% CI=0.01, 0.36; Q=3, Tau-sq=0.01, I-sq=71%; evidence from 2 studies) (Figure 4.13). The limited number of studies did not allow for sensitivity analysis.

Figure 4.13: Effects of women's self-help groups on political empowerment
The study of Swendeman et al. (2009) also finds positive effects of SHGs on women's political empowerment. Although this study was not included in our metaanalysis because of the high risk of selection-bias, the positive effect on women's political empowerment is consistent with the findings from our meta-analysis.
The study of Deininger and Liu (2009)  Liu, 2013) did not include a focus on political empowerment. It appeared as if the political empowerment variable from the working paper included elements of social empowerment. Therefore, the results of the paper, which reported positive effects on political empowerment, were not included in our meta-analysis. Nonetheless, the positive effects are consistent with the findings of the meta-analysis.

Psychological Empowerment
Finally, we synthesized the effects of women's self-help groups on women's psychological empowerment across studies, including adverse effects. Of the 23 included quantitative studies, three included an impact estimate on women's psychological empowerment that we were able to include in our meta-analysis. However, we only included two studies in our meta-analysis because including the study by Swendeman et al. (2009) may result in bias due to the high risk of selection-bias. (Table 4.11). As in the meta-analysis for political empowerment, we pooled RCTs and quasiexperimental studies with a medium risk of selection-bias in our meta-analysis for psychological empowerment Kim et al., 2009), but we did not include studies with a high risk of selection-bias (Swendeman et al., 2009). Although our meta-analysis is only based on two studies, the forest plot in Figure  4.14 indicated that there is major heterogeneity in the effects of women's self-help groups on psychological empowerment (SMD=0.02, 95% CI=-0.21, 0.26; Q=1, Tau-sq=0.00, I-sq=0%; evidence from 2 studies). One study in India did not find positive effects on psychological empowerment . A second study in South Africa demonstrated a large point estimate, but the sample size was too small and consequently the confidence interval too wide to derive strong conclusions regarding the effect of women's self-help groups on psychological empowerment (Kim et al., 2009). Arguably, there is no evidence for positive effects of SHGs on psychological empowerment based on the studies we included in our meta-analysis.

Intimate Partner Violence and Other Potential Adverse Effects
We also synthesized the adverse effects of women's self-help groups with a strong focus on intimate partner violence. Of the 23 included quantitative studies, three included an impact estimate on intimate partner violence that we were able to include in our meta-analysis, and one estimated the impact on partner violence, but we were not able to determine the effect size of this study (Table 4. 12). However, as in our previous meta-analyses with few studies we do not include studies with a high risk of selection-bias in the meta-analysis (Ahmed, 2005).  Pronyk et al. (2006) Our theory of change suggests that women's self-help groups might have adverse consequences, in the sense that domestic violence could increase as a result of participation in women's self-help groups. However, the meta-analysis of the effects of women's self-help groups on attitudes toward domestic violence, did not show evidence for adverse effects of women's self-help groups on attitudes toward domestic violence (Figure 4.15). As in our meta-analyses for political and psychological empowerment we only pool RCTs and studies with a medium risk of selection-bias in our meta-analysis Kim et al., 2009). Although the point estimate suggests a positive effect of SHGs on positive attitudes towards domestic violence the relationship is not statistically significant (SMD=0.07, 95% CI=-0.06, 0.20; Q=0, Tau-sq=0.00, I-sq=0%; evidence from 2 studies). Arguably, our meta-analysis did not allow for a nuanced understanding of the effect of women's self-help groups on intimate partner violence, because we only found two studies with a low or medium risk of selection-bias that could be included in the meta-analysis. More rigorous evidence about the effect of women's self-help groups on intimate partner violence is clearly needed. However, at this moment our meta-analysis does not show evidence for adverse effects of women's self-help groups via a contribution to intimate partner violence. In addition to the studies we included in our meta-analysis, two other studies also presented impact estimates on intimate partner violence. First, Ahmed (2005) presented evidence for an adverse but non-significant effect of women's self-help group membership on the likelihood of female respondents having encountered violence. This study was not included in the meta-analysis because of the high risk of selection-bias. Second, Husain et al. (2010) presented findings that suggested a negative effect of women's self-help group membership on women's tolerance of domestic violence. However, we were not able to estimate the effect size of this study because point estimates were not reported. Furthermore, the study was rated as having a high risk of selection-bias.
Our included studies only contained one study that focused on other adverse consequences of women's self-help groups. De Hoop et al. (2014) argued that, on average, women's self-help groups might not have adverse consequences for subjective well-being or happiness, but, at the same time, they found strong negative effects on happiness of women's self-help group members in relatively conservative areas. De Hoop et al. (2014) argued these negative effects occurred because of social sanctioning of women who show autonomous behavior and because of the internal psychological struggles of women who are autonomous in a patriarchal context where this is not considered appropriate behavior for a woman. The absence of average negative effects in the full sample and the strong negative effects in areas with relatively conservative gender norms indicate that adverse consequences of women's self-help groups may be clouded by heterogeneities in the impact estimates. Alternatively, negative effects may also be underreported or researchers may not focus on collection of data on adverse outcomes. We have to be cautious in interpreting this result, however, because the finding was based on only one study with a medium risk of selection bias and a high risk of spillovers. Thus, the findings of the study might not be internally or externally valid, although they were supported by qualitative accounts of women's empowerment trajectories reported in the same study. Table 4.13 summarizes the results of all meta-analyses with an emphasis on political and psychological empowerment or intimate partner violence.

Publication bias assessment
We relied on funnel plots to determine the potential for publication bias of studies that focused on economic and social empowerment. As discussed above, the number of studies that focused on political and psychological empowerment was not sufficient to determine the potential for publication bias of studies that focused on these topics. For social empowerment we decided not to test for publication bias for women's family-size decision making power and mobility separately, despite the fact that our meta-analysis suggests that these two empowerment components can be considered different constructs, because this would have resulted in a strong reduction of statistical power to reject the null hypothesis of no publication bias. Figure 4.16 presents a funnel plot for studies that focused on economic empowerment with a low or medium risk of selection-bias. The basic idea of a funnel plot is that publication bias is most likely when the effect sizes of studies do not follow a normal distribution. As can be clearly seen in the figure, the effects on economic empowerment are not normally distributed. Instead, it appears as if the results are skewed to the right. Hence, the funnel plot suggests that there might be publication bias in the studies that estimated impacts on economic empowerment. For social empowerment we find a similar pattern with results skewed to the right. Thus, there may also be publication bias for impact evaluations that focus on the effects of women's self-help groups on social empowerment. Funnel plots can be interpreted in multiple ways, however, so we should be careful in interpreting the figure. We can only say there is potential for publication bias in the impact estimates on economic empowerment. We formally tested for the potential of publication using Egger's test. For both economic and empowerment we found no formal evidence for publication bias based on this test. For economic empowerment the point estimate for publication bias is positive but the results are not statistically significant (β=2.32, S.E.=1.58, p=0.20). For social empowerment the Egger test indicated no evidence for publication bias (β=0.25, S.E.=1.33; p=0.86). Hence, although there are indications of publication bias in the studies that focused on economic empowerment we found no formal evidence for publication bias based on the Egger test.
Nevertheless, our risk of bias assessment did present some evidence for publication bias. For example, we found two studies that did not report point estimates because the results were not statistically significant (Mahmud, 1994;Steele et al., 1998). This indication of outcome and analysis reporting biases may indicate that the positive impacts we found could be slightly overestimated. Similarly, only a few of our studies (Ahmed, 2005;Husain et al., 2010;Kim et al., 2009) assessed adverse consequences of SHGs. The relatively low number of studies focusing on adverse consequences may indicate reporting bias. However, despite the potential for outcome and analysis reporting bias, we did not find evidence for differential effects for studies with high outcome and analysis reporting bias.

SYNTHESIS OF QUALITATIVE STU DIES
The meta-ethnographic analysis of the qualitative studies focused on women's explanations of empowerment outcomes (review objective 2). The 11 studies included in the qualitative analysis came from SHGs in South Asia (Bangladesh, India and Nepal), Bolivia and Tanzania. Table 4.14 and Table 4.15 summarize the findings from the qualitative studies after relying on the meta-ethnographic approach. The following descriptions of the four major outcome categories (economic, social, political, and psychological empowerment) emerged from women's accounts of their self-help group experiences from the 11 contributing studies. A table of additional quotes for each theme is available in Appendix 12.

85
The Campbell Collaboration | www.campbellcollaboration.org   Dahal, 2014, Nepal Participation in Household Decisions: Women discussed the process of gaining acceptance from husbands and in-laws to participate in SHGs. Then, over time, they described gaining respect from husbands and extended family for their contributions and became part of the household decision-making.
"After two years, they [husband and in-laws] understood the value of the women's groups and remained silent." Ramachandar, 2009, South India Dahal 2014, Kabeer 2011, Kumari 2011, Mercer 2002, Ramachandar 2009, Sahu 2012 High Thick data from 6 studies; 5 from South Asia, 1 from Tanzania; quality was high for 2 and medium for 4.
"Being allowed to have money and decide on how to spend it has brought us development in our household and now husbands give us the freedom to do our own things. " Mercer, 2002, Tanzania Impact on Domestic Disputes: Women reported that their participation had an effect on domestic disputes and violence including both verbal and physical abuse. Women reported an initial increase in disputes or violence but that they eventually gained respect from husbands and in-"My husband used to beat me when I became a member of the sangha. He used to manhandle me when I returned home from the meetings. His parents instigated him to beat me. But I stood in silence and today he dare not touch me." Ramachandar, 2009, South India Dahal 2014, Kabeer 2011, Kilby 2011, Knowles 2014, Kumari 2011, Mathrani 2006 Thick data from 8 studies; Only from South Asia; quality was high for 5 and medium for 3.

88
The Campbell Collaboration | www.campbellcollaboration.org laws by bringing in income to the household and that they fought less with their husbands. They also reported that their SHGs took action against domestic violence in their communities.
"You cannot come drunk and batter me, my SHG will question you if you touch me, you should be prepared to answer them." Kumari, 2011, South India Ramachandar 2009, Sahu 2012 Social Empowerment Improved Networking: Women SHG members had the confidence to work with local authorities, village leaders, and law enforcers to make positive changes in their communities. These experiences emboldened the women to address authorities when a social issue came up or when they needed support for a community development project. This was a profound change from being confined to the domestic sphere and speaking only to family and close neighbors.
"SHG members complain if a tap is broken or if there is stagnant water ... they bring this to the panchayat [village leader] president's attention issues in the community ... if they have other difficulties they go to government officials now." Knowles, 2014, South India Kabeer 2011, Kilby 2011, Knowles 2014, Kumari 2011, Mathrani 2006, Pattenden 2011, Sahu 2012 High Thick data from 7 studies; only from South Asia; quality was high for 4 and medium for 3.
"The women themselves insisted on dealing with the tractor owners directly and 'held out' for three weeks before the tractor owners agreed to deal with the women directly. It was the close interaction with staff at all levels, which gave the women the confidence to deal with higher caste village people in this way." Kilby, 2011, South India Solidarity: Women reported feeling mutual support within their groups and feeling as though they could speak as a collective voice. A sense of solidarity enabled women to make meaningful decisions and to enact positive change in their lives and communities.
"One stick can be broken, a bundle of sticks cannot. It is not possible to achieve anything on one's own. You have no value on your own. Now if I am ill, my [SHG] members will look after me." Kabeer, 2011, Bangladesh Dahal 2014, Kabeer 2011, Kumari 2011, Mathrani 2006 Moderate Thick data from 4 studies; only from South Asia; quality was high for 2 and medium for 2.
"If we disapprove of something, we are able to express our opinions to the larger community as we have a collective voice." Mathrani, 2006, South India 89 The Campbell Collaboration | www.campbellcollaboration.org Community Respect: Women described walking confidently through their villages and feeling respected by their peers and their leaders. They expressed feeling that they were no longer solely housewives but community actors who had influence over their village politics.
"The society's view upon being a SHG member has changed. Before it was against the social norms to go out of a house but now society praises women who are involved in SHGs." Dahal, 2014, Nepal Dahal 2014, Kabeer 2011, Kumari 2011, Sahu 2012, Ramachandar 2009 High Thick data from 5 studies, Only from South Asia; quality was high for 2 and medium for 3.
"The biggest benefit of the [SHG] is that we get prestige and honour in our community; we gain experience going to the bank and meeting with officials." Ramachandar, 2009, South India

Economic Empowerment
Financial Skills: Women reported feeling empowered by the newness of handling money. Many of the women had never participated in the buying and selling of goods and had never been allowed to manage the household accounting.
With the new access to credit, women were suddenly in the role of the money manager. Women reported that they gained a sense of self-reliance as a result of having access to money, making decisions about buying and selling, and completing transactions with that money "The fear of handling money is gone." Kumari, 2011, South India Dahal 2014, Kabeer 2011, Kumari 2011, Maclean 2012, Mercer 2002, Ramachandar 2009, Sahu 2012 High Thick data from 7 studies across regions (5 from South Asia, 1 from Tanzania and 1 from Bolivia); quality was high for 2 and medium for 5. "Being allowed to have money and decide on how to spend it has brought us development in our households and now husbands give us the freedom to do our own things. " Mercer, 2002, Tanzania Financial Inexperience: Being in charge of finances was a new experience for most women and the women reported feeling unsure about their financial decision making abilities. Some of the SHGs offered training around such topics as income generation and savings. But because women were making decisions in front of their community members, they felt there was a great deal at stake to make sound choices.
"The interest rate is really high. Don Pedro-my husband-tells me off: 'Why are you just working for that [the credit]. You're just working for the bank, and the interest is really expensive!'" Maclean, 2012, Bolivia Knowles 2014, Kumari 2011, Maclean 2012, Mathrani 2006, Pattenden 2011, Ramachandar 2009 Moderate Thin description from 6 studies from 2 regions (5 from South Asia and 1 from Bolivia); quality was high for 4 and medium for 2.
"The men say, 'What kind of structure have these women constructed? They are like monkeys, if we hit their home it will collapse.'" Mathrani, 2006, South India 90 The Campbell Collaboration | www.campbellcollaboration.org

Political Empowerment
Catalyzing Social Action: Women described their participation in a SHG as a "stepping stone" toward wider social participation but not necessarily a political act in itself. Women reported that participation in SHGs did expose them to the concept of women's rights through participation in social activities and it did give them political capital the ability to speak out on political issues such as accountability. Women reported that some members of SHGs went on to become local political leaders.
"In the previous election, the MLA candidate had promised to build a road but he did not. When he came for campaigning this time, we questioned him for not keeping his promise and we didn't vote him either." Sahu, 2012, South India Dahal 2014, Kabeer 2011, Kilby, 2011Knowles 2014, Kumari, 2011Mathrani 2006, Sahu 2012 High Understanding the Political Context: SHG members were able to identify the limits to their "empowerment" and described SHGs facing barriers to affecting change in their community through even small political acts. The context within which groups operated "restricted the capacity for political action." Women talked about feeling that awareness of rights was only an important first step and they still had a long way to go before women gained property and reproductive rights. Women agreed that their domestic role of women was still primary.
"Empowerment? There has not been complete empowerment. More factors are needed like equal wages. I would say that only 5 to 10% of empowerment has happened." Kumari, 2011, South India Kabeer 2011, Kumari 2011, Mathrani 2006, Pattenden 2011, Ramachandar 2009 Moderate Thin description from 5 studies all from South Asia; quality was high for 3 and medium for 2.
"Women are still tethered to domesticity and men still regarded women as below them." Kabeer, 2011, South India

Adverse Outcomes
Barriers to Participation: Women described barriers to participation specifically for marginalized groups such as lower castes or the very poor. This finding is likely underreported "Some women don't join because they feel inferior, they think that members are rich, can afford things and can be close to the Church, they are in good positions. " Mercer, 2002, Tanzania Dahal 2014, Knowles 2013, Mercer 2002, Mathrani 2006 Moderate Thin description from 4 studies from two regions (3 from South Asia and 1 from Tanzania); quality was high for 3 and medium for 1.

91
The Campbell Collaboration | www.campbellcollaboration.org because studies focused on the narratives of participants versus non-participants.
"The issue of selection bias can be agreed to a certain extent acknowledging to the fact that very poor people cannot afford the membership fee and enough time for group activities." Dahal, 2014, Nepal Disappointment: Women described a degree of disappointment when their groups did not deliver on perceived promises such as solving social problems in their villages like alcoholism. Another source of disappointment occurred when women gained new awareness about rights but were not able to enact them or when their group took on new responsibilities but in the end did not have the authority or financial power to make changes.
"Other women are discouraged because it is almost four to five years since we contributed the money for the cows and up to now we haven't seen any good profit. " Mercer, 2002Tanzania Dahal 2014, Kabeer 2011, Kumari 2011, Maclean 2012, Mercer 2002 Moderate Thin description from 5 studies from 3 regions (South Asia, Bolivia and Tanzania); quality was high for 1 and medium for 4.
"SHGs operate at very low cost, have a small fund, raise little interest so we cannot accomplish bigger projects and this is our weakness." Maclean, 2012, Bolivia Mistrust

92
The Campbell Collaboration | www.campbellcollaboration.org stoned for membership or SHG women were seen as trouble-makers accused of trying to take over the local council.
The men used to make comments such as, these women are doing "tamasha" (showing off) and they are going to close down our sangha after a few days. But we did not worry about those comments." Ramachandar, 2009, South India

Psychological empowerment
In contrast to the quantitative literature, much of the qualitative literature on individual-level empowerment focuses on self-confidence and self-esteem, and suggests women participating in SHGs feel psychologically empowered. The 11 contributing qualitative studies included in this review suggest specific aspects of individual-level change which were experienced by women self-help group members.
Agentic voice: One of the dominant themes from six studies is that women self-help group members reported feeling more capable of speaking in front of others. First, women experienced this by speaking in front of their peers at their group meetings.

Participation in household negotiations:
Another emergent theme involved intrahousehold dynamics, which was mentioned in six studies (Dahal, 2014;Kabeer, 2011;Kumari, 2011;Mercer, 2002;Ramachandar & Pelto, 2009). At first, women reported the process of gaining acceptance from husbands and in-laws to participate in SHGs. Furthermore, women described gaining respect over time from husbands and extended family and becoming decision-makers within their households following their membership in SHGs.
Domestic disputes: Women in eight studies reported how their participation in SHGs had contributed to domestic disputes and violence including both verbal and physical abuse (Dahal, 2014;Kabeer, 2011;Kilby, 2011;Kumari, 2011;Mathrani & Pariodi, 2006;Ramachandar & Pelto, 2009;Sahu & Singh, 2012). Women from three studies reported an initial increase in disputes or violence but said that they eventually gained respect from husbands and in-laws by bringing in income to the household. These women also reported fighting less with their husbands (Kumari, 2011;Mathrani & Pariodi, 2006;Ramachandar & Pelto, 2009). In two other studies women reported that they experienced a decrease in disputes and conflict between husbands and wives (Knowles, 2014;Sahu & Singh, 2012). In all eight studies, women described how SHG members put social pressure on men to stop beating wives and would show up in groups to support women who had been beaten. The interviewed women felt these activities decreased domestic violence in their communities.

Social empowerment
The literature around empowerment talks about social capital accumulation as a result of participation in SHGs. We found three main themes that emerged within the context of social capital that explain this phenomenon in more detail.
Networking: An important theme discussed by women in seven studies was that not only were women SHG members more confident speaking in front of others, but the women also felt comfortable working with local authorities, village leaders, and law enforcers to make positive changes in their communities (Kabeer, 2011;Kilby, 2011;Knowles, 2014;Kumari, 2011;Mathrani & Pariodi, 2006;Pattenden, 2011;Sahu & Singh, 2012). Women's perceptions suggest that these experiences emboldened them to address authorities when a social issue came up or when they needed support for a community development project.
For these women SHG members, networking experiences represented a profound change from being confined to the domestic sphere and speaking only to family and close neighbors. In one group in India, women had to negotiate with formal banking institutions. The women reported that these institutions at first refused to give them loans, but the women went up the chain of authority to the national bank for rural development and their loans were released (Kumari, 2011).
Women suggested that this type of networking was useful in getting small projects completed, and, in four studies, women report that they capitalized on relationships and progressed from holding leadership positions with their groups to holding leadership positions within the community (Knowles, 2014;Kilby, 2011;Kumari, 2011;Mathrani & Pariodi, 2006).
Solidarity: Another important theme was the empowerment that came from group solidarity. Women's experiences suggested that knowing that their group is supporting them enabled women to make meaningful decisions and to enact positive change in their lives. This boldness to make change as a result of solidarity was reported with respect to situations within the household or the extended family. Four studies reported on women's perspectives about group solidarity (Dahal, 2014;Kabeer, 2011;Kumari, 2011;Mathrani & Pariodi, 2006).
The boldness of women was particularly strong when women talked about how their husbands treated them. Women in three studies (Kabeer, 2011;Kumari, 2011;Mathrani & Pariodi, 2006) reported feeling that they now had recourse from the group for husbands who committed such acts as domestic violence and heavy drinking.
Community respect: Similar to this sense of solidarity that was apparent, women reported feeling that being a part of their SHG gave them clout within their communities in five studies (Dahal, 2014;Kabeer, 2011;Kumari, 2011;Sahu & Singh, 2012;Ramachandar & Pelto, 2009). Women described walking confidently through their villages and having the courage to approach authorities in a group whereas before they had not felt this way. The women felt more able to participate in community decision-making, and they felt respected by their peers and their leaders. The women were no longer solely housewives but community actors.

Economic Empowerment
Financial skills and independent decision-making: A theme across seven studies was that women reported feeling empowered by the newness of handling money. Many of the women had never participated in the buying and selling of goods and had never been allowed to manage the household accounting before their SHG membership. With the new access to credit following SHG membership, women were suddenly in the role of the money manager. Although the learning curve was steep for some, most women reported they gained a sense of self-reliance as a result of having access to money, making decisions about buying and selling, and completing transactions with that money (Dahal, 2014;Kabeer, 2011;Kumari, 2011;Maclean, 2012;Mercer, 2002;Ramachandar & Pelto, 2009;Sahu & Singh, 2012.). One interesting finding from two studies was that women stated that they were putting money aside specifically for their daughters' education (Mathrani & Pariodi, 2006;Sahu & Singh, 2012).
Financial experience and handling money: Because handling money was a new experience for most women, women in six studies reported feeling unsure about their financial decisions (Knowles, 2014;Kumari, 2011;Maclean, 2012;Mathrani & Pariodi, 2006;Pattenden, 2011;Ramachandar & Pelto, 2009). Some of the SHGs offered training around such topics as income generation and savings. But because women were making decisions in front of their community members, they felt there was a great deal at stake to make sound choices.
In three self-help groups, women reported not feeling prepared to make certain financial decisions related to their individual or group projects (Maclean, 2012;Mathrani & Pariodi, 2006;Ramachandar & Pelto, 2009). In one SHG in Bolivia, the women reported that men saw their participation in the SHG as foolish because they were not knowledgeable enough with money to be able to benefit from microfinance services (Maclean, 2012). In another SHG in India, the community was initially discouraging and ready to scorn at any misstep (Mathrani & Pariodi, 2006). But in this case, women reported using the public embarrassment to generate greater determination to fix the construction and build a stronger structure. The women reported spending considerable time researching building materials and using their networking skills to find proper builders and building materials in order to redo their community center.

Political empowerment
Catalyzing broader social action: In seven studies, women described their participation in a SHG as a "stepping stone" (Mathrani & Pariodi, 2006) toward wider social participation but not necessarily a political act in itself. Participation in SHGs did expose the women to women's rights through participation in social activities and it did give them political capital through networking (Kumari, 2011;Dahal, 2014) and encouraged them to speak out on political issues such as transparency and accountability (Knowles, 2014;Sahu & Singh, 2012). In addition, women who go on to participate in local village government indicate that participating in SHGs provided the support and grounding for them to be able to take leadership positions in government (Kilby, 2011).
Understanding political context: In three settings, women talked about understanding what they could and could not change in their communities. Women were able to identify barriers to affecting change in their community through even small political acts (Mathrani & Pariodi, 2006;Pattenden, 2011;Ramachandar & Pelto, 2009). The context within which groups operated "restricted the capacity for political action" (Pattenden, 2011, p.483). In two other settings, women reported that the gradual acceptance by husbands and community member gave way to broader acceptance and respect, which lent strength to their political efforts (Mathrani & Pariodi, 2006;Ramachandar & Pelto, 2009).
But in one case, women reported that changing the status of women in their society was not their priority and not on their stated agenda (Mathrani & Pariodi, 2006). In this case it appeared as if women SHG members remained focused on poverty reduction through income-generation and community development-not directly challenging gender norms or women's status in society. The author of the study reported that things like networking and household decision-making constituted micro-political processes. The author suggests that in this specific case SHG participation did not change the station in life of women: the women were still "tethered to domesticity" and men still regarded women as below them (Kumari, 2011).
In one case, women talked about feeling more aware of their rights but awareness was only an important first step and the women still had a long way to go before women gain property and reproductive rights and the domestic role of women was still primary (Kabeer, 2011).
And as Kabeer (2011) stated when discussing this theme observed in the data from her study: "In social terms, marriage is still the only conceivable pathway to full adulthood for women, particularly in rural areas. In economic terms, it marks the necessary transition from their dependence on fathers to dependence on husbands and ultimately on sons. On both counts, women had a strong stake in shoring up rather than undermining the institution, however abusive the relationships involved" (p. 519).

Adverse Outcomes
Barriers to Participation: Three studies reported that women talked about barriers to participation including economic and social standing (Dahal, 2014;Mercer, 2002;Mathrani & Pariodi, 2006). Specifically, lower class women were excluded from "high class" SHGs and lower caste members were not allowed to mix into upper caste groups due to discrimination. In Tanzania, women reported that wealthier women were more likely than poor women to join SHGs. Women's perceptions suggest that to poor women, the SHG was a status symbol and served to reinforce the idea that wealthier or less poor women had more access to financial services, social capital, and community respect than poorer women (Mercer, 2002). In India, issues of caste and religion came up in terms of participation and groups of the same caste joined together to avoid conflict. But due to limited funding, some groups of the same caste had to wait or did not get funding for their SHG (Mathrani & Pariodi, 2006).

Disappointment:
Five studies reported that some women felt a degree of disappointment when their groups did not deliver on perceived promises such as solving social problems in their villages such as alcoholism (Mercer, 2002) and challenging cultural norms (Dahal, 2014;Kabeer, 2011). Another source of disappointment occurred when women gained new awareness about rights but were not able to enact them (Kumari, 2011) or when their group took on new responsibilities but in the end did not have the authority or financial power to make changes (Maclean, 2012).

Mistrust and Corruption:
In three studies, women reflected on negative experiences about mistrust and corruption of their group or stories about corruption in other groups particularly stories of leaders stealing group funds (Knowles, 2014;Maclean, 2012;Dahal, 2014).
Stigma: Membership in two SHGs had negative associations and women reported facing public shame or discrimination especially during the formation of the groups. This experience of discrimination was reported much less than experiences of increased respect by community member. But importantly, women reported hearing stories of women being stoned for membership (Pattenden, 2011) or SHG women were seen as trouble-makers accused of trying to take over the local council (Mathrani & Pariodi, 2012).

INTEGRATED S YNTHESIS
The quantitative synthesis suggests that SHGs have positive effects on women's economic, social, and political empowerment ranging from 0.06-0.41 standard deviations. We did not find quantitative evidence for positive effects of SHGs on psychological empowerment. However, we found that women perceive positive contributions of SHGs to psychological empowerment in the synthesis of the qualitative research. Thus, either the quantitative studies do not adequately measure psychological empowerment or the women's perceptions are biased due to various cognitive biases, such as the fundamental error of attribution or the tendency for people to attribute changes to programs rather than contextual characteristics (White & Phillips, 2012).
The quantitative evidence does not suggest strong adverse impacts of self-help groups on indicators such as disappointment, stigma, or domestic violence, although we were only able to meta-analyze the impact of self-help groups on domestic violence. Findings from the qualitative research suggest that women perceive SHGs as having the potential to reduce domestic violence as a result of some combination of the following: 1) improved economic stability, 2) increased respect of wives by husbands, 3) increased self-confidence of women, 4) exposure to human rights and gender training and 5) enforcement from SHG members to reduce violence within households. These perceptions on domestic violence were one of the strongest themes drawn from eight contributing qualitative studies, although these studies were all conducted in South Asia. However, the quantitative meta-analysis examined the effects of women's self-help groups on attitudes toward domestic violence and the results neither showed evidence for adverse effects of women's self-help groups on attitudes toward domestic violence nor evidence for the potential of SHGs to reduce domestic violence. Thus, we need to be careful in interpreting this result. Nonetheless, our findings certainly do not suggest that there is evidence for increasing the likelihood 0f domestic violence for SHG participants.
Furthermore, self-help groups may have stronger effects on economic empowerment and women's family-size decision-making power when the self-help group includes a training component. However, we should be careful in the interpretation of this finding because both quantitative and qualitative studies present insufficient details about the contents of the training in SHGs. In the quantitative studies, health education training and training on business and entrepreneurial skills were the most prevalent, but we were not able to distinguish between the effects of different types of training in a meta-analysis because of the limited number of studies.
Although the quantitative analysis did not allow for a rigorous identification of contextual moderators of heterogeneous effects, the qualitative synthesis suggests various reasons for why women do not experience empowerment as a result of women's self-help groups under all circumstances. The first barrier toward an empowering experience resulting from self-help groups that was identified by women SHG members is a barrier to participation in self-help groups. Women SHG members suggest that the poorest of the poor, lower caste members or other marginalized groups may not always have the possibility to participate in SHGs. This perception of women SHG members suggests that the theory of change underlying self-help group interventions we proposed should start even before female participation in economic or livelihood self-help groups. Several assumptions need to be fulfilled before women even start participating in self-help groups. However, we should be careful in interpreting the result about participation because of the limited potential of qualitative studies to determine causal effects. This finding, nonetheless, reinforces the call of De Hoop and Menon (2014) to more systematically analyze participation in development programs. They argue that: …while implementing a women's self-help group programme, it would be important to consider the possibility that information about the existence of the programme may not reach potential participants. It is also likely that the women may consider attendance in meetings to have a big opportunity cost. They may have to give up several hours of work on their farm to attend a selfhelp group meeting. Assumptions about participation may also run counter to what a woman is able to do in her community. Women may have to break gender related social norms to attend this meeting unaccompanied by their spouse or male relative. (De Hoop & Menon, 2014) The latter argument relates to a different conclusion from the qualitative research.
Here, it is interesting to see that women's perspectives suggest that women in Bolivia and Tanzania encounter more resistance from the community when they participate in self-help groups than women in South Asia. Women's perspectives from South Asia suggest that the initial resistance of other community residents to participation of women in self-help groups and the resulting empowerment process fades out after women are self-help group members for a longer amount of time. This finding suggests that the maturity of self-help groups might be an important additional moderator for achieving effects on women's empowerment. With respect to social empowerment, the quantitative evidence suggests this may be true. We found stronger effects of women's self-help groups on women's family-size decisionmaking power in the context of India, where self-help groups are well-established, than in the context of Ethiopia, where self-help groups are less well-established. However, we have to exercise caution in interpreting this finding because there may be factors that confound this result such as reporting bias and cross-cultural misinterpretation. In addition, the number of studies that discuss backlash from the community is relatively small.
In general, qualitative studies do not give sufficient attention to the identification of causal effects and quantitative studies do not emphasize enough the importance of potential moderators in the design of SHG programs. Too often qualitative studies present information about the empowering experience of women in self-help groups without focusing attention on issues like self-selection in self-help groups. At the same time, our meta-analysis suggests that self-selection in self-help groups complicates counterfactual analysis tremendously. Studies with a high risk of selection bias overestimate the impact of self-help groups relative to studies with a medium or low risk of bias. Furthermore, quantitative studies do not present sufficient detail regarding the specifics of the designs of SHGs. This lack of detail complicates the analysis of moderating effects in the meta-analysis tremendously. The lack of attention for causal identification in the qualitative studies and the lack of detail about the program in the quantitative studies complicate the integration of quantitative and qualitative studies.
Based on the findings discussed above and the relatively small number of quantitative studies that adequately account for selection bias, we argue that, although we are able to determine the average pooled effect of self-help groups on empowerment across studies with reasonable precision, there are not yet enough rigorous quantitative studies about self-help groups that present sufficient details about the program design to answer second-generation questions with respect to their effectiveness, such as whether self-help groups with a specific training component are more effective than self-help groups that merely provide financial services and group-support. It is still unclear how to organize self-help groups to achieve maximum impacts on women's empowerment. For example, we would need more evidence to understand what types of training result in women's empowerment.
Nevertheless, for other findings, the strength of an integrated mixed-methods review is clearly visible. For example, the quantitative evidence suggests positive effects on various dimensions of empowerment, which the qualitative evidence reinforces with its emphasis on the mechanisms of the underlying the positive effects. Here, the quantitative evidence addresses the attribution question and shows that women's self-help groups have positive causal effects on women's empowerment. The qualitative evidence presents a more nuanced understanding of how these empowerment processes might work. First, women's perspectives suggest that economic empowerment may be stimulated by giving women the opportunity to handle money. Second, women's perspectives indicate that social empowerment may be stimulated by improvements in social networks, community respect, and solidarity among women self-help group members. Third, the integration of the quantitative and qualitative evidence suggests that, although women's self-help groups may stimulate political empowerment, changing the status of women in society is not the main goal of women SHG members. Fourth, women experiences suggest women SHG members are able to speak freely in front of others in contrast to before their membership. These four mechanisms indicate that the original theory of change may miss several intermediate steps in the causal chain. Figure 4.18 depicts a revised theory of change based on our findings.

SUMMARY OF MA IN RESULTS
Our results suggest that self-help groups can have positive effects on various dimensions of women's empowerment. We found positive effects, ranging from 0.06-0.41 standard deviations, on economic, and political empowerment, as well as women's family-size decision making power and mobility, which can both be included under social empowerment. However, we did not find evidence for positive effects on psychological empowerment. These findings are based on the results of RCTs and higher quality quasi-experimental studies.
The qualitative synthesis we presented also indicates that women's perspectives suggest that self-help groups contribute positively to their empowerment. The qualitative results showed a more nuanced understanding of how women experience the phenomenon of empowerment after they enter self-help groups. Women's experiences suggested that the positive effects of self-help groups on economic, social, and political, empowerment may run through the channels of familiarity with handling money and independence in financial decision making, solidarity, improved social networks, and respect from the household and other community members. In contrast to the quantitative evidence, the qualitative synthesis of women's perceptions indicate that SHGs may contribute to psychological empowerment.
Our synthesis of women's experiences in SHGs also suggests that while participation in self-help group can initially create tension within households, especially between husbands and wives, in the long term participation in SHGs does not contribute to domestic violence. This finding is in alignment with the lack of evidence for a statistically significant effect of SHGs on the likelihood of domestic violence in our meta-analysis.
The findings on community push-back were mixed and may be context-specific. For example, De  demonstrated that push-back from conservative community members in India resulted in negative consequences for happiness or subjective well-being for women SHG members, but only in communities with relatively conservative gender norms. Women's perspectives from the qualitative synthesis also present evidence for occasional backlash from other community members. However, our synthesis also suggests that backlash is more prevalent in contexts where SHGs are less well established. Although we have to be cautious in interpreting this result because of the difficulty of establishing causal effects with qualitative research, some of the quantitative studies also presented suggestive evidence that spillovers from self-help groups may benefit the social empowerment of women residents in the community who were not themselves members of selfhelp groups. These spillovers may be more likely in settings where SHGs are more established.

OVERALL COMPLETENESS AND AP PLICABILITY OF EVIDENCE
A secondary goal of this research was to develop a new theory of change underlying self-help groups using a triangulation of research findings from the quantitative meta-analysis and the qualitative narrative synthesis. As discussed in section 4, the positive effects of self-help groups on various dimensions of women's empowerment indicate that the theory of change we presented at the beginning of this review is at least valid to a certain degree but missed several important steps. A triangulation of the quantitative and qualitative research findings further indicates that a higher level of group-based support in the form of training might contribute more to women's economic empowerment than the microfinance services of self-help groups. However, we have to be careful in interpreting this result because of the lack of details quantitative studies present about the contents of training in SHGs.
More fundamentally, our research findings also indicate that the original theory of change we presented was not complete. First, the theory of change only started at the stage where women already participate in self-help groups. But, as White (2014) argued, many development programs fail because of the low level of participation in the program or, in other words, the take-up of the program is too low. Our qualitative synthesis suggested that women perceive low participation of the poorest of the poor in self-help group programs. So self-help group programs might currently bring more benefits to a group whose members are not the poorest of the poor. Therefore, we propose to start the theory of change with potential encouragements that might be necessary to stimulate the poorest of the poor to participate in self-help groups. These incentives might be either financial, for example, by offering the opportunity to participate with no savings requirements, or nonfinancial, for example, by stimulating the husbands or mothers-in-law of the poorest of the poor to let their spouses and daughters-in-law participate in self-help group programs.
In addition, our qualitative synthesis suggests that various intermediate outcomes were missing from the original theory of change. First, women's perspectives indicate that SHGs may only contribute to psychological empowerment if women are able to gain a public voice. Second, women's perspectives indicate that women may first need to gain the skill to handle money before women can achieve economic empowerment. Third, women's perspectives suggest that SHGs contribute to social empowerment after women gain respect from community members, which potentially increases the quality of their social networks and improves solidarity among group members. Fourth, women's perspectives about their participation in SHGs suggested that they need to go through various stages of political empowerment, of which only some can be achieved with SHG membership. Women SHG members' perceptions from the qualitative research suggest that women selfhelp group members only achieved the first stage. In this stage, women became knowledgeable about their rights, but they did not directly challenge women's status in society. Importantly, however, none of the quantitative studies was able to directly test these mechanisms. Thus, we need to remain careful in the interpretation of these results from the qualitative analysis because of the potential of various biases.
With respect to adverse outcomes, our integration of the quantitative and qualitative synthesis suggests that participation in women's self-help groups is not likely to have strong adverse effects on domestic violence. We did not find evidence for positive effects of SHGs on the likelihood of domestic violence. Furthermore, women's perspectives indicate that even if SHGs contribute to domestic violence, this adverse consequence is likely to disappear in the long term.
Finally, the strong heterogeneity in the impact estimates on social empowerment and the wide range of potential mechanisms from the qualitative research indicate the theory of change needs to represent the social and political context within which women are making decisions. For example, women might not choose to become autonomous because this might result in community disapproval. Or women might not choose to participate in a self-help group because then they would no longer have the time required to conduct agricultural labor. We argue that the considerations discussed previously need to be reflected in the theory of change by adding assumptions along the causal chain from inputs to intermediate to final outcomes. First, women may need to support in introducing the purpose of SHG participation to other household members before they start participating in self-help groups. Second, women need to show demand for the financial and nonfinancial services the self-help group provides, and have sufficient time to participate in the activities of the self-help group.

QUALITY OF THE EVIDENCE
The findings of every systematic review depend on the quality of the primary studies on which the review relies. In our case, we believe both the quantitative and the qualitative studies on women's self-help groups suffered from substantial limitations with respect to their quality. However, we also believe that our risk of bias assessment for the quantitative research allowed us to distinguish clearly between the findings of studies with high, medium, and low risk of bias. The meta-analysis indeed showed that studies with a high risk of selection-bias were likely to present biased estimates on the impact of women's self-help groups on women's empowerment. For this reason, we were only able to present a meta-analysis for a small number of studies. We were not able to show strong evidence for heterogeneous effects in a large sample of studies; even though; our analysis presented clear indications for strong heterogeneities in the effect sizes.
In addition, we were not able to present a convincing meta-analysis for the effects of self-help groups on women's psychological and political empowerment. Nonetheless, the results of the meta-analysis for studies with high and medium risks of selection bias presented important evidence of the effects of self-help groups on women's empowerment.
The qualitative evidence also presented important findings with respect to the possible mediators of the effectiveness of women's self-help groups. But the lack of qualitative studies that report the empowering experiences of women in self-help groups directly from women's narratives limited our ability to more fully understand such mediators. Furthermore, several of the qualitative studies suffered from a medium risk of bias.

LIMITATIONS AND POTENTIAL BIASES IN THE REVIEW PROCESS
The limitations of this review are specific to the two types of analyses and appeared in the synthesis process to triangulate the quantitative and qualitative results. In particular, we were not able to triangulate all the research findings with respect to the qualitative synthesis in the quantitative meta-analysis. This was partly because of the small number of studies in the quantitative meta-analysis that could be considered rigorous. More importantly, however, the majority of the potential moderators in the qualitative research were not reported in the quantitative research or insufficient details were provided. Hence, we were not able to estimate the moderating effect of potential moderators identified in the qualitative research. Furthermore, although we were able to assess the moderating effect of training in the quantitative analysis, suggesting that training has positive effects on empowerment, both the quantitative and the qualitative studies did not present sufficient details about the contents of training in SHGs. Thus, we need to remain very careful in the interpretation of this result.

Limitations of quantitative data analysis
Publication bias: The results of our meta-analysis may be vulnerable to publication bias. We tested for the presence of potential publication bias by reporting funnel plots for the effects on women's social (women's family-size decision-making power and mobility) and economic empowerment and reporting the results of the Egger test. From these funnel plots, we concluded that there might be scope for publication bias with respect to the impact estimates of women's self-help groups on women's economic and social empowerment. However, the Egger test did not show formal statistical evidence for publication bias in the impact estimates of women's self-help groups on women's economic or social empowerment. So based on the funnel plots, we can merely say there was potential for publication bias in the studies that focused on women's economic or social empowerment.
In addition, the results of our both our quantitative and qualitative synthesis are heavily based on studies from India and Bangladesh. The external validity of the review may thus be limited to the context of South Asia. At the same time our results may also be most relevant for the context of South Asia because self-help groups are a more popular intervention to stimulate women's empowerment in this region than in other regions of the world.
Missing information: Unfortunately, in this review, we were not able to distinguish among the effects of different self-help group models because studies often did not report sufficient information about the specific model on which they focused. And a wide variety of self-help group models exist across regions. The Indian model was quite different from the model in Bangladesh, and even within India, a wide range of different self-help group models exist. The differences among self-help group models in South Asia and the rest of the world are even larger.
The results of our quantitative synthesis might also be biased due to the exclusion of studies from the meta-analysis for which we were not able to estimate effect sizes. We believe this risk was minimal, however. In general, the findings of the studies we were not able to include in our meta-analysis were consistent with the findings of studies that we were able to include in our meta-analysis. Further, the study findings that were not in line with the findings of the meta-analysis were generally based on studies with a high risk of selection bias. Data analysis: Unfortunately, the number of studies with only a low or medium risk of selection bias was limited. Therefore, we were able to convincingly demonstrate the effectiveness of women's self-help groups with respect to social and economic empowerment only. For the effects on psychological and political empowerment we had to mostly rely on a narrative synthesis. In addition, we were not able to convincingly demonstrate the effects of women's self-help groups on women's economic and social empowerment for subgroups using a meta-analysis. For this purpose, we again had to rely on a narrative synthesis. Finally, many studies used different outcome measures to measure the same empowerment domain. Thus, the outcome variables in our metaanalysis may not always measure the same construct. We, however, took this concern seriously as shown by our decision to separately analyze impact estimates on women's family-size decision-making power and women's mobility after our evidence suggested that these two empowerment components cannot be considered part of the same construct. Notwithstanding these limitations, we believe our systematic review presents important evidence with respect to the pooled impact estimate and the heterogeneities in that pooled impact estimate of women's self-help groups on women's economic, social, and less convincingly political and psychological empowerment.

Limitations of qualitative data analysis
Searches: Given the large scope of this review, it is possible that we have missed some articles that may have been relevant. We made a concerted attempt to find all relevant qualitative studies. But we noticed that fewer qualitative evaluations exist and even fewer make it into peer-reviewed publications. Therefore, our search strategy also emphasized the gray literature, including dissertations and unpublished reports. One of the most comprehensive qualitative studies that we included was a dissertation. The value of this piece was further emphasized by the lack of any page limitations, and therefore, the author could include full quotations from female SHG participants and cover many different themes. What appeared in the peer-reviewed literature was less comprehensive, with fewer quotations and lessdeveloped theoretical frameworks. It was unclear if this finding was representative of the lack of strong qualitative studies altogether or if there was a bias in what ends up being published versus the type of research actually conducted. Finally, because of the interdisciplinary nature of the review questions spanning public health, psychology, economics, law, and human rights, it is possible that relevant psychology reports or legal documentation did not meet the inclusion criteria of this review.
Underreporting of adverse outcomes: The included qualitative studies intended to examine changes in empowerment outcomes as stated in their research questions. As a result, qualitative researchers spoke with group members who were willing to talk about their experiences and not with women who did not want to be interviewed, who dropped-out or who did not join SHGs. In addition, researchers did not talk to men or other community members who may have different perspectives about the SHGs. As a result, it may be possible that adverse effects of SHGs were underreported in the qualitative research.
Missing information: Although the authors conducted a thorough quality assessment of each study, there are concerns that descriptions of important methodological processes were missing from many of the qualitative studies. For example, although the data analysis of a study might have appeared rigorous as judged by the results presented, the description of the process of analysis was weak in most studies. In addition, the discussion about the researcher's relationship with the study participants and ethical considerations were either unreported or not examined. These are important parts of any qualitative research and should also be reported in any dissemination of the findings. The risk of bias summary table (4.2) offers a way for readers to assess completeness for themselves.

Data analysis:
The meta-ethnographic process attempts to use the included studies much like one would use transcripts in a qualitative analysis. The quality and completeness of the transcripts affects the analysis process and the results. The direct quotations from women in the studies were as close to the raw data as we could get-similar to having access to a dataset in quantitative research. Unfortunately, some studies provided more direct quotations from women than other studies, and the analysis was therefore biased toward studies that included more quotations.

Limitations of the synthesis process
The theory of an integrated mixed-method review is that the two parts of the analysis can inform each other during the analysis process and not just in the conclusions. Therefore, the researchers working on the two parts of the study spent time during the data extraction and analysis phase discussing findings but there were limitations in how much the exchange of information could impact each analysis. For example, very few concepts that emerged from the qualitative studies could be used in the subgroup analysis of the quantitative studies because of missing data.
We believe that integrated mixed-method reviews that include both quantitative and qualitative research have potential. However, to optimize the learning from integrated mixed-methods reviews, it is important that quantitative researchers integrate the findings of qualitative researchers in their research design and vice versa. Hence, maximizing the potential of integrated mixed-methods reviews would require a more interdisciplinary attitude from both quantitative and qualitative researchers.

AGREEMENTS AND DISAGREEMENTS WITH O THER STUDIES OR REVIEWS
The systematic review found positive significant impacts of self-help groups on empowerment, whereas the systematic reviews that focused on microcredit and microsavings (e.g. Stewart et al., 2012;Vaessen et al., 2014) only found limited evidence for positive effects on economic outcomes. In addition, our quantitative synthesis suggested that self-help group interventions that include a training component may have stronger effects on women's empowerment, particularly economic empowerment and women's family-size decision making power, than selfhelp groups that do not contain a training component. So although our results presented more positive findings than other systematic reviews with an emphasis on microfinance, we do not believe the results from the different reviews are necessarily contrasting. However, we need to remain very careful in the interpretation of this result because the quantitative studies neither presented sufficient about the training components of SHGs nor about other details of the trainings.

IMPLICATIONS FOR PRACTICE AND POLICY
Our review highlights several important implications for practice and policy related to the rollout and potential impact of SHGs. First, our quantitative evidence suggested positive effects on women's empowerment indicating that self-help groups have the potential to strengthen development outcomes. These findings have important implications for program designers and managers. Thorough program planning and implementation is essential to ensure an optimal number of participants meet frequently. In addition, staff and institutions may consider structures that will ensure the same staff and institutions are accountable to their clients.
The greatest quantitative impacts were found among SHGs where health education, life skills training, and/or other types of information were shared and supported. The additional benefits accrued via group training, such as group sharing, learning, and support. Furthermore, it is important for programs to consider that SHGs offer an important venue to deliver additional services and training. SHGs that are facilitated externally are also more likely to have the resources to provide additional components, such as training. The finding on training might also reflect the success of programs in which more holistic programming is provided as indicated by the qualitative research. However, unfortunately, the quantitative studies do not present details about the contents of the training. Thus, we have to remain careful in interpreting this finding.
One area that has particularly important implications for programs and policy is the qualitative finding that women SHG members perceive low participation of the poorest of the poor in self-help groups. In part, this might be because the poorest of the poor are too financially and/or socially constrained to join self-help groups or to benefit from the financial services most often provided through self-help groups. But other barriers such as class or caste discrimination might also be occurring. Poorer or marginalized women may not feel accepted by groups that are made up of wealthier or better connected community members. It is important for program and policy makers, as well as researchers, to identify ways to build in support and reduce barriers for individual women who want to participate in such groups but who do not have the financial resources or freedoms to join. One enhancement that we have made based on the findings from this review is to start the theory of change related to SHGs with encouragements to stimulate the poorest of the poor to participate in self-help groups. These incentives could be financial, for example, by giving the poorest of the poor the opportunity to participate without a savings requirements, or nonfinancial, for example, by stimulating the husbands or mothers-in-law of the poorest of the poor to let their spouses and daughters-in-law participate in self-help group programs or conducting outreach activities to marginalized groups.
It is important to note that although SHGs overall showed positive impacts, both the quantitative and qualitative evidence showed there was much heterogeneity across program designs and the effectiveness of programs. This finding indicates that context matters. The types of specific program components, and the likely impacts, depend on the overarching social, cultural, political, and economic context from a national level down to a very local level. As new programs are implemented in different contexts, and as more nascent groups become more established, it is critical that program designs are tailored to the local settings in ways that allow them to evolve over time. Such consideration may include conducting community readiness activities, performing more comprehensive outreach to marginalized groups even within small communities, and included some form of advocacy training so that women might address change beyond the individual level and towards overcoming structural barriers to empowerment. This review has shown that one-size does not fit all, and while there is a need to take best practices across programs for implementation, this needs to be done in a flexible way to adapt programs most successfully for the greatest impact in women's lives.

IMPLICATIONS FOR RESEARCH
This review has several implications for research. First, the synthesis of the quantitative evidence suggests there is a need for more rigorous quantitative studies that can correct for selection bias, spillovers, and the difficulties of measuring empowerment. The quantitative synthesis indicated that studies that did not adequately account for selection bias overestimated the impact of self-help groups on empowerment. Furthermore, the qualitative synthesis suggested that the current measurements of empowerment in the quantitative studies might not reliably capture all dimensions of empowerment. Whereas the quantitative measures are useful in understanding certain aspects of the impact of self-help groups on empowerment, the qualitative studies show us more nuanced ideas about how to measure the lived experience of empowerment. In both cases (quantitative and qualitative studies), researchers need to describe more fully the various components of the interventions/programs being studied, so outcomes and findings can be understood and interpreted against the specifics of the program components.
Greater detail in the description of the program design will help in determining moderating factors in the design of SHGs. In addition, future research could draw on mixed-method strategies to develop and test new rigorous measures of empowerment.
Second, there is a need for more research focused on examining the impact of economic self-help groups on women's empowerment using meditator and moderator analysis to further understand the pathways or mechanisms through which SHGs impact empowerment. In addition, there may be other pathways not examined in this review that lead to empowerment that can be rigorously measured (or measure development embarked on) for inclusion in future studies that examine the impact of women's SHGs on empowerment. Potential mediators/moderators of interest include indicators of mental health, relationship power, community-level respect, social capital, and social solidarity. In addition, other important mediators may include an understanding of whether women who participate in SHGs have male partners who experience shifts in gender-related attitudes in the direction of more gender equality as measured by the gender equitable man scale (Pulerwitz et al., 2008). Future research can examine if and how men's attitudinal shifts impact, positively or negatively, women's empowerment. Furthermore, the effects of these complex interventions take time to influence both mediators and outcomes; thus, longer follow-up periods are needed in future research to understand fully both the long-term impacts of SHGs and the factors that support the maintenance of empowerment.
Because women's self-help programs are implemented across many different regions of the world, it is also critical for researchers to not assume that an intervention that works in one place should be replicated elsewhere. In short, as alluded to, nuanced modifications of programs and sensitivity to local cultural norms are needed in future program design and in the evaluation of program impacts.
Another interesting dimension of our review, where we were not able to make definitive conclusions, is the effectiveness of SHGs that integrate components other than economic ones (skills-building, reduced family size, reproductive health) and whether these "integrated programs" result in more social, psychological, political, or economic empowerment for women.

Acknowledgments
We would like to thank 3ie for funding this study. We would also like to acknowledge our advisory group including Reema Nanavaty of SEWA and Shahid Vaziralli from the Center for Microfinance, search strategist Tara (2) qualitative Source of information (outsource): (1) survey (2) records (3) interviews (4) focus groups Measure/Indicator of outcome (measure): Were there any differences in measurement of this outcome between the group participants and the comparison? (1) yes (2) Vijayanthi, 2002 No quantitative estimate of impact Qualitative Reason Ahmed, 2011 This is not an evaluation of a self-help group program Apantaku, 2008 This is not an evaluation of a self-help group program  This study does not focus on empowerment outcomes.

Salway 2005
This is not an evaluation of a self-help group program Sarojani, 2009 This study does not report direct quotations from participants Sharma, 2014 This is not an evaluation of a self-help group program Shylendra, 1999 This study does not report direct quotations from participants Somé, 2013 This is not an evaluation of a self-help group program Sotshongaye, 2000 This is not an evaluation of a self-help group program Ssewamala, 2009 This is not an evaluation of a self-help group program Torri, 2011 This is not an evaluation of a self-help group program Comment: Open answer If baseline characteristics are not available, does the study show characteristics of beneficiaries and non-beneficiaries that are not likely to be affected by the intervention? Are the mean values or the distributions of the covariates at baseline statistically different for beneficiaries and nonbeneficiaries (p<0.05) If there are statistically significant differences in plausibly exogenous characteristics between beneficiaries and nonbeneficiaries are these differences controlled for using covariate analysis in the impact evaluation? If baseline characteristics are not available, does the study qualitatively assess why beneficiaries are likely/unlikely to be a random draw of the population at baseline? Confounding and selection bias (ask questions for all quantitative studies) Does the study use a comparison/control group of students/households without access to the program? 1 = Yes 2 = No 9 = Unclear 99 = Not applicable Comment: Open answer Does the study use a comparison/control group of students/households with access to the program but without participation in the program? Does the study include data at baseline and endline (before and after the intervention)? Are the data on covariates collected at the baseline? Is difference in differences estimation (i.e. using statistical inference) used? If the study is quasi-experimental and uses difference-indifference estimation do the authors assess the parallel trends assumption? If the study does not use difference in difference, does the study control for baseline values of the outcome of interest If the study does not use difference in difference and does not control for baseline values of the outcome variable, does the study control for other covariates at baseline If the study does not use difference in differences estimation, is there any assessment of likely risk of bias from time invariant characteristics driving both participation and outcome? If the study does not use difference in difference estimation but does assess likely risk of bias from time invariant characteristics, are these time invariant characteristics likely to bias the impact estimates Does the study report the Comment: Open answer Is the attrition rate below 10% ? Does the study assess whether drop-outs are random draws from the sample (e.g. by examining correlation with determinants of outcomes, in both treatment comparison group)? Spillovers and contamination (ask questions for all quantitative studies) Spillovers: are comparisons sufficiently isolated from the intervention (e.g., participants and non-participants are sufficiently geographically or socially separated) or are spillovers estimated by comparing non-beneficiaries with access to the intervention to non-beneficiaries without access to the intervention and/or through social network analysis? 1 = Yes 2 = No 9 = Unclear 99 = Not applicable Comment: Open answer Spillovers; if spillovers are not estimated, is the study likely to bias the impact of the program? Contamination: does the study assess whether the control group receives the intervention? Contamination: if the control group receives the intervention but for a shorter amount of time does the study assess the likelihood that the control group has received equal benefits as the treatment group Contamination: if the control group receives the intervention have they received the intervention sufficiently long to argue that they have benefited from the intervention Contamination: does the study describe and control for other interventions which might explain changes in outcomes? Other threats to validity (ask questions for all quantitative studies) Does the evidence suggest analysis reporting biases are a serious concern? Analysis reporting biases include failure to report important treatment effects (possibly relating to intermediate outcomes), or justification for (uncommon) estimation methods, especially multivariate analysis for outcomes equations. Are there concerns about baseline data collected retrospectively Are there concerns about courtesy bias, social acceptability bias, political correctness bias, self-serving bias, self-importance bias and biases in reporting of sensitive information from outcomes collected through self-reporting?

Construct Validity (ask questions for all quantitative studies)
Was the survey suitable for the local context? 1 = Yes 2 = No 9 = Unclear 99 = Not applicable Comment: Open answer Does the study describe the implementation of the program in sufficient detail? Does the study take into consideration potential implementation failures Does the study use a proper theory of change, logframe and/or other proper conceptual or theoretical framework? Does the study analyze the outcome measures put forward in the theory of change or logframe? Was the implementation of the intervention influenced by the research? Did the researchers have perfect control over the intervention? Was the implementing agency representative for the agencies that usually implement self-help group programs? External Validity (ask questions for all quantitative studies) Is the study sample representative of the population of interest? 1 = Yes 2 = No 9 = Unclear 99 = Not applicable Comment: Open answer Was the effectiveness of the intervention harmed by implementation failures that would not have happened in the absence of the research? Does the study assess the replicability of the intervention? Is the intervention replicable? Does the study assess the scalability of the intervention? Is the intervention scalable? Do the authors clearly distinguish between the intention-totreat effect and the treatment effect on the treated? Do the authors highlight the intention-to-treat effect? Hawthorne and John Hendry Effects (ask questions for all quantitative studies) Do the authors argue convincingly that it is not likely that being monitored influences the behavior of the beneficiaries and non-beneficiaries in different ways? Comment: Open answer Does the study use a unit of allocation with a sufficiently large sample size to ensure equivalence between the treatment and the control group? Ask questions below only for studies that apply regression discontinuity designs Is the allocation of the program based on a pre-determined continuity on a continuous variable and blinded to the beneficiaries or if not blinded, individuals cannot reasonably affect the assignment variable in response to knowledge of the participation rule? 1 = Yes 2 = No 9 = Unclear 99 = Not applicable Comment: Open answer Is the sample size immediately at both sides of the cut-off point sufficiently large to equate groups on average? Is the mean of the covariates of individuals immediately at both sides of the cut-off point statistically significantly different for beneficiaries and non-beneficiaries? If there are statistically significant differences between beneficiaries and non-beneficiaries are these differences controlled for using covariate analysis? Ask questions below only for studies that apply matching Quality of matching (PSM, covariate matching) Are beneficiaries and non-beneficiaries matched on all relevant characteristics? 1 = Yes 2 = No 9 = Unclear 99 = Not applicable Comment: Open answer Does the study report the results of the matching function (e.g. for PSM the logit function)? Does the study report the matching method? Does the study exclude observations outside the common support? Does the study use variables at follow-up that can be affected by the intervention in the matching equation? Are matches found for the majority of participants (>90% )? If >=10% of participants failed to be matched, is sensitivity analysis used to re-estimate results using different matching methods? For nearest-neighbor PSM, does the study report the mean or distribution of the propensity scores in the treatment and control groups after matching? For nearest-neighbor PSM, are propensity scores similar, based on tests for statistical differences at the means or other quantiles of the distribution)? Does the study report the mean or distribution for the covariates of the treatment and control groups after matching? Are these characteristics similar, based on tests for statistically significant differences (p>0.5)? Do the authors use bootstrapped standard errors? Sensitivity analysis (only for studies that apply PSM) For PSM, where propensity score distributions and/or covariates of the treatment and control groups are not reported, or they are reported but there are differences in means or distributions of the covariates or propensity scores (usually only applicable to methods which do not exclude treatment observations such as nearest neighbor), is robustness assessed using an additional matching technique? Comment: Open answer Are the results of the participation equation reported? Are the instruments jointly significant at the level of F ≥ 10? If an F test is not reported, does the author report and assess whether the R-squared of the instrumenting equation is large enough for appropriate identification (R-sq > 0.5? ) Are the instruments individually significant (p≤0.05 )? For IV, If more than one instrument is used in the procedure, does the study include and report an overidentifying test (p≤0.05 is required to reject the null hypothesis)? Does the study qualitatively assess the exogeneity of the instrument/identifier (both externality as well as why the variable should not enter by itself in the outcome equation)? Ask questions below only for studies with censored outcome variables Do the authors use appropriate methods (e.g. Heckman selection models, tobit models, duration models) to account for the censoring of the data? The study does not adequately control for selection bias in the analysis.
The study does not take into consideration that the comparison group may also have been contaminated by the intervention.
The study assesses the impact of several components of the intervention without taking into consideration selection bias. This is an uncommon estimation method, which suggests that the analysis is vulnerable to analysis reporting biases. The study also uses co-variates in the model that may be endogenous.
The answers to the questions about domestic violence are vulnerable to social desirability bias.

Bali Swain and Wallentin, 2009
High risk of bias High risk of bias High risk of bias High risk of bias The study uses analysis to separately determine the trend of the outcome measures among the beneficiaries and the nonbeneficiaries. This does not allow for estimating the impact of the intervention. Hence, the study does not use a valid identification strategy The study selects the comparison group from the same location as the beneficiaries so there is a potential for spillovers biasing the findings.
The study uses an unusual type of analysis (separately determining the trend for the beneficiaries and the nonbeneficiaries). This could bias the research findings.
The use of recall data could bias the impact estimates. And it is not well explained why the use of these data can be considered valid for this study.

Banerjee et al., 2015, 2010
Low risk of bias Medium risk of bias Low risk of bias Low risk of bias The study uses a matchedpair cluster-randomized controlled trial. Baseline and follow-up data were collected, but panel data were not available (i.e., the respondents in the followup are not necessarily the same as the respondents at baseline due to resampling). The study assesses equivalence of The authors estimate a combined intention-to-treat effect for women who decide to self-select into SHGs and women who decide not to self-selection into SHGs. This minimizes the risk of spillovers.
The two versions of this paper report slightly different results, which may indicate outcome reporting bias. Standard deviations are not reported in either version of the paper, and authors did not respond to requests for information. As a result, the standard deviations had to be imputed increasing the potential risk of bias of the effect size.
The use of recall data could result in social desirability bias in the measurement of empowerment.

Desai and Joshi, 2012
Low risk of bias Low risk of bias Low risk of bias Low risk of bias It appears that the randomization resulted in balance across observable and unobservable characteristics.
The authors estimate a combined intention-to-treat effect for women who decide to self-select into SHGs and women who decide not to self-select into SHGs. This minimizes the risk of spillovers.
There do not appear to be serious outcome or analysis reporting biases.
There do not appear to be other serious biases.

Desai and Tarozzi, 2011
Low risk of bias There do not appear to be serious other biases.

Holvoet, 2005
High risk of bias Medium risk of bias Low risk of bias Medium risk of bias This is an ex post multivariate multinomial logistic regression study without a valid identification strategy. The study does not collect baseline data and elicits baseline characteristics using recall over long periods. The study attempts to "match" the programs that deliver There was potential for spillover effects, but the study reports that the authors attempted to minimize these by not sampling nonbeneficiaries with close connections to the beneficiaries.
It does not appear that there are serious outcome or analysis reporting biases.
The study relies on retrospective baseline data collection to a considerable extent.
the treatments under study, strategy. The authors do not use baseline data and do not report the results of the analysis.
In the earlier version of the paper, the study uses panel data and difference-indifferences analysis. However, the intervention already started before the baseline survey. This invalidates the parallel trends assumption. The authors do also not take clustering into consideration in the estimation of the standard errors and do not assess the potential biases in outcome measurement.
in their analysis without separately analyzing these, which could result in a bias due to spillovers. Furthermore, several members from the comparison group were members of self-help groups during the baseline survey, suggesting that they may also have been affected by the intervention.
In the earlier version of the paper, the study is vulnerable to analysis reporting biases because of the start of the intervention before the collection of baseline data.
recall information from two years ago.

Mahmud, 1994
High risk of bias The study uses a crosssectional study design without data collection at baseline and does not have a valid identification strategy. The study only controls for a small number of potential confounding variables but also includes annual income, a variable likely affected by the program, as a control variable in the regression analysis.
The nonbeneficiaries come from the same locality as the beneficiaries, which could invalidate the results due to spillovers.
The outcome variables are not well explained. It controls for a small number of potential confounding variables but also includes annual income, a variable likely affected by the program, as a control variable in the regression analysis.
There do not appear to be serious concerns about other biases.

Osmani, 2007
High risk of bias High risk of bias High risk of bias Low risk of bias The study uses an instrumental variable approach to address the problem of selection bias. However, the validity of the instruments depends on the inclusion of household income as an independent variable, which is an intermediate outcome. The beneficiaries and the comparison group were drawn from the same villages. Hence, the estimates may be biased due to spillovers.
It appears that the authors do not apply the use of instrumental variables in a correct manner. This suggests that the findings are vulnerable to analysis reporting biases.
There do not appear to be serious other biases.

Pitt et al., 2006
Medium risk of bias High risk of bias Medium risk of bias Low risk of bias The study uses a fixedeffects instrumental variable regression approach (and compares the results to ordinary least squares with village-level variables and fixed-effects estimation). The authors identify a set of instrumental variables and control for village-level fixed unobserved characteristics.
The authors do not discuss the potential bias from spillover effects, even though the comparison women come from the same communities.
The study does not report the participating equation; it is unclear whether the instruments were jointly or independently significant; and the authors do not report a test for overidentification.
There do not appear to be serious outcome and analysis reporting biases

APPENDIX 10: PROCEDURES FOR CALCULATING EFFE CT SIZES
This appendix describes the procedure for calculating the effect sizes of the included quantitative studies.
First, we calculated standardized mean differences (Cohen's d) by dividing the mean difference with the pooled standard deviation by applying the formula in Equation Equation 10.2 was used for regression studies with a continuous dependent variable for which we had information about the point estimate for the treatment variable and the associated standard deviation. SDy refers to the standard deviation for the point estimate from the regression, nt refers to the sample size for the treatment group, nc refers to the sample size for the control group, and β refers to the point estimate. Equation 10.3 was applied when there was information about the standard deviation for the treatment group and the standard deviation for the control group. In this formula, st refers to the standard deviation for the treatment group and sc to the standard deviation for the control group. We assumed the same standard deviation for the treatment and the control or comparison group when the paper only reported the standard deviation for the full sample, treatment group, or control or comparison group.
Then we corrected the standardized mean difference for potential bias from a small sample size using the formula to transform Cohen's d to Hedges' g in Equation 10 For dichotomous variables, we used odds ratios and log odds ratios rather than risk ratios because methods are available to convert the natural logarithm of odds ratios to the standardized mean difference and vice versa, as illustrated in the formula in Equation 10.6 (Borenstein et al., 2009): (10.6) g = LogOddsRatio *

√3
This transformation required several statistical assumptions but it allowed for one meta-analysis with both dichotomous and continuous variables for the same construct. Conducting one meta-analysis for dichotomous and continuous variables was preferable because it substantially increased the number of studies we could include in one meta-analysis.
It was also appropriate because the included studies that analyzed continuous variables shared goals in common with the included studies that analyzed dichotomous variables. Borenstein et al. (2009) suggests that the transformation of log odds ratios to standardized mean differences improves the meta-analysis as long as the outcome variables measure the same construct. It is less important whether the outcome variables use different measurement scales. Nonetheless, the transformation from log odds ratios to standardized mean differences requires several statistical assumptions (Borenstein et al., 2009).
Following the correction of the effect size, we estimated the corrected standard error by applying the formula in Equation 10.7 for standardized mean differences that were estimated from odds ratios: To derive odds ratio from studies that applied linear probability models, we assumed linearity in the estimation of standardized effect sizes from the linear probability model. In practice, this meant that if we observed a mean baseline value for the comparison group of 0.067 and an effect size of 3.1 percentage points, then we assumed that the follow-up value for the treatment group would be 0.067+0.031=0.098 and we assumed that the follow-up value for the comparison group would be 0.067. Using this information, we were able to estimate odds ratios using a 2 by 2 contingency table (Lipsey & Wilson, 2001), as described in Figure 10.1: We then calculated the standard error of the natural logarithm of the odds ratio by calculating the number of cases where the treatment group could be considered empowered and the number of cases where the control or comparison group could be considered empowered. We did this by using the information about the percentage of empowered women in the treatment and control or comparison group, and information about the sample size in the treatment and control or comparison group. This allowed us to estimate the standard error of the natural logarithm of the odds ratio using the following formula in Equation 10.9, where n11 is the number of empowered women in the treatment group, n10 is the number of empowered women in the control group, n01 is the number of nonempowered women in the treatment group, and n00 is the number of nonempowered women in the control group.
(10.9)√ Then we converted the log-odds ratios and their 95 per cent confidence intervals back to odds ratios as well as to standardized mean differences using the formula to transform log odds ratios to standardized mean differences. Following this conversion, we converted the standardized mean difference (Cohen's d) to Hedges' g to account for potential bias from small samples using the formula in Equation 10.10 to correct for potential bias from a small sample size: (10.10) SMDcorrected = SMDuncorrected * (1 -3 4 * ( + −2)−1 ) We were also able to estimate the variance and standard deviation of outcome variables for which the standard deviation was not reported but for which the full distribution was reported. For this purpose, we used the formula from Equation 10.11: (10.11) ( ) = √ ∑( −µ) 2 −1 Here, µ is the mean value of x and n is the number of observations.
In the absence of standard errors for the regression analysis, we estimated the standard error of the mean effect size by dividing the point estimate by the t-value that is associated with significance at the 90, 95, and 99 per cent significance level, respectively. This procedure ensured the estimation of conservative pooled standard deviations.

Solidarity
Author Quotation Dahal, 2014, Nepal "Our strength is that we have some common problems which we have to solve together. We were deprived of our rights and respect for years and this agony has helped us to move together and form a unity." Kabeer, 2011, Bangladesh "One stick can be broken, a bundle of sticks cannot. It is not possible to achieve anything on one's own.

Community Respect
Author Quotation Dahal, 2014, Nepal "The society's view upon being a SHG member has changed. Before it was against the social norms to go out of a house but now society praises women who are involved in SHGs" Kabeer, 2011, Bangladesh "There is no proper treatment or medicine in hospitals.
We have demonstrated in [our] town, demanding our rights and protesting against the corruption of doctors and theft of public medicine. So now when they hear at the hospital that someone is from our [SHG], they give them a bit more respect." Kumari, 2011, South India "When people know we are from GSGSK, we are given special consideration. They give us a chair to sit wherever we go." Ramachandar & Pelto, 2009, South India "The biggest benefit of the [SHG] is that we get prestige and honour in our community; we gain experience going to the bank and meeting with officials."

Financial Skills
Author Quotation Kumari, 2011, South India "The fear of handling money is gone." Mercer, 2002, Tanzania "Being allowed to have money and decide on how to spend it has brought us development in our household and now husbands give us the freedom to do our own things." Knowles, 2014 Sahu & Singh, 2012, South India "In the previous election, the MLA candidate had promised to build a toad but he did not. When he came for campaigning this time, we questioned him for not keeping his promise and we didn't vote him either."

Author Quotation
Pattenden, 2011, South India "A group from another village who had approached the GP [Gram Panchayat-local government] building to request the disbursement of anti-poverty resources had been stoned." Kumari, 2011, South India "Empowerment? There has not been complete empowerment. More factors are needed like equal wages. I would say that only 5 to 10% of empowerment has happened."

Barriers to Participation
Author Quotation Dahal, 2014, South India "The issue of selection bias can be agreed to a certain extent acknowledging to the fact that very poor people cannot afford the membership fee and enough time for group activities." Mercer, 2002, Tanzania "Some women don't join because they feel inferior, they think that members are rich, can afford things and can be close to the Church, they are in good positions." Mathrani & Pariodi, 2006, South India "The larger community was of the view that sangha formation is relevant only for the lower castes and that women from the upper castes were demeaning themselves by getting involved in this work."

Author Quotation
Maclean, 2012, Bolivia "SHGs operate at very low cost, have a small fund, raise little interest so we cannot accomplish bigger projects and this is our weakness." Mercer, 2002 Tanzania "Other women are discouraged because it is almost four to five years since we contributed the money for the cows and up to now we haven't seen any good profit."

Mistrust and Corruption
Author Quotation Maclean, 2012

Stigma
Author Quotation Ramachandar & Pelto, 2009, South India The men used to make comments such as, these women are doing "tamasha" (showing off) and they are going to close down our sangha after a few days. But we did not worry about those comments." Mathrani & Pariodi, 2006, South India "They think women are attending meetings to get money and take control of the village council." "Men say that women are being overly ambitious." "Upper castes say, 'These women attend meetings and visit the panchayat to get money. They are trying to usurp the position of the gowda and take control of the village.'"

Contribution of authors
The study was led by Carinne Brody (CB). This report was written by CB and Thomas de Hoop (TH). CB, Megan Dunbar (MD), Padmini Murthy (PM) and Shari Dworkin (SD) authored by study protocol. The search strategy was developed by CB, MD and Tara Horvath, a search specialist from the University of California, San Francisco. CB and RW, along with several research assistants, conducted the search. The meta-analysis was undertaken by TH and Martina Vojtkova. The qualitative synthesis was undertaken by CB and RW. RW edited the report. CB and TH will be responsible for updating this review as additional evidence accumulates and as funding becomes available.

Declarations of interest
The authors declare that we are not aware of any conflicts of interests. Thomas de Hoop was involved in a primary study on the effects of women self-help groups on women's empowerment in Odisha, India. In the risk of bias assessment for this study, we emphasized the opinion of the other reviewer of the quantitative risk of bias assessment Martina Vojtkova rather than relying on the opinion of Thomas de Hoop.

Sources of support EXTERNAL SOURCES
Funder: International Initiative for Impact Evaluation