Mainstreaming as rhetoric or reality? Gender and global health at the World Bank

Background: Over the past decade gender mainstreaming has gained visibility at global health organisations. The World Bank, one of the largest funders of global health activities, released two World Development Reports showcasing its gender policies, and recently announced a $1 billion initiative for women’s entrepreneurship. We summarise the development of the Bank’s gender policies and analyse its financing of gender projects in the health sector. This article is intended to provide background for future research on the Bank’s gender and global health portfolio. Methods: First, we constructed a timeline of the Bank’s gender policy development, through a review of published articles, grey literature, and Bank documents and reports. Second, we performed a health-focused analysis of publicly available Bank gender project databases, to track its financing of health sector projects with a gender ‘theme’ from 1985-2017. Results: The Bank’s gender policy developed through four major phases from 1972-2017: ‘women in development’ (WID), institutionalisation of WID, gender mainstreaming, and gender equality through ‘smart economics’. In the more inclusive Bank project database, projects with a gender theme comprised between 1.3% (1985-1989) and 6.2% (2010-2016) of all Bank commitments. Most funding targeted middle-income countries and particular health themes, including communicable diseases and health systems. Major gender-related trust funds were absent from both databases. The Bank reports that 98% of its lending is ‘gender informed’, which indicates that the gender theme used in its publicly available project databases is poorly aligned with its criteria for gender informed projects. Conclusion: The Bank focused most of its health sector gender projects on women’s and girls’ issues. It is increasingly embracing private sector financing of its gender activities, which may impact its poverty alleviation agenda. Measuring the success of gender mainstreaming in global health will require the Bank to release more information about its gender indicators and projects.


Amendments from Version 1
This version includes an analysis of a third World Bank gender database, Monitoring Gender Mainstreaming in World Bank Lending Operations. It provides a more in-depth discussion about the differences between the World Bank's gender 'theme' and its 'gender informed indicator', and highlights issues that researchers face when using these indicators to study mainstreaming in the health sector. Finally, it provides specific recommendations for future research on the Bank's gender portfolio, in the context of global health.
In detail: • We amended the Introduction to clarify that our paper is a scoping paper, rather than a case study, and added a new figure (Figure 2) to better link our financial analysis with our Bank policy literature review section • We added a basic analysis of the Monitoring Gender Mainstreaming database; we explain our analysis strategy in the Methods and added a paragraph to our Results section • We improved the congruence between our literature review and financial analysis by re-structuring the Discussion section; we now have 2 sub-headings to pull-out crosscutting themes

REVISED
Such private investments in gender programming have come into vogue since the late 2000s, the newest of which is the World Bank's Women in Entrepreneurs Finance Initiative (We-Fi). In 2017, G20 leaders pledged approximately $1 billion into this trust fund (a financing vehicle for voluntary contributions, of which the Bank serves as trustee) 15 , which will be implemented jointly by the traditional World Bank (the International Bank for Reconstruction and Development and International Development Association) and its private financing arm, the International Finance Corporation (IFC). The World Bank is arguably the most influential institution in global health, both ideologically and financially 16 . Although its historical links to neoliberalism 17 have raised questions about its commitment to poverty alleviation and women's empowerment 18 , the Bank publicly advocated for gender equity and mainstreaming through its heavily cited 2002 and 2012 World Development Reports. Indeed, the Bank has pointed to its high level of success in mainstreaming gender, especially in the health sector. From 1988-1999, it estimated that 89% of Health, Nutrition and Population (HNP) projects contained gender considerations 19,20 , compared to 38% of Bank projects across all sectors, and in 2013, 97% of all Bank projects were deemed 'gender informed'.
However, does the rhetoric of the Bank's work in gender match the reality of its operations and lending portfolio, in the context of global health? This paper provides a scoping review of the Bank's gender framework since the 1970s and its corresponding financial flows to health sector projects with a gender theme since 1985, using publicly available sources. It is intended as a roadmap for future research on gender, mainstreaming indicators, and global health at the Bank. Using Bank reports and secondary literature, we first explore the Bank's conceptualisation of and policies for gender over time. In doing so, we identify four phases of the Bank's gender approach (Figure 1): the launch of 'women in development' (WID) (1972)(1973)(1974)(1975)(1976)(1977)(1978)(1979)(1980)(1981)(1982)(1983)(1984), the institutionalisation of WID at the Bank (1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994), gender mainstreaming (1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004), and gender equality through 'smart economics' (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017). We illustrate Bank commitments to projects with a gender theme during three of these phases ( Figure 2). We then position global health financing within this gender framework, using two major Bank project databases and the Bank's Monitoring Gender Mainstreaming financial database. By tracking Bank health sector projects with a gender theme in these databases, we provide a snapshot of trends in the Bank's support of gender in global health and highlight significant transparency issues.

Methods
This paper relies on two major data sources. First, we used published articles and grey literature reports to construct a timeline of the framing and operationalising of gender at the World Bank. Second, we extracted financial data from publicly available gender project databases, and analysed this data for the health sector.

Introduction
Over the last decade, particularly since the launch of the sustainable development goals (SDGs) in 2015, gender has become increasingly visible within the global health community. The Bill and Melinda Gates Foundation selected gender as a 'grand challenge' for the first time in 2014 1 and pledged $80 million to close gender data gaps in 2016. This announcement was followed by a call to action from the 'Women in Global Health' initiative 2 , which in turn was instrumental in lobbying World Health Organization (WHO) Director General Tedros Adahanom Ghrebreyesus to appoint 60% women to the WHO's leadership team for the first time in its -and in fact in any UN institution's -history (reports here and here) 2 .
Yet, such widespread attention has not come without controversy. The Ebola and Zika epidemics, in particular, brought issues of 'gender blindness' to the forefront, as women's voices were often underrepresented in planning and response activities, in spite of the fact that they were disproportionately affected by the outbreaks 3,4 . Researchers and policymakers have questioned how major international development organisations define and frame gender 5,6 . A dominant critique is that gender is often seen through the lens of women's and girls' empowerment, particularly through education (i.e. Millennium Development Goal 3) 7,8 . This may exclude men and members of the LGBTI community who have the highest burden of disease in some health contexts 5,9,10 . Others have argued that the gender equality rhetoric does not match reality, as evidenced by few women in leadership and decision-making positions at global health organisations 2,11 . Finally, scholars worry that the rising involvement of the private sector in health (e.g. through global public-private partnerships) could link corporate profit with gender equality 12-14 .  and policies at the World Bank and were published in English in a peer-reviewed journal. A total of 307 search results were reviewed for the aforementioned criteria, and 20 were included in this analysis. Additional peer-reviewed publications were identified through the reference lists of these 20 articles. Finally, we identified and analysed key publications on gender by the World Bank, its Operations Evaluation Department, and its Independent Evaluation Group.
Data on World Bank financing of projects with a gender component are available publicly through the Bank's 'Projects & Operations' (PO) and 'Development Topics' (DT) databases. Both databases include projects with gender focuses from 1985-2017 and allow projects to be searched by sector and theme, but they do not include identical projects. In order to understand the Bank's reported funding for gender projects in the health sector, we therefore exported project data from both databases (as of July 1, 2017). Figure 3 provides a summary of the inclusion/exclusion criteria and analysis framework. We classified projects as 'gender projects' if they had a gender theme listed (see definition in Table 1), regardless of the percentage given for this theme. The Bank lists up to five themes for each project, and the gender theme percentage for projects varied from 5% to 100%. Theoretically, therefore, the gender theme should capture projects with even a minor gender component.
For each database, projects with a health sector classification were selected for further analysis, and the absolute Bank commitments to gender projects in the health sector were calculated. Although some projects took place over multiple years, the Bank releases funding data by the project's approval date (PO database) or starting year (DT database), and all commitments were assigned to this year. We then disaggregated all health sector project commitments by theme (health and gender themes are defined in Table 1), to determine the relative

Theme Definition
Child health Activities aimed to improve the health status of children and to reduce child morbidity and mortality.

Gender
For the purposes of coding, the theme encompasses World Bank Group activities thatirrespective of sector -address and/or close gaps between males and females and other gaps that may be identified at the Country Partnership Framework at the country level.

Communicable diseases
HIV/AIDS -Programmes that increase access to HIV/AIDS prevention, treatment, care and support services. Tuberculosis -Activities aimed at the prevention, diagnosis and/or treatment of tuberculosis.
Malaria -Activities aimed at the prevention, diagnosis, control and/or treatment of malaria.

Health system performance
Programmes and policies which aim to bring about improvements in the management, financing and overall functioning of health systems.

Injuries & noncommunicable diseases
Activities aimed to reduce morbidity and premature mortality from cardiovascular disease, hypertension, cerebrovascular disease, peripheral vascular disease, cancer, chronic obstructive pulmonary disease, asthma, diabetes, mental illness (including depression, post-traumatic stress disorder, suicide, psychosis, alcohol and drug abuse), and other non-infectious, chronic conditions such as arthritis and osteoporosis. This theme also includes preventable injuries (excluding road/traffic accidents).

Nutrition & food safety
Programmes that include objectives and specific activities related to improving nutritional status or food security at the household level.

Population & reproductive health
Activities to improve reproductive health and reduce maternal morbidity and mortality.
World Bank funding for health themes each year. To facilitate comparison of health themes, some Bank themes were combined (see Table 2). Finally, we determined the total commitments for gender projects in the health sector for each recipient country and geographical region from 1990-2017. We also compared the scope of both databases by identifying the number of identical projects that they included.   (Table 3) 16 .
In both databases, the health sector represented a high percentage of the Bank's total commitments to gender projects ( Figure 4). The percentage of the health sector within all gender projects peaked at 56.8% in 2005-2009 for the PO dataset, and at 96.0% in 1995-1999 for the DT dataset. However, while many gender projects were in the health sector, they consistently formed only a small part of the Bank's total funding for health sector projects ( Figure 5). The PO database's commitments to gender projects in the health sector formed a maximum of 23.0% of the Bank's commitments to health sector projects (in 1995-2009), and only approximately 11% of health sector commitments since 2005 (Table 3). Despite the fact that both databases included projects with a gender theme, they only had four identical projects. This indicates that even the more inclusive PO database may be missing a significant number of health sector projects with a gender theme.
The Monitoring Gender Mainstreaming database included 89 HNP projects totalling $8.61 billion in World Bank commitments, from FY2014 to FY2017 (2013-2016). The Bank reports that its commitments to all new HNP projects from 2013-2016 was $9.42 billion, meaning that approximately 91.4% of HNP lending during this period was gender informed. The vast majority, 85.4%, of gender informed HNP projects in the database had a gender informed indicator score of 3 (i.e. gender was considered at the planning phases of project analysis, actions, and monitoring and evaluation), while only one project had a score of 1 (i.e. gender was considered at the planning phase of only one of these project dimensions). However, the gender informed indicator was poorly correlated with the gender theme. Only three of the 89 gender informed projects were classified as having a gender theme; thirteen projects with a highly gender informed score of 3 were classified as having no gender theme; and 39 gender informed projects had no marking (neither a yes nor a no) for the gender theme category. The database did not include a category for project sector, and more than half of all of the 2645 projects listed did not include a global practice classification. It was therefore not possible to   easily track HNP gender informed projects before 2013, and it is possible that our analysis missed some wider health and social services sector projects after 2013.

Health themes & recipient countries for gender projects
For projects with a gender theme in the health sector, particular health themes were emphasized over others ( Figure 6 and Figure 7). Definitions of the major health themes are given in Table 2. In the PO database, communicable disease and health system performance themes received the highest commitments. Funding for gender projects with a communicable disease theme peaked in 2000-2004 and have since declined, while funding for health system performance remained relatively steady from 1990-2015 and peaked in 2010-2014. Population and reproductive health and child health projects with a gender theme received relatively less funding from 1990-2015, and only one project targeted injuries and non-communicable diseases during this period. The DT database similarly included only one project Figure 6. Total World Bank funding for projects with a gender theme in the health sector for the PO database, by theme. The World Bank's commitments to gender projects in the health sector included many themes, and the relative financing for communicable diseases, health system performance, population and reproductive health, and child health varied over time. This data was obtained from the more comprehensive Projects & Operations (PO) database. NCD = non-communicable disease.
for injuries and non-communicable diseases from 1985-2017. However, the relative importance of population and reproductive health, child health, communicable disease, and health system performance was different in the DT than the PO database ( Figure 6 and Figure 7). The DT database lacked many health projects included in the PO database, and particularly omitted communicable disease projects.
Bank commitments to gender projects in the health sector were given inconsistently to countries and geographic regions over time ( Figure 8). For instance, in the PO dataset, low-income countries in Sub-Saharan Africa received funding for gender projects in the health sector most years from 1990-2017. However, this funding was typically in small commitments to many different countries. In contrast, the large commitments for health sector projects were given to five countries: Brazil, India, Argentina, Pakistan, and Egypt. These five lower-and upper-middle income countries collectively received nearly half of all Bank commitments to gender projects in the health sector from 1990-2017 ( Figure 9). The less inclusive DT database revealed an even sharper preference for funding middle income countries, with Argentina receiving over half (52%) of all health sector and gender funding. Europe and Central Asia and North America received no or extremely little funding in both databases, and East Asia and the Pacific and the MENA regions collectively received between 7% and 21% of the commitments from 1990-2017.
The DT database also included some projects financed by trust funds, to which donors (but not the Bank) made contributions. Trust funds at the Bank are sometimes called 'multi-bi' or 'extra-budgetary' aid, because they use voluntarily contributed funds from specific donors to finance activities 15 . From 2002-2017, $89.3 million was invested in gender projects by donors through recipient-executed trust funds (Bank trust funds that are executed directly by a country or organisation), of which $17.6 million was for health projects. These health projects were primarily for maternal and child health programmes in low income countries, through the Japan Social Development Fund. Some donor commitments ($5.7 million) to the Bank's Gender Trust Funds (GENTF) programme were included in this dataset, but they were not for the health sector.

Wider OECD financing for gender projects in the health sector.
To contextualize the Bank's gender commitments within the larger aid landscape, we tracked OECD donor development assistance commitments to the health sector, using the 'gender equality policy marker', from 2002-2015. This policy marker is based on a gender mainstreaming checklist, which requires donors to state whether a project has a gender dimension or impact, and whether this is at a principal or significant level. OECD donor contributions to all gender projects increased from $6.5 billion in 2002 to $39.3 billion in 2015. Health sector projects averaged about 10% of all OECD commitments to gender projects from 2002-2015 ( Figure 10). Based on our PO database commitments, the World Bank therefore contributed approximately 10% of the total development assistance for gender projects from 2002-2013, but the Bank's relative commitments dropped after 2013.
The Bank emphasized gender projects in the health sector relatively more than OECD donors; according to the PO database, Bank health sector commitments averaged 37.3% of all gender commitments from 2000-2017 (Table 3), while those of OECD donors were 11.0% from 2002-2015. Within these health sector commitments, OECD donors prioritized reproductive   policies and population health projects more than the Bank (Figure 11), while the Bank prioritized health systems and communicable disease gender projects more than OECD donors ( Figure 5 and Figure 11). Unlike the Bank, a high proportion (58%) of OECD donor commitments to gender projects in the health sector from 2002-2015 targeted low-income countries in Sub-Saharan Africa ( Figure 12).
Using the gender informed indicator from the Monitoring Gender Mainstreaming database yields dramatically different results. Comprehensively, all gender informed HNP projects correspond to 88.8% ($6.2 billion) of the total OECD donor commitments to health sector projects with a gender policy marker from 2013-2015.

Discussion
Improved transparency is required to study the Bank's mainstreaming success in global health Our analysis of World Bank gender datasets reveals significant discrepancies between its mainstreaming rhetoric and publicly released data on project financing. The World Bank recently claimed that 98% of its total lending (or 97% of its operations) is gender informed. Earlier WID ratings for gender inclusion, which were based on a random sample conducted by the Bank's Operations and Evaluations Department of 112 Bank projects, also found that 38% of all Bank (and 89% of HNP) projects addressed gender from 1988-1999 19 . Such data would appear to show a positive trend in gender considerations for global health and development projects. Indeed, at first glance, data from the Monitoring Gender Mainstreaming database seems largely congruent with this Bank rhetoric. This database shows that over 91% of HNP project lending was 'gender informed' in recent years (2013-2016), and that a majority of these projects consider gender at three project dimensions (analysis, actions, and monitoring and evaluation).
However, this information about the gender informed indicator is not included in the PO database, which is the primary resource that external researchers use to identify Bank projects, obtain financial and evaluative information about these projects, and download relevant project documents 8,16 . Projects with gender themes within the PO dataset comprised only 1.3% (1985)(1986)(1987)(1988)(1989) to 6.2% (2010-2016) of all Bank commitments. Furthermore, there was a concerning lack of congruence between projects included in each project database. The PO and DT databases, which should theoretically contain similar development projects with a gender theme, only had four projects in common, and only one project was listed in all three gender databases. Many projects  with a high gender inclusion indicator score (3) did not include a gender theme and were therefore not identifiable in the PO database, but the Bank did not provide any details about how this classification decision was made.
Ultimately, the discrepancies identified through our financial analyses raise a key question: how much can we rely on the datasets that we have analysed? Are critical gender projects missing from the PO and DT databases, and could the trends that we identified in health sector financing simply be inaccurate?
The answer is that external researchers have to rely on these databases, because they are the only data publicly released by the Bank on its financing of gender projects. As a study by the Center for Global Development reflected upon in 2016, the Bank's lack of description of its application of the gender theme (and allocation of its percentage) hampers researchers' ability to study outcomes and evaluations of gender projects 19 . The PO and DT databases provide the only way, to the best of our knowledge, to track projects with gender components before 2009, and to track the ways in which gender is included within the health sector portfolio, using Bank-assigned health themes.
Even if the gender theme and gender informed indicator are applied consistently across the project databases in the future, analyses of the Bank's gender portfolio in the health sector will still be limited by the quality of these metrics themselves. The Bank first began scoring projects in its WID portfolio in 1988, using a 0 to 2 rating system (i.e. 0 for no gender inclusion, 1 for gender addressed but no specific actions, and 2 for concrete, specific activities addressing gender or WID issues). However, in 2005, the Bank's Operations Evaluation Department pointed to a lack of framework for staff accountability and quantitative targets to assess gender projects' implementation 20 . Based on its recommendations, the '3' rating was added to the gender indicator, for projects that made recommendations based on gender analysis. In 2010, the Bank's Independent Evaluation Group underscored the continued absence of a results framework for the indicator 52 . The gender criteria was adapted, so that gender informed projects were recorded as any with at least a rating of 1, meaning that they take gender into account in either the analysis, actions, or monitoring and evaluation dimensions of a project. The rating of each project is determined internally by the Bank, using staff estimates based on a review of project appraisal documents 19 , and little information is available publicly about the specific criteria for these estimates. Major critiques levelled at the Bank by outside researchers reinforce the Independent Evaluation Group's finding that this gender informed rating system prioritises gender consideration at the planning project stage 19,21 . In particular, based on its analysis of the Monitoring Gender Mainstreaming database and a subset of projects with a gender theme, the Center for Global Development argued that most gender projects do not have gender-specific outcome objectives 19 . The Bank has responded with a new gender strategy (2016-2023), which outlines goals for improved data, staff capacity, results frameworks, and monitoring 50 . The impact that this new strategy will have on data transparency and gender project classification remains to be seen.
Health trends from the gender projects databases: framing gender as a women's issue, targeting middle-income countries, & increasingly turning to 'innovative financing' Our analysis of the PO database shows that the Bank invested relatively more than OECD donors in health systems and communicable disease control than reproductive health themes from 1985-2017. This would appear to show a move beyond McNamara and Clausen's focuses on women's reproductive roles for development goals. It also falls in line with the Bank's HNP focuses on universal health coverage and disease control 53 However, the PO dataset also indicates that the Bank may have struggled to operationalise its gender as smart economics and mainstreaming frameworks in three ways. These results must obviously be interpreted with caution due to the data transparency issues, and apply to projects with a distinct gender theme.
First, although the 2012 World Development Report emphasized including men in gender projects, the Bank and other multilateral health organisations have faced challenges in doing so, as they risk losing their focus on women's subordination 5,54,55 .
Only one project with a gender theme in the PO database had a non-communicable diseases and injuries health theme, and none specifically targeted transgender populations. Yet, the global burden of disease for non-communicable diseases and road injuries is higher for men than women 9,52,56 , the top ten contributors to DALYs (including alcohol and tobacco use) have a greater burden on men than women 5 , and transgender populations may experience health inequities 10 . This mirrors a wider problem in global health priority-setting; extremely little emphasis is given to non-communicable diseases, road accidents, and the needs of non-female populations in global public-private partnerships for health ( Associating gender with the private sector and market-based activities could adversely affect the Bank's gender equality and poverty alleviation goals, so financial sources and channels for gender projects in the health sector should be monitored 8,12,45,51 . However, the PO database does not include trust funds, and, while the DT database does include some small gender trust funds in the health sector, it does not include any of these large gender trust funds. Trust funds are not assigned a Bank project number, meaning that it is difficult to systematically obtain project documents and financial data, particularly for closed projects. They are also missing from the Monitoring Gender Mainstreaming database, which only includes projects to which the World Bank contributed directly. This difficulty in tracking gender trust funds is part of a larger issue; researchers have flagged the Bank's lack of transparency in its use of health sector trust funds and recommended methods to improve data availability 15 . The Center for Global Development's analysis using the Monitoring Gender Mainstreaming database demonstrated that, in 2013-2014, the average rating for gender informed projects in the health sector was 2.56, compared to 0.082 in the finance and private sector. This indicates that Bank-financed health projects typically consider gender more than its projects in the private sector. It also underscores that gaging the success of gender mainstreaming in the health sector will require improved trust fund data, including trust fund classification by donor, gender inclusion indicator, and gender theme. In the case of the World Bank, we specifically recommend future research on project outcomes and evaluations for health sector projects with a gender theme. For the time being, this may be best accomplished by selecting all HNP projects in the Monitoring Gender Mainstreaming database from 2013-2016 (i.e. projects with a gender informed indicator of 1-3), locating their project records in the PO database, and looking for gender-specific outcome objectives and results within the Project Appraisal Documents and Project Information Documents. This would update the Center for Global Development's valuable 2016 study on gender mainstreaming, which suggested that 'mainstreaming is at best a somewhat paper-based activity at the moment' 19 , and would extend its analysis more deeply into the health sector. For a more historical, but less HNP-comprehensive, study, we recommend performing similar document searches for gender-specific indicators and outcome objectives for all health sector projects with a gender theme in the PO database. Such a study would allow for improved understanding of how the Bank has operationalised the gender policies outlined in our timeline within the health sector (i.e. whether indicators and their targets have been for women-oriented quotas, gender-specific data disaggregation requirements, gender-equality project outcomes, etc.).

Conclusion
As the World Bank implements its 2016-2023 gender strategy, we recommend two major changes in transparency. First, all past and present Bank projects -including trust funds -should be included in the PO database, and classified by gender theme. The allocation process for this theme should be clearly described. Second, the Bank should revisit the gender informed indicator itself, so that it only includes projects with gender considerations in all three dimensions (design, implementation, and evaluation). More information should be released about the criteria used by Bank staff to make these gender inclusion designations for each project and these ratings should be included in the publicly available project databases. The Monitoring Gender Mainstreaming database should either be replaced entirely by a more comprehensive PO database, or should more directly support the Bank's new gender strategy for 2016-2023, by including more information on gender outcome objectives, and projects before 2009.
Such improved data will foster examination of the impact of gender policies and private investments on poverty reduction and health goals. Ultimately, until gender indicators and revisited and further independent research is conducted, the success of gender mainstreaming in the Bank's health sector will remain rhetoric rather than reality.

Competing interests
A senior member of the World Bank is on our project's advisory board.

Grant information
This work was supported by the Wellcome Trust [106635].
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.   1.

2.
3. Thanks for opportunity to review the revised version of this paper.

Open Peer Review
Overall, the authors are to be congratulated as the revised paper has addressed the issues raised in my review of the initial version, and is considerably clearer in structure and in the description of methodology. I also note that the authors have taken the opportunity to undertake some additional analysis, which has further strengthened the paper.
An important change from the original version is the increased focus on the issues of transparency and the accuracy and reliability of the data provided publicly by the World Bank. The additional analysis using the Monitoring Gender Mainstreaming database, and search for duplicates across the different databases, identified further discrepancies and inconsistencies in the data, and the conclusions and recommendations have an increased focus on this aspect.
As a further comment, the paper now provides an interesting example of a methodology to measure transparency and reliability in reporting from project databases, through a comparison of consistency of identification across different databases. This resembles the 'capture -recapture' methodology used by epidemiologists to measure the coverage of surveillance systems. It is perhaps something that could be adapted to measure other aspects of information captured in multiple databases.
I have no further significant concerns with paper.

Responses to recommendations:
Inconsistencies in Purpose / objectives -resolved. The revised papers refers to scoping review in the introduction and identifies the purpose as a roadmap for future research. This is a much clearer explanation of purpose and contribution.
Clarifying gender theme / gender informed -addressed. The revisions in the methods section explains the classification of gender projects. Table 1 provides useful definition of terms. Figure 2 comparing finance in total / health sector / gender theme is a useful addition and addresses concerns re: interpretation table 3.
Governance of gender replaced with conceptualisation of gender -resolved.
Absence of methodological framework -resolved. Such a framework would be needed for a case 4.
Absence of methodological framework -resolved. Such a framework would be needed for a case study, but the current content is adequate for the revised purpose and objective. The shift in focus from policy implementation to focus on indicators and publicly available data -appropriate.
Order of analysis and check for duplicates -addressed. Addition of Monitoring Gender Mainstreaming (MGM) database-gender indicator scores and comparison with PO and DT databases -valuable additions that strengthen the paper. The comparison of gender theme / gender informed projects is very illuminating.
Results - figure 2 addresses the link between financing and phases. The identification of only 4 common projects between PO and DT databases is very illuminating and provides further evidence of the lack of consistency in the application of project definitions.
Discussion & conclusions -is now better structured, and addresses the potential limitations up front. There is much more acknowledgement of the issues of transparency and inconsistencies in data sources in responding to WB claims on gender financing. Lack of congruence emerges as key issue, not previously raised.
New discussion on gender theme and indicator addresses the concerns about measurement of mainstreaming using these project definitions, and the greater sensitivity of gender measurement of projects does respond to this question. Much more focused conclusions and recommendations for further study.
No competing interests were disclosed.

Competing Interests:
Referee Expertise: Health systems governance and performance, with a focus on Asia Pacific Many thanks for the opportunity to review the revised version of this important study. We very much appreciate the revisions, including the clarification of this study as a scoping review, the additional analysis of the Monitoring Gender Mainstreaming database, the expanded discussion of challenges with data availability, and the excellent new Figure 2. The authors have greatly strengthened the paper, which is an important addition to the literature.

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
When I first peer reviewed the paper, I had never worked with any of the authors. Competing Interests: 1.

2.
When I first peer reviewed the paper, I had never worked with any of the authors.

Competing Interests:
Since then, I have started a collaboration with Professor Devi Sridhar (we will be jointly guest editing a series of articles).

Referee Expertise: Global health policy
We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. Thanks for the opportunity to review and comment on this paper. The paper is addressing an important question, and, by providing an external analysis of World Bank data, is contributing to independent scrutiny of development policy and funding. While the paper is only partly successful in its aim of comparing World Bank gender policies with financing, it does draw attention to issues in the availability and usability of World Bank data that are deserving of attention by the Bank and other development agencies.
There are inconsistencies in the presentation, particularly in terms of purpose and objectives.
(1) Inconsistencies in how the study is described -its purpose and objectives. The abstract refers to ' a case study of how global health organisations frame their gender policies and measure their success'. While page 3 column 2 refers to a comparison of 'this policy.. with financing for gender in the health sector'. The paper describes the evolution of gender policies at the WB and seeks to interpret this in terms of framing; but much of the emphasis is on measurement of the proportion of financing for gender projects. The issue of measurement is raised, but in terms of the transparency of databases, rather than how success is measured.
(2) The paper compares the proportion of Bank funding on gender projects with the proportion of projects that are 'gender informed' in three places (abstract, page 3 and page 15). There is an inconsistency in the percentage quoted as gender informed ( 98% in the abstract and discussion (page 15), and 97% on the last line column 1 of page 3). But more significantly, this comparison itself seems misleading, as one assumes that there are projects that are not 'gender projects' but that are informed by gender. The lack of precision on definition of the terms 'gender informed' and 'gender projects' (as raised in the discussion) raises further questions about the comparison.
(3) The references in the text to Tables 2 and 3 appear to be incorrect on page 8 column 1, or the tables are incorrectly labelled. The percentage of WB funding in the health sector is provided in Table 3 (not Table 2); while the definition of themes is provided in Table 2 (not Table 3).

Study design and methods
In terms of methods, the study is described as a 'case study'. However, a conceptual framework 2. 3.

5.
In terms of methods, the study is described as a 'case study'. However, a conceptual framework for the study, and in particular the conceptual basis for the proposed comparison between policy and programs, is not presented.
There is an assumption that policy should be reflected in financing. But, noting that four phases of policy are described over several time periods, it is not clear which policy is being compared to which funding, over what time period.
The expected linkage between policy statements and project financing is not described. Here more explanation of the process by which policy statements are 'translated' into project financing would assist in understanding what might be expected in terms of project financing and how it might reflect policy changes.
Some key terms are not defined -for example what is meant by 'governance of gender' ? (it appears to refer mainly to a description of policy formulation, rather than governance in the sense of institutional roles and relationships).

Details of methods and analysis
Some details of the methods and analysis are also missing.
The criteria for inclusion / exclusion of the events and policies listed in the Timeline of Figure 1 is not provided. The process of selection of the events and policies for the timeline could be clarified.
The process for construction of Figure 2 could also be further clarified. It appears that the initial step in analysis of both databases was to identify projects with gender themes, and then to exclude non-health projects. Would it have made a difference if the process was reversed, and health projects selected first, then non-gender removed ? Was there a rationale for the start with gender projects ? It is also not clear whether there is a potential for projects to be included in both databases, and whether a check for duplicates was undertaken. The labelling for Figure 2 should note that the dollar amounts are in USD million.

Results
It is difficult to interpret the graphs and figures on funding by theme and period without some information on the overall funding envelope in health and gender over the period. This is provided in Table 3, but there is little text description of the information in Table 3, except for Page 8 para 2 (incorrectly referred to as Table 2); and in the discussion, page 15 (where figures are provided from Table 3, but without the reference to the table). There does not appear to be any effort to relate the changes in project financing with the policy phases described in Figure 1.

Discussion and conclusions
The key finding from the paper is a disjunction claimed between policies on gender mainstreaming and the proportion of funds allocated to gender projects. This is contrasted with the Bank's claim on the proportion of 'gender informed' lending.
The discussion includes an examination of the limitations encountered in obtaining and interpreting the data. These limitations suggest, however, more caution in some of the authors' claims, given the questions on the completeness and definitions used in the project databases, and lack of the questions on the completeness and definitions used in the project databases, and lack of clarity on the measurement of 'gender informed'.
More fundamentally, the issue of the measurement of 'gender informed' projects raises an inconsistency which lies at the heart of what the study aims to do. Is it possible to measure the success of a gender mainstreaming policy by the proportion of funds allocated to 'gender projects'? Surely the success of gender mainstreaming would be seen in a decrease in gender specific projects, and an increase in the incorporation of gender into all other projects. The paper addresses the issue of the meaning of 'gender informed', but has not questioned the fundamental assumption of whether measurement of allocation of funding to 'gender projects' is an appropriate way to measure the implementation of gender policies, particularly if the policy direction is towards 'mainstreaming'.
Perhaps the most important finding from the study is the difficulty faced by independent observers in assessing and measuring the extent to which a key global organisation translates its policies into program actions. The proportion of funding allocated to projects satisfying specific gender criteria may be a crude measure of the application of gender inclusion, but, in the absence of other data, they become the default measure. As the authors conclude, how we measure our achievements determines whether they will be classed as successful or not. Their recommendations for improved reporting and a clearer 'gender informed project' indicator would go some way to address the deficiencies found.

If applicable, is the statistical analysis and its interpretation appropriate? Not applicable
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Partly
No competing interests were disclosed.

Competing Interests:
Referee Expertise: Health systems governance and performance, with a focus on Asia Pacific I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 10 Aug 2018 Author Response 10 Aug 2018 , University of Edinburgh, UK Janelle Winters

We have submitted a new version (version 2) of our article, which should be released in August 2018. Our responses to Dr. Krishna Hort's valuable feedback are below (in bold).
Thanks for the opportunity to review and comment on this paper. The paper is addressing an important question, and, by providing an external analysis of World Bank data, is contributing to independent scrutiny of development policy and funding. While the paper is only partly successful in its aim of comparing World Bank gender policies with financing, it does draw attention to issues in the availability and usability of World Bank data that are deserving of attention by the Bank and other development agencies.
Thank you for your detailed feedback. We agree that the paper could benefit from tightening our aim. In the abstract and introduction, we have clarified that our research study is intended as a scoping review of general gender policy development at the Bank and of how projects with gender themes at the Bank have been financed in the health sector, rather than a case study of gender policy at the Bank. To our knowledge, this is the first external analysis to produce a timeline of gender policies at the Bank, and to explore gender project financing in the health sector. We hope that future research will be able to use this paper as a base to compare gender policies and financing for gender-incorporating projects in global health in more depth, such as through case studies. As you rightly point-out, one of our take-home points is that comparing policy and financing for gender projects, particularly in specific sectors like health, is hampered by publicly available data. Ideally we would have been able to track financing for all health projects with a gender theme across all of our phases (1972-2017), but we were limited by data availability. We reinforce the data availability angle in our conclusion, through an adapted final paragraph on transparency, and in our discussion, through a more concrete discussion of how the gender informed indicator evolved at the Bank and the limitations of each database in capturing this indicator.
There are inconsistencies in the presentation, particularly in terms of purpose and objectives.
(1) Inconsistencies in how the study is described -its purpose and objectives. The abstract refers to ' a case study of how global health organisations frame their gender policies and measure their success'. While page 3 column 2 refers to a comparison of 'this policy.. with financing for gender in the health sector'. The paper describes the evolution of gender policies at the WB and seeks to interpret this in terms of framing; but much of the emphasis is on measurement of the proportion of financing for gender projects. The issue of measurement is raised, but in terms of the transparency of databases, rather than how success is measured.
You are correct to point-out that our use of the term 'case study' is confusing. We have refined the abstract to indicate that we are summarizing the development of the Bank's gender conceptualization and framework, and analysing its financing for gender projects in the health sector since 1985, rather than a case study.
We also realise that we may have introduced confusion by not clearly defining the different ways that the Bank defines (measures) its gender projects. We have updated our methodology and our use of the term 'gender project' throughout the manuscript to Table 1) is what we used to classify a project as a 'gender project'. This theme was assigned by Bank staff, and little information is available about how projects are classified by theme and why the datasets would include so many different projects for each theme. This theme classification does not mean that a project was a gender project. Instead, it primarily means that the project was assigned a gender theme percentage (between 5-100%), among up to four other themes (such as health system strengthening, etc., as shown in Tables 1 and 2). We considered any project with any gender theme percentage a 'gender project', although very few were exclusively gender focused. Additionally, the Bank has used a 'gender informed' indicator to report the percentage of its overall operations and the percentage of its financing that are gender informed (this is described in our discussion). Again, Bank staff assign this indicator, and it is given to a project if it considers gender at of three levels of the project cycle. Unfortunately, it is not any possible to search the DT and PO using the gender informed indicator (which is an issue, as we point-out in the revised discussion). We now suggest in the conclusion specific ways that the Bank could improve its transparency, using the gender theme and indicator, and productive future research for external researchers.

address this issue. In short, the project databases that we use (Development Topics [DT] address this issue. In short, the project databases that we use (Development Topics [DT] and Projects & Operations [PO]), the gender 'theme' (as described in
(2) The paper compares the proportion of Bank funding on gender projects with the proportion of projects that are 'gender informed' in three places (abstract, page 3 and page 15). There is an inconsistency in the percentage quoted as gender informed ( 98% in the abstract and discussion (page 15), and 97% on the last line column 1 of page 3). But more significantly, this comparison itself seems misleading, as one assumes that there are projects that are not 'gender projects' but that are informed by gender. The lack of precision on definition of the terms 'gender informed' and 'gender projects' (as raised in the discussion) raises further questions about the comparison.
Actually, there is not a consistency in our reporting of the percentage of gender informed projects, although we understand why this would appear to be the case. The Bank reported the percentage of FY13 Bank projects that were gender informed in two ways: as a percentage of all operations (98%, based on the number of projects), and as a percentage of total funding (97%, based on the aggregate commitments to these projects). We have inserted parentheses to make this distinction more evident, and clearly cited each figure.
We also see why it would seem unfair to compare 'gender informed' and 'gender projects', particularly if gender projects had a primary gender focus and gender informed projects merely had to have some sort of smaller-scale gender consideration. Based on our above description of the gender theme and gender informed indicator, we recommend that the Bank be more clear in its application of and inclusion/exclusion criteria for the 'gender theme' to projects, especially because some projects only have a small percentage for their gender theme (5-10%). When we wrote our first draft of the paper, we were unaware that the World Bank had a publicly searchable database, through its World Bank Finances portal (rather than its Projects & Operations or Development Topics project portals). This Monitoring Gender Mainstreaming database includes a list of projects, only some of which are classified by sector, with their gender indicator score and sometimes whether they have a gender theme. We have analysed the gender indicator scores and presence of a gender theme for each project in the HNP sector, and included this analysis in our methodology, results, and discussion. We find that the gender informed indicator is poorly correlated with a gender theme, and that it is unclear how designations of indicator poorly correlated with a gender theme, and that it is unclear how designations of indicator and theme are made internally at the Bank.
As we now suggest in the conclusion, the Bank should make its gender informed indicator searchable in the PO database, so that scholars could use this indicator instead of project themes to track gender financing. They should also release more historic data, since the Monitoring Gender Mainstreaming database only includes partial data for the years 2009-2016. Based on your comment, we have replaced the term 'gender projects' throughout the manuscript with the phrase 'projects with a gender theme', and have more deliberately defined the difference between themes and informed indicators. We hope that this makes our comparisons between the gender informed indicator, gender theme, and wider dialogue about mainstreaming clearer.
(3) The references in the text to Tables 2 and 3 appear to be incorrect on page 8 column 1, or the tables are incorrectly labelled. The percentage of WB funding in the health sector is provided in Table 3 (not Table 2); while the definition of themes is provided in Table 2 (not Table 3).
Thank you for catching this mistake -this was our error. We have corrected it, and also added a graph (Figure 2, based on Table 3 data) to more visually compare projects with a gender theme to the phases in our timeline.

Study design and methods
In terms of methods, the study is described as a 'case study'. However, a conceptual framework for the study, and in particular the conceptual basis for the proposed comparison between policy and programs, is not presented.
There is an assumption that policy should be reflected in financing. But, noting that four phases of policy are described over several time periods, it is not clear which policy is being compared to which funding, over what time period.
The expected linkage between policy statements and project financing is not described. Here more explanation of the process by which policy statements are 'translated' into project financing would assist in understanding what might be expected in terms of project financing and how it might reflect policy changes.
Some key terms are not defined -for example what is meant by 'governance of gender' ? (it appears to refer mainly to a description of policy formulation, rather than governance in the sense of institutional roles and relationships).
As addressed above, we have removed the word 'case study' to avoid confusion. We have also removed the word 'governance', because you are correct that we are not considering policy through the lens of institutional structures or decision-making processes. We have replaced it with 'conceptualisation' of gender, which more clearly captures our interest in how the Bank's framing of gender has evolved over time and been generally operationalised.
We see value in your point that we could describe the link between policy statements and project financing (i.e. provide a methodological framework) in more depth. However, tracking financing as a proxy for understanding institutional priorities is widely accepted  . 653-671). We therefore feel that it is beyond the Development Policy Review scope of the paper to delve into methodological frameworks, particularly because our emphasis is more on transparency issues with indicators. We hope that readers will see our two results sections -the gender conceptualization timeline and finance tracking -as largely independent analyses, and have therefore scaled-down our discussion of rhetoric and reality in the context of policy implementation (and scaled-up our discussion of rhetoric and reality in the context of indicators and publicly available data) in both the introduction and discussion.

Details of methods and analysis
Some details of the methods and analysis are also missing.
The criteria for inclusion / exclusion of the events and policies listed in the Timeline of Figure 1 is not provided. The process of selection of the events and policies for the timeline could be clarified.
We have added a sentence in the methodology to more explicitly explain our inclusion/exclusion criteria for our events and policies timeline.
The process for construction of Figure 2 could also be further clarified. It appears that the initial step in analysis of both databases was to identify projects with gender themes, and then to exclude non-health projects. Would it have made a difference if the process was reversed, and health projects selected first, then non-gender removed ? Was there a rationale for the start with gender projects ? It is also not clear whether there is a potential for projects to be included in both databases, and whether a check for duplicates was undertaken. The labelling for Figure 2 should note that the dollar amounts are in USD million.
Thank you for noting that we are missing units in (what was then) Figure 2; we have corrected this omission (see Figure 3). We are grateful for your useful suggestion that we check for duplicates in both datasets; we were surprised to find that only four projects overlapped between the two project databases (i.e. very few projects were included in both datasets) and that only one project overlapped between all three databases. This reinforces the poor consistency of the Bank's use of the gender theme. Because gender theme percentages are assigned to all projects that we included in the PO dataset (and all projects in the DT dataset had a gender theme), theoretically the number of projects with a gender theme should be identical in both datasets. We have added a sentence in the methodology about checking for duplicates, and raised the issue in the results.
It does not make a difference to reverse the analysis steps, by sorting first for gender themes and then excluding health projects in the PO database. Each project is tagged in the databases by sector (i.e. health) and theme (i.e. gender, and other health themes) in the PO database. In the DT database, the Bank has already sorted projects into a gender category, so it is not possible to perform a reverse analysis. Future research could, instead of using pre-assigned themes, go through each project document from the PO dataset manually, and search for the word gender; this could shed light on the accuracy of dataset manually, and search for the word gender; this could shed light on the accuracy of the theme designation. As we suggest in the revised conclusion, future research could also attempt to bridge the Monitoring Gender Mainstreaming (which does not include project documents) and PO database (which includes project documents) for recent years, to track the association of the gender theme and gender inclusion indicator with gender-specific outcomes measures. This would go a long way to bridging the gap in policy dialogue with financing and project-level data, but would take months of data analysis and goes beyond the scope of our study.

Results
It is difficult to interpret the graphs and figures on funding by theme and period without some information on the overall funding envelope in health and gender over the period. This is provided in Table 3, but there is little text description of the information in Table 3, except for Page 8 para 2 (incorrectly referred to as Table 2); and in the discussion, page 15 (where figures are provided from Table 3, but without the reference to the table). There does not appear to be any effort to relate the changes in project financing with the policy phases described in Figure 1.
Thank you for raising this point, which was also highlighted by our other reviewer. We have added a graph (Figure 2), with more description, which combines data from Table 3 across three of our timeline's phases (phases II-IV). We would like to be able to compare project financing with gender themes before 1985, but no data has been released for this earlier phase. As much as we would have also liked to be able to focus explicitly on gender conceptualisation and policy for health in our timeline section (which would have facilitated direct comparison with our gender and health project financing analysis), we found extremely little literature on gender and health policy at the World Bank. We hope that this paper, and potentially improved transparency at the Bank, provides background for further research on gender policy and health.

Discussion and conclusions
The key finding from the paper is a disjunction claimed between policies on gender mainstreaming and the proportion of funds allocated to gender projects. This is contrasted with the Bank's claim on the proportion of 'gender informed' lending.
The discussion includes an examination of the limitations encountered in obtaining and interpreting the data. These limitations suggest, however, more caution in some of the authors' claims, given the questions on the completeness and definitions used in the project databases, and lack of clarity on the measurement of 'gender informed'.
You are correct that it is important to use caution in our claim. We had a paragraph in the discussion reinforcing this fact; that one could argue that all of our data analysis on projects with a gender theme does not capture the Bank's true gender inclusion in global health, but that this limitation merely highlights the poor data available to external researchers for understanding the Bank's gender portfolio. We have moved this paragraph earlier into the discussion, and qualified some of our findings from our financing analysis. As described earlier in our response, we have added recommendations to address these limitations in the discussion and conclusion.
More fundamentally, the issue of the measurement of 'gender informed' projects raises an More fundamentally, the issue of the measurement of 'gender informed' projects raises an inconsistency which lies at the heart of what the study aims to do. Is it possible to measure the success of a gender mainstreaming policy by the proportion of funds allocated to 'gender projects'? Surely the success of gender mainstreaming would be seen in a decrease in gender specific projects, and an increase in the incorporation of gender into all other projects. The paper addresses the issue of the meaning of 'gender informed', but has not questioned the fundamental assumption of whether measurement of allocation of funding to 'gender projects' is an appropriate way to measure the implementation of gender policies, particularly if the policy direction is towards 'mainstreaming'.
We believe that our above discussion of the gender theme and gender indicator respond to this concern that tracking 'gender projects' is not a fair way to measure implementation of mainstreaming.
Perhaps the most important finding from the study is the difficulty faced by independent observers in assessing and measuring the extent to which a key global organisation translates its policies into program actions. The proportion of funding allocated to projects satisfying specific gender criteria may be a crude measure of the application of gender inclusion, but, in the absence of other data, they become the default measure. As the authors conclude, how we measure our achievements determines whether they will be classed as successful or not. Their recommendations for improved reporting and a clearer 'gender informed project' indicator would go some way to address the deficiencies found.
We agree that data availability and transparency in use of indicators may be our most important take-home messages. Thank you again for your detailed feedback.

None.
Competing Interests: 26  Janelle Winters and colleagues have conducted an analysis of the World Bank's gender policies and the Bank's financing of gender programs. As discussed further below, there have been several previous analyses of the World Bank's gender projects, but Winters and colleagues' study applies a specific global health lens, which appears to be novel.
There are two parts to the study, which are somewhat disconnected. The first part is a literature review, reviewing both the peer-reviewed and grey literature, to construct a "timeline" of the evolution of the Bank's gender focus. The second part is a quantitative financial analysis of the Bank's spending on gender projects. The timeline resulting from part 1 is shown as a figure (Figure 1) and also described through 4 key phases (women in development [WID]; institutionalization of WID; gender mainstreaming; through 4 key phases (women in development [WID]; institutionalization of WID; gender mainstreaming; gender equality through 'smart economics'). This timeline is very valuable. We think it will provide a helpful "roadmap" for others who wish to conduct research related to the World Bank's gender portfolio in the future. The results of Part 2 suggest that from 1985-2016, the Bank committed only between 1.2-6.2% of IBRD and IDA commitments to "gender projects." Below we comment on the importance, originality, validity, presentation, and interpretation of the research.

Importance of research question
The role of gender in development has gained increasing momentum and has moved higher up the agenda with SDG 6, "achieve gender equality and empower all women and girls." Many development and health agencies, including the World Bank, make claims about how they are mainstreaming gender in their work. An external assessment such as this one is an important way to keep the World Bank accountable and provides a foundation to advocate for changes if results deem it necessary.
This study is also very timely indeed given that the Swedish Institute for Global Health Transformation has just launched a new Lancet Commission exploring the links between SDGs 3 (health), 5 (women and girls), and 16 (institutions) (one of us, GY, is a Commissioner). Understanding how a major development institution approaches gender is helpful for that Commission's work.

Originality of the research
We would like to make two points on originality. First, one of us (GY) used to be a journal editor, at the BMJ and PLOS, and at both publishers we would not let authors claim that there has "never" been a similar study. What we asked authors to say, out of caution, is something like "to the best of our knowledge, there have been no previous studies that examined X and we believe ours is the first." This terminology allows for the possibility that you may have missed a study (e.g. in a non-English language, in the grey literature, in a consulting report, by the Bank itself, etc.).
Second, there have been several studies of the World Bank's gender policies in recent years and we think it would be helpful for the authors to more explicitly review and summarize what these found. Perhaps the most high profile was Kenny and O'Donnell's study "Do the Results Match the Rhetoric? An Examination of World Bank Gender Projects," published by CGD. That study used a Bank dataset, "Monitoring Gender Mainstreaming in World Bank Lending Operations," for its analysis (with 1666 projects that date from July 2009 to June 2014). We also note that the Bank itself conducted a study of its gender policy that included health sector projects (Evaluating a Decade of World Bank Gender Policy: 1990-99, World Bank Operations Evaluation Department, 2005).

Validity of the research
Overall, the methods seem appropriate and the mix of a more qualitative literature review with a quantitative financial analysis is a strength. However, as mentioned, it would be helpful to better connect these two distinct parts of the study (one easy way to do this, for example, would be to display key measurements/commitments in each phase of World Bank gender policies alongside financial data from the years during that phase). It was excellent, and highly valuable, to see a comparison of the World Bank's financial data from the PO and DT databases with a broader analysis of development assistance for gender projects using the CRS database. The breakdown of gender funding in the health sector by theme and also by recipient country are also interesting and helpful.
Below, we make a few specific comments about the overall approach and we note some minor inconsistencies or possible errors.
First, it is heartening to see in the introduction a discussion of men and the LGBTI community. As the authors note, in many parts of the world, men's health outcomes are much worse (e.g. the IHME's GBD2010 study found that women in the Russian Federation were outliving men by an average of 11.6 years). In a paper that one of us co-authored (reference 9 in Winters 's paper), we note that, "In et al many societies, men generally enjoy more opportunities, privileges and power than women, yet these multiple advantages do not translate into better health outcomes." This is likely to be due to a combination of factors, including risk-taking behavior, occupational exposure to risk factors, gendered norms of male behavior, etc. While the introduction makes this point, it is not clear whether the authors specifically examined whether the World Bank has any specific policies on or funding for men's health or LGBTI health. EMRO (WHO Europe) is developing its first men's health strategy (due to be published in September 2018) and PAHO is also working on this issue, so it would be good to know where the Bank is.
Second, while it is clearly highly appropriate to use the gender policy marker for analyzing projects in the CRS database, it would be helpful to know if there are any data on how well this marker captures gender-specific project financing. Might some projects be missed? Would there be value, for example, in taking a sample of projects that were not captured by the marker to see if gender was included?
Third, in Figure 2, the numbers from the Development Topics Database don't seem right-it starts with n = 90, then 8 were dropped, but then the figure still says 90 (after dropping 8, it should say 82). In addition, we think it would be good to mention the date restriction in figure 2 itself and not just in the methods section. Looking at figure 2 right now, readers may think the search was from January 1, 1985 through December 31, 2017 (the figure says 1985-2017, and this would give 92 projects); the methods section states a narrower date range, i.e. to July 1, 2017 (which yields 90 projects).
Fourth, it would be good to update the projects, since the World Bank now lists 102 gender projects: . http://www.worldbank.org/en/topic/gender/projects/all Presentation The presentation is generally clear.

Interpretation
The conclusions focus primarily on the shortcomings of the reporting mechanisms and indicators for gender mainstreaming in general. Gender reporting shortcomings were also a key theme in the CGD study on World Bank gender rhetoric. It would be valuable, we think, to discuss the concrete concerns with the gender project evaluation criteria moving forward and the conclusions could align more with the specific research question on gender-focused health programs.
There is one comparison that the authors make that we think may not be a valid one. The authors note "the Bank's recent claim that 98% of its total lending is gender informed" and then they compare this 98% figure with the proportion of IBRD/IDA financing that is specifically for gender projects. Is this really an apples to apples comparison? Being "gender informed" is not the same as financing a gender project. apples to apples comparison? Being "gender informed" is not the same as financing a gender project.

Is the work clearly and accurately presented and does it cite the current literature? Partly
Is the study design appropriate and is the work technically sound? Yes

If applicable, is the statistical analysis and its interpretation appropriate? Not applicable
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Partly
One of us (GY) personally knows one of the authors (DS); he has been at Competing Interests: meetings with DS, but they have not collaborated or co-authored work.
We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.
Author Response 10 Aug 2018 , University of Edinburgh, UK Janelle Winters

We have submitted a new version (version 2) of our article, which should be released in August 2018. Our responses to Dr. Gavin Yamey and Dr. Kaci Kennedy's feedback are below (in bold).
Thank you for giving us the opportunity to peer review this interesting and potentially important study.
Thank you for all of your valuable comments; they are very appreciated, and we have done our best to refine our analysis to incorporate them.
Janelle Winters and colleagues have conducted an analysis of the World Bank's gender policies and the Bank's financing of gender programs. As discussed further below, there have been several previous analyses of the World Bank's gender projects, but Winters and colleagues' study applies a specific global health lens, which appears to be novel.
There are two parts to the study, which are somewhat disconnected.
The first part is a literature review, reviewing both the peer-reviewed and grey literature, to construct a "timeline" of the evolution of the Bank's gender focus. The second part is a quantitative construct a "timeline" of the evolution of the Bank's gender focus. The second part is a quantitative financial analysis of the Bank's spending on gender projects. The timeline resulting from part 1 is shown as a figure (Figure 1) and also described through 4 key phases (women in development [WID]; institutionalization of WID; gender mainstreaming; gender equality through 'smart economics'). This timeline is very valuable. We think it will provide a helpful "roadmap" for others who wish to conduct research related to the World Bank's gender portfolio in the future. The results of Part 2 suggest that from 1985-2016, the Bank committed only between 1.2-6.2% of IBRD and IDA commitments to "gender projects." We appreciate this feedback, and also felt that it was challenging to insert global health frameworks and policies explicitly into our 'roadmap' for gender at the Bank (although we like this phrase and have borrowed it in the manuscript). In the abstract and introduction, we have clarified that our research study is intended as a scoping review of general gender policy development at the Bank and of how projects with gender themes at the Bank have been financed in the health sector, rather than a case study of gender policy at the Bank. We hope that the literature review provides a strong context for readers to understand and discuss the financial results.
Specifically, we intend this paper to act as a base for future research to compare gender policies and financing for gender-incorporating projects in global health in more depth, as suggested in the revised conclusion. Ideally we would have been able to track financing for all health projects with a gender theme across all of our phases (1972-2017), but we were limited by data availability. We reinforce the data availability angle in our conclusion, through an adapted paragraph on transparency, and in our discussion, through a more concrete discussion of how the gender informed indicator evolved at the Bank and the limitations of each database in capturing this indicator.
Below we comment on the importance, originality, validity, presentation, and interpretation of the research.

Importance of research question
The role of gender in development has gained increasing momentum and has moved higher up the agenda with SDG 6, "achieve gender equality and empower all women and girls." Many development and health agencies, including the World Bank, make claims about how they are mainstreaming gender in their work. An external assessment such as this one is an important way to keep the World Bank accountable and provides a foundation to advocate for changes if results deem it necessary.
This study is also very timely indeed given that the Swedish Institute for Global Health Transformation has just launched a new Lancet Commission exploring the links between SDGs 3 (health), 5 (women and girls), and 16 (institutions) (one of us, GY, is a Commissioner). Understanding how a major development institution approaches gender is helpful for that Commission's work.
Thank you for this point -we agree that considering how gender is incorporated in health sector projects is particularly timely, and encourage future research using the three publicly available databases that we describe.

Originality of the research
We would like to make two points on originality. First, one of us (GY) used to be a journal editor, at the BMJ and PLOS, and at both publishers we would not let authors claim that there has "never" been a similar study. What we asked authors to say, out of caution, is something like "to the best of our knowledge, there have been no previous studies that examined X and we believe ours is the first." This terminology allows for the possibility that you may have missed a study (e.g. in a non-English language, in the grey literature, in a consulting report, by the Bank itself, etc.).
We agree and have revised our claims throughout the paper. We very much appreciate you raising this constructive point.
Second, there have been several studies of the World Bank's gender policies in recent years and we think it would be helpful for the authors to more explicitly review and summarize what these found. Perhaps the most high profile was Kenny and O'Donnell's study "Do the Results Match the Rhetoric? An Examination of World Bank Gender Projects," published by CGD. That study used a Bank dataset, "Monitoring Gender Mainstreaming in World Bank Lending Operations," for its analysis (with 1666 projects that date from July 2009 to June 2014). We also note that the Bank itself conducted a study of its gender policy that included health sector projects (Evaluating a Decade of World Bank Gender Policy: 1990-99, World Bank Operations Evaluation Department, 2005).
We had referenced both papers, but you are correct that we did not engage with them optimally. Kenny and O'Donnell's (Center for Global Development, CGD) study provides a base for many of our points raised in the discussion about the Bank's gender and health financial portfolio (which we have now restructured, to highlight them more explicitly). Upon taking a fresh look at this study, we realised that this database on gender and global health, Monitoring Gender Mainstream is available through the Bank's Finances portal. It has also been updated to include two years of projects since the CGD study (now 2060 projects). We therefore performed a basic analysis of this third database for all projects with a HNP global practice designation. This database is especially helpful for comparing overlaps between the gender informed indicator and the gender theme, but is not searchable by health theme and only includes HNP data from 2013-2016. Details of this analysis are in the methods and results.

Validity of the research
Overall, the methods seem appropriate and the mix of a more qualitative literature review with a quantitative financial analysis is a strength. However, as mentioned, it would be helpful to better connect these two distinct parts of the study (one easy way to do this, for example, would be to display key measurements/commitments in each phase of World Bank gender policies alongside financial data from the years during that phase). It was excellent, and highly valuable, to see a comparison of the World Bank's financial data from the PO and DT databases with a broader analysis of development assistance for gender projects using the CRS database. The breakdown of gender funding in the health sector by theme and also by recipient country are also interesting of gender funding in the health sector by theme and also by recipient country are also interesting and helpful. This is another helpful suggestion. We have added Figure 2, which compares Bank health sector commitments and commitments to projects with a gender theme during the final three phases (1985-2016, PO database). Data is not available to track health commitments for projects with a gender theme or gender inclusion indicator before these dates.
Below, we make a few specific comments about the overall approach and we note some minor inconsistencies or possible errors.
First, it is heartening to see in the introduction a discussion of men and the LGBTI community. As the authors note, in many parts of the world, men's health outcomes are much worse (e.g. the IHME's GBD2010 study found that women in the Russian Federation were outliving men by an average of 11.6 years). In a paper that one of us co-authored (reference 9 in Winters 's paper), et al we note that, "In many societies, men generally enjoy more opportunities, privileges and power than women, yet these multiple advantages do not translate into better health outcomes." This is likely to be due to a combination of factors, including risk-taking behavior, occupational exposure to risk factors, gendered norms of male behavior, etc. While the introduction makes this point, it is not clear whether the authors specifically examined whether the World Bank has any specific policies on or funding for men's health or LGBTI health. EMRO (WHO Europe) is developing its first men's health strategy (due to be published in September 2018) and PAHO is also working on this issue, so it would be good to know where the Bank is. This is certainly an interesting comment, and we agree that it would be helpful to know where the Bank stands on issues related to men and the LGBTI community. However, we feel that this goes beyond the scope of our paper; we did not encounter specific Bank policies in the published papers that we reviewed (although we did not complete a systematic review and performed the review last August, so it is possible that we missed some references) and did not find health themes related to LGBTI or men's health issues in the PO database. We would certainly recommend that future research looks into the Bank's current position. This might be better accomplished by performing targeted keyword searches of project documents with a gender theme in the PO database, or through interviews with Bank staff, particularly those involved in its 2016-2023 gender strategy.
Second, while it is clearly highly appropriate to use the gender policy marker for analyzing projects in the CRS database, it would be helpful to know if there are any data on how well this marker captures gender-specific project financing. Might some projects be missed? Would there be value, for example, in taking a sample of projects that were not captured by the marker to see if gender was included?
We have added a few sentences in the discussion, about the fact that three gender all databases that we analyse (the PO, DT, and Monitoring Gender Mainstreaming databases) seem to be missing projects. The PO and DT databases have very different projects listed (and only four overlapping projects), despite the fact that they are sorted by gender theme, the Monitoring Gender Mainstreaming database only includes very recent HNP projects, and only one project is listed in all three databases. We also point-out that many projects with a '3' (highest) gender informed indicator rating are not classified as having even a low percentage gender theme, such that there is considerable ambiguity. We even a low percentage gender theme, such that there is considerable ambiguity. We suggest in our conclusion that a very valuable follow-up research project will be to take a sample of projects that have a gender informed rating of 1-3, and searching their project documents to see if and in what ways gender objectives and outcomes indicators are addressed. Similarly, it would be very valuable to take all health sector projects with a gender theme in the PO database, and look for references to gender in their project information and appraisal documents (including any gender informed indicator scores), to see how well they match with data reported in the Monitoring Gender Mainstreaming database. Again, however, these valuable projects go beyond the scope of our paper.
Because our analysis is primarily focused on the Bank's gender project financing and indicators, we decided only to venture into the OECD gender marker as a basic comparison. We have included a sentence that references criticisms of the OECD gender marker (which, like the Bank gender themes and gender informed indicator is largely self-reported based on largely non-transparent staff appraisals of projects). Figure 2, the numbers from the Development Topics Database don't seem right-it starts with n = 90, then 8 were dropped, but then the figure still says 90 (after dropping 8, it should say 82). In addition, we think it would be good to mention the date restriction in figure 2 itself and not just in the methods section. Looking at figure 2 right now, readers may think the search was from January 1, 1985 through December 31, 2017 (the figure says 1985-2017, and this would give 92 projects); the methods section states a narrower date range, i.e. to July 1, 2017 (which yields 90 projects).

Third, in
This was entirely our error, and we are grateful to you for catching it. The project numbers are updated in Figure 2 (which is now Figure 3). We have also more clearly added the date restriction in the figure.
Fourth, it would be good to update the projects, since the World Bank now lists 102 gender projects: . http://www.worldbank.org/en/topic/gender/projects/all We respectfully feel that this update is unnecessary, given that we are following the standard practice of clearly noting when we pulled our data from the database prior to beginning analysis. While it is always ideal to have the most up-to-date data, this would require us to re-do most of our data files, calculations, and graphs. If we do a follow-up study, as suggested in the conclusion, we will certainly use the most up-to-date projects listed.

Presentation
The presentation is generally clear.

Interpretation
The conclusions focus primarily on the shortcomings of the reporting mechanisms and indicators for gender mainstreaming in general. Gender reporting shortcomings were also a key theme in the CGD study on World Bank gender rhetoric. It would be valuable, we think, to discuss the concrete concerns with the gender project evaluation criteria moving forward and the conclusions could align more with the specific research question on gender-focused health programs.
You are absolutely right that we could have done a better job discussing tangible recommendations for gender project (or projects considered gender informed) evaluation criteria. We have restructured our discussion to focus more explicitly on three take-home messages about the Bank's health portfolio, based on our financial analyses (i.e. transparency and poor correlation between gender themes and indicator scores for the health sector; the focus on middle-income countries and implications for the Bank's poverty agenda; and poor tracking of health trust funds that may have gender components). We have also added significantly to our conclusion, to discuss specific recommendations about how to improve transparency of gender-focused health programmes and suggested future research about the Bank's health portfolio.
There is one comparison that the authors make that we think may not be a valid one. The authors note "the Bank's recent claim that 98% of its total lending is gender informed" and then they compare this 98% figure with the proportion of IBRD/IDA financing that is specifically for gender projects. Is this really an apples to apples comparison? Being "gender informed" is not the same as financing a gender project.
Our paper's other reviewer made a similar point, and it is fair. In our new manuscript version, we have addressed this issue by (1) more clearly stating that we considered projects with a distinct gender theme percentage listed to be gender projects, and more clearly defining what is included in the gender informed indicator rating; (2) clearly distinguishing, throughout the methods and results, the gender projects that we tracked in the PO and DT databases from gender informed projects that we tracked in the Monitoring Gender Mainstreaming database; and (3) comparing the Bank's 98% gender informed financing claim with the gender project theme commitments and the both gender informed indicator scores. When combined with our discussion about transparency and missing projects in each database, as well as our qualifications about our findings due to data quality, we believe that the comparison has become much more valid.
Thank you again for taking the time to provide us with such constructive feedback.