Development aid contracts database: World Bank, Inter-American Development Bank, and EuropeAid

This article presents a global database of government contracts funded by the World Bank, Inter-American Development Bank and EuropeAid, principally from the years 2000-2017. The contract-level data were directly collected from the official contract publication sites of these organisations using webscraping methods. While the source publication formats are diverse both over time and across publishers, we standardized and harmonized the datasets so that they can be analysed jointly. The datasets contain key information on the contracting parties (e.g. buyer and supplier names) the contract's content (e.g. contract value and product description) and details of the contracting process (e.g. contract award date or the procedure followed). In addition, it also contains information on the development aid projects of the contracts (e.g. project title and value). The data has wide reuse potential for researchers looking for detailed micro-level information on how major development aid spending takes place and what impacts it has. This database underlies the research article “Anti-corruption in aid-funded procurement: Is corruption reduced or merely displaced?” [1] which develops corruption risk indicators using the dataset presented.

Data were scraped and downloaded from the official websites of the multilateral institutions. Data format Raw and analyzed Description of data collection Scraping and downloading data involved the collection of all publicly available information related to all development aid projects as well as the corresponding procurement processes from the organisations' official publication websites (as of 2019). All relevant fields available on the sources have been automatically collected and manually verified. Project-level data were linked to contract-level information through a unique project identifier. The list of variables was standardized and harmonized among the three multilateral development agencies. Whenever possible, values have also been standardized, for example contract values exchanged to Intl. USD. Based on the combined and standardized dataset, a list of tendering risk indicators has been developed. Data source location Primary data sources: the raw data on the development aid projects and corresponding procurement processes are available on the organizations' official publication websites: •

Value of the Data
• The exceptionally broad scope of the dataset makes it valuable for a wide set of researchers and policy analysis. It offers detailed and accurate insights into where and how development aid is spent. • Academics, national governments, and donor agencies can use the data to monitor and assess public procurement and project performance across the world, including tracking corruption risks. • Aid contracts and projects data can be combined with further datasets such as company registry data or sectoral performance indicators in order to gain a more comprehensive assessment of development aid effectiveness. • This dataset adds value to existing macro-level datasets on development aid flows by providing rich micro-level information covering 3 large donors active across the globe. Micro-level data on the process and outputs of aid projects provide a much needed detail to understanding the mechanisms and constraints of facilitating development in Low and Middle Income Countries.

Data Description
The financial monitoring of the distribution of development aid has been increasingly challenging in the field of development economics and public policy. In order to move towards increased accountability and higher effectiveness of development aid, donors seek opportunities to strengthen the evidence-based and apply risk assessment models. The data presented here combines information on development aid projects of the world's largest multilateral development agencies: World Bank (WB), Inter-American Development Bank (IADB), and EuropeAid (EC). The data on projects is also linked to procurement contracts related to the implementation of these projects with a set of contract-level corruption risk indicators. The data provide a comprehensive overview of development aid spending along with a range of process and output features, also including corruption risk indicators. However, it is necessary to highlight that the datasets do not represent the full amount of development aid provided by the three donors. Due to the country-specific regulations of the development aid agencies, contracts below a certain threshold do not get published on a donor's website.
Aid-funded public procurement processes start with a call for tenders or request for quotations. This is when the buyer approaches the market or potential suppliers directly. Then, interested bidders submit their bids which are assessed by the assessment committee of the buyer. The decision is published in a contract award notification and then contract implementation commences. The procurement process is completed by delivering according to the contract or incomplete termination of the contract. Each procurement tender is part of a development aid project which are approved both by the donor and the recipient government. Typically, one project would lead to a number of procurement tenders and contracts. While these processes are complex (multi-stage, multi-level), our database contains information on major steps and features for both projects and contracts. The level of observation in the dataset is a public procurement contract which is the lowest unit of observation of the project and procurement cycles. By implication, features characterising higher-level observations such as projects are repeated for all corresponding contracts (rows).
The below data description reports parameters on an unfiltered dataset which includes all available procurement information on contracts that were both successfully concluded as well as failed/got canceled. It is also possible to select contacts from completed procedures using the condition filter_ok = 1 (while there is no definitive flag on the official publication about cancellation, we denote contracts as cancelled if they fail to have a winning supplier name).
The combined dataset represents more than 15,0 0 0 projects and 40 0,0 0 0 contracts ( Table 1 ), covering nearly all countries of the world ( Fig. 2 ) (there is no project information available for the EuropeAid dataset). While the IADB data goes back to 1961, the bulk of the dataset covers 20 0 0-2017 ( Fig. 1 ). Nevertheless, there is a notable difference between the datasets with the World Bank data being the most comprehensive. The combined dataset is also highly diverse in terms of types of products purchased ranging from social services to major construction projects ( Fig. 3 ).

Data collection, cleaning, and standardization
The data collection process consists of a series of steps. First, we scraped and downloaded all the relevant information available on the online publication pages of the 3 agencies (World Bank, IADB, and EuropeAid). Second, we parsed, cleaned, and merged all the acquired data for the three agencies separately. Finally, we standardized and harmonized variable content, format, and measurement units across contracts coming from the 3 organisations which allowed us to construct a combined database. To provide greater detail, we discuss each step of the data collection process below. Given the heterogeneity of the 3 data sources, we report the parsing and processing steps for each multilateral agency separately.
For the World Bank dataset, the main source was the organization's website 1 . On its website, the World Bank reports the information on development aid projects as well as related contract notices, contract awards, and concluded contracts ( Table 2 , Panel A). We parsed the data on both projects and the associated procurement documents. The linking of the datasets was done through the unique project identification number assigned to each development aid project as well as specified in all related procurement documents. In addition to the project ID number, procurement documents have unique identifiers that allow us to link information on the contract level ( Table 2 , Panel B). Unfortunately, not all procurement records could be mapped to projects due to errors and inconsistencies in the source data. The IADB data was scraped from the organization's website. The data inputs were development aid projects information and associated contract notices and awards ( Table 3 , Panel A). In the process of matching the three inputs, we had to exclude contract notices since they are missing a unique identifier that could link them to contract awards. The merged dataset included project data and related contract awards that were linked through a unique project ID ( Table 3 , In the case of the EuropeAid data, the organization's website contained limited information on projects and related procurement procedures, therefore, we used an alternative official source for data scraping -Tender Electronic Daily (TED) ( Table 4 , Panel A). TED is the European public platform dedicated to public procurement which publishes documentation on opportunities for public procurement as well as concluded public procurement contracts in the European Union and European Economic Area. While TED functions as the EU-wide platform, for this database compilation, we narrowed down the search of procurement records to external aid programmes and further to European Development Fund and External aid. From the TED website, we scraped all relevant contract notices and contract awards. The matching of contract notices to contract awards was done by using a combination of unique identifiers: tender ID, record iD, and lot title which enabled the identification of each lot within a tender (since there can be several lots in one contract notice). There was no project information available on this source.
Once the individual organizations' data sources were scraped and merged, we standardized variable names and formats to compile the 3 datasets into a single database. Table 5 presents the    list of the project-and procurement-related variables that are present in the combined dataset. As the 3 sources contain a wide set of, often idiosyncratic variables, we selected those for the combined dataset which fulfilled the following criteria: • high value-added to the understanding of development aid projects and procurement processes • high quality of the data and • presence in at least two out of the three data sources.
Overall, the shortlisted variables comprehensively describe development aid projects and the procurement processes associated with their implementation. The share of missing observations for each variable is presented in the Appendix, Table A1 .
Following the harmonization of the variables' names and formats, we performed cleaning and standardization steps to ensure the consistency of the combined data. Firstly, we created a filter (filter_ok) that narrows down the sample of procurement processes to successfully completed procedures. Due to data complexity, no criterion directly shows if an observation represents an awarded contract. Therefore, we assumed that an awarded contract has a non-missing winning supplier name, conversely, a procurement record without supplier name was not awarded. In this paper, the reported numbers represent the characteristics of the whole sample which includes both failed and completed procurement procedures. The filter variable is included in the combined dataset making filtering options easily accessible by data users.
Locations are of crucial value for a range of uses of this dataset, hence we implemented a series of data enrichment procedures. We used Here Maps API 2 to enhance the unstructured supplier address data in the IADB dataset. We also used the "kountry" Stata module [2] to standardize all country names for projects, buyers and suppliers. As for contract values, we provided the user with the directly reported prices and the purchasing power parity adjusted prices along with the Worldbank's Purchasing power parity (PPP) conversion rates 3 . Furthermore, we enhanced the product classification for contracts without product codes. We applied a token-based string matching technique to match contracts without product codes to the Common Procurement Vocabulary (CPV 2008) 4 based on tender/lot descriptions. Additionally, we supplemented entries with missing contract sectors using the CPV divisions from the product codes. Finally, to ensure completeness, we merged both the borrowing body and the procuring entity to generate the buyer name for the World Bank source while it is generated only using the procuring entity name in the IADB source.

Calculated risk indicators
Given that risk assessment is a major use case for the dataset, a set of risk indicators have been calculated based on the available project and procurement data. These corruption risk indicators capture the restricted and unfair access to public resources benefiting connected bidders in public procurement [3] . Risk indicator development and validation are based on already established methodologies [4] . Some of these risk indicators are also used in the linked publication for this article [1] . All risk indicators are calculated at the contract level, their summary and availability by data source are presented in Table 6 , while Fig. 4 presents the composite risk indicator, CRI for each source.
While this article presents an extensive description of available data and constructed individual as well as composite corruption risk scores, it does not present an exhaustive list of potential data applications. In addition to monitoring and assessing corruption risks in public procurement related to development aid projects, the dataset introduces the opportunity to measure transparency in development project documentation by inspecting what kind of data is available and what crucial pieces of information are missing. Missing bits of information in tender documentation could potentially be a result of a deliberate action aimed at limiting public access to some crucial facts such as, for instance, bidder name, title, contract value, procurement method, etc. Furthermore, the compiled database offers a great potential for further competition and collusion research given a wide pool of contracts and a high level of data granularity. With the available data, it is possible to shift a level of observation from a single contract to a more aggregated level of a bidder, buyer, product market, country, etc., to observe participants' behavior and the dynamics of market structure.

Ethics Statement
The data were obtained from the official websites of the World Bank, Inter-American Development Bank, and EuropeAid which publish the data with the aim of providing transparency and supporting accountability of their operations and spending. The data includes information on organisations and formal tenders and contracts, hence do not fall under personal data protection regulations in Europe or elsewhere (i.e. no personal information is processed).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.

Data Availability
Development Aid Contracts Database (Original data) (Mendeley Data).

Table A1
Report on missing values.

Table A2
Validation results.