PEST-CHEMGRIDS, global gridded maps of the top 20 crop-specific pesticide application rates from 2015 to 2025

Available georeferenced environmental layers are facilitating new insights into global environmental assets and their vulnerability to anthropogenic inputs. Geographically gridded data of agricultural pesticides are crucial to assess human and ecosystem exposure to potential and recognised toxicants. However, pesticides inventories are often sparse over time and by region, mostly report aggregated classes of active ingredients, and are generally fragmented across local or government authorities, thus hampering an integrated global analysis of pesticide risk. Here, we introduce PEST-CHEMGRIDS, a comprehensive database of the 20 most used pesticide active ingredients on 6 dominant crops and 4 aggregated crop classes at 5 arc-min resolution (about 10 km at the equator) projected from 2015 to 2025. To estimate the global application rates of specific active ingredients we use spatial statistical methods to re-analyse the USGS/PNSP and FAOSTAT pesticide databases along with other public inventories including global gridded data of soil physical properties, hydroclimatic variables, agricultural quantities, and socio-economic indices. PEST-CHEMGRIDS can be used in global environmental modelling, assessment of agrichemical contamination, and risk analysis.


Background & Summary
The amount and diversity of pesticides used in agriculture and horticulture have enormously increased since the Green Revolution to protect or increase yield, and enhance harvest and processing efficiency by the agro-food industry. The FAO reports 4.1 million tonnes of substances applied globally in 2015, that is, 35% greater than in 2000 1 . For the projected 9.77 billion people in 2050 2 and the expected land conversion into arable production, global pesticide applications are likely to increase. Currently, the USGS and the European Commission list about 500 approved active ingredients, which may differ from country to country and can span across 18 pesticide classes. Regulatory authorities approve pesticides that do not pose toxicological health risk or are considered tolerable or non-persistent at the application rates recommended by manufacturers; however, there is evidence of pesticides residues in the environment 3,4 . Some pesticides can pose negative impacts on terrestrial and aquatic biodiversity even at sublethal doses such as triazines herbicides (e.g., atrazine) on the sexual development of amphibians 5 , the pre-emergence herbicide trifluralin on aquatic life 6 , or more recently, neonicotinoid insecticides (e.g., chlothianidin) on the immune system of bees 7 . Other organophosphate pesticides have been argued to potentially alter the biodiversity and shape ecosystem functions either when intensively used over large surface areas such as the non-selective herbicide glyphosate 8 or when their toxicity persists for long periods such as the glyphosate degradation byproduct AMPA 9 , or other legacy pesticides such as DDT. Mixtures of common pesticides were shown to decrease the microbial species richness in laboratory samples by 15 to 30% 10 , while global scale analyses suggest an overall biodiversity decline due to loss in pollinators 11 and land use-associated perturbations including the release of pollutants 12 .
There is therefore a pressing need, as recently highlighted by the Lancet Commission on Pollution and Health 13 , to specifically broaden our understanding of the global-scale use of pesticides and associated impacts on human

Methods
Development of the PEST-CHEMGRIDS data release required the use of multiple publicly available data sources referenced in Table 1, and the development of several computational scripts to intersect and perform calculations on those data along various sequential and parallel steps. The overall workflow to generate the global maps of pesticides annual application rates and data quality is described in the following sections and is schematically represented in Fig. 1.

Acquisition of seeding databases and re-analysis (
Step 1). The globally-gridded pesticide application rates in PEST-CHEMGRIDS were estimated based on the USGS Pesticide National Synthesis Project (USGS/ PNSP) within NAWQA 22 . The "high" and "low" annual application mass compiled for each state of the USA from 1992 to 2016 and for a total of 512 active ingredients relied on surveys used in conjunction with the USDA National Agricultural Statistics Service (USDA/NASS) 23 for various years from 2007 to 2012, and interpolation and extrapolation methods originally described in 24 when data were not available. The USGS/PNSP data are explicit for 6 dominant crops (i.e., corn, soyabean, wheat, cotton, rice, and alfalfa) and 4 aggregated crop classes (i.e., vegetables and fruit, orchards and grapes, pastures and hays, and others), which include the crops listed in Table 2 column 2. We did not reconstruct missing data in the original USGS/PNSP database except when either the "high" or the "low" estimate was available; in those cases, the yearly-averaged "high"-to-"low" or "low"-to-"high" mass ratio for a specific crop and active ingredient was used as a factor for the available datum to retrieve the missing datum. The year 2015 was used as the reference throughout for PEST-CHEMGRIDS; hence, a linear interpolation of data in 2015 was implemented when data in 2014 and 2016 were available to retrieve the crop-specific application mass of an active ingredient in 2015. Next, the maximum value of the "high" estimate and the average of the "low" estimate for the mass of each active ingredient in each year were calculated for the 48 available USA states of the USGS/ PNSP to represent the range of applied mass in each of the 25 years assuming that the actual lower bound is always zero when no pesticides are applied (Fig. 1, step 1). The states of Alaska and Hawaii, and other minor USA territories were not included in the original USGS/PNSP database. Along with the range of applied mass, the median mass M was calculated and used in the next steps described below.

Selection of active ingredients samples (Step 2).
Using the reanalysis of the USGS/PNSP data, we ranked the active ingredients applied in the USA for each dominant and aggregated crop in decreasing order of mass (Fig. 1, step 2). Ranking was conducted by first integrating the applied median mass M over each state in the reference year 2015. Next, the top 20 most used crop-specific active ingredients were selected against satisfaction of the following conditions: (i) no more than 10 missing years over the 25-year records, and (ii) no more than 5 missing years in the last 10 years. In total, 95 different active ingredients were found, which cumulative mass represents 84.2% of the pesticide mass used in the USA in 2015 (see Fig. 2 for the top 5 out of 20 selected in each crop class, and 25,26 for the full set of panels). This figure was validated against the FAOSTAT pesticide database (see "Technical Validation") and the correction factor F M = 0.842 was defined and later used in "Global conditioning against FAOSTAT pesticide records" to account for the mass deficit after selection of active ingredients. Using the pesticide databases of the European Commission, which details 18 pesticide classes for more than 1300 substances as of 2019 27 , we determined that the selected active ingredients cumulatively include 7 pesticide classes distributed as 60 herbicides, 22 fungicides, 20 insecticides, 8 acaricides, 6 nematicides, 5 plant growth regulators and 1 repellent. Of these, 41 active ingredients belong to more than one pesticide class, and 1 was not classified. Seed treatments were included in our reanalysis as per the USGS/PNSP but we note that these have not been reported any longer from 2016 and 2017. The FAOSTAT database does not have data for seed treatments in the USA.
where t represents time in year (Fig. 1, step 3). The dimensional polynomial parameters (a, b) i,j specific to the selected active ingredient i on each dominant and aggregated crop j were recorded, totalling 200 couples. The projections in pesticide mass used in the USA were next calculated as M p (i, j, t p ) = M r (i, j, t = t p ) in years t p ranging between 2015 and 2025 using Eq. (1) (Fig. 1, step 4) and are represented in Fig. 2 for selected active ingredients.  Each pesticide-and crop-specific time-varying prefactor F calculated for the USA is a proportionality scalar used to condition the projections of pesticide application rates globally as described later. All polynomials  www.nature.com/scientificdata www.nature.com/scientificdata/ coefficients along with goodness-of-fit metrics are distributed in 25,26 , while different methods to achieve time projections are discussed in "Validation of historical trends and projection prefactors".
Application rates in the USA (Steps 6 to 7). The most important step at this stage was to convert the application mass of selected active ingredients into application rates (APR) expressed in mass per unit area (kg/ ha per year). To this end, the global crop distribution database in 28 was used (here called MRF as per the authors' initials); this includes globally gridded harvested area (ha) and yield (kg/ha) for 175 crops estimated in year 2000. To match the MRF and UGSG/PNSP databases we rearranged the crops in the four aggregated crop classes (Fig. 1, step 6) using the aggregation list in Table 2, column 3. Crop aggregation using the MRF in 28 29 , which reports the surface area fraction of grid cells used for general pastures in 2000 (Fig. 1, step 7). This layer was converted to area using the georeferencing projection details. For this crop class, we also included some crops from MRF that were explicitly tagged to forage, hence quantification of matching was not directly possible for the Pasture and Hay crop class used in the USGS/PNSP database. Corrections for the unmatched list of crop surface area at global scales were not implemented at this stage but as described in "Global conditioning against FAOSTAT pesticide records". The fractions of matched crops were tracked and used as one of the data quality factors in the calculation of the quality index for this PEST-CHEMGRIDS release (see "Technical Validation").
During assemblage of the dominant and aggregated crop classes we tested that the total area of all crops included in a grid cell did not exceed the grid cell area and we corrected where needed by introducing a maximum crop saturation of 0.95 in those cells. The rebuilt dominant and aggregated crops maps are therefore distributed with PEST-CHEMGRIDS in 25 Table 2. List of dominant and aggregated crops classes and matching. The crop list in the USGS/PNSP database is reconstructed with the crops listed under column "PEST-CHEMGRIDS" and originally sourced in 28 . The matched, partial match, and unmatched fractions are calculated and tracked in the quality factor QF SA described in "Technical Validation". FOR and NES stand for "forage" and "not elsewhere specified". www.nature.com/scientificdata/ in 28 and because they may be needed for further processing of PEST-CHEMGRIDS by a third party. Table 3 reports the surface fraction of each dominant and aggregated crop after our manipulation. The calculation of the historical APR relative to the USA states was ultimately accomplished by dividing the smoothed median application mass M s by the area of each dominant and aggregated crops in each state for the selected active ingredients and each of the available 25 years of data records (Fig. 1, step 7). The georeferenced USA map from 30 was used as a mask to identify the cumulative area of each crop class in individual USA states. Because we calculated the range and median application mass M s earlier, the APR for the USA states was also expressed by a range and median.   Table 3. List of crop maps in the PEST-CHEMGRIDS release. Dominant and aggregated crops used in PEST-CHEMGRIDS defined by the crops in Table 2, column "PEST-CHEMGRIDS", are corrected by the total surface area available in a grid cell. The original disaggregates crop layers are available in 28 . The PEST-CHEMGRIDS crop maps listed here are distributed in files equally stored in Portable Network Graphics (.PNG), Tagged Image File Format (.TIFF/.TIF), and NetCDF4 (.NC) formats in 25,26 . www.nature.com/scientificdata www.nature.com/scientificdata/ Global spatial inference of application rates (Steps 8 to 10). The inference of pesticide application rates from the USA to global scales was conducted by means of a polynomial extrapolation from the 2015 APR in the USA states (determined in step 7) using 20 globally-gridded environmental quantities that included soil physical properties, hydroclimatic and agricultural variables, and socio-economic indices ( Table 1). The procedures are implemented in Fig. 1, step 8 to 10, and are detailed below.
Soil physical properties were sourced from SoilGrids 31 , which consists of globally-gridded soil profiles through 6 layers from the surface to 2 m depth. For this work, we accessed the three soil textural fractions (sand, silt, and clay), the soil organic carbon content, and the soil porosity of the top layer. In addition to these, we also used the global soil thickness data of the Distributed Active Archive Centre for Biogeochemical Dynamics of the Oak Ridge National Laboratory (ORNL/DAAC) 32 and the global equilibrium water table depth (WTD) by Fan et al. 33 .
Global hydroclimatic variables included daily precipitation from the CPC Global Unified Precipitation data provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA 34 , atmospheric temperature from the Global Historical Climatology Network -Daily (GHCN-Daily) dataset 35,36 , and the 8-day net solar radiation 37 and the 8-day net primary productivity 38 from the NASA Earth Observations (NASA/NEO), the monthly actual evapotranspiration available from the Commonwealth Scientific and Industrial Research Organization (CSIRO) 39 , and the thermal climate region maps of the FAO/GeoNetwork 40 .
Agricultural variables included the global annual application rate of nitrogen 41 and phosphorous 42 from NASA/ SEDAC 43 , the yield of dominant and aggregated crops obtained from reanalysis of the USGS/PNSP and MRF data in step 6 at their original resolution (with the exception of the yield of pastures and hays sourced from NASA/ SEDAC inventory) 29 , and the global crop water security (GFSAD) layer from NASA Land Processes Distributed Active Archive Center (NASA/LPDAAC) 44 .
Finally, socio-economic indices included the global population density map estimated in 2015 by 45 , and the gross domestic product (GDP) and human development index (HDI) maps in 2015 developed and distributed in 46 (here called KTG from the authors' initials).
Data integrated in step 8 have diverse grid resolutions (Table 1) and were first harmonized to the same resolution as of the dominant and aggregated crop layers of the USGS/PNSP-MRF data in step 6; remapping of grid cells values were conducted by various interpolation methods (depending on the variable to be resized) implemented using the Matlab function resizem. Thus, the resulting 20 homogeneous layers (one for each environmental variable) were used with the APR in 2015 relative to the 48 USA states obtained in step 7; specifically, APR values were scattered against the average value of each environmental variable X within each USA state for each crop type and active ingredient (Fig. 1, step 9). These scatters served to determine the "natural" correlation strength R x (i, j) of APR of active ingredient i on crop j against the environmental variable X, and the corresponding linear regression through the points. The correlation strength R x (i, j) was determined using the Matlab corrcoeff function (see natural correlation mosaics in Fig. 3 of 25,26 ), while the polynomials for linear regression and inference (APR r ) were defined as where X represents the generic environmental variable spatially averaged within a specific USA state and (α, β) i,j are the dimensional parameters retrieved by least-squares fitting against the median APR for each pesticide i and crop j in 2015. The 95% confidence intervals CI(i, j, X) around each regression polynomial APR r were calculated to www.nature.com/scientificdata/ determine the upper and lower APR bounds H and L, respectively. Equation (3) was assigned no value when used with the missing yield of the pastures and hays aggregated crop. We extrapolated the APR values from the USA to the global grid using Eq. (3) with X in place of X , where X is the actual value of the generic environmental variable in a specific grid cell of the remapped global grids (Fig. 1,  step 10). Only grid cells in which a specific crop class existed were used. This step resulted in the raw estimate of global annual application rates APR g (i, j, X) for a specific active ingredient i, crop j, and environmental variable X in 2015. The overall high (H) and low (L) raw estimates APR i j * ( , ) were then calculated by weight-averaging the X-specific estimates APR g (i, j, X) as are relative to the extrapolation from each environmental variable X and their sum ∑ W i j ( , ) k X k equals 1, and n(i, j) is the number of points used for polynomials fitting relative to the total number of available points (i.e., the 48 USA states). This weighting method implies that the environmental variables that were highly correlated with APR across the USA states and resulted from more available points in the USGS/PNSN database had greater weight on the global estimate as compared to the other environmental variables. In particular, we used Eq. (4) only with the 5 environmental variables with the greatest strength |R x (i, j)|, while the remaining variables were neglected. The n(i, j) and R x (i, j) values used in Eq. (4) were tracked to assess data quality (see "Data quality tracking and gridded quality index maps" in "Technical Validation"). The overall method described above for spatial inference of application rates was tested for robustness and statistical quality as detailed in "Validation of spatial inference" of "Technical Validation", while full list of polynomials coefficients and goodness-of-fit metrics are distributed in 25,26 . (4) do not yet include specific conditions imposed by national authorities and regulations such as active ingredients that are not approved for use or banned. The raw H and L estimates were therefore conditioned (along path C1 in Fig. 1) to regulations enacted locally by combining the European Commission database on pesticides, which reports bans on active ingredients within the EU28 as of September 2016 and May 2019 27 , and the global database maintained by the Pesticide Action Network that reports bans on more than 700 active ingredients for more than 80 countries as of April 2017 and May 2019 47 . The two databases were first harmonized (Fig. 1, step  11) and were next used with a georeferenced map of countries from 48 to revise the H and L APR i j * ( , ) g estimates for specific active ingredients banned on any of the dominant or aggregated crops (Fig. 1, step 12). Here, APR i j * ( , ) g was set to null in countries that apply a ban; the most recent known ban was assumed to last until 2025. Note that the European Community has multiple approval levels. Active ingredients approved by the European Commission also require approval by the member country before they can be used in that country, while those not approved by the Commission can be used by EU28 member countries under particular circumstances. Approval can be periodically reviewed, and hence the status of an active ingredient can change when it is not banned. For this reason, we aggregated banned and not approved (and therefore not used) substances into one class called and shown in our PEST-CHEMGRIDS release as B/NA.

Biotechnology conditioning of global estimates (Step 13). Biotechnologies implemented to
induce resistance such as in Genetically Modified (GM) crops against specific active ingredients are explicitly accounted for in PEST-CHEMGRIDS. We retrieved the database of the International Service for the Acquisition of Agri-Biotech Applications (ISAAA), which lists 44 countries that approve pesticide-resistant GM crops as of 2018. A total of 29 crops are listed as GM in the ISAAA database, including five of the six dominant crops (corn, soyabean, wheat, cotton and alfalfa) and some of the aggregated crops used in PEST-CHEMGRIDS. We excluded the aggregated GM crops from further analysis because they cumulatively accounted for only a minor fraction of the 175 crops available to us from the MRF database. For the selected dominant GM crops, we tagged the most used active ingredients (glyphosate, glufosinate, 2,4-D, dicamba, isoxaflutole, and mesotrione) and the countries that allow both cultivation of pesticide-resistant GM crops and the use of those active ingredients (Table 4). While agronomic practices may differ for the specific ingredient used on any of the dominant GM crops, we assumed that the application rate of GM crop-specific active ingredients in the USA is the upper bound, while the APR on the corresponding non-GM crops in other countries was taken as 30% of the one in the USA. Use of the APR in the USA as the upper bound was justified by the fact that the USA does not apply restrictions or bans to the selected GM crop-specific active ingredients. In addition, countries that do not allow for GM crops but do not have a ban on GM crop-specific active ingredients can, by a matter of fact, use that active ingredient, but that likely occurs at lower application rates. For example, glyphosate-resistant GM corn in the EU is not allowed for feeding purpose but there is no ban on glyphosate, which can be used with no or minor restrictions; hence, the amount used was presumed to be substantially less than in countries where glyphosate-resistant GM corn is allowed. This approach is corroborated by a reanalysis of data in 49 showing a generally higher applied glyphosate mass on glyphosate-resistant cotton, corn and soyabean as compared to the corresponding traditional crops in the USA over the period 1998 to 2009. Finally, active ingredients used with GM crops were excluded if a ban was lifted in a specific country even if GM crops are permitted. These conditioning in our estimates were implemented in Fig. 1, step 13  www.nature.com/scientificdata www.nature.com/scientificdata/ Global conditioning against FAOSTAT pesticide records (Step 14). The last step to estimate the global pesticide application rates was conducted to correct biases introduced by the methods described above using the FAOSTAT pesticide database 1 , which reports the cumulative pesticides mass and the mass of pesticides grouped by herbicides, insecticides, and the lumped fungicides and bactericides applied country-wide from 1990 to 2016 (Fig. 1, step 14). To correct our raw estimates in 2015 from step 13, the mass in each country was calculated by integrating the median APR i j * ( , ) g for ingredient i on crop j in 2015. This country-specific cumulative mass, M c , was next compared to the values M c,FAO in the FAOSTAT and the ratio R c = M c,FAO /M c was calculated. The closer is R c to 1, the closer our estimate to the FAOSTAT data in a specific country. Several estimates were fairly close to the FAOSTAT data even before implementing this conditioning procedure, but a number of corrections were to be implemented. Hence, APR c in all countries other than the USA was conditioned to M c,FAO in 2015 as (Fig. 1, step  where F M = 0.842 is the correction factor introduced in Section "Selection of active ingredients samples" to account for the mass deficit relative to the total mass of pesticides in the USA in 2015, while the second term within square parenthesis relative to the USA was used as a limiting factor to prevent outliers of a particular ingredient when the correction to be implemented was substantial. This conditioning was not applied to the application rates in the USA because they were determined from the USGS/PNSN-MCR database and matched relatively well the FAOSTAT. During this conditioning, we tracked the number of countries for which an estimate of the total pesticide mass was available in the FAOSTAT database; it was noted that the FAOSTAT reports data for about 160 countries. Hence, the ratio R c for missing countries in 2015 was determined as the average R c of the neighbouring countries based on the assumption that geographic proximity is a measure of similarities in agricultural, environmental, and socio-economic variables 50 . Figure 3a shows the corrected high (H) and low (L) APR estimates for individual countries as compared to the FAOSTAT data in 2015, while Fig. 3b shows the resulting global APR projections (for the median) as compared to the FAOSTAT historical data from 1992 to 2016 calculated for all countries and the countries reported in the FAOSTAT database for comparison.
The conditioning implemented to APR in this step is the result of comparison with the total pesticide use in a country; however, the FAOSTAT also provides disaggregated data for herbicides, insecticides, and bactericides and fungicides, which were used for validation and data quality tracking described in "Technical Validation".  from Eq. (5). These global estimates ultimately describe the subnational distribution of the annual application rate of the 20 most used pesticides active ingredients on 6 dominant and 4 aggregated crops from 2015 to 2025. The PEST-CHEMGRIDS release also includes gridded maps of the quality index QI describing the reliability of estimates in each grid cell for all ingredients and crops, which is described in "Technical Validation". Examples of maps for estimates relative to the most used active ingredients on corn (glyphosate) and its corresponding quality index QI map are provided in Fig. 4. The full PEST-CHEMGRIDS data set is accessible in 25,26 , while the full list of crop-specific active ingredients maps is provided in Table 5.

Projections of global estimates (Step 14).
A global outlook. We summarise the estimated mass and application rates of 50 of the 95 selected active ingredients used globally (Fig. 5). With reference to the mass, the most used herbicides resulted to be glyphosate and metam potassium (about 700,000 tonnes per year), metam and dichloropropene (about 450,000 tonnes per year) and 2,4-D (about 150,000 tonnes per year). The most used insecticides are metam potassium and metam, calcium polysulfide (about 50,000 tonnes per year) and chlorpyrifos (about 20,000 tonnes per year). Finally, the most used fungicides are metam potassium, petroleum oil (about 150,000 tonnes per year), and chlorothalonil (about 120,000 tonnes per year).

Data Records
The PEST-CHEMGRIDS data release has a global extent with bounding box 180°E-180°W; 56°S-84°N at a resolution of 5 arc-min by 5 arc-min (about 10 km by 10 km at the equator) in standard WGS84 coordinates, corresponding to matrices of 1681 (S-N) by 4306 (E-W) pixels. PEST-CHEMGRIDS stored in 25,26 is organized in three compressed folders, each collecting files of the same maps in a different format, namely Portable Network Graphics (.PNG), Tagged Image File Format (.TIFF/.TIF), and NetCDF4 (.NC) to facilitate distribution and usability. The full list of maps is provided in Table 5 with the corresponding active ingredient and crop class, and with some details on formats. Intermediate data such as those prior to estimate conditioning as per workflow in Fig. 1 are available upon request. All source data are detailed in Table 1 and are accessible from the original repositories.

Technical Validation
The technical validation of the PEST-CHEMGRIDS data release is structured into three levels: (i) benchmarking of source data, (ii) estimates conditioning and validation against independent data, and (iii) data quality tracking across key implementation steps and data reliability calculations. The validation levels are described in detail below.

Fig. 4
Examples of global gridded application rate and quality index maps. The top two panels show the high (HIGH) estimate in 2015 for the annual application rate of glyphosate on corn globally gridded and the corresponding quality index QI map. Panes in the second row show regional application rates.
www.nature.com/scientificdata www.nature.com/scientificdata/ Source data benchmarking. The USGS/PNSP data were reanalysed via step 1 to 2 to retrieve statistical information of the top 20 crop-specific active ingredients and to detect patterns that could bias our estimates. Selected active ingredients ranked by the mass applied in 2015 (Fig. 6) show that they represented the greatest fraction (up to about 84.2%) of the total applied in the USA, but also show that these have had less relevance in previous years such as in the evident case of corn and soybean. For other crops, the most used 20 ingredients in 2015 retained a relevant presence also in previous years such as in the case of rice and alfalfa. Hence, some patterns in pesticide use have undergone substantial changes over the past 25 years; this justifies the choice to calculate APR projections based on the most recent pesticide use distributions, bearing in mind that this may be subject to changes over the 2015-2025 period.
All and the top 20 crop-specific active ingredients in the USA were classified in step 2 according to the EU28 pesticide classification database 27 . Note that the 20 crop-specific ingredients cumulatively resulted in 95 most used ingredients across all dominant and aggregated crops. These were benchmarked against the FAOSTAT pesticide database (Fig. 1,B1). That is, the class-specific cumulative "high" and "low" USGS/PNSP estimates matched the FAOSTAT data well with only a minor overestimate for all pesticides (Fig. 7a) and pesticides classes (Fig. 7b-d). "Insecticides" and "fungicides and bactericides" were likely overestimated (blue lines) because we included seed treatments, which were not included in the FAOSTAT database. The similar masses in the last two pesticides classes may be due to the classification counting. For example, mass of active ingredients falling in multiple classes were counted multiple times and divided by the number of belonging classes. Overall, the mismatch in the "insecticides" and in the "bactericides and fungicides" classes was more than one order of magnitude smaller than the total pesticide mass and, hence, was considered to introduce only minor errors. These errors were not corrected because source data are from two independent authorities. www.nature.com/scientificdata/ in better fit to historical data of individual active ingredients on specific crops but projections suffered from unrealistic, steep changes in a few years ahead of 2015; upon appreciation that first-order polynomials captured relatively well the historical trends over 25 years in the USA for most ingredients and crops (see dashed lines in Fig. 2 and 25,26 for comparisons with higher-order polynomials) and with partial consideration of the actual goodness-of-fit of higher-order polynomials, first-order polynomials were used for projections. Note that while the projection of one active ingredient on one crop was the result of only one linear approximation, the projected global use of pesticides or pesticide classes combine multiple linear approximations and can result in segmented (piecewise) trends.

Validation of historical trends and projection prefactors.
Validation of spatial inference methods. We tested monovariate and multivariate polynomials ranging from order 1 to 3 and from order 1 to 2, respectively, for spatial inference of APR values from the USA to globally. We used a linear combination of monovariate polynomials (function polyfit in Matlab) with weights as described in Section "Global spatial inference of application rates", that is, weights were proportional to the "natural" correlation between application rates and individual independent variables. We tested multivariate polynomials with and without interactions (function fitlm in Matlab), that is, including and excluding combinatorial products of independent variables, respectively, but we did not use linear combinations because this method only returns one polynomial. Spatial inference tests were conducted on gridded maps of applied atrazine on corn at sub-county level within the USA in year 1997 available from 22 , which includes 35,000 grid cells. Bin averages were first calculated to reduce data dispersion; we then divided the resulting points for the application rates and the 20 environmental variables into randomized calibration and validation sets. We tested different calibration and validation set sizes. We then applied the above methods and we calculated the goodness-of-fit (R and NRMSE) on the validation sets. Excluding the pure quadratic multivariate polynomial fitting (Table 6), all other methods were nearly equivalent in terms of goodness-of-fit. Focusing only on the 0.07 fraction of points for calibration (this is the case most similar to our global spatial inference), the rank in the right most column of Table 6 ranged between about 30 and 40 (minimum is 0 for best, above 100 is poor). Although the weighted linear combination of monovariate second-order polynomials performed relatively well, we ultimately used monovariate first-order polynomials after considering that: (i) high-order polynomials are known to produce better fit on tests points and introduce distortions to extrapolation points far from the calibration points, which could be the case in a number of combinations of active ingredients and crops in PEST-CHEMGRIDS beyond these tests; and (ii) the chosen method is simple to implement even in computational environment different than the one we used (Mathworks Matlab) by following Eqs (3 and 4). The full list of polynomials coefficients and quality of regression is provided in 25,26 . Validation of conditioned estimates against FAOSTAT aggregated pesticide classes. Statistical inference first (Fig. 1, step 10) and the following conditioning of global estimates to country-specific governances, biotechnologies, and records of total pesticide application mass (Fig. 1, steps 12 to 14) do not ensure alone that the www.nature.com/scientificdata www.nature.com/scientificdata/ estimates are of enough high quality to be usable by third parties. The ultimate validation of estimates was conducted on independent data from the FAOSTAT pesticide database. As mentioned earlier, the FAOSTAT includes the country-specific cumulative mass of "herbicides", "insecticides", and lumped "bactericides and fungicides" from 1990 to 2016. These aggregated pesticide classes do not specify the exact active ingredients; yet, comparison with our selection of the top 20 most used crop-specific ingredients in the USA in 2015 (i.e., 84.2% of the total pesticide mass) showed that the ingredients were distributed nearly as in the FAOSTAT pesticide aggregated classes over time (Fig. 6). Validation, therefore, was conducted along pathway B2 in Fig. 1 by integrating the H and L APR www.nature.com/scientificdata/ estimates in all countries using the surface area of dominant and aggregated crops, and tracking the class of each active ingredient. The mass of all ingredients within the same class as in the FAOSTAT was therefore aggregated correspondingly. To calculate country integrals, we used the global country borders map available in 48 .
Our estimates of the herbicide mass used in each country available in the FAOSTAT in 2015 matched the FAOSTAT data well (Fig. 8a), with the country-specific average error of the median estimate relative to FAOSTAT being about 2.5. The relative error for a country was calculated as |X EST − X FAOSTAT |/X FOASTAT , where X EST is the estimated value for a country and X FAOSTAT is the FAOSTAT value for the same country, which was next averaged over the FAOSTAT countries to retrieve the average error. The global herbicide estimate was also close to the FAOSTAT data (Fig. 8b, red line). Note, however, that the number of countries in the FAOSTAT varied over the years and showed a sensitive decline since 2010 down to 70 countries in 2015, meaning that the FAOSTAT underestimates the actual total global herbicide mass. Our estimate for all countries (including the UN193 countries as of 2019) suggests that the global herbicide mass is nearly 3 times that of the FAOSTAT (Fig. 8b, green line).
We used a similar approach to validate our estimates against the country-specific and global insecticides mass in 2015. Relative to the countries of the FAOSTAT, the validation error of the median estimate was 0.07 averaged across the countries (Fig. 8c), with the total insecticide mass slightly underestimating that of the 67 countries in the FAOSTAT (Fig. 8d). Our global estimate over all countries suggests larger values but we could not conclude if these are in line with the FAOSTAT records because only 9 out of 67 countries are associated with data on insecticides used for seed treatment in 2015.
Finally, an analogous validation of country-specific and global "bactericides and fungicides" applications in 2015 shows a relative average error of 0.64 (Fig. 8e) and a slightly underestimated global mass, respectively (Fig. 8f). In contrast to the reported 59 countries in the FAOSTAT, we overall estimate a global mass about 3 times greater than in the FAOSTAT, but we recall that only 13 out of 71 countries were associated with data on "bactericides and fungicides" used for seed treatments in 2015.
Validations against independent pesticide databases by active ingredient. Along with the validation against FAOSTAT, we compared the PEST-CHEMGRIDS estimates against independent pesticide databases or literature reporting the mass of individual active ingredients applied in a specific country in a given year. We  Fig. 9 shows generally a good match between PEST-CHEMGRIDS and independent sources, though some estimates have discrepancies. The relative errors averaged over the active ingredients are 9.8 (UK), 0.98 (AU), 37.6 (SK) and 0.009 (SA). PEST-CHEMGRIDS overestimated the applied masses of 5 active ingredients in the United Kingdom by about an order of magnitude (e.g., 2,4-d, metam, dicamba, Fig. 9a) and underestimated the use of some active ingredients in Australia (e.g., trifluralin, simazine, fluometuron, Fig. 9b). Specific to South Korea, PEST-CHEMGRIDS matched relatively well the masses reported in 53 but underestimated alachlor, terbufos, and trifluralin (Fig. 9c,d). We conclude that variability against independent records is present but overall trends are well captured.
Validation of conditioned estimates against manufacturers' recommendations. In addition to the above validations, we conducted a separate quality control on estimates by identifying values of application rates for active ingredients that were particularly high as compared to other ingredients, and we compared our estimates to those recommended by the manufacturers or regulatory bodies. For example, values of the fumigant www.nature.com/scientificdata www.nature.com/scientificdata/ metam potassium up to about 160 kg/ha-crop were found particularly high in the crop class "VegFru" but are in the range of or lower than the recommended 40 to 360 kg/ha-crop in the US EPA 55 . Similarly, chloropicrin in the "VegFru" crop class was estimated up to about 22 kg/ha-crop, which meets recommendations or is below the maximum application rates of about 350 kg/ha-crop 56 . Dichloropropene applications up to about 40 kg/ha in the "VegFru" class meet the recommended maximum application rates of about 370 kg/ha 57 , while chlorothalonil estimates below 7 kg/ha-crop in "VegFru" are substantially lower than the maximum application rate of 1400 kg/ ha-crop reported in the product factsheet. In contrast, calcium polysulfide (lime sulphur) estimates up to about 25 kg/ha-crop in "OrcGra" crop class is overestimated as compared to recommended 1 to 1.2 kg/ha-crop 58 . Data quality tracking and gridded quality index maps. Throughout the workflow depicted in Fig. 1, we have identified crucial steps to measure the quality of our estimates. Five specific quality factors QF were therefore defined to cover three levels of specific data quality metrics.
QF SA (Fig. 1 steps 1 to 6) quantifies the quality of the aggregation matrix for the surface area of dominant and aggregated crops used for the raw APR estimates. QF SA was defined by a vector of 1 s for the dominant crops used in the USGS/PNSP database that have equivalent representation in the MRF database used for our estimates, that is, for corn, soyabean, wheat, cotton, rice and alfalfa. For all aggregated crops, QF SA is calculated from the average fraction of matched crops within the two USGS/PNSP and the MRF databases. For example, the vegetables and fruits aggregated crop in the USGS/PNAS database (VegFru , Table 1) consists of a pool of 58 crops with 19 unmatched crops, hence the matched fraction is 0.673; correspondingly, the crops pool used in PEST-CHEMGRIDS consists of 12 partially matched crops out of 58 crops, hence the matched fraction is 0.793. The resulting quality factor QF SA = (0.673 + 0.793)/2 = 0.733 is used for the VegFru aggregated crop. A similar procedure is used for aggregated orchards and grapes (OrcGra), and other crops (Other). For aggregated pastures and hays (PasHay), QF SA is not defined as detailed earlier in section "Application rate in the USA", hence this factor is not further accounted for on this crop class.
QF HT (Fig. 1 steps 1 to 3) quantifies the quality of regression on historical trends of the USGS/PNSP database in the USA and is therefore a specific measure of our estimates as a function of active ingredients, crops, and environmental variables used for global scale inference. QF HT is defined as HT k X k where the relative number of existing points n(i,j) and the correlation strength R x (i,j) of APR of active ingredient i on crop j against the environmental variable X in the USGS/PNSP are defined in Eq. (4). QF HT is a matrix of scalars between 0 and 1 and complements QF SA .  www.nature.com/scientificdata/ Designing systematic criteria to quantify goodness of estimates of individual ingredients beyond the validation tests proposed here is generally difficult; first, aggregated crops used in PEST-CHEMGRIDS integrate a number of individual crops types that generally undergo a different number of applications and, second, the different crops can receive different application rates per treatment. In contrast, PEST-CHEMGRIDS provides the integral annual application rates in this first release. Future improvements may include constraints on individual ingredients such as number of applications and rates per treatment in order to improve estimate quality.

Order of polynomials
We have conditioned our estimates based on near-current (2015) agricultural practices, but we have not included an accounting of special farming practices such as organic, biodynamic and similar. Recent studies have brought evidence that an increasing number of farmers in high-income countries are progressively transitioning to agriculture with limited use of synthetic agrochemicals. It is therefore possible that some active ingredients will see a decline in the coming decade in some regions.
There are a number of other factors that we did not include in the generation of projections such as potential climatic shifts, changes in population habits and attitudes to foods, and changes in diets, which are not expanded here. However, all of our estimates provide an expected range in application rates (denoted by high "HIGH" and low "LOW" rates) and we assumed that aleatory fluctuations in application rates can be contained within those ranges.

Usage Notes
The PEST-CHEMGRIDS data release is the first of its kind and we plan to continue its maintenance as well as expand on active ingredients, increase resolution and metadata, and validations. For the purpose of data reuse, we distribute a basic customizable script written in Matlab 2018a that helps the user to read and convert data in other preferred formats. However, data can be read in any computational environment that is compatible for .TIFF/. TIF and .NC formats of georeferenced maps. Software for further processing PEST-CHEMGRIDS maps includes licenced Mathworks Matlab and Arcgis and freeware software such as QGIS. We are willing to provide guidance with software that we are familiar with upon reasonable request. For questions, collaborations or suggestions, please contact the corresponding author.

Code Availability
All data elaborated from original sources and newly produced in this work were the result of custom-built codes written in Mathworks Matlab2018a on Windows PC. These consist of several independent and dependent scripts and functions to read and reorganize variables, perform calculations, and save intermediate and final data. Custom scripts were also developed to generate .PNG images, georeferenced .TIF maps, and georeferenced NetCDF4 .NC maps of both application rates and data quality. Due to the complexity and size of source data, which use several storage formats and occupy approximately 43 GB, scripts are not directly distributed but are available along with all source data upon request. The only custom code that is distributed with PEST-CHEMGRIDS is the script written in Matlab2018a to read any of the two data storage formats and redraw figures of application rates and their quality indices.