Background & Summary

Driven by the growing global demand for raw materials1, mineral extraction has expanded particularly into biodiversity-rich ecosystems in the past two decades2, and demand trends are projected to further increase3,4. Mining can cause a wide range of adverse impacts during mining operation and after closure, e.g. fragmenting the landscape and polluting soils and water with effects on human settlements, agriculture plantations, and natural ecosystems5. Mapping the global mining areas is increasingly important for quantifying pressures of mineral extraction on biodiversity6,7,8,9, land-use modelling10, estimating the impacts of global supply chains and sustainable resource use11,12,13, for risk assessments of major environmental disasters on mining areas14,15, and planning and reinforcing mine reclamation16.

The increasing availability of high-resolution Earth observation data and new machine learning approaches has allowed mapping and monitoring of mining land use and its related environmental impacts on a local or regional scale17,18. However, automatically mapping mining areas on a global scale is challenging because they are composed of a set of heterogeneous land cover types17. Mining areas are used for various purposes, including the mine itself (e.g. open cuts where the minerals are extracted), waste dumps (e.g. tailings dams, waste rock piles), water ponds, and industrial processing facilities. Additionally, different minerals (e.g. coal, copper, or gold), extraction and processing methods, and landscape characteristics also increase intraclass variability, challenging automated mapping approaches using Earth observation data on a large scale.

Visual interpretation of high-resolution satellite images has been used as an alternative to producing three global mining land use datasets. The first dataset mapped the 295 major mine sites worldwide, adding a total area of 3,633 km219. The second data source mapped a total area of 31,396 km2 including active and inactive mining sites20 and the third dataset, described in our previous article21, covered 6,201 active mining sites that add to 57,277 km2. These three datasets are not comparable because they were derived using different satellite data sources acquired at different times and with distinct spatial resolutions. In addition, each dataset covered a different subset of mining locations, which can lead to underestimating the global mining land use because subnational mining activities are usually underreported compared to national accounts2,22.

Here we present a new dataset that improves global mining land use accounting by significantly expanding our previous global-scale dataset of mining sites21,23. The data update includes 44,929 polygon features covering 101,583 km2 of large-scale mining (LSM) as well as artisanal and small-scale mining (ASM). We followed a similar methodology based on visual interpretation to map all 34,820 mining coordinates reported in the SNL Metals & Mining database24. Compared to the first version, this is a substantial expansion, which covered only 6,201 coordinates of mines reported as active in the SNL database. As in the previous version, we mapped all land cover types related to mining without distinguishing them within the polygons. Although significantly expanded, our dataset still does not cover all existing mines worldwide, as we only inspected areas within a 10 km buffer around the coordinates from SNL24. However, to date, our updated dataset provides the most comprehensive information on global mining land use, including openly available georeferenced mining locations.

Methods

Version 2 of the global-scale mining area dataset builds on the polygons from the first data release23 and follows a similar methodology. We updated the areas in the first version using satellite images from 2019 and added new areas not included in the previous version. We inspected all 34,820 coordinates reported in the SNL database, substantially expanding the coverage compared to Version 1, which covered only 6,201 coordinates of mines reported with the status “active” or having any reported production between 2000 and 2017 by SNL21. We inspected all SNL coordinates in the second version because several SNL locations with “inactive” status and no reported production have clear ongoing mining activities visible in satellite images. Therefore, inspecting all SNL coordinates independently from their reported status was critical to provide a more comprehensive overview of the global mining land use. This data update also improved the coverage of ASM areas, which were almost absent from the first version because most ASM activities do not report production or activity in the SNL database, although their approximate coordinates are reported.

Study area

To make the visual interpretation of images viable on a global scale, we limited the area of inspection to a 10 km buffer around the coordinates in the SNL database. Based on our previous experience21, this buffer size is sufficient to cover large mining sites expanding over several kilometres and also takes into account the imprecision in the SNL coordinates that can be up to 3 km distant from the actual mining sites7,8. We mapped all mines identified inside or intersecting the buffers’ borders, including areas that start inside the buffer and extend beyond its limits. This protocol was adopted to make sure mines that extend over long distances would be well captured, e.g. ASM mining following deposits on rivers and streams.

Mining areas

We defined mining areas as all land used by the mining sector at any step in extraction and processing at the mining site. Our mining areas also cover all 111 different commodities reported in the SNL database, including primary and companion commodities (see the complete list of commodities in Table 1). This definition includes different ground features, such as open cuts, tailings dams, waste rock dumps, water ponds, processing plants, and other infrastructure used in LSM and ASM activities. We mapped all underground and above-ground mining infrastructure visible on the satellite images. We did not distinguish between the different infrastructure types, i.e. we aggregated them into a single mining land-use class that includes all the above-mentioned ground features. Following this approach, we produced a global dataset with the georeferenced extent of mining land use that can be used as a starting point to distinguish LSM and ASM and their different infrastructure types in future work.

Table 1 List of all commodities reported in the SNL database.

Delineation of mining areas

The new version of the data set significantly improved temporal consistency. In the previous version, we used images from Google Earth imagery, Microsoft Bing Imagery and Sentinel-2 cloudless25. However, Google Satellite and Microsoft Bing Imagery provide heterogeneous spatial resolution across the globe, and in many areas, their images are outdated by several years26. For the update, we delineated the areas always using the 2019 Sentinel-2 cloudless mosaic, which provides homogeneous 10 m spatial resolution and a well-defined time frame for the entire globe25. We only consulted Google Earth and Microsoft Bing for additional information in case of doubt about a ground feature but did not use these images to delineate the mines.

All three satellite data sources were visually inspected using our open-source web application27 developed for this specific purpose. The web interface systematically displays buffers and markers with information about the mines, which were used to limit the study area and to provide additional information about mining types and commodities. After visually inspecting all satellite data sources, the interpreter delineated the mining areas using Sentinel-2 cloudless25 as the background layer. Note that we did not map mining features in regions where the quality of the images did not allow proper interpretation. However, only a few of the inspected locations were unclear because the Sentinel-2 cloudless layer by EOX mosaics all acquisitions from one year to produce yearly composites with significantly reduced cloud cover and atmospheric interference25.

The mining polygons can also contain isolated patches with forest or other land covers, not necessarily representing any land cover related to mining activity. We included these isolated patches on the mining polygons because they usually do not have other use and have a reduced ecological function as landscape fragmentation reduces the ability of the ecosystem to provide ecosystem services28.

It is important to note that we could not keep the relation between the SNL coordinates and the delineated polygons. In most cases, SNL provides several coordinates clustered around a number of mining ground features identified in the satellite images. However, the information from satellite images is not sufficient to link these features with the SNL coordinates without additional fieldwork. Besides that, some mines displace waste dumps and other infrastructure several kilometres from the main mining site, making it difficult to confidently link them to the coordinates using only information from satellite images. Therefore, our methodology uses the SNL coordinates only to gather information on the locations where mining might occur, but our final data product does not include information or links to the SNL database such as coordinates, commodities or production volumes.

Geoprocessing of data records

The delineated mining areas produced a raw data collection of polygons, which were checked and corrected by geoprocessing operations in R using the packages sf29 and s230. We removed the double-counting of mining areas by uniting overlapping polygons and corrected all invalid geometries, for example, due to crossing edges accidentally created during the digitalisation of the polygons. After that, we removed sliver polygons (unwanted small polygons) and polygons with persistent invalid geometries, finally producing a consistent set of polygons simple features29.

We then calculated the area of each feature and added information on the country in which each polygon is located. We calculated the area in square kilometres using spherical geometry30. After that, a spatial join query acquired country names and ISO 3166-1 alpha-3 codes from the country’s administrative units geometries available from EUROSTAT31. The final set of polygons thus includes the geometries (polygons) covering the mining areas, their respective areas in square kilometres, country name, and ISO 3166-1 alpha-3 code of the corresponding country.

Similarly to Version 1, we also derived global grid datasets with the mining area at 30 arcsecond, 5 arcminute and 30 arcminute spatial resolution (approximately 1 × 1 km, 10 × 10 km and 50 × 50 km at the equator). This is useful as many modelling applications require regular grid data32. The 30 arcsecond grid was derived from the percentage of the polygons’ area intersecting each cell. The percentages were rounded to zero decimal digits to reduce the size of the dataset. Therefore, the percentage of mining area covering a cell should be greater than 0.5% to be considered, i.e., approximately 0.5 ha at the equator. To obtain the gridded mining area, we estimated the area of each cell in square kilometres and multiplied it with the percentage of mining cover per cell, resulting in a 30 arcsecond global grid indicating the mining area within each cell. The other two grid levels, 5 arcminute and 30 arcminute, were resampled from the 30 arcsecond grid. The scripts used in the geoprocessing of data records are available with our open-source web application tool27.

Data Records

The new dataset consists of 44,929 polygon features covering 101,583 km2 of mining areas worldwide33. It more than doubles the number of polygons compared to Version 1 (21,060 polygons) and nearly doubles the mapped area, previously 57,277 km221. The number of countries covered also increased from 121 to 145. Besides the polygons, grid data provides a ready-to-use dataset for modelling with the mining area in square kilometres per grid cell provided at 30 arcsecond, 5 arcminute, and 30 arcminute spatial resolution. All data records were deposited to PANGAEA (Data Publisher for Earth & Environmental Science) and are available from https://doi.org/10.1594/PANGAEA.942325. The data is also available for visualisation from our platform www.fineprint.global/viewer. In what follows, we present a few examples to illustrate the data and provide an overview of the global mining land use compared to the first version of the data.

Examples of mapped areas

The maps in Fig. 1 show examples of LSM and ASM. The map in the top right of Fig. 1 illustrates the spatial pattern of ASM gold mining in the Brazilian Amazon. In this region, mining activities can spread over hundreds of kilometres, usually following water streams34. The same spatial pattern can be found in other areas worldwide, such as in Ghana35. In the bottom right of Fig. 1 we illustrate LSM areas with an example of the Toquepala copper mine in Peru. We invite the reader to explore other regions in our web platform at www.fineprint.global/viewer.

Fig. 1
figure 1

Mapped small- and large-scale mining in South America. (a) Small-scale gold mining in the Brazilian Amazon on both sides of the Tapajós River in the Brazilian state of Pará. (b) Toquepala copper mine in Tacna Province, Peru.

Global mining land use

Figure 2 shows the geographical distribution of the mining area across the globe. The map in the figure is projected to equal area Interrupted Goode Homolosine and the mining areas resampled to a 50 × 50 km grid to facilitate visualisation. Except for Antarctica, mining spreads across all continents with some hot-spot regions, for example, in northern Chile mainly due to copper extraction, northeastern Australia and East Kalimantan in Indonesia because of coal mining, and in the Amazon rain forest primarily due to small-scale gold mining.

Fig. 2
figure 2

Global overview of mining areas mapped in Version 2 aggregated to 5050 km grid cells and projected to Interrupted Goode Homolosine. The maps at the bottom are zoomed to South America (left), and Australia and parts of South-East Asia (right).

A summary of our data aggregated by country shows that 52% of the mapped mining area is concentrated in only six countries: Russia, China, Australia, the United States, Indonesia, and Brazil. Another 21 countries account for 39%, and the remaining 118 countries add up to only 9% of the total mapped mining area (see Fig. 3). These results show that mining areas are highly concentrated in only a few countries.

Fig. 3
figure 3

Mining land use per country in square kilometres. The dashed bars indicate the areas mapped in Version 1 of the dataset.

Compared to the area mapped in Version 1 of the dataset23 (dashed bars in Fig. 3), we see that the ranking of countries has changed. Russia, for instance, held the fourth position in the first version, but is the country with the largest mining land use in Version 2. The large difference is due to the substantial increase in the number of regions visually inspected, including the buffer around all coordinates reported in the SNL database independently from their activity status or reported production. This allowed us to identify ongoing mining activities from the satellite images in many regions with no reported production and to significantly improve the coverage of global mining land use. The substantially larger area mapped in Version 2 (nearly double the area mapped in Version 1), also indicates that mineral extraction amounts are underreported in the SNL database. This can have implications for studies that rely on SNL’s production data and urges for more transparency on the quantities of material extracted in mines worldwide.

Figure 4 highlights the spatial distribution of the difference in the area mapped in Version 2 compared to Version 1 within a 50 × 50 km grid. Most grid cells increased their mapped area between three and five square kilometres. Some regions also reduced the mining area from Version 1 to Version 2. However, this decrease was not caused by abandoned mine sites nor rehabilitation, but it is an artefact of the more accurate delineation of the borders of the polygons in Version 2. In the map, we can also note a few hotspots with a substantial increase in the mining area, e.g. Brazil, Guyana, Suriname, Ghana, and Indonesia, mostly due to the better coverage of ASM on river and water streams in Version 2.

Fig. 4
figure 4

Global overview of additional mining area mapped in Version 2 compared to Version 1, aggregated to 5050 km grid cells and projected to Interrupted Goode Homolosine.

Table 2 presents a summary of the area and number of polygons per country, illustrating different profiles of countries regarding the spatial distribution of the mines. For example, Russia and China have comparable figures regarding total mapped mining areas, 11,770.93 km2, and 10,364.57 km2. However, the number of identified polygons in China was significantly higher than in Russia, 8,795 against 2,825. This indicates structural differences in the mining sectors, i.e. a larger number of mining areas of smaller size in China compared to Russia, highlighting the known presence of a small-scale mining industry in China36,37.

Table 2 Mining area in km2 and the number of polygons (n) mapped per country. The countries are indicated by their respective ISO 3166-1 alpha-3 code.

Technical Validation

The mapping work was performed by trained interpreters exclusively using satellite images. Most mining areas are identifiable in the satellite images for the human eye. However, some areas can be challenging to interpret, creating a source of commission (no-mine areas mapped as mines) and omission errors (mine areas not mapped as mines). Besides that, the borders of the mines are not always evident in the images, creating another source of uncertainty.

We performed an independent classification of random points to assess these mapping errors. We followed the best practices on map accuracy assessment and sample design for overall accuracy, user’s accuracy (or commission error), and producer’s accuracy (or omission error)38. We drew a set of 1,220 random points stratified between the area mapped as mine and those not mapped as mine (no-mine) within the region of interest (10 km buffer from the geographical coordinates). These validation points were inspected independently by experts that did not participate in the delineation of the mines. They classified these validation points as mine or no-mine based on the satellite data without information on whether the points were mapped as part of a mining area. The validation points are also available from the data record33.

Based on these control points, we provide a range of assessment metrics. The overall accuracy shows that 88.3% of the control points were correctly classified, and the high F1 score of 0.87 indicates a low penalisation for false negatives39. The Kappa index was 0.77 and Matthews correlation coefficient (MCC) 0.78 (Kappa and MCC range from −1 to 140). Negative values imply that the agreement is worse than random; 1 presents a complete agreement, while 0 is the expected value for a random classification). Our dataset also had an 89.7% probability of correctly distinguishing mining from non-mining areas according to the area under the curve (AUC) of the Receiver Operating Characteristic (ROC) curve41. We also derived the user’s and producer’s accuracy along with the error matrix (see Table 3) as recommended in map accuracy assessment38,42. The user’s accuracy tells how well the classes in the map represent the reality on the ground, while the producer’s accuracy points to how well a class has been mapped38. Our map reached a 78.9% producer’s accuracy, indicating that we missed some mining areas (the omission of mines was around 21.2% in our validation samples). However, the mapped mining areas had 97.2% user’s accuracy, i.e. the mapped mining areas have a high probability of being correctly mapped as mining (less than 3% incorrectly mapped as mining).

Table 3 Error matrix and accuracy statistics derived from 1,220 random points equally allocated between the mapped classes Mine and No-mine.

We also investigated whether the proximity to the borders of the mines has affected the accuracy. We found that 54.5% of the control points with disagreement are located less than 50 m from the borders of the delineated polygons. On the other hand, only 16% of points with an agreement are located closer than 50 m to the polygons’ borders. These results indicate that higher uncertainty lies closer to the borders of the mapped areas. Additionally, it indicates high confidence in the existence of mines within the mapped polygons.

Usage Notes

The global mining dataset described here is available from https://doi.org/10.1594/PANGAEA.942325 under the Creative Commons Attribution-ShareAlike 4.0 International (CC-BY-SA) license. The data records include the same resources as the previous data release23 – the mining polygons, validation points, mining area grid, and a summary per country’s mining area.

  1. 1.

    The mining polygons and validation points are encoded in GeoPackage geographic data structures43, such that:

    1. a.

      the mining_polygons layer has five attributes:

      • ISO3_CODE: A string with the country’s ISO 3166-1 alpha-3 code

      • COUNTRY_NAME: A string with the country name in English

      • AREA: A number with the area of the feature in square kilometres

      • geom: A polygon geometry in geographical coordinates WGS84

      • fid: An integer with feature ID

    2. b.

      the validation_points layer has four attributes:

      • MAPPED: A string with the class derived from the mining polygons (“mine” or “no-mine”)

      • REFERENCE: A string with the validation class (“mine” or “no-mine”)

      • geom: A point geometry in geographical coordinates WGS84

      • fid: An integer with feature ID

  2. 2.

    The mining grids include a single layer each (one band raster) encoded in Geographic Tagged Image File Format (GeoTIFF)44. Each grid cell over land has a float number (data type Float32) greater than or equal to zero representing the mining area in square kilometres; grid cells over water have no-data values. The grid is available in three spatial resolutions, 30 arcsecond, 5 arcminute, and 30 arcminute, extending from the longitude −180 to 180 degrees and from the latitude −90 to 90 degrees in the geographical reference system WGS84.

  3. 3.

    The summary of the mapped mining area per country derived from the mining polygons is available in Comma-separated values (CSV)45 format, including four attributes:

    • COUNTRY_NAME: A string with the country name in English

    • ISO3_CODE: A string with the country ISO3 code

    • AREA: A number with the area of the feature in square kilometres

    • N_FEATURES: An integer with the number of features per country

The datasets can easily be overlaid with other geospatial variables for further spatial analysis using software with support Geographic Information System (GIS) (e.g. including QGIS46, R47, and Python48). Besides, we also provide a tool for visual analysis of the geographical data records at www.fineprint.global/viewer and a Web Map Service (WMS)49 accessible from www.fineprint.global/geoserver/wms.