Is Amazon deforestation decreasing the number of thunderstorms over South America?

Lightning activity has been predicted to increase with global warming, though estimates of lightning sensitivity to a change of temperature vary widely. Since lightning is a small‐scale process, it must be represented by parametrizations in climate models. This article uses large‐scale meteorological parameters tied to thunderstorm generation to improve existing empirical models that simulate regional thunderstorm behaviour. The response of the number of thunderstorms (as presented here) to climate change is rarely analysed in global studies. This study focuses on Tropical America, and uses the ERA5 higher‐resolution reanalysis data (ERA5) to develop our empirical model. Thunderstorm data were taken from the World Wide Lightning Location Network (WWLLN) and processed using the clustering algorithm developed by Mezuman et al. (2014). The two meteorological parameters that correlated best with thunderstorm clusters in Tropical America were specific humidity (SH) and convective available potential energy (CAPE). The resulting empirical model was run from 1979 to 2019 using ERA5 reanalysis data as input. This approach approximates the long‐term trends in the behaviour of thunderstorms in the regions, in the absence of a complete historical lightning record. To our surprise, Tropical American thunderstorms exhibited a negative trend over this period, with an 8% decrease in thunderstorm clusters since the 1980s even with a rise of 1 K in temperature over the same period. The regions of largest decreases in thunderstorm activity align well with estimates of deforestation. We estimate that for every 1 Tg C lost due to deforestation, there is a 10% decrease in thunderstorm number.

Africa's lightning trends were previously researched using similar methods to here (Price and Asfur, 2006;Harel and Price, 2020), the trends over the other chimneys have not yet been studied.
Lightning strikes are a worldwide cause of injury, death, damage to property and infrastructure, as well as forest fires. While many industries, such as aviation and power companies, are sensitive to this hazard (Price, 2013), rural populations are often the most vulnerable to the immediate dangers of direct strikes. In South Africa, annual lightning deaths are estimated at nine deaths per million people for rural populations, compared with 1.5 deaths per million in urban populations (Cooray et al., 2007). There have been troubling indications from local governments in Africa, South America and Southeast Asia that incidents of death and injury from lightning may be on the rise, under the influence of both population growth and climate change (Khadka, 2014;Spring, 2021).
Lightning activity is generally expected to rise in the coming decades as a result of surface warming (Price and Rind, 1994;Trapp et al., 2009;Seeley and Romps, 2015). However, these predictions suffer from a great deal of uncertainty. The small horizontal scale (∼25 km 2 ) of lightning generation presents a challenge for global models, because of their much larger grid spacing (Brooks, 2013). Predictions of flash rate density's sensitivity to a rise of 1 • C in surface temperature vary widely, falling between 5 and 100% (Romps et al., 2014). And recently there are several studies that even show a possible decrease in lightning as the climate warms (Jacobson and Streets, 2009;Clark et al., 2017;Finney et al., 2018Finney et al., , 2020. The degree to which lightning regimes are sensitive to surface temperature is a subject of debate. Questions include: Will the number of storms grow larger or smaller? Will storms that do form be more or less intense? How will different regions be affected? We point out that in this study we are not simulating lightning flashes, but thunderstorm clusters, and climate change may have a different impact on the number of thunderstorms compared with the number of lightning flashes (Williams et al., 2000).
Until relatively recently, there was no global system that monitored thunderstorms continuously. The study of these and other severe weather events relied on data from individual meteorological stations (for example, counting the days when thunder was heard), and local lightning networks (Brooks et al., 2003).
Given the lack of long-term global lightning data, we need to find proxies to estimate the long-term changes in lightning and thunderstorm activity in different parts of the world. To simulate the microphysical processes leading up to the production of lightning we need to simulate the charge build-up in storms through the inductive and non-inductive processes (MacGorman and Rust, 1998).
However, without cloud-resolving models we need to find the parameters in the climate models that are best related to lightning and thunderstorm activity. However, climate models supply only large-scale parameters in their simulations, with no cloud microphysical details. Hence, this study follows Harel and Price (2020), investigating the best large-scale meteorological parameters to use to build an empirical model of thunderstorms over Tropical America, in the interest of understanding how thunderstorm behaviour has changed over time (not lightning).
Water content and instability of the vertical temperature profile are prerequisites to the formation of convective clouds and thunderstorms. Water content can be represented directly by parameters such as relative humidity, specific humidity and precipitable water. Precipitable water refers to the water content of the entire atmospheric column, whereas relative and specific humidity refer to a particular layer. The humidity content of rising air parcels plays a significant role in buoyancy through the release of latent heat with condensation. It has been suggested that warming and humidification of the lower levels of the atmosphere enhance convective instability and may be tied to future thunderstorm severity (Trapp et al., 2007(Trapp et al., , 2009Seeley and Romps, 2015). Parameters such as wet-bulb temperature and wet-bulb potential temperature include information about both heat and humidity and have also been used to study lightning activity (Williams, 1992).
Lightning flash rate is often approximated using parametrizations based on cloud characteristics. The most common parametrization uses cloud top height as the cloud variable related to lightning frequency (Price and Rind, 1992;Michalon et al., 1999;Banerjee et al., 2014;Krause et al., 2014;Romps et al., 2014). The cloud base height (CBH) hypothesis  describes updraught regimes through the low-altitude bases of oceanic clouds (500 m) compared to continental clouds. Low CBH is indicative of smaller convective bubbles, narrower updraught zones and low flash rates. These clouds are very effective in producing precipitation. Clouds with high bases (2,600-3,000 m) are characteristic of strong continental convection and exhibit the highest flash rates . The disadvantage of using cloud and precipitation parameters as proxies for lightning is that convective clouds are also a subgrid-scale process, and they too are represented by parametrizations in climate models (Price, 2013).
Two commonly used variables that signal convective instability are Lifted Index (LI) and Convective Available Potential Energy (CAPE) (Harel and Price, 2020;Etten-Bohm et al., 2021). The two are related, and both are used by the National Oceanic and Atmospheric Administration (NOAA)'s Storm Prediction Center as indicators for severe thunderstorm favourable environments. Studies in the continental United States (CONUS) have used CAPE in combination with other variables such as wind shear to identify severe thunderstorm and cyclone favourable environments (Brooks et al., 2003;Seeley and Romps, 2015). Romps et al. (2014) used CAPE times precipitation as a parametrization. CAPE has been shown to be directly related to vertical velocity or maximum vertical velocity in the CONUS (Brooks, 2013). Though thunderstorm regimes in North America differ significantly from those in the tropics, equatorial Africa and the United States both suffer from some of the most severe thunderstorms on Earth (Brooks, 2013).
LI and CAPE are based on parcel theory, which presumes there is little or no heat and water exchange between the parcel and its environment. In reality, thunderstorms are characterized by turbulent mixing, and this description is quite inaccurate. However, parcel theory and the CAPE hypothesis specifically, predict significantly greater updraught speeds than those observed for all but the largest supercell storms (Williams, 2005;Williams et al., 2005). Though LI and CAPE differ in their units (temperature vs. energy), the most significant difference between them is their vertical structure. LI is the stability index based on a single pressure level in the atmosphere, while CAPE incorporates the instability over many pressure levels. Because of CAPE's 3rd dimension, Blanchard (1998) suggested a measure called "normalized CAPE", where CAPE is divided by the level of free convection (depth) giving units of acceleration. Blanchard found moderate correlations between CAPE and LI; however, normalized-CAPE's correlations showed a highly linear relationship with LI (Blanchard, 1998).
A statistical study conducted in South Africa by Gijben et al. (2017) employed the Unified Model to identify parameters related to lightning strokes and used them to build a lightning probability index. The study was conducted with numerical weather forecasting data, and the findings are highly relevant to the current work. The variables used to construct the index were temperature gradient, equivalent potential temperature, precipitable water, relative humidity, CAPE and lifted index. The variables were ranked in order of importance, and two seasons were studied. In both seasons temperature gradient was found to be most important. In December-February LI was more important than CAPE. In September-November CAPE was not found to be significant at all and was left out of the index (Gijben et al., 2017).
Another statistical study conducted by Etten-Bohm et al. (2021) used CAPE, normalized CAPE (nCAPE), lifting condensation level (LCL), column saturation fraction (r), 700 hPa omega, low-level wind shear (LS) from 900 to 700 hPa and deep wind shear (DS) from 900 to 300 hPa for parametrization of lightning. All environmental variables show a significant shift towards larger values when lightning is present except for shear. They also used geographical variables such as the difference between land and sea and the difference between tropics and subtropics and topography (Etten-Bohm et al., 2021). Gijben et al. (2017) and Etten-Bohm et al. (2021) did not rank the variables, but they did mention that the most important variables were CAPE and nCAPE, with little to no difference between the two (Etten-Bohm et al., 2021).
In the current work, several large-scale parameters used in the literature described above were tested against thunderstorm observations. The parameters include lifted index, CAPE, specific humidity, relative humidity, omega and temperature gradients. The overall goal is to develop an empirical model that can use ERA5 data to estimate the number and area of thunderstorms over South America, and particularly the Amazon basin. Using this model, we wish to investigate long-term trends in thunderstorm activity over this region.

ERA5 reanalysis
The European Centre for Medium-Range Weather Forecasts (ECMWF) produces the reanalysis (RA) product, of which the fifth version was used in this study (ERA5). Reanalysis products combine global models with historical data to force the models to agree with the observations where possible. The observation stations in South America included in the ERA5 reanalysis are found here (https://www.ecmwf.int/en/about/media-centre/news/ 2020/wmo-and-ecmwf-launch-new-web-tool-monitorquality-observations), and it is clear that there are many surface observation stations in South America, mostly south of the Amazon but also within the Amazon region.
In fact, there are many more stations in South America than in Africa, but obviously many fewer than in Europe and Japan. Hersbach et al. (2020) has shown that over South America there is a good agreement between observations and ERA5 precipitation rate (figure 24 in their paper) with a correlation coefficient of ∼0.75. Wang et al. (2020) show that the ERA5 reanalysis has the lowest root-mean-square error (RMSE) from all reanalysis products when compared with the Global Navigation Satellite System (GNSS) precipitable water vapour product. Hassler and Lauer (2021) showed good correlation between ERA5 and observations of rain, with less than 10% bias for most of the South American continent, while the Andes and the western coastal regions showed higher biases. Comparisons with other parts of the world show good agreement with observations in South America relative to Africa, Southeast Asia, and even North America (figures 3 and 10 in Hassler and Lauer (2021)).
Hence, the ERA5 is likely the best representation of the global atmosphere that we have, going back to 1950. This product provides a detailed description of the atmosphere at many vertical levels, and at a horizontal spatial resolution of 0.25 • , ∼25 km, compared with the previous ERA-Interim product that had 0.8 • spatial resolution (Hersbach et al., 2020). The temporal resolution of the ERA5 product is hourly, together with an estimate of the uncertainty of the ensemble model runs. The reanalysis for 1950-1978 is still a preliminary version, so currently only data from 1979 is being used in this study. Hersbach et al. (2020) present some of the improvements in the ERA5 analysis product. When comparing the ERA5 to the previous ERA-Interim product, there is an improvement in skill of 1 day. In other words, the quality of forecasts after 3 days in the ERA-Interim is the same as the quality of forecasts after 4 days in the ERA5 product. There are improved agreements with temperature, wind and humidity in the troposphere, but not in the stratosphere. Detailed weather system evolution is possible with the enhanced spatial and temporal resolution in ERA5. For global mean monthly mean precipitation, the agreement between ERA5 and observations increases from 67% to 77%.
While we have used the ERA5 reanalysis product as our best estimate of the real-world meteorology, we need to emphasize that the ERA5 reanalysis product, like all reanalysis products, is basically a model forced to agree with observations, when they exist and are used in the data assimilation process. If there are no data available, the output is simply based on the ERA5 model, although observations in other locations will constrain the physics and dynamics of the model. In addition, the quantity and quality of observations assimilated into reanalysis products can vary over time, meaning that long-term trends may not be well represented.

Worldwide lightning location network (WWLLN)
The World Wide Lightning Location Network (WWLLN) supplied the basis for the thunderstorm observations. The WWLLN network is made up of more than 70 ground stations around the world, run by the University of Washington, Seattle. The network started in 2004 with 18 stations and has grown to around 70 stations (Holzworth, 2021).
The WWLLN network detects the electric pulses produced by lightning strokes in the very low frequency (VLF) range (3-30 kHz). The detection software calculates the time of group arrival (TOGA) for determining the exact time of detection of each pulse. A discharge needs to be detected by four ground stations in order to obtain an accurate geolocation of the lightning (Rodger et al., 2006;Virts et al., 2013). The network detects primarily the strong (>30 kA) cloud-to-ground flashes, which make up around 30% of global lightning discharges (Mezuman et al., 2014;Holzworth, 2021). In recent years, more ground stations and better detection algorithms have allowed WWLLN to also detect weaker strokes (Rodger et al., 2006;Rodger, 2008;Hutchins et al., 2012). Over the Amazon region the detection efficiency for strokes is around 80% (Hutchins et al., 2012). Since that study, additional ground stations have been installed in South America, increasing the detection efficiency of WWLLN in this region of the globe.

Thunderstorm clusters and pre-processing
The typical size of a thunderstorm is roughly 25 km in diameter (Seeley and Romps, 2015), and clustering lightning flash detections is a well-known method for assessing storm behaviour and scale from individual strokes/flashes. This technique has been used by satellite sensors at different resolutions: the Optical Transient Detector (OTD) used a 22 km boundary to cluster lightning flashes while the Lightning Imaging Sensor (LIS) used a 16 km scale similarly (Williams et al., 2000). This work utilized Mezuman et al. (2014) clustering algorithm to represent thunderstorm parameters. Mezuman et al. (2014) developed an algorithm that translates the raw dataset of WWLLN lightning coordinates and arrival times into thunderstorm clusters at a time step of 1 hr, appropriate to the duration of thunderstorm cells. To capture smaller storms the grid resolution is defined as 0.15 • (∼15 km). A cluster is defined at the cell within which at least one stroke has been detected. Adjacent cells (top, bottom, left, right) in which strokes are detected within the same hour are combined to create larger clusters. The algorithm is able to reproduce spatial and temporal patterns of thunderstorms successfully.
To scale the datasets appropriately, the ERA5 grid was interpolated to 0.9 • to allow 6 × 6 0.15 • cluster-grid pixels for each larger ERA5 pixel. To create the cluster dataset of a matching size, the clusters were counted for each ERA5 pixel, to produce two distinct cluster parameters: the number of individual clusters (henceforth "cluster number"), and the area cover of the clusters (number of cluster pixels relative to the total area of a single ERA5 pixel, henceforth "cluster area"), see Figure 1. The hourly cluster number and area parameters were integrated over the course of a whole day (0000-2300 UTC). Cluster number units were defined as the number of clusters per day, in a standardized 1 • box. Cluster area units were defined as a daily mean percentage of area covered by clusters, per hour (due to the relatively short lifetime of most thunderstorms, a measure of cluster area on a daily scale would have been counterintuitive).

Initial analysis
The region of interest was defined over Tropical America as latitude 30 • N-30 • S, and longitude 98 • W-33.2 • W. In this region the geographic means of the cluster parameters were compared with several ERA5 fields ( unstable conditions, with potentially higher updraught velocities in thunderstorms, and hence potentially more lightning activity Pawar et al., 2012;Siingh et al., 2013). Generally, CAPE of less than 1,000 J⋅kg −1 represents weak instability, and values greater than 4,000 J⋅kg −1 imply extreme instability with severe weather (NOAA, 2021). In the current work, this field was taken directly from ERA5 reanalysis as the hourly value and was later averaged to a daily value. Specific humidity (SH) is the water content of an air parcel expressed in terms of grams of water vapour per 1 kg air parcel (similar to mixing ratio). In a study of global circulation models' forcing of thunderstorms over the United States, surface specific humidity was found to correlate positively with CAPE (Trapp et al., 2007). This field was taken directly from ERA5 reanalysis as hourly values and was later averaged to daily values.
CAPE and specific humidity are also not independent, of course. The moist, buoyant air tends to rise, and causes instability that makes large values of CAPE. In the ERA5 reanalysis over Tropical America the linear correlation coefficient of the two is r = .61; however, when fitting a power law, the correlation is r = .81 (Figure 2). The highest values of CAPE occur on days with the highest SH.

The empirical model
The empirical model follows Harel and Price (2020). It uses 1 year of daily mean data (2018)

The mean forecasting scheme
The mean scheme calculates a mean value of the cluster observations <Znumber> and <Zarea> for each particular CAPE and SH set in the base year. If there are missing cluster values, they are interpolated to create a smoother distribution. The resulting surface presented in Figure 3 is used as the reference for the forecast. The algorithm runs on the ERA5 map in the forecast year, checks each pixel's CAPE and SH values and inserts the mean cluster value from the reference to create the forecast. This forecasting scheme gives a suggested mean of thunderstorm activity on the daily scale, rather than an actual forecasted value. A similar calculation was done for standard error and maximum cluster values. On a daily time-scale, the map exhibits low variability between pixels, and is therefore very unlike real storm data. However, the mean scheme improves significantly when it is aggregated to a monthly mean as it develops more variability and can better represent thunderstorm distributions (Figure 4).

3.3.2
The random selector forecasting scheme The random selector forecasting scheme takes each pixel's CAPE and SH values in the forecast year and references the sets of observed values Znumber{} and Zarea{} from the appropriate variables set in the base year. The scheme then randomly selects one of the cluster values that was observed for the given set and inserts it into the forecast.
For example, for a particular set we might have n = 8 observations in the base year. A random value of Znumber is selected using MATLAB's datasample function. The matching cluster area value is selected using the same index. In the example below, units for the number parameter are given as the number of individual clusters in one 0.9 • pixel. Area parameter is given in number of 0.15 • cluster pixels within one 0.9 • pixel. Cluster number and area are selected together to avoid an impossible forecast (for example, a pixel that contains 2 clusters with an area of zero). This selected set of cluster values is inserted into the forecast maps. If a particular variable set has no observations in the basis year, the cluster values in the forecast year are set to zero.
Unlike the mean scheme, where there is one mean value that is referenced by all pixels with the same variable set, the random selection process occurs for every pixel daily (Figure 4). The random selector scheme captures more variability and extreme values in comparison with the mean scheme. However, because of its unpredictable nature, on a daily time-scale it is quite erratic and does not reflect a realistic synoptic weather pattern. Like the mean scheme, the random selector benefits greatly from aggregation to a monthly mean as it becomes much more consistent and representative of real thunderstorm distributions.
Because of the differences in the regimes governing thunderstorm formation for continental vs. oceanic environments, in both forecasting schemes land and sea are catalogued and treated separately according to ERA's provided land-sea mask. The separate maps are then stitched together to produce the complete forecast map.
To test the skill of the empirical model, it was run with the ERA5 data for 2019, and the monthly results of both schemes were compared with WWLLN's observations, clustered and translated to the number and area parameters. An error analysis was performed to determine the relative success of the forecasts. To compare the forecast abilities with each other, the normalized root-mean-square error (NRMSE) was used. NRMSE was also calculated for each month separately to observe the influence of the seasonal cycle on the forecasts. In addition to NRMSE, R value was also calculated for the monthly number of clusters in the model vs. the monthly number of clusters in the WWLLN's data. The Cluster Number and Area estimate for 2019 with the random selector had an NRMSE of 14.5% and 11.9% respectively. At the same time the correlation coefficient between the forecast clusters and the observed clusters in 2019 with the random selector was r = .9 for cluster number and area.
The empirical model performs worst over the Andes, especially the central portion. The schemes do not account for topographic factors, that probably are very important there for the large number of storms (Etten-Bohm et al., 2021). In addition, there is a great deficit in the places with the maximum number of storms, since the empirical model does not have the ability to extrapolate and predict values more extreme than those of the basis year. This is a common problem with regression models that are built using existing data. However, in this article we are looking at past trends, and not future trends. We agree that using 1 year of training data (2019) may introduce biases if trying to predict lightning for a specific year, but here we are looking at decadal changes in thunderstorm activity.

Long-term trends of thunderstorm clusters over tropical America
Using ERA's CAPE and SH for Tropical America from the years 1979-2019 as input, the forecasting scheme that was deemed more successful in the error analysis (random selector in both cases) was run for these years. This run constructed a 41-year forecasted time series of daily thunderstorms clusters over Tropical America. The daily forecasts were aggregated to produce annual and decadal geographic means of these years.
Over Tropical America we can see a decreasing trend in the number of thunderstorm activity beginning in the 1980s, but especially since the turn of the century ( Figure 5). This trend is opposite to the rising trend in surface temperatures we see over Tropical America from the ERA5 data. If we compare the annual mean surface temperature with the annual mean thunderstorm cluster number (using the random selector scheme), we can determine the sensitivity of Tropical American thunderstorm activity to changes in surface temperatures.
In Figure 5, each data-point is a geographic-annual mean over the entire Tropical American region. The correlation coefficients were R = −0.17, and p = .29, indicating that temperature is not strongly connected with the number of clusters. The p-value is much higher than 0.05, which means that the connection is weak, and the temperature is not actually affecting the number of storms in Tropical America over this period. This decrease in thunderstorm activity surprised us given the observations and modelling that show more lightning as the world warms, and the findings by us and Harel and Price (2020) that over Africa the thunderstorm clusters do increase with warming temperature over the last 70 years.

Links to deforestation
Deforestation is related to the destruction of natural forests for human exploitation. This can be for timber, agriculture, mining, infrastructures and urban development. The Amazon rainforest has suffered from deforestation for decades, and since the 1980s it is estimated that the rate of deforestation is ∼14.5 Mha⋅year −1 (Butler, 2020). As a result of evapotranspiration from the forests, the Amazon is a major source of atmospheric moisture, influencing the regional water cycle (Zemp et al., 2017). Approximately 70% of the rainfall over the Amazon is from recycled water vapour from the forests. Hence, there is a positive feedback between the amount of biomass, the evapotranspiration from the forests, and the rainfall over the Amazon. Deforestation results in the reduction of biomass, followed by a reduction in evapotranspiration from the forests. This leads to a decrease in rainfall, intensifying the drying of the region, which can lead to more fires and deforestation (Staal et al., 2020). The decrease of tree cover can therefore decrease the convection, and hence thunderstorm clusters over the southern Amazon rainforest. It should be emphasized that deforestation and land use changes are not specified in the ERA5 reanalysis. However, the changes in sensible and latent heating, the specific humidity and relative humidity, instability indices are all impacted by the ongoing deforestation, and these meteorological parameters are included in the data assimilation process of ERA5. Hence, deforestation is indirectly included in the reanalysis through the changing synoptic meteorology, and hence this impacts the thunderstorm clusters in our empirical model.
In order to check whether or not deforestation is a possible suspect for the decrease in the number of clusters over Tropical America, we calculated the average number of storms over each pixel in Tropical America in the 1980s and in the 2010s and then we looked at the differences. By doing so we could see where we have, on average, a significant decrease in the number of storms ( Figure 6).
We clearly see a decrease of lightning clusters to the south of the Amazon, that appears to be similar to the deforestation maps presented by Staal et al. (2020), Mu et al. (2021) and Xu et al. (2021), and presented in Figure 7. The greatest decrease in the number of storms is just south of the Amazon River, where the worst deforestation occurs. It appears that there exists a strong connection between the deforestation rate and the decrease in the number of storm clusters that occurs in the region. The negative effect of deforestation appears to be stronger than that of the global warming.
Note that the deforestation is more concentrated than the clusters area of reduction. That is possibly because the effect of deforestation is affecting an area greater than just the area of deforestation itself (Staal et al., 2020). In order to quantify the effect of deforestation we made a pixel-by-pixel comparison between Figure 7 and a plot that is similar to Figure 6, but for the change since the 1990s and not 1980s, in order to better match the data in Figure 7 for 2001-2019. After the pixel-by-pixel comparison, we binned the data into 14 bins. In each bin we averaged the cluster change data and the standard deviation in each bin for cluster number change (Figure 8).
In the figure above we can see a highly linear, strong connection between deforestation and the change in cluster number over Tropical America. Statistically, the deforestation can explain about 72% of the cluster decrease over time. That connection is statistically significant at the 1% level. We can see an average decrease of ∼0.86 clusters⋅deg −2 ⋅day −1 for a carbon loss of 1 Tg C, which is significant when comparing to an average cluster number of no more than 7-8 clusters⋅deg −2 ⋅day −1 . Note that this is only an upper limit because deforestation also occurred during the 1990s.
According to Figure 5, the decreases in thunderstorm clusters (excluding the strong El Niño of 1997/1998) started around 1993 and continued to 2011. While deforestation in the 1980s was around 2% per year, until 1993 the deforestation area was possibly below a threshold to impact regional convection and thunderstorms. After a threshold was crossed in the 1990s, we hypothesize that the rapid decrease in convection and thunderstorms started, continuing until 2011. As deforestation increased so did the deficit in surface moisture, until the area was dry enough for it to impact the regional specific humidity, which also decreases the CAPE values, and hence the thunderstorm clusters. The reduction in the deforestation rate after 2005 is also shown by the reduced rate of decrease in thunderstorm clusters, with a levelling off before the cluster number starts rising again in the last decade.
F I G U R E 8 Sensitivity of thunderstorm clusters to carbon loss from deforestation in Tropical America. The x-axis is the total carbon loss from deforestation since 2001, the y-axis is binned decadal cluster change between the 1990s and 2010s. Each bin represents 0.05 Tg C loss per pixel. The red trendline is a fitted curve for the binned ΔNumber of clusters vs. carbon loss from deforestation. [Colour figure can be viewed at wileyonlinelibrary.com] Both temperature and moisture impact thunderstorm frequency, and in the Amazon these two trends have been pushing thunderstorms in different directions. Warmer surface temperatures increase instability and CAPE, while increased deforestation decreases surface moisture, latent heat release, and eventually CAPE as well. Disentangling these two factors will need modelling studies that are outside the scope of this article. However, we should point out that over Africa where there is less deforestation, a similar reanalysis-based study (Harel and Price, 2020) estimated that there has been an increase in thunderstorm clusters over Africa during the last 70 years. And in the Amazon since 2010 ( Figure 5) we may be witnessing again a stronger impact from temperature trends compared with deforestation trends.

CONCLUSIONS AND DISCUSSION
In this article we present an empirical model that uses large-scale parameters to construct an observation-based forecasting tool of thunderstorm clusters over Tropical America. The model is based on large-scale meteorological variables that are not dependent on cloud parametrizations commonly used to create lightning proxies, such as maximum cloud height or precipitation estimates (Price and Rind, 1992;Romps et al., 2014). Our approach can be applied to coarse-grid models and reanalysis products without the use of a second parametrization. The proposed empirical model uses CAPE and SH to predict thunderstorm number and area with considerable skill on a monthly scale. The resulting forecasts show good spatial and temporal agreement with observations. Of the two schemes developed, the most successful forecast is the random selector's forecast of cluster area, which had a mean error of 14.8% but with a correlation coefficient of r > 0.9 between the forecast and observed clusters.
The empirical model is by definition based on observations, and hence the forecasts have difficulties predicting extreme values. On a daily scale, both forecasting schemes can only predict the number and area of clusters that were observed in the base year. The aggregation to monthly and longer time-scales is beneficial, but not without limits. There often appears to be an underestimation of thunderstorms over high topography, and this parameter will need to be considered in future modelling. According to Etten-Bohm et al. (2021), topography is an important factor in observing storms, with high mountains increasing the probability of storms. We did not quantify that factor, but as expected in the higher regions, there is more lightning than in the lower regions with the same conditions. Coastal errors are probably connected to the fact that on land the same conditions will generate more lightning than on the ocean, and the land/sea mask is not perfect. We do have pixels of "land" that are partly sea covered and vice versa. Therefore, if we see more lightning over sea it is possible that this pixel in the model was calculated as land, and more lightning was forecasted there.
The empirical model was also able to produce a long-term historical proxy of thunderstorm activity over South America, but to our surprise we detected a clear decreasing trend in thunderstorms in this region, with major differences between the 1980s and the 2010s, with much less thunderstorm activity over the region in the last decade. This is consistent with Saravia (2020) and Brazilian weather stations in this region, which show a dramatic decrease of rainfall over the southern Amazon basin.
Deforestation is known to be a major force in the water cycle over the Amazon rainforest (Staal et al., 2020;Mu et al., 2021). The increasing deforestation in the southern Amazon basin has caused droughts (Staal et al., 2020), and a significant decrease of precipitation over the past 40 years (Saravia, 2020). According to our analysis, deforestation probably also affected the number of thunderstorms over Tropical America in general and the southern Amazon in particular. We can see a clear decreasing trend of thunderstorm number since the 1980s, and if we take a look at the decadal pictures, we can see that most of that trend is in the southern Amazon region, suggesting a connection to deforestation and the water cycle. We estimate that for every 1 Tg C lost due to deforestation, there is a 10% decrease in thunderstorm number.
The continuing deforestation over the Amazon and also over other areas in southeast Asia, and Africa will probably be a major factor in thunderstorm behaviour in the future and is also something to take into account when discussing that issue in the context of global warming. These results are remarkable, and need further examination in other, maybe smaller, areas in order to validate the actual impact of deforestation on thunderstorm parameters and frequency.

AUTHOR CONTRIBUTIONS Raam Bekenstein:
Conceptualization; funding acquisition; methodology; project administration; supervision; writing -review and editing. Colin Price: Formal analysis; software; visualization; writing -original draft. Eugene Mareev: Investigation; methodology; validation; writing -review and editing. research possible. This work was partially supported by a grant from the Government of the Russian Federation (contract no. 075-15-2019-1892).

DATA AVAILABILITY STATEMENT
The ERA5 data are available from the ECMWF https://www.ecmwf.int/en/forecasts/datasets/reanalysisdatasets/era5 and are freely available to all. The WWLLN lightning data are available from http://wwlln.net or from the author.