An improved global vegetation health index dataset in detecting vegetation drought

Due to global warming, drought events have become more frequent, which resulted in aggravated crop failures, food shortage, larger and more energetic wildfires, and have seriously affected socio-economic development and agricultural production. In this study, a global long-term (1981–2021), high-resolution (4 km) improved vegetation health index (VHI) dataset integrating climate, vegetation and soil moisture was developed. Based on drought records from the Emergency Event Database, we compared the detection efficiency of the VHI before and after its improvement in the occurrence and scope of observed drought events. The global drought detection efficiency of the improved high-resolution VHI dataset reached values as high as 85%, which is 14% higher than the original VHI dataset. The improved VHI dataset was also more sensitive to mild droughts and more accurate regarding the extent of droughts. This improved dataset can play an important role in long-term drought monitoring but also has the potential to assess the impact of drought on the agricultural, forestry, ecological and environmental sectors.

assumed to hold one inch of moisture. The amount of moisture that can be held by the rest of the underlying soil is a location-dependent value, which must be provided as an input parameter to the program. The sc-PDSI automatically calibrates the behavior of the index at any location by replacing empirical constants in the index computation with dynamically calculated values 21 . Compared with other meteorological drought indices [such as the standard precipitation index (SPI) and the standardized precipitation-evapotranspiration index (SPEI)], the sc-PDSI has a higher correlation with vegetation drought 23 . However, these indices still face problems such as inaccurate station observation and delayed data collection 7,24 , and do not consider either the impact of water and heat stress on vegetation growth or do not fully consider land cover and vegetation information of the underlying surface 25 . Therefore, the traditional meteorological drought index is not suitable for direct use in vegetation drought research. But using it as an auxiliary information supplement tool and combining it with other relevant technologies can help us better understand and study vegetation drought. Currently, satellite data are widely used for drought assessment, because of its ability to identify drought conditions on different underlying surfaces 26 . Remotely sensed drought monitoring has advantages such as being suitable for large-scale drought monitoring 27 , having a powerful real-time update function 28,29 , and high accuracy, as well as unmatched cost-effectiveness when compared to other methods 30,31 . Therefore, the combination of satellite remote sensing technology and traditional drought index is a very promising drought assessment method.
The vegetation health index (VHI) is one of the most popular remote sensing drought monitoring indices [32][33][34][35] . VHI is composed by two terms: the vegetation condition index (VCI) and the thermal condition index (TCI). VHI considers local biophysical and climatic conditions, and can be used for actual plant drought monitoring in various agrometeorological regions 36 . The basic principles of VHI are as follows: (1) a low normalized differences vegetation index (NDVI) and high land surface temperature (LST) suggest poor vegetation health 37,38 ; and (2) the contributions of VCI and TCI to VHI are assumed to be equal, since there are no data on the relative contributions of other conditions to vegetation health 35 . However, the contributions of TCI and VCI to VHI depend on climatic and other environmental factor 39 . Indeed, environmental conditions in different regions are usually different, which implies that giving equal weights to VCI and TCI may reduce the application prospects of the VHI, increasing the uncertainty in drought detection 33 . Therefore, evaluating the contributions of VCI and TCI to VHI while considering the underlying surface and hydrothermal conditions is a key issue that needs to be addressed. Bento, et al. 39 used SPEI index combined with Pearson correlation analysis to evaluate the contribution of TCI and VCI to VHI in arid regions, while Zeng, et al. 23 compared SPEI and sc-PDSI as a control drought index to evaluate the contribution of TCI and VCI to VHI.
The purpose of this study is to produce a global, long-term (1981-2021), high-spatial-resolution, improved vegetation drought detection dataset and a reference parameter dataset for calculating the improved VHI for different regions. Here, we supplemented meteorological and soil moisture information on the basis of the VHI of the original algorithm and used the sc-PDSI as the control drought index, combined with the Pearson correlation analysis method, to evaluate the best contributions of VCI and TCI to VHI in different regions. Then, we generated a global long-term dataset for detecting vegetation drought. We also used drought event records from the Emergency Events Database (EM-DAT) to assess the drought detection ability of our improved VHI dataset. The developed global vegetation drought dataset has the potential to monitor and assess drought and its impact on the agricultural, forestry, ecological, and environmental sectors.

Data and Methods
Dataset coverage. Dataset coverage ranges from approximately from −50°S to 70°N and from −180°W to 180°E. Antarctica, the high latitude regions of the Northern Hemisphere (due to the lack of VCI and TCI data in this area), and the Sahara Desert region in Africa (due to the lack of sc-PDSI in this area) were excluded from the study area. These areas have little vegetation or extremely low vegetation coverage, so this part is usually excluded when studying vegetation drought [40][41][42][43] , and these areas are not considered in this dataset. The colors on Figure 1 represent the number of drought events since 1900. Figure 1 also shows information on the change in global annual mean temperature (Fig. 1b), change in global annual precipitation (Fig. 1c), and the change in global annual number of droughts (Fig. 1d). Since the 20th century, global temperatures have substantially increased, and precipitation also showed a positive trend, but with high fluctuations and large uncertainties. Significantly higher temperatures and greater uncertainty in precipitation could dramatically increase drought risk. According to statistics, drought events have been on the rise since 1900 (p < 0.001), and especially since the 1980s. Data sources. Global VCI and TCI data were downloaded from the National Oceanic and Atmospheric Administration (NOAA) Center for Satellite Application and Research (STAR) (https://www.star.nesdis.noaa. gov/smcd/emb/vci/VH/vh_ftp.php), with a spatial resolution of 4 km, a weekly temporal resolution, and a time span ranging from 1981 to 2021. We organized the data in annual values through the arithmetic mean, and processed the background value (−9999) as a null value through the spatial masking technique to facilitate subsequent spatial and related statistical analyses.
Global sc-PDSI data were downloaded from the Climate Research Unit (CRU), with a spatial resolution of 0.5°, a monthly temporal resolution, and a time span ranging from 1901 to 2020 (https://crudata.uea.ac.uk/ cru/data//drought/#global) 44,45 . We organized the data in annual data by the arithmetic mean, then resampled to 4 km by the nearest neighbor method to match the VCI and TCI data. Resampling the 0.5 ° sc-PDSI to 4-km will not lose the information it contains, making the results acceptable. However, the use of sc-PDSI with higher spatial resolution may provide more information on hydrothermal and soil characteristics and may reduce the uncertainty of results.
Data on global drought events from 1900 to 2021 were obtained from the EM-DAT (https://public.emdat.be/) 46  Annual VHI calculation. According to Kogan 35 , the VCI for each pixel and period in a given year was calculated as follows: where NDVI is the value of a given pixel and period, and NDVI MIN and NDVI MAX are the minimum and maximum values of NDVI for all pixels and periods, respectively. Equation (2) was used to calculate the TCI: where LST is the value of a given pixel and period. LST MIN and LST MAX are the minimum and maximum values of LST for all pixels and periods, respectively. The VHI represents the overall health of the vegetation and is used to identify drought 48 . It is calculated by combining the VCI and the TCI as follows: where a determines the contributions of VCI and TCI to VHI, which varies depending on the environment of the study area 33 . The original VHI (VHI ori ) assumes equal contributions from water demand (here, a proxy of NDVI) and temperature during plant growth, and the coefficient a is assigned the value of 0.5. Following other studies 35,49,50 , we classified drought levels on the basis of the VHI ( Table 1):

Improvement of the VHI algorithm.
We chose the meteorological drought index sc-PDSI, which considers hydrothermal conditions and soil moisture, as the control drought index, and combined it with Pearson correlation analysis to evaluate the contribution of VCI and TCI on a grid by grid basis, thereby obtaining an improved VHI index. To improve VHI, the following steps are taken: (1) The parameter a was set to vary in steps of 0.02, www.nature.com/scientificdata www.nature.com/scientificdata/ starting from 0.02 and gradually increasing to 0.98 (49 values in total). The VHI was then calculated in each step for each year from 1981 to 2021 based on different a values. The specific formula is as follows: Where VHI i,t,a represents the VHI when the VCI contribution of the ith pixel is a at time t. VCI i,t represents the VCI of pixel i at time t, and TCI i,t represents the TCI of pixel i at time t.
(2) The Pearson correlation was then used to evaluate the correlation between sc-PDSI and the VHI calculated in each iterative step of contribution values a pixel by pixel. The sc-PDSI based on the water balance theory fully considers meteorological factors and soil moisture conditions, suitable for improving the VHI 23 . The Pearson correlation coefficient is calculated as follows: Where R i represents the correlation coefficient between VHI and sc-PDSI of pixel i, x i represents the VHI of pixel i, x i represents the average value of VHI of pixel i from 1981 to 2021, y i represents the sc-PDSI of pixel i, and y i denotes the mean value of sc-PDSI for pixel i from 1981 to 2021. The range of |R| is 0 to 1.
(3) After obtaining the spatial correlation coefficient map of sc-PDSI and VHI calculated with different a values, the optimal contribution value a was estimated pixel by pixel according to the following formula: Where a i,opt represents the best contribution value a of VCI to VHI at pixel i, and (1-a i,opt ) represents the best contribution value of TCI to VHI. R is the correlation coefficient between VHI and sc-PDSI, VHI i,a is the VHI of pixel i when the contribution value of VCI is a, and scPDSI i is the sc-PDSI of the ith pixel. By comparing the correlation coefficients between sc-PDSI and VHI calculated for the 49 values of a pixel by pixel, the a value of the largest correlation coefficient was selected as the best contribution value a of the VCI to VHI of this pixel. (4) Finally, according to the best contribution value a and the calculation formula of VHI, the improved VHI, namely VHI opt , is obtained. The specific formula is as follows: Where VHI i,t,opt represents the VHI obtained by the best contribution value a of the ith pixel at time t, and a i,opt represents the best contribution value of the VCI to the VHI of the ith pixel. The algorithms used to improve the VHI index have been uploaded to Github and can be obtained from https://github.com/BNUJingyuZeng/A-ne w-global-VHI-dataset-code. A general overview of the working scheme is given in Fig. 2. We compared the scatterplots and linear regression fittings of sc-PDSI and VHI before and after the improvement using detrending. The method of detrending is as follows: Where VHIde i,t represents the VHI for the ith pixel at time t after detrending, VHI i,t represents the VHI for ith pixel at time t before detrending, and VHI i,t−1 represents the VHI of pixel ith at time t-1 before detrending.
Evaluation of the drought detection efficiency. Drought detection efficiency of the VHI dataset was evaluated before and after the improvement, based on drought event records in the EM-DAT. The specific formula is as follows: DTE is the drought detection efficiency, S is the score, and TDE is the total number of drought events. The score is evaluated according to the following principles. The VHI-based vegetation drought rating scale threshold is selected as 40. When the VHI is lower than 40, this means dryness (Table 1) -let's assume drought, and if it is higher than 40, this means normal or wet conditions (Table 1). Given the occurrence time and location

Category VHI
Extremely dry [0, 10] Severely dry (10,20] Moderately dry (20,30] Mild dry (30,40] Normal (40,50] Good (50,  www.nature.com/scientificdata www.nature.com/scientificdata/ of the drought event, VHIs before and after the improvement are compared one by one. When the number of pixels detected by drought accounts for more than 80% of the number of pixels in the area where the drought event occurred, the score is 1, when the number of pixels detected by drought accounts for more than 40% of the number of pixels in the area where the drought event occurred, the score is 0.5, when the number of pixels detected by drought accounts for less than 40% of the number of pixels in the area where the drought event occurred, the score is 0. Statistical methods. The Mann-Kendell (MK) method is widely used in meteorology, ecology, environmental research [51][52][53][54] . It is a nonparametric test method 55,56 . We used the Theil-sen trend analysis and the MK trend detection methods to study the temporal and spatial trends of global vegetation drought.

Data records
Extent, projection, resolution and data format. The VHI opt dataset covers most of the world's land areas except Antarctica, with an approximate range of −50°S to 70°N and −180°W to 180°E. The data projection is GCS_WGS_1984. This dataset is available as a 4-km GeoTIFF accessible from a data repository on figshare (https://doi.org/10.6084/m9.figshare.19811854.v5) 47 . For each year's VHI opt image, the globe is divided into 10000 × 3616 1-km × 1-km grid cells. In addition to annual data, we also provide monthly data, which can be used to analyze vegetation drought and related research from the seasonal scale analysis, and also provide a list to explain which months' data are missing due to the lack of data, and how to combine weekly data into monthly data. We also provide a global best contribution parameter dataset based on the 1981-2021 VHI dataset, i.e., the a opt parameter file. The file's extent, projection, spatial resolution and data format are consistent with the VHI opt dataset. The annual VHI opt dataset does not exceed 12 GB (about 3 GB for the compressed download). To preserve as much information as possible, we kept the value of each cell to six decimal places. Missing data are represented by "Nodata".
Data naming and availability. The global 1981-2021 VHI opt dataset is named "VHIopt_year.tif ", where year corresponds to the year of the data, with a total of 41 files. The VHI opt data of the corresponding year is calculated from annual VCI and TCI data of the year with the best contribution parameter graph from the a opt file. The parameter file of the global best contribution value a is directly named a opt , which can be used to calculate VHI data from small to macro scales and from daily to seasonal scales.

Technical Validation
Drought detection efficiency of the VHI opt dataset. The best contribution value a opt is less than 0.5 in most regions of the world. The proportion of regions dominated by TCI reaching 70% and the proportion of regions dominated by VCI of ~28% (Fig. 3). VHI opt is affected by TCI and VCI differently in different regions. Most of Africa is dominated by VCI, while South America, Australia, and regions north of 30° latitude are dominated by TCI. This is especially visible in Europe, where TCI-dominated regions are concentrated and contiguous, with a opt values generally lower than 0.3. Overall, an abnormally high surface temperature is the main driving factor affecting vegetation drought.
The coefficient of determination of the VHI before and after the improvement and the sc-PDSI were compared, as well as the performance before and after detrending (Fig. 4). The coefficient of determination of VHI opt and sc-PDSI (Fig. 4a,c) agree better compared with VHI ori (Fig. 4b,d). The correlation between VHI opt and sc-PDSI was 0.13 higher than that between VHI ori and sc-PDSI (i.e., 0.51 and 0.38, respectively) after excluding time-dependent trends (Fig. 4a,b), which implies an improvement of 34% in correlation. The correlation between VHI opt and sc-PDSI without detrending was 0.04 higher than that between VHI ori and sc-PDSI (i.e., www.nature.com/scientificdata www.nature.com/scientificdata/ 0.41 and 0.37, respectively; Fig. 4c,d), i.e., an improvement of 10%. The improved VHI dataset can better capture soil moisture anomalies and heat stress levels than the original VHI dataset, thus improving the detection of vegetation drought.
Based on global drought events recorded in the EM-DAT, we compared the ability of VHI opt and VHI ori to detect drought in a year-by-year globally (Fig. 5). The ability of VHI opt to detect drought was higher than that of VHI ori , with drought detection efficiencies about 84.97% and 70.69%, respectively. The period from 1997 to 2002 was a period of frequent drought events in the world. The year 2001 experienced the largest number of drought events, i.e., 27. Drought events have declined in recent years.
The same analysis was repeated but by continent (Fig. 6). The drought detection efficiency of VHI opt was higher than that of VHI ori in all continents except Oceania. The drought event frequency from high to low was  www.nature.com/scientificdata www.nature.com/scientificdata/ 251, 143, 84, 56, 42, and 16 for Africa, Asia, North America, South America, Europe, and Oceania, respectively. The VHI opt drought detection efficiency was highest in North America at 90% and lowest in Oceania at 81%. The VHI ori drought detection efficiency is the highest in Oceania at 84% and lowest in Asia at 62%. The drought detection efficiencies of VHI opt and VHI ori in Oceania were not much different, related to the small number of drought events occurring there. In addition, in many areas of Oceania, drought occurs on small islands, which are affected by the resolution of the data. In Australia, the drought detection efficiency of VHI opt was still higher than that of VHI ori .
In general, compared with the original VHI ori , our improved VHI opt dataset has an improved drought detection efficiency in all continents. It has broad application prospects for the detection and long-term monitoring of vegetation drought. Spatial differences between VHI opt and VHI ori . Based on the estimated global best contribution value a opt , we analyzed global annual VHI opt and VHI ori from 1981 to 2021 and compared their spatial patterns (Fig. 7). A relatively low degree of global vegetation drought is seen, with normal to good levels in most areas over the past 40 years, especially near the Equator, eastern North America, and southeastern South America. Clear differences between the two indices are seen (Fig. 7c). VHI opt effectively detected mild vegetation drought, while VHI ori underestimated the occurrence and impact range of mild vegetative drought. VHI ori was significantly higher than VHI opt along the western coasts of North America, Europe, South America, and southwest Asia.
We also compared VHI opt and VHI ori globally and by continents (Fig. 7b,d). VHI ori is higher than VHI opt in 66.5% of the cases. Oceania was the region with the lowest level of vegetation health in the world, as well as the region with the greatest variability, the highest uncertainty, and a higher risk of vegetation health. The vegetation health level in North America was the highest in the world, with average values of 49.02 and 47.98 for VHI ori and VHI opt , respectively, which are significantly different from Oceania (p < 0.05).
We further zoomed in on the above regions to compare spatial differences between VHI opt and VHI ori (Fig. 8). In South America and Europe, VHI opt values below 40 indicates that this index effectively detected vegetation drought, especially in specific years. VHI ori values above 40 indicate normal conditions. VHI opt was more sensitive to mild vegetation drought than VHI ori , illustrating its better ability to detect mild drought. VHI opt also  www.nature.com/scientificdata www.nature.com/scientificdata/ improved the ability to assess the occurrence range of vegetation drought compared with VHI ori . For example, in the western United States and southwestern Asia, the spatial range of VHI ori values below 40 was smaller than that of VHI opt , and there was an overly optimistic estimate of vegetation health. Furthermore, VHI opt also more effectively assessed the extent of vegetation drought in specific years. In general, VHI opt has clear advantages over VHI ori in both drought detection efficiency and drought occurrence range identification.
Trends seen in the global VHI opt dataset. We analyzed global and continental trends in VHI opt and VHI ori from 1981 to 2021 (Fig. 9). Results show that over the past 40 years, the VHI opt in Europe, South America, and Oceania has decreased significantly by about 0.14, 0.31, and 0.35 per year, respectively. There was no significant change in Asia, North America, and Africa. Globally, VHI opt showed an overall downward trend, with an average annual decrease of 0.16. VHI ori showed a significant increase in Europe, with an annual increase of about 0.2, and a significant decline in Africa, with a decrease of about 0.24 per year. A further analysis showed that South America was the region with the most obvious downward trend in VHI opt (R 2 = 0.54). From 1981 to 2005, inter-annual differences in Asia were relatively large, and from 2005 to 2021, changes in VHI opt and VHI ori gradually stabilized. This suggests that the future uncertainty of vegetation health in Asia may be less than in other regions.

Fig. 7
Global distributions of (a) VHI opt and (c) differences between VHI opt and VHI ori . Box plots of (b) VHI opt and (d) VHI ori globally and by continent. The letters above the boxes indicate significant differences at the p = 0.05 level. www.nature.com/scientificdata www.nature.com/scientificdata/ Since VHI opt has noticeable advantages over VHI ori in both drought detection efficiency and drought occurrence range identification, we used the VHI opt dataset to evaluate the changing trend of global vegetation drought to identify hot spots where vegetation drought may be further aggravated in the future and areas with improved vegetation health (Fig. 9). Results show that the spatial heterogeneity of global vegetation drought changes may be high in the future. The western United States, South America, central and southern Australia, and southwestern Asia are regions where VHI opt will decreases. On the other hand, central Africa, India, southern China, North America, and high-latitude parts of Asia are areas where vegetation droughts will be alleviated and vegetation health levels will improve. These changes have passed the significance assessment in the MK trend test. In the context of climate change, the changes in vegetation drought in areas where VHI opt has declined are subject to large uncertainties and required more attention.

Usage Notes
We produced a long-term (1981-2021) high-spatial-resolution (4 km) improved vegetation drought detection dataset and the best contribution parameter dataset that can be used to calculate improved VHIs in different regions. We also evaluated the efficiency of the VHI opt dataset to detect drought and the spatial pattern and changing trends of global vegetation drought to improve our techniques for long-term monitoring of vegetation drought. Based on the best global contribution parameter dataset, VHI opt data in different regions can be calculated for performing drought assessment and studying food production and vegetation carbon sinks.
The data set can help people to carry out drought assessment more conveniently and efficiently, but there are areas where improvements are needed. First, the spatial resolution of the data may have an impacts on findings 57 . Using higher-spatial-resolution remote sensing products and sc-PDSI data will help further improve the detection efficiency of vegetation drought. Second, increased human activities may complicate changes in vegetation aridity 25 . Human interventions may have more complex impacts on vegetation health, requiring further research 58 .
Our research was conducted on a global scale, providing a tool aimed at understanding global and regional vegetation drought characteristics. This dataset greatly improves the ability of VHI to detect vegetation drought, and can help people better understand the impact of temperature on vegetation drought in different regions. This research is also conducive to the effective implementation of vegetation drought resistance ecological engineering, and helps local governments and farmers reduce the losses caused by vegetation drought. The global vegetation drought dataset developed here also has the potential to be used to monitor and assess drought and its impact on the agricultural, forestry, ecological, and environmental sectors.  9 Time series of (a-g) VHI opt and (h-n) VHI ori in Asia, North America, Europe, Africa, South America, Oceania, and globally, respectively. Global (o) trends in VHI opt (unit: per year) and (p) p-values. The relations, coefficients of determination, and p-values are given in each (a-n) panel.