A global historical twice-daily (daytime and nighttime) land surface temperature dataset produced by Advanced Very High Resolution Radiometer observations from 1981 to 2021

. Land surface temperature (LST) is a key variable for monitoring and evaluating global long-term climate change. However, existing satellite-based twice-daily LST products only date back to 2000, which makes it difﬁcult to obtain robust long-term temperature variations. In this study, we developed the ﬁrst global historical twice-daily LST dataset (GT-LST), with a spatial resolution of 0.05 ◦ , using Advanced Very High Resolution Radiometer (AVHRR) Level-1b Global Area Coverage (GAC) data from 1981 to 2021. The GT-LST product was generated using four main processes: (1) GAC data reading, calibration, and preprocessing using open-source Python libraries; (2) cloud detection using the AVHRR-Phase I algorithm; (3) land surface emissivity estimation using an improved method considering annual land cover changes; (4) LST retrieval based on a non-linear generalized split-window algorithm


Introduction
Land surface temperature (LST) is one of the key physical variables of land surface processes (Li et al., 2013).As an indicator of the regional and global surface energy and water balance (Duan et al., 2019;Liu et al., 2019;Ma et al., 2020;Zhang et al., 2022), LST has been used to detect climate change (Bright et al., 2017;Hansen et al., 2010;Jin and Dickinson, 2002;Li et al., 2015), estimate surface soil moisture (Bai et al., 2019;Song et al., 2022;Zhao et al., 2021), monitor vegetation (Duveiller et al., 2018;Sims et al., 2008;Weng et al., 2004), assess drought (Sánchez et al., 2018;Zhang et al., 2017), and study the urban thermal environment (Phan and Kappas, 2018;Si et al., 2022).Many of these applications require long-term observations made at regular temporal revisit intervals over large spatial scales (Hong et al., 2022).Compared to traditional ground observations, which are sparse, unevenly distributed, and able to obtain LST only at a specific point, satellite observations offer a valid opportunity to obtain LST data with a large and continuous spatial coverage.
LST cannot be measured directly by satellite but can be estimated from satellite-based thermal infrared (TIR) data (Li et al., 2013).To date, several methods for LST retrieval have been developed in accordance with TIR data, such as the mono-window algorithm (Qin et al., 2001), split-window algorithms (Becker and Li, 1990;Wan and Dozier, 1996), the temperature-emissivity separation algorithm (Gillespie et al., 1998), and the physical day and night algorithm (Wan and Li, 1997).Currently, a number of publicly available LST products exist that are based on various TIR instruments aboard satellite platforms and derived from different LST retrieval algorithms (Z.-L.Li et al., 2023).These LST products can be divided into three approximate categories according to their spatial-temporal resolutions and time periods: (1) global LST products with low temporal resolution but high spatial resolution, such as the Landsat LST product (16 d and 30 m) and the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) LST product (16 d and 90 m) (Gillespie et al., 1998;Malakar et al., 2018), (2) global LST products with medium spatial resolution (1 km) and medium temporal resolution (twice daily), such as the Advanced Very High Resolution Radiometer (AVHRR) LST product, the (Advanced) Along-Track Scanning Radiometer LST product, the Moderate Resolution Imaging Spectroradiometer (MODIS) LST product, and the Visible and Infrared Imagery Radiometer Suite LST product (Hulley and Hook, 2018a, b;Prata, 2002;Trigo et al., 2011;Wan, 2006;), and (3) regional LST products with relatively low spatial resolution but high temporal resolution, such as the Advanced Baseline Imager LST product (America, 1 h and 2 km), the Spinning Enhanced Visible and InfraRed Imager LST product (Africa, 15 min and 3 km), the Advanced Geosynchronous Radiation Imager LST product (China, 1 h and 4 km), and the Advanced Himawari Imagers LST prod-uct (Japan, 1 h and 2 km) (Trigo et at., 2008;Yamamoto et al., 2018;Yang et al., 2017;Yu et al., 2008).In summary, the number of regional and global LST products derived from TIR data has increased, but global daily satellite-derived LST products with medium and high spatial resolution only date back to the year 2000.However, many application fields, including climate change, environmental monitoring, and meteorology, urgently require global LST products with twicedaily observations that include more than 40 years of available data (IPCC, 2014;Liu et al., 2019;Ma et al., 2020).Notably, AVHRR is the only sensor that has the advantages of frequent revisits (twice per day), relatively high spatial resolution (4 km at the nadir), global coverage, and easy access prior to 2000.
Several LST products were generated from AVHRR TIR measurements before 2000 (Table 1).These products can be broadly classified into two categories.The first includes regional products with relatively high spatial or temporal resolution.For example, the European Space Agency produced the World Land Surface Temperature Atlas dataset, which provides monthly LST data over Europe at 1 km and 0.5 • spatial resolution from 1992 to 1993 (Kerr et al., 1998).Moreover, Pinheiro et al. ( 2006) developed a regional daily, 8 km resolution, daytime and nighttime LST dataset over Africa for the NOAA-14 AVHRR from 1995 to 2000 (denoted as RT-LST).Khorchani et al. (2018) generated a longterm AVHRR LST dataset with a spatial resolution of 1 km for peninsular Spain at annual and seasonal timescales for 1981-2015.Furthermore, a long-term study by the TIME-LINE (Time Series Processing of Medium Resolution Earth Observation Data Assessing Long-Term Dynamics in our Natural Environment) project of the Earth Observation Center at the German Aerospace Center provided a long time series of almost 40 years of daily AVHRR LST at 1 km spatial resolution over Europe and northern Africa (Frey et al., 2012(Frey et al., , 2017;;Holzwarth et al., 2021;Reiners et al., 2021).The second category includes global products with low temporal resolution.For example, Ouaidrari et al. (2002) generated a global monthly average LST dataset at 8 km spatial resolution for January and July 1989, based on the AVHRR Land Pathfinder II project framework.Moreover, Jin (2004) provided a monthly global 8 km, 0.5 and 5 • resolution LST dataset based on the diurnal temperature cycle model, which spans a 17-year period (i.e., 1981-1998).A more recent study by Ma et al. (2020) generated a global historical daytime 0.05 • × 0.05 • LST product (denoted as GD-LST) by reprocessing the daytime AVHRR dataset (including reflectance data and brightness temperature data) provided by the Land Long Term Data Record (LTDR) for 1981-2000.In summary, these efforts are limited by covering only certain regions (e.g., Europe or Africa) or their coarse temporal resolutions (e.g., daytime or monthly).To develop a long-term (>40-year) satellite-derived LST product, it is necessary to generate a twice-daily AVHRR LST product that covers a period of more than 40 years or that can be combined with the existing satellite-derived twice-daily LST product (e.g., MODIS) after 2000.Moreover, global long-term meteorology and climatology-related applications also demand global and instantaneous AVHRR LST data with two observations each day, which can be used to estimate relatively accurate climate change indices such as the mean LST, extreme LST, and diurnal LST range.
In this study, we aim to fill this research gap by developing a standard global historical twice-daily (daytime and nighttime) LST product (GT-LST) at 0.05 • spatial resolution.GT-LST is derived from original long time series AVHRR Level-1b Global Area Coverage (GAC) data spanning a 41-year period .Section 2 introduces the data used in this study, including data for LST generation and validation.Section 3 describes the methodology for GT-LST generation and validation.Section 4 presents and discusses the results.Section 5 summarizes the main conclusions.

AVHRR datasets
The GT-LST product is derived from AVHRR sensors installed aboard the NOAA series of Polar Operational Environmental Satellites (POES) (Cracknell, 1997).According to the operational times of different POES satellites, NOAA-7/9/11/14/16/18/19 were selected to generate a global longterm LST from 1981 to 2021 (Fig. 1).The orbital period was about 102 min, producing 14 orbits per day (Kidwell, 1998).The AVHRR sensor has six spectral bands with a spatial resolution of 1.1 km at the nadir and scan angles of approximately ±55 • off the nadir (Table 2).Although the AVHRR sensors measure the same infrared bands, their spectral responses are not completely identical.Figure 2 shows the spectral responses of the two infrared bands of NOAA-7/9/11/14/16/18/19.
The commonly used AVHRR Level-1b GAC data are reduced-resolution data, which take the first scan line out of every three, average four of each five consecutive samples along the scan line, and are processed aboard the satellite in real time.Therefore, AVHRR Level-1b GAC data are generally treated as having a coarse resolution of 4 km at the nadir, and the pixel size increases with the satellite zenith angle (VZA).Furthermore, as the VZA increases, the geolocation accuracy of the AVHRR GAC scene becomes lower, particularly when VZAs are larger than 40 • (Wu et al., 2020).However, the AVHRR Level-1b GAC dataset is the only dataset in which every place on Earth has been sampled at least twice per day (daytime and nighttime) since 1981 (Kidwell, 1998).Thus, AVHRR Level-1b GAC data are available for generating global daytime and nighttime LST data from 1981 to 2021.AVHRR GAC data were archived in Level-1b format with 10 bit precision.Then the data were assembled into discrete datasets using full orbits with quality control.Each file contains video data for the six channels and time codes, quality indicators, Earth location, calibration information, and solar zenith angles (SZAs).AVHRR Level-1b GAC data were obtained from the NOAA Comprehensive Large Array-Data Stewardship System (https://www.avl.class.noaa.gov/saa/products/search?datatype_family=AVHRR, last access: 5 April 2023).

Datasets for generating simulations
To obtain the nonlinear generalized split-window (GSW) algorithm coefficients, it is necessary to establish a comprehensive simulation dataset.In this study, we used the latest version of the Thermodynamic Initial Guess Retrieval 2000 dataset, which is a reliable atmospheric profile dataset, and the ASTER spectral library, which is a collection of the Jet Propulsion Laboratory spectral library, Johns Hopkins University spectral library, and United States Geological Survey spectral library.
The Thermodynamic Initial Guess Retrieval 2000 dataset (V1.2) contains 2311 representative atmospheric situations that were carefully selected from 8000 global radiosonde reports (Chedin et al., 1985).Each situation consists of temperature, ozone concentrations, and water vapor values at a given pressure level from the surface to the top of the atmosphere.Finally, we obtained 946 globally representative and clear-sky atmospheric conditions by removing cloudy atmospheric conditions, i.e., removing the relative humidity at any pressure level exceeding 90 % or two adjacent pressure levels exceeding 85 %.The ranges of water vapor content (WVC) and near-surface air temperature values are 0.06-6.5 g cm −2 and 230-310 K under these atmospheric conditions.
The ASTER spectral library version 2.0 includes over 2300 spectra of natural and artificial materials covering the wavelength range from 0.4 to 15.4 µm.In this study, we used 54 land surface emissivity spectra to represent different land surface types, including 41 soil types, 4 vegetation types, 4 water body types, and 5 ice/snow types.The emissivity values of the AVHRR TIR channels were estimated by convolving the emissivity spectra with the relative spectral response functions of AVHRR bands 4 and 5.

Datasets for emissivity estimation
For nonlinear GSW, emissivity is an essential parameter in LST retrieval, and its accuracy directly affects LST accuracy.Four datasets were used for emissivity estimation, except for the Level-1b reflectance dataset of the GT-LST product: ASTER Global Emissivity Dataset (GED), Global Soil Regions Map (GSRM), and global yearly land cover dynamics of the Global Land Surface Satellite (GLASS-GLC) and MODIS land cover product (MCD12C1).
The ASTER GED product, which provides the global mean land surface emissivity in five ASTER TIR spectral bands with spatial resolutions of 100 m and 1 km on 1 Administration's Jet Propulsion Laboratory (Hulley et al., 2015).The emissivity of the ASTER GED was developed from all clear-sky ASTER data acquired over 2000-2008 using temperature-emissivity separation algorithms and the water vapor scaling atmospheric correction algorithm.This product also provides the mean LST, mean normalized difference vegetation index (NDVI), global digital elevation model, land-water mask, and other data.In this study, we used the ASTER GED mean emissivity and mean NDVI at 1 km spatial resolution.
The GSRM product provides the global distribution of 12 major soil types with an approximately 0.03 • spatial resolution.It was generated by the United States Department of Agriculture using a reclassification of the FAO-UNESCO Soil Map of the World combined with a soil climate map (https://www.nrcs.usda.gov/wps/portal/nrcs/detail/soils/use/?cid=nrcs142p2_054013, last access: 5 April 2023).
The GLASS-GLC product provides the first record of the 1982-2015 global yearly land cover dynamics with a spatial resolution of 5 km (Liu et al., 2020).It forms part of the global land surface satellite products and is generated using the Google Earth Engine platform.This land cover product contains seven types of land cover: barren land, tundra, cropland, grassland, shrubland, forest, and snow/ice.The average overall accuracy of each land cover type from 1982 to 2015 according to 2431 test sample units is 82.81 %.The GLASS-GLC product from 1982 to 2005 was used in this study.The MCD12C1 product provides global maps of land cover at a spatial resolution of 0.05 • and an annual time step, start-  and Friedl, 2018).In this study, the Collection-6 MCD12C1 product from 2006 to 2020 was employed.In the absence of available global land cover datasets for 1981 and 2021, the land cover data for 1982 and 2020 were used as a substitute in this study.
To match the GT-LST pixels, these global surface datasets were mosaicked and resampled to 0.05 • spatial resolution in terms of their geographic longitude and latitude.

Atmospheric water vapor content dataset
The ancillary dataset used for LST retrieval was the Modern-Era Retrospective Analysis for Research and Applications Version 2 Reanalysis dataset, tavg1_2d_slv_Nx, which provides an hourly time-averaged WVC (the variable name is TQV in this dataset) at 0.5 • × 0.625 • spatial resolution (https://disc.gsfc.nasa.gov/datasets?project=MERRA-2, last access: 5 April 2023).The TQV dataset was corrected to match the spatial resolution and overpass time of AVHRR prior to LST retrieval.

Validation datasets
Validation of product accuracy is necessary before applying a new LST product.In this study, ground-based validation, satellite product intercomparison, and comparison with existing AVHRR LST data were used to assess the accuracy of the retrieved product.
In situ measurements from the Surface Radiation Budget (SURFRAD) network and the Baseline Surface Radiation Network (BSRN) were used to validate GT-LST.(Augustine et al., 2000).In this study, we selected seven stations of the SURFRAD network representing various land cover types and providing in situ data between 1994 and 2005 (Table 3).SURFRAD sites provide quality-controlled measurements of solar/infrared upwelling/downwelling radiation.Upwelling and downwelling TIR radiances are the primary measurements used to retrieve in situ LST.The instrumental error of the SURFRAD station gives rise to uncertainty in the retrieved LST value of less than 1 K (Guillevic et al., 2012).Therefore, the LST from SURFRAD has been widely used to evaluate ASTER, MODIS, and VIIRS LST products (Wang et al., 2008;Wang and Liang, 2009).The BSRN has 76 stations that have detected important changes in the Earth's radiation field at the Earth's surface since 1992.These stations provide highquality surface and upper-air meteorological observations, which are important in supporting the validation and confirmation of satellites.We selected four sites with measurements of upwelling and downwelling TIR radiances before 2000 (Table 3).In situ LST measurements were estimated using the Stefan-Boltzmann law as follows: where LST s is the in situ LST, σ is the Stefan-Boltzmann constant, R ↑ and R ↓ are the upwelling and downwelling longwave radiation, respectively, and ε b is the broadband emissivity, which was derived from Duan et al. (2019).
The MODIS LST products (MYD11A1 and MYD21A1) were used to evaluate the accuracy of GT-LST.MYD11A1 LST is a daily level-3 LST product, which is a typical operational and standard LST product with a 1 km spatial resolution from 2002 to the present.MYD11A1 observations were obtained by the MODIS sensor aboard the Aqua satellites and pass through the Equator at approximately 13:30/01:30 local solar time.Every pixel has quality flags containing cloud contamination, emissivity, input data, and calibration.In this study, Collection-6.1 MYD11A1 of 2004 was selected for sensor-to-sensor comparison.The MYD21A1 LST product, which uses the same observations with MYD11A1 but uses the temperature-emissivity separation method to dynamically retrieve LST and emissivity, was also selected to make an intercomparison with GT-LST in this study.This intercomparison was conducted in four months in 2004 (January, April, July, and October), which cover different seasons.
Globally and regionally historical AVHRR LST products, GD-LST and RT-LST, were used to compare to GT-LST.GD-LST especially is the only currently available global daytime AVHRR LST, with a spatial resolution of 0.05 • × 0.05 • from 1981 to 2000.Compared to GT-LST, GD-LST is not derived from the original AVHRR Level-1b GAC datasets but from LTDR datasets that reprocess daytime AVHRR data such as the reflectance, top-of-atmosphere brightness temperature of TIR bands, and NDVI.RT-LST is a twice-daily LST product at 8 km resolution over continental Africa from 1995 to 2000, which is based on GAC data.Auxiliary data of RT-LST only include cloud mask and observation time without VZAs.

LST generation
This study developed an AVHRR LST processing system to produce a global historical twice-daily (daytime and nighttime) LST dataset with a 0.05 • spatial resolution from 1981 to 2021 (Fig. 3).The system includes four steps: (1) data reading, calibration, and preprocessing; (2) cloud detection; (3) land surface emissivity estimation; (4) LST retrieval.In the following subsections, we describe each major component of the processing system.

Data reading, calibration, and preprocessing
The first step in our framework includes reading, decoding, performing quality control, and calibrating packed 10 bit AVHRR Level-1b GAC data (Fig. 3).In this study, we used an open-source and community-driven package, Pygac, to process the 41-year AVHRR Level-1b GAC data record.Pygac is a Python package used for reading, calibrating, and navigating data from the AVHRR instrument in GAC and Local Area Coverage (LAC) formats (Devasthale et al., 2017).Many studies have processed AVHRR GAC/LAC data using this package (Karlsson et al., 2017;Pareeth et al., 2016;Wu et al., 2021).By inputting the AVHRR Level-1b GAC data and two-line elements of a satellite into the Pygac program, we can obtain calibrated quality-control (QC) flags, sun-satellite position, reflectance and brightness temperature data.The complete details of the package are provided at https://github.com/pytroll/pygac(last access: 5 April 2023).
We then remapped and rebinned the data into the World Geodetic System 1984 projection with 0.05 • grid cells.Owing to the wider scan angles of NOAA satellites, panoramic bow-tie effects were apparent at the edges of the images (Pareeth et al., 2016).Thus, we used the Pyresample package to resample the AVHRR Level-1b GAC data and correct for bow-tie effects.Further details of the package are explained at https://github.com/pytroll/pyresample(last access: 5 April 2023).In areas where multiple AVHRR observations were available for a given grid cell, especially at polar latitudes, we selected and stored only one observation per grid cell with the maximum brightness temperature from channel 4 (Pinheiro et al., 2006;El Saleous et al., 2000).We assumed that this observation had a lower possibility of including cloud.Then, we distinguished daytime and nighttime observations using SZAs to ensure compatibility with the cloud-detection Earth Syst.Sci.Data, 15, 2189Data, 15, -2212Data, 15, , 2023 https://doi.org/10.5194/essd-15-2189-2023algorithm (Stowe et al., 1999).If the SZA of a pixel was less than 85 • , the pixel and its observations were assigned to the daytime class; otherwise, they were assigned to the nighttime class.

Cloud detection
Currently, no global daytime and nighttime cloud mask datasets are available for AVHRR Level-1b GAC data be- Further details of the algorithm are provided by Stowe et al. (1999).In this study, we used the day-land and night-land algorithms of CLAVR-1 to identify clear and cloudy pixels and create a cloud mask dataset (Fig. 3).

Land surface emissivity estimation
To retrieve LST using nonlinear GSW, the land surface emissivity must be known a priori.The NDVI threshold method is an operationally simplified emissivity estimation method that is widely used to estimate emissivity from AVHRR observations (Liu et al., 2019;Ma et al., 2020;Sobrino et al., 2008).However, previous studies have combined this method with a fixed land cover dataset to determine the long-term emissivity (Frey et al., 2017;Ma et al., 2020;Reiners et al., 2021).
As an intrinsic property of the surface, land surface emissivity predominantly depends on the land cover type, which is highly temporally dynamic because of phenological changes and human activities.Therefore, to obtain relatively accurate emissivity values, we developed an improved method that considers annual changes in land cover from the GLASS-GLC dataset and combines ASTER GED data with the NDVI threshold method to estimate the emissivity (Fig. 3).First, we assumed that the emissivity of an AVHRR pixel can be described as the weighted ensemble of bare soil emissivity and vegetation emissivity, where the weights are determined by the vegetation cover fraction: (2) Here, ε i is the emissivity in channel i, ε i,v is the vegetation emissivity in channel i, ε i,s is the bare soil emissivity in channel i, and P v is the fraction of vegetation cover, calculated as follows: where NDVI max and NDVI min are the thresholds for pure vegetation and pure bare soil pixels, respectively.According to Sobrino et al. (2001), NDVI max and NDVI min were set to 0.5 and 0.2, respectively.When NDVI is no more than 0.2, the pixel is assumed to be pure bare soil with no vegetation cover; when NDVI is no less than 0.5, the pixel is assumed to be pure dense vegetation.Following Eq. ( 2), the bare soil component emissivity of ASTER channels 10-14 can be calculated as follows: where ε AST i,s is the bare soil emissivity in ASTER channel i (i = 10, . . ., 14), and ε AST i,v is the emissivity of dense vegetation in ASTER channel i.Because the emissivity spectra of dense vegetation are similar and vary slightly in the TIR region, we used the dense vegetation emissivity of ASTER channel i provided by Meng et al. (2016).ε AST i is the emissivity of the ASTER GED product in channel i. P v is calculated from the NDVI of the ASTER GED product according to Eq. (3).For long-term cloud-cover pixels and densevegetation pixels (P v = 1), the bare-soil emissivities of these ASTER pixels are null values.To generate a global gap-free bare-soil emissivity map of ASTER, we used the average emissivity of the same soil type within 5 × 5 neighborhood pixels to fill these null values.Because of some pixels with no valid neighbor pixels for averaging, we needed to enlarge the neighborhood until all null values are filled.Soil-type data are described in Sect.2.3.
Figure 2 shows that the spectral range of ASTER channels 10-14 covers AVHRR channels 4 and 5.A linear regression relationship was used to convert the bare soil emissivity values from ASTER channels to AVHRR channels.
where ε AVH j,s is the bare soil emissivity in the AVHRR channel j (j = 4, 5), and b 0 to b 5 are the coefficients provided by Ma et al. (2020).
The emissivity of each vegetation type in the GLASS-GLC dataset was obtained from Ma et al. (2020).Specifically, the vegetation type of a pixel was determined from the annual global land cover dataset (see Sect. 2.3).NDVI values were derived from the reflectance data of AVHRR channels 1 and 2 (see Sect. 3.1).In addition, the emissivity values of water pixels and ice/snow pixels were used to distinguish nonvegetated pixels.We then produced a daily dynamic global emissivity map for AVHRR channels 4 and 5. Further details can be found in Ma et al. (2020).

LST retrieval
To obtain the LST, we adopted the nonlinear GSW algorithm proposed by Wan (2014) because of its simplicity, efficiency, and high accuracy.The algorithm can be formulated as follows: with ε = (ε 4 + ε 5 ) /2 and ε = ε 4 − ε 5 , where T 4 and T 5 are the brightness temperatures measured in AVHRR channels 4 and 5, ε 4 and ε 5 are the land surface emissivity values in channels 4 and 5, ε is the average emissivity for these two channels, ε is the emissivity difference between these two channels, and a n (n = 0, 1, . . ., 7) are coefficients related to the WVC and satellite zenith angles.The coefficient simulation for the nonlinear GSW algorithm is based on the radiative transfer theory in a cloud-free atmosphere (Fig. 3).The channel radiance received at the top of the atmosphere in the TIR channel of the sensor can be described using the radiative transfer theory: where L i is the top-of-atmosphere radiance in channel i, ε i is the emissivity in channel i, B i is the Planck function, T s is the LST, τ i is the total atmospheric transmittance in channel i, and R atm↑ i and R atm↓ i are the thermal-path atmospheric upwelling and downwelling radiances in channel i, respectively.
To estimate the coefficients, the VZA sensor was set to 0, 33.56, 44.42, 51.32, 56.25, and 60 • .A moderate spectral resolution atmospheric transmittance algorithm and a computer model (MODTRAN, version = 5.2) were run using 946 clear-sky atmospheric profile data to simulate the atmospheric parameters.By convolving these parameters with the spectral response functions of the two AVHRR TIR channels, we obtained the channel atmospheric parameters of each VZA, including the total atmospheric transmittance and thermal-path atmospheric upwelling and downwelling radiances.To ensure that the simulation experiments were representative, the bottom air temperature (T bat ) of the profiles was adopted as the LST.Specifically, LST varies from T bat − 5 to T bat +15 K in 5 K intervals for T bat ≥ 290 K and from T bat −5 to T bat + 5 K in 5 K intervals for T bat <290 K (Tang, 2018).In a subsequent step, we converted the LST, channel atmospheric parameters (τ i , R atm↑ i and R atm↓ i ), and channel emissivity mentioned earlier to brightness temperature using the radiative transfer theory (Eq.7).The brightness temperatures and LST were then used for coefficient estimation according to Eq. (6).To improve the fitting accuracy for each VZA mentioned above, the averaged emissivity values, WVC, and LST were divided into two, six, and five subranges, respectively.More details can be found in Tang et al. (2008) and Liu et al. (2018).The coefficients a 0 to a 7 in Eq. ( 6) were obtained using the least-squares method for each subrange.
Finally, the LST product was retrieved in two steps.In the known subranges of emissivity and WVC, the initial LST was estimated with coefficients derived for the entire range of LST, whereas the ultimate LST was estimated using coefficients for a suitable LST subrange determined by the initial LST (Tang, 2018).

LST validation
To assess the quality of the GT-LST product, two classical LST validation approaches were used in this study: groundbased validation (Göttsche et al., 2016;Ouyang et al., 2017;Wang and Liang, 2009) and satellite product intercomparison (Guillevic et al., 2014;Trigo et al., 2008).To further demonstrate the preponderance of this product, we also compared GT-LST with historical AVHRR LST products (i.e., GD-LST and RT-LST).
Ground-based validation was performed between in situ LST obtained from seven stations within the SURFRAD network and four stations within the BSRN network and GT-LST from 1994 to 2005.Four criteria were used to guarantee the validation results.(1) The two LST datasets were accurately matched under the condition of geolocation.(2) Time differences between in situ LST and GT-LST acquisition of less than 3 min were permitted, as measurements were provided by the SURFRAD network every 3 min.(3) We only used high-quality data of GT-LST (QC = 0) and in situ data with the quality flag corresponding to high-quality data.(4) To further minimize the effect of cloud contamination, a popular method, "3σ -Hampel identifier", was employed to further remove cloudy samples (Duan et al., 2019).
Here, x k are the differences between GT-LST and in situ LST, and x m is the median of the dataset {x k }.Matchups with differences of less than x m −3σ or greater than x m +3σ were regarded as cloudy contamination.
In this study, satellite product intercomparison was performed between GT-LST and the MODIS LST products (MYD11A1 and MYD21A1).Because these two MODIS LST products have provided daily LST since 2002, the comparisons were limited to data in 2004 (see Sect. 2.1).Five criteria were used to guarantee the validation results.
(1) MODIS LST matched GT-LST in space.(2) Because MODIS LST has a finer spatial resolution than GT-LST, MODIS LST was spatially aggregated to the GT-LST pixel scale with a simple arithmetic mean and a rigorous standard that all MODIS pixels within a GT-LST pixel must be valid.
(3) Differences in the acquisition time between MODIS LST and GT-LST of less than 15 min were permitted.(4) Differences in VZAs between MODIS and GT-LST were not more than 15 • .( 5) We only use high-quality LST values of MODIS (QC = 0, i.e., good-quality data with no need to examine more details) and GT-LST (QC = 0).
In contrast to the ground-based validation and satellite product intercomparison mentioned above, the comparisons for AVHRR LST products were performed using different strategies.Concretely, GT-LST during daytime was compared with that of GD-LST using a strategy that compares GT-LST and GD-LST with the same SURFRAD measurements concurrently with the satellite overpass to evaluate the difference in the absolute accuracy of these two products.GT-LST was compared with RT-LST using two strategies.
(1) Two days, 15 January and 15 July 1997, were selected to implement the comparison over continental Africa because they represent the median time of different seasons (winter and summer, respectively).( 2 We first compared GT-LST data with in situ LST data at the BND, DRA, FPK, GWN, PSU, SXF, and TBL sites from the SURFRAD network for 1994-2005 (Fig. 4).Each scatterplot shows the overall validation count, RMSE, bias, standard deviation, and coefficient of determination (R 2 ).First, the GWN site had the most data points matching the GT-LST, which meant that more data passed the validation criteria shown in Sect.3.2 at this site than at other sites.The stations with the next highest number of matching data points were BND, DRA, FPK, and TBL.The stations PSU and SXF had the least valid points because the time period for these two sites was smaller (1998-2005 and 2003-2005).The overall RMSE range was approximately 1.6-4.0K (Fig. 4), 1.8-4.8K for daytime observations, and 1.0-3.3K for nighttime observations (Table 4).The RMSEs of all the sites except PSU for nighttime observations were less than 3.0 K. Compared to daytime observations, nighttime observations of all sites except GWN and PSU had better accuracy with lower RMSEs.This is because in situ LST measurements during the daytime do not necessarily have good spatial representativeness for the satellite sensor footprint (Duan et al., 2019;Göttsche et al., 2016).In contrast, the LST was more spatially homogeneous at night.The BND site exhibited low accuracy with the largest RMSE and bias values; this result was also confirmed by previous studies (Liu et al., 2019;Ma et al., 2020;Reiners et al., 2021).A positive bias (GT-LST-in situ LST) was found for all SURFRAD sites except for daytime observations at the GWN stations.Furthermore, R 2 values between the retrieved LST and in situ LST ranged from 0.94 to 0.99, indicating a high correlation between these data.We further compared GT-LST data with in situ LST data at the BAR, NYA, PYA, and TAT sites from the BSRN network for 1995-2005.Figure 5 shows the scatterplots between GT-LST and in situ LST at four BSRN sites.The accuracy of the GT-LST product at BSRN sites is relatively worse than that at SURFRAD sites, with RMSE (bias) ranges from 3.1 K (−2.7 K) to 4.0 K (2.5 K).It should be noted that the relatively poor accuracy at BSRN sites is possible due to the large spatial heterogeneity of LST at these sites.Many studies have obtained similar results.For example, Duan et al. (2019) evaluated the accuracy of the Collection-6 MODIS LST data based on in situ LST observations and obtained large RMSE values (>2 K) during the daytime.Moreover, Martin et al. (2019) evaluated the accuracies of several LST products (AATSR, GOES, MODIS, and SEVIRI) based on multiple years of in situ LST observations and concluded that the average daytime and nighttime accuracies over the entire time span were within ±4 and ±2 K, respectively.Furthermore, Ma et al. (2020) and Liu et al. (2019) compared AVHRR LST with in situ LST during the daytime and re- vealed RMSE variations of 2.3-3.9 and 2.2-4.1 K, respectively.Therefore, the accuracy of GT-LST is encouraging.

Comparison with MODIS LST
An intercomparison between GT-LST and MYD11A1 LST was performed on a global scale for 2004 (see Sect. 3.2).Specifically, Fig. 6 shows the daytime and nighttime RMSD values of 3.4 and 3.1 K and that of positive bias of 1.4 and 2.4 K between GT-LST and MYD11A1 LST for 2004, respectively.This result is similar to that of Reiners et al. (2021), who compared a regional 1 km AVHRR LST product of the TIMELINE project with MODIS LST for Earth Syst.Sci.Data, 15, 2189Data, 15, -2212Data, 15, , 2023 https://doi.org/10.5194/essd-15-2189-20232003-2014 and reported RMSD and bias values of approximately 2.7 and 2.2 K, respectively.However, as can be seen in the red box of Fig. 6, there are some considerable scattered samples (111 samples) which perform large LST differences (more than 20 K). Figure A1 shows that all scattered samples are barren land cover type and arid climate type.About twothirds of all the samples (77 samples) happened in Haiya, Sudan, on 31 March 2004.The samples of the rest (34 samples) happened in Taif, Saudi Arabia, on 2 April 2004.For these samples, we double-checked variables that are essential in GT-LST retrieval.The result showed that values of all the variables are reasonable, except BTs of TIR bands.Abnormally high BTs at these nighttime samples were found on 31 March and 2 April 2004 (Fig. A2), which led to extremely high LSTs.The possible reasons for abnormally high BTs may be instrument failure on these two days.
Figure 7 shows the RMSD and bias between GT-LST and MYD11A1 LST for 2004 over various land cover types.The RMSD varied from 2.1 to 4.2 K, and the bias varied from approximately 0.6 to 3.3 K. Specifically, savannas and cropland/natural vegetation mosaics had an RMSD of larger than 4 K.The permanent snow and ice and the water bodies' land cover types had an RMSD of less than 2.5 K, with the water bodies exhibiting the lowest RMSD of 2.1 K.We further analyzed the land cover types of different groups.Forests except deciduous broadleaf forests, including evergreen needleleaf forests, evergreen broadleaf forests, deciduous needleleaf forests, and mixed forests, had an RMSD of less than 3 K.Shrublands, including open shrublands and closed shrublands, had a similar RMSD of 3.3 K. Savannas and croplands, including woody savannas and savannas, croplands, and cropland/natural vegetation mosaics, respectively, had the largest RMSDs.The possible reason is that the fraction of the vegetation cover of savannas and croplands varies greatly due to the influence of natural and human factors, which leads to the underestimation of emissivity compared with the fixed emissivity of MYD11A1, resulting in an overestimation of LST.Snow and ice and water bodies had the smallest RMSDs.
Spring (March-May), summer (June-August), fall (September-November), and winter (December-February) of 2004 were used to perform a seasonal intercomparison at a global scale.Figure 8 shows the GT-LST versus MYD11A1 LST during different seasons.The plot shows a strong correlation, with R 2 values greater than 0.97 and a positive bias between GT-LST and MYD11A1 LST in each season.The RMSDs of each season varied from approximately 3.0 to 3.5 K.Moreover, we observed a seasonal pattern, with a higher RMSD and bias in spring and summer and a lower RMSD and bias in fall and winter.As noted above, these validation results are encouraging.However, GT-LST was overestimated when compared with MYD11A1 LST.A reasonable explanation could be that the emissivity used for the retrieval of AVHRR LST was lower than that of MYD11A1 LST.Specifically, the emissivity of MYD11A1 LST was derived from the classification-based method, whereas that of GT-LST was derived from the NDVI threshold method, which considers annual changes in land cover and dynamically retrieves daily emissivity.As a result, the dynamic emissivity of GT-LST is typically lower than that of MYD11A1, which leads to overestimation of the LST (Hulley et al., 2016;Guillevic et al., 2014;Reiners et al., 2021;Ren et al., 2011).Figure A3 shows that the mean biases (GT-LST -MYD11A1) for LSTs calculated with emissivity differences less than −0.05, between −0.05 and −0.03, between −0.03 and −0.01, between −0.01 and 0.01, and more than 0.01 are 7.0, 4.3, 2.3, 0.8, and 0.7 K, respectively.To further demonstrate this point, we compared GT-LST with MYD21A1 LST. Figure 9 shows the daytime and nighttime RMSD values of 3.2 and 2.5 K and that of bias of 0.1 and 1.3 K between GT-LST and MYD21A1 LST for 4 months in 2004.Compared to the result of MYD11A1, the significantly smaller bias was obtained for MYD21A1.The possible reason is attributed to the fact that the MYD21A1 LST uses the same observations with MYD11A1 but uses a physics-based method to dynamically retrieve emissivity.

Comparison with existing AVHRR LST data
A recent study by Ma et al. (2020) generated a global historical daytime 0.05 • × 0.05 • LST product from NOAA AVHRR data for 1981-2000 (see Sect. 2.5).To further validate the GT-LST product, we compared these two LST products at the selected SURFRAD sites (see Sect. 3.2).The results of the daytime comparison, shown in Fig. 10, were as follows.First, comparing these two AVHRR LST products to the same in situ LSTs showed that both GT-LST and GD-LST obtained approximately similar accuracies, with an overall RMSE of 3.0 K. Except for the BND and FPK stations, GT-LST showed higher accuracy for all sites, especially the GWN and PSU stations, which had RMSE values of less than 2 K.All the sites showed positive biases for GT-LST other than GWN, whereas only BND and FPK had positive biases for GD-LST.
However, GD-LST data are limited in that they are only obtained during the daytime, which somewhat limits their practical applications.Meteorology-and climatology-related applications require at least two instantaneous LSTs (i.e., one daytime LST and one nighttime LST) to estimate temperature-based climate change indices such as the mean LST, extreme LST, and LST ranges for different temporal  scales.In contrast, the GT-LST product significantly improved the generation of the two instantaneous LSTs per day (Fig. 11).Furthermore, many studies have shown that two satellite observations that are separated by approximately 12 h can be used to estimate a relatively accurate daily and monthly mean LST (i.e., DMLST and MMLST) (Chen et al., 2017;Liu et al., 2023;Xing et al., 2021).Therefore, it was possible to derive an estimate of the global accurate DMLST and MMLST based on the average value of daytime and nighttime overpasses of the AVHRR sensors (Fig. 12).To estimate MMLST, first obtain the mean instantaneous clear-sky LST at daytime and nighttime and then use these mean values to estimate MMLST according to the simple linear regression method (see Appendix B).In order to validate the accuracy of MMLST results, we compared MMLST based on GT-LST with that of in situ LST observations from SURFRAD sites for 1994-2005.All in situ LST measurements are all-sky and complete in a certain month, which means that the in situ MMLST is the true MMLST.Figure 13 showed that MML-STs derived from GT-LST are related to the true MMLST, with an R 2 value of 0.94 and an RMSE value of 2.7 K. This result is similar to that of Chen et al. (2017)  reported RMSE values of approximately 2.7 K. However, a positive bias of 1.3 K between GT-LST MMLST and in situ MMLST should be noted.One possible reason is that the in situ MMLST of some sites does not represent the MMLST over the 0.05 • × 0.05 • pixel.Moreover, a comparison between GT-LST and RT-LST was performed during daytime and nighttime over continental Africa on 15 January and 15 July 1997 (Fig. 14).As can be seen, GT-LST and RT-LST had an RMSD of more than 2.1 K and a bias of more than 1.1 K.A likely explanation is that the emissivity of GT-LST is lower than that of RT-LST, which leads to overestimation of the LST.Compared to daytime LST, nighttime LST had an improvement with lower RMSD due to the comparatively spatially homogeneous LST during night.Furthermore, the RMSD of 15 July is distinctly higher than 15 January because the atmospheric condition is hot and wet on 15 July and cool and dry on 15 January.

Benefits, limitations, and future prospects
To the best of our knowledge, a global historical twice-daily LST dataset for the period 1981-2021 has never before been generated because of the limitations of large amounts of original Level-1b data handling (i.e., approximately 14 TB), huge amounts of process variable data generation (i.e., approximately 10 TB), and complicated data-processing flow design.Based on the experience of other research institutions and scholars, we generated the GT-LST product based on AVHRR observations, which showed advantages in spatial coverage and temporal resolution compared to existing studies.Moreover, to obtain a relatively accurate emissivity, we used an improved method that considers annual changes in land cover to estimate the emissivity.The GT-LST product, with two observations every day, can provide daily, monthly, and yearly mean LST datasets.This can reduce the number of gaps and uncertainty in instantaneous LST data.Furthermore, the mean LST is more valuable than the instantaneous LST for global climate change.Although many LST products can provide global twice-daily LST after 2000, we still extend the time span of GT-LST to 2021.In this way, users can obtain a relatively homogeneous twice-daily LST product over a long time series.Furthermore, the overlapping observations between the GT-LST product and other LST datasets during the extension period can be used to calibrate the bias when combining these datasets.In conclusion, the GT-LST product is suitable for detecting climate changes over the past 40 years, such as global extreme LST changes and trends of global mean LST.
However, it should be noted that observations of equatorial crossing time for NOAA afternoon satellites become progressively later after launch (Fig. 15).As the orbit drifts, the AVHRR sensors change the illumination conditions and local solar times of observations.Users are therefore urged to be cautious when using the AVHRR LST product, especially in the LST range.The timings of the occurrences of maximum and minimum LSTs are approximately 13:30 and 01:30 local solar time, respectively, which correspond to the initially observed time of NOAA afternoon satellites.However, the overpass time of these satellites gradually drifts backward because of drift in the satellite orbits over time.For example, the initial NOAA-14 overpass (equatorial crossing) time was 13:30 local solar time (descending) in 1994 but had shifted to 16:30 local solar time by the end of 2000.Although several studies have proposed correction methods for this problem, the accuracy of the AVHRR LST after orbital drift correction is lower than that without orbital drift correction (Liu et al., 2019).Although the GT-LST product extends the time span of LST data, it has a number of missing values (Fig. 11).For MMLST, it still has a few gaps (Fig. 12b).A variety of factors such as cloud cover, orbital gaps, and instrument failure   To assess the accuracy of this product, we employed three evaluation methods.Ground-based validation, which involved a comparison between the GT-LST product and multiyear SURFRAD and BSRN in situ measurements from 1994 to 2005, showed that the R 2 values of all selected data were greater than 0.92, and the overall RMSE range was approximately 1.6-4.0K: 1.8-4.8K for daytime observa-Earth Syst.Sci.Data, 15, 2189Data, 15, -2212Data, 15, , 2023 https://doi.org/10.5194/essd-15-2189-2023tions and 1.0-4.2K for nighttime observations.These results suggested competitive accuracy with other satellite-derived LST products.Intercomparison with the satellite products MYD11A1 and MYD21A1 LST showed that, (1) in 2004, the overall RMSD was 3.2 K and the bias was 1.8 K between GT-LST and MYD11A1 LST, (2) according to RMSD values between GT-LST and MYD11A1 LST, nighttime data were more accurate than daytime, as LST is more spatially homogeneous at night, (3) a higher RMSD and bias between GT-LST and MYD11A1 LST were observed in spring and sum-mer, whereas a lower RMSD and bias were observed in fall and winter, and (4) compared to the result of MYD11A1, the significantly smaller bias was obtained for MYD21A1.Comparisons with existing AVHRR LST products (i.e., GD-LST and RT-LST) showed that (1) GT-LST and GD-LST products at the selected measurements of SURFRAD sites exhibited similar accuracies, with an overall RMSE of 3.0 K, (2) GT-LST showed a substantial improvement from GD-LST that is only obtained during the daytime, because it generates two instantaneous LST values (daytime and nighttime) every day https://doi.org/10.5194/essd-15-2189-2023Earth Syst.Sci.Data, 15, 2189Data, 15, -2212Data, 15, , 2023 and then can estimate the global DMLST and MMLST, (3) daytime and nighttime observations of GT-LST can provide relatively accurate MMLST under all-sky conditions, with a RMSE of 2.7 K, and (4) compared with RT-LST over continental Africa in different seasons, the results showed that the RMSD range was 2.1-4.1 K and the bias range was 1.1-3.4K.
Appendix A: Supplementary tables and figures

Figure 3 .
Figure 3. Schematic of the workflow used to generate the GT-LST product.

Figure 12 .
Figure 12.Daily and monthly mean LST based on GT-LST: (a) daily mean LST for 27 July 1997; (b) monthly mean LST in August 2000.

Figure 13 .
Figure 13.Monthly mean LST based on GT-LST versus monthly mean LST based on in situ LST from 1994 to 2005.

Figure A1 .
Figure A1.Distribution of the 111 scattered samples.

Figure A4 .
Figure A4.Monthly mean LST based on GT-LST using a simple average method versus monthly mean LST based on in situ LST from 1994 to 2005.

Table 1 .
Characteristics of LST products generated with AVHRR data.

Table 3 .
Details of the validation sites used in this study.

Table 4 .
GT-LST versus in situ LST during the daytime and nighttime.

Table A1 .
Statistics for the relationship between the regressions of the eight combinations and actual monthly mean LST.