Estimation of surface air temperature over central and eastern Eurasia from MODIS land surface temperature

Surface air temperature (Ta) is a critical variable in the energy and water cycle of the Earth–atmosphere system and is a key input element for hydrology and land surface models. This is a preliminary study to evaluate estimation of Ta from satellite remotely sensed land surface temperature (Ts) by using MODIS-Terra data over two Eurasia regions: northern China and fUSSR. High correlations are observed in both regions between station-measured Ta and MODIS Ts. The relationships between the maximum Ta and daytime Ts depend significantly on land cover types, but the minimum Ta and nighttime Ts have little dependence on the land cover types. The largest difference between maximum Ta and daytime Ts appears over the barren and sparsely vegetated area during the summer time. Using a linear regression method, the daily maximum Ta were estimated from 1 km resolution MODIS Ts under clear-sky conditions with coefficients calculated based on land cover types, while the minimum Ta were estimated without considering land cover types. The uncertainty, mean absolute error (MAE), of the estimated maximum Ta varies from 2.4 °C over closed shrublands to 3.2 °C over grasslands, and the MAE of the estimated minimum Ta is about 3.0 °C.


Introduction
Surface air temperature (T a ) is a critical variable in the energy and water cycle of the Earth-atmosphere system and is a key input element for hydrology and land surface models. It is a very important variable in agricultural application and climate change studies. For example, T a is used to monitor regional evapotranspiration in the biophysical processes and model crop growth for food security assessment [1], local climate and diseases [2,3] and climate change [4], etc.
Traditionally, T a is obtained from meteorological stations at ∼2 m height above the ground. In general, meteorological stations are sparsely distributed, especially in rural areas. Information about spatial variation of T a is often limited. Satellite data have synoptic spatial coverage compared to the discrete spatial distribution of weather stations, thus attracting researchers to use satellite data to fill in the gaps inherent in weather station data. A different variable, the land surface temperature (T s ) or surface skin temperature, is measured in situ or retrieved from satellite measurements. Observations have shown that T a and T s are significantly different in terms of magnitude and diurnal cycle due to the different heat capacities between air and land, and weather conditions [5]. On the other hand, T a and T s are correlated due to heat exchange between the land and the air. Recent studies have shown that estimated minimum T a (T a min ) values from nighttime MODIS land surface temperature (T s night ) values are statistically meaningful over Africa, Mississippi State (USA) and in Alpine areas, but the estimated maximum T a (T a max ) values from daytime MODIS T s day values have large errors in some regions [6][7][8]. Studying the MODIS T s and T a relationship over Africa, Vancutsem et al [6] found that the differences between daytime T s and maximum T a (T s day − T a max ) vary significantly by region and season. The paper indicated that no good relation was observed between (T s day − T a max ) and normalized difference vegetation index (NDVI) or solar zenith angle (SZA) that could be used operationally over Africa.
Over northern Eurasia, especially within the central and eastern Asia region, dramatic climatic, environmental and socioeconomic changes have occurred during the past century. These include significant changes in land cover and land use, floods and droughts, dust storms, desiccation of the Aral Sea, disintegration of the Soviet Union and formation of independent countries in Central Asia. International research programs, such as Northern Eurasian Earth Science Partnership Initiative (NEESPI) and Monsoon Asia Integrated Regional Study (MAIRS), have been created to study the climate, land use and environment changes, and human activities. In supporting the research of the NEESPI [9,10] and MAIRS programs, the NASA Goddard Earth Sciences Data and Information Services Center (GES DISC) has collected relevant satellite remotely sensed and modeled atmosphere, land and ocean products. More recently, 1 km resolution MODIS land products, such as eight-day T s , monthly vegetation index and yearly land cover types over the Asian Monsoon region have been added into their collections. It is extremely useful to have higher resolution, such as 1 km, T a for local climate and environmental change studies in the central and eastern Asia region to account for significant change in land cover type in recent years.
The purpose of this study is to develop a method for estimating T a from satellite retrieved T s over two Eurasian regions: northern China and the former Union of Soviet Socialist Republics (fUSSR). The accuracy or statistical errors of the estimated T a have been investigated by validating the results against the data from ground measurements.

Station data
In this study, T a come from ground meteorological stations over two northern

MODIS data
The study used MODIS daily T s , monthly vegetation index (VI), and yearly land cover type (LCT) from USGS Land The MODIS T s data have a large amount of missing data due to cloud, heavy aerosols and gaps between satellite swath, etc. Some previous similar studies have used MODIS eight-day T s products [6,8]. Considering that the value of the eight-day T s was calculated by averaging all valid points under clear-sky only and the number of data points participating in the average could vary from 1 to 8, it may introduce uncertainty due to sampling if comparing eight-day T s (clear-sky) and eight days averaged T a (all-sky). Therefore, we have decided to use daily T s product for comparing clear-sky days only instead of using time averaged T s . There are no year 2000 LCT data available, therefore the 2001 LCT data were used here, assuming no significant LCT change from 2000 to 2001.
MODIS daily T s at 1 km resolution of Collection-5 from Terra (MOD11A1.005) is a Level-3 global product. The MODIS T s is derived from two thermal infrared bands: 31 (10.78-11.28 µm) and 32 (11.77-12.27 µm) using a splitwindow algorithm that corrects for atmospheric effects and emissivity using a look-up table based on global land surface emissivity in the thermal infrared bands [11]. Some validation studies have shown that the accuracy of MODIS T s is less than 1 K at 1 km resolution under clear-sky conditions with known emissivity, higher errors may occur at large view angles and in semiarid regions [12,13]. Another validation study indicated that nighttime MODIS T s are underestimated by 2-3 K at some locations [14]. Higher errors in arid regions may exist due to effects of heavy dust aerosols and greater uncertainties in classification-based surface emissivities. The error in T s contaminated by clouds and heavy aerosols can be very large (4-11 K or even larger). The T s that are severely contaminated by clouds and heavy aerosols were removed from MODIS Collection-5 products using empirical constraints on temporal variations in clear-sky T s . However, it is very difficult for the MODIS cloud-mask to discriminate all pixels affected by clouds and heavy aerosols from clear-sky pixels, particularly cirrus, near cloud edges, and/or with sub-pixel clouds [12]. The accuracy of T s is affected by the wind speed and at highly reflective surfaces in the visible/near-infrared range, such as snow/ice cover, as well [11].
The MODIS-Terra (MOD13A3.005) monthly vegetation index product at the same spatial resolution as T s is used. The product contains both the normalized difference vegetation index (NDVI) and the enhanced vegetation index (EVI), retrieved from blue, red, and near-infrared reflectance, centered at 469 nm, 645 nm, and 858 nm channels, respectively [15]. The MODIS NDVI and EVI products are computed from atmospheric corrected bi-directional surface reflectance values that have been masked for water, clouds, heavy aerosols, and cloud shadows. The monthly NDVI are used in this study.
MODIS yearly LCT at 500 m resolution (MCD12Q1.005) is produced by combining the inputs from both MODIS-Terra and -Aqua. The product includes five different land cover classification schemes, derived through a supervised decisiontree classification method. This study uses the primary land cover scheme that identifies 17 land cover classes defined by the International Geosphere Biosphere Programme (IGBP), including 11 natural vegetation classes, three developed and mosaicked land classes, and three non-vegetated land classes [16].
The standard MODIS 1km resolution T s , VI and 500 m resolution LCT data files are 10 • × 10 • tile-based and gridded in the sinusoidal projection. In this study, for easy use, the MODIS data were stitched for the study region and reprojected with the nearest point method onto cylindrical equidistant projection by using the MODIS Reprojection Tool (MRT) [17] provided by LP DAAC. As shown in figure 1, stations over northern China and fUSSR are located in different land surface environments. Most stations over northern China are near barren and sparsely vegetated areas, grasslands, and croplands, while most stations over fUSSR are close to crop and natural mosaic, mixed forest, and croplands.

Time series comparison
For each station, T s values are extracted from the 1 km daily files at the nearest grid point for the daytime and nighttime, respectively. The data have a significant amount of missing points due to cloud, heavy aerosols, gaps between satellite swath, and failed retrieval under certain conditions. On average, the daytime T s has only about 44% valid points over northern China, and about 38% valid points over fUSSR; the nighttime T s has about 56% valid points over north China, and about 40% over fUSSR.
Time series of T a max and T a min (black lines) are plotted together with MODIS T s day and T s night (blue dots) for each station. For most stations, T s day is higher than T a max during the summer time, but is opposite during the winter time. Similarly, T s night is higher than T a min during the summer time and is reversed during the winter time. The amount of difference (T s day −T a max ) varies from one station to another. It is noticed that (T s night − T a min ) is smaller than (T s day − T a max ) over most stations. Figure 2 shows plots of sample time series at three stations over different LCTs; two stations are in northern China and one is in fUSSR. It is clear that (T s day − T a max ) are very different at the three stations. Station A (ID = 51644 at 83.07 • E, 41.71 • N) is located over the barren or sparsely vegetated land, with maximum monthly NDVI of 0.14. T s day at station A is significantly higher than T a max during the summer and the largest difference exceeds 15 • C. The time series of station B (ID = 54027 at 119.40 • E, 43.98 • N) looks different from that over station A. T s day and T a max have no significant differences in the summer at this station. Station B is over the urban area surrounded by grasslands and croplands, with maximum monthly NDVI of 0.69. The third time series is at station C (ID = 22200028264 at 65.40 • E, 58.10 • N), which is located in the evergreen needleleaf forest, with maximum monthly NDVI of 0.73. T s day at this station is similar to T a max during the summer, while it is colder than T a max during the winter. There are no T s data in the first two months at station C as the MODIS daily data are available since 5 March 2000. In all three cases, the difference between T s night and T a min seems smaller than the difference between T s day and T a max and is not sensitive to the season.

Statistical relations between T a and MODIS T s
Statistical relations between T a and MODIS T s , such as correlation coefficient (r ), mean difference (mean), standard deviation (STD), mean absolute error (MAE), as well as linear regression coefficients (slope, offset) have been calculated for each station. It was found that (a) the correlation coefficients are high, ranging from 0.83 to 0.99 at 0.001 significance level; (b) the linear regression coefficients vary significantly from one station to another between T s day and T a max , but not between T s night and T a min .
The MODIS LCT data for each station location were used to figure out any connection between the slope change and LCT. Classes of nine grid points centered at the station were extracted from the IGBP classification. The LCT of a station is defined as the class at the nearest grid point to the station location and the other eight points are used to check its surrounding environment.
Analyses have indicated that the slope of the linear regression between T s day and T a max is closely associated with the LCT where a station is located. The value of slope is relatively low if a station is near barren land, while the value is higher if a station is near forest or croplands. Table 1 lists the statistical values for stations grouped by LCT. Over northern China, among 75 stations, about a half of the stations are located in urban and built-up areas (urban). The other stations are mainly located in croplands, grasslands, barren land, and closed shrublands. There are only two or fewer stations over other LCTs, and, therefore, their statistical values ware not computed. The differences of the statistics among groups are significant for the daytime, but not for the nighttime. The averaged (T s day − T a max ) varies from 6.1 • C over the barren land to 0.6 • C over the croplands. The MAEs vary from 6.7 • C over barren land to 3.6 • C over closed shrublands.
Over urban stations, the slope varies widely from 0.61 to 0.96. The urban stations over northern China are surrounded by different LCTs. It is noticed that the slope is relatively smaller at urban stations near dryer and less vegetated areas. To find if there is a good relation between the slope and vegetation condition, NDVI values at the nearest point are extracted from MODIS 1 km monthly data. It is found that, over urban stations, the slopes and the maximum monthly mean NDVI have positive correlations. Similar relationships are found between slope value and yearly accumulated NDVI. Thus, the urban stations over northern China are divided further as urban-1 (NDVI < 0.4) and urban-2 (NDVI 0.4).
Over fUSSR, 58 stations are mainly distributed over six LCTs: evergreen needleleaf forest, mixed forest, woody savannas, croplands, urban and cropland/natural vegetation mosaic. There are only two or fewer stations over other types and no statistics were computed. Differently from northern China, in general, T s day is lower than T a max over fUSSR. The  figure 2. This may be due to strong energy exchange between the land surface and the near surface air while T s and T a depend highly on the seasonal change of the solar irradiance. The T a is also influenced by weather conditions, such as winds, humidity, and cloud, etc. What is the correlation if the annual cycle is removed? To examine it, we have selected two stations over different LCTs. Station #1 (ID = 50949 at 124.87 • E, 45.08 • N, croplands), has a higher slope value (0.95) between T s day and T a max . Station #2 (ID = 51644 at 83.07 • E, 41.72 • N, barren), has a lower slope value (0.78). The lower slope value indicates that the difference between T s day and T a max depends more on the season. The daily T a climatology was calculated by using 30 years of data from 1980 to 2009. The correlation coefficient at station #1 is 0.78 with 154 degrees of freedom (DF) for T s day and T a max anomalies, and 0.61 with 205 DF for T s night and T a min anomalies, respectively. For station #2, the correlation coefficient is 0.31 with 190 DF for T s day and T a max anomalies, and 0.24 with 237 DF for T s night and T a min anomalies. At both stations, the correlations between T s anomaly and T a anomaly pass the 0.001 significance level, although the values are smaller compared to the original data (0.95-0.98), especially for stations at barren land. When performing the correlation significance test, T s and T a anomalies are assumed in normal distribution and made null hypothesis. This indicates that the high correlations between T s and T a are composed by the low frequency annual cycle and high frequency signals. To better estimate T a from T s , the input data sample group should be considered to have both an annual cycle and high frequency signals. Thus, the statistical relationships between T s and T a should be evaluated by using data from an entire year, not their anomaly or from a season only. Figure 3 shows scatter plots of T s day and T a max for three LCT groups. The slope values are (a) smallest for the barren group, (b) medium for the croplands group, and (c) largest for the mixed forest group. The correlation between T s and T a is high for all groups, varying from 0.93 for the barren group to 0.97 for the croplands and the mixed forest group. Over the barren land, the T s values are higher than the T a values when T a is more than about 10 • C. The T s values are much higher than the T a values in the summer over the barren land, while T s is slightly lower than T a in the winter. The mixed forest group is very different from the barren group. The T s is lower than T a for most data points in the mixed forest group, and the T s is higher than T a only when the temperature reaches about 40 • C.

Estimation of surface air temperature
As documented in section 4, the linear correlations between T a and MODIS T s are significant. Therefore, it is reasonable to estimate T a with a linear regression equation: where T a is the estimated T a , a and b are coefficients. For estimating T a max , a and b were calculated from all stations in each LCT group as shown in table 1. For estimating T a min , a and b were calculated from all stations in each geographic region as they are independent of LCT within the geographic region.
Comparing regions of northern China and fUSSR, besides those of class 'urban', 'croplands' is the only class type that appears in both regions. The regression coefficients of this class differ slightly for the two regions. To have a clear picture of the difference in T a if using linear relationships trained from two different regions, a set of T a is calculated from T s . In general, the differences between the T a values calculated by using coefficients from the two regions are less than 10%. T a is lower than T s during the summer time by about 4.0 • C when T s is 40.0 • C, but is higher than T s during the winter time by 7.0 • C when T s is −30.0 • C.
T a was calculated daily for each station where the LCT is listed in table 1. If the LCT at a station is not listed in table 1, T a at this station was not computed and was set as undefined. Validation of estimated T a was carried out for each LCT group in the two regions, as listed in table 2. The mean (T a − T a ) is zero. Over northern China, the STD of (T a max − T a max ) varies from 3.1 • C over closed shrublands to 4.1 • C over grassland; the corresponding MAE varies from 2.4 to 3.2 • C. The STD of (T a max −T a max ) over fUSSR varies from 3.4 • C over evergreen needleleaf forest to 3.7 • C over crop and natural mosaic lands, and the corresponding MAE is from 2.5 to 2.9 • C. The STDs of (T a min − T a min ) are 3. The yearly averaged uncertainties of the estimated maximum T a and minimum T a are similar. The uncertainty is slightly smaller in the winter than in the summer for the estimated maximum T a , while it has no seasonal difference of uncertainty for the estimated minimum T a .

Conclusions and discussion
The statistical relationships between station-measured T a and MODIS remotely sensed T s were examined over two different geographic regions in central and east Eurasia: dry and semiarid northern China, and boreal fUSSR. 75 stations over northern China and 58 stations over fUSSR were used in this study. The results are summarized as follows.
(a) MODIS-Terra measured T s day and T s night have high correlations with T a max and T a min , respectively. The correlation between T s night and T a min is slightly higher than that between T s day and T a max . (b) Linear regression between T a max and T s day shows that it depends significantly on LCT. The slope varies from 0.71 over barren land to 0.87 over forest. A lower slope value indicates that the difference between T s day and T a max is more season dependent. For example, at 40.0 • C T s day , The question is why does the difference between MODIS T s day and T a max vary with LCT? T s and T a are different in principle [5]. The satellite remotely sensed T s is a measure of the surface radiation. T s was calculated from surface emissivities, which are sensitive to LCT, especially during daytime. The heat capacity or specific heat of air is about 1.0 kJ kg −1 K −1 , and varies only slightly with air temperature. However, the specific heat varies significantly from one LCT to another. For example, the specific heat is 0.8 kJ kg −1 K −1 for dry soil, 1.48 kJ kg −1 K −1 for wet soil, 0.84 kJ kg −1 K −1 for stone, and 4.19 kJ kg −1 K −1 for fresh water. The variation of the difference between T s day and T a max may due to the different heat capacities or specific heats of LCTs. For the extreme cases, the barren land has much lower heat capacity than the air, while the forest may have higher heat capacity than the air. The heat capacity changes with temperature, which may result in different relations in the winter and summer over the same LCT.
With the method in this study, it is possible to estimate daily 1 km resolution T a by using MODIS T s data under clearsky conditions. The data are useful for application studies at local scales, such as temperature related diseases, crop growth, etc. The uncertainty of the estimated T a is about 3.8 • C (STD) or 3.0 • C (MAE), averaged from all stations, which may be considered large for some applications. More research on reducing uncertainty is needed. The following two paragraphs address some sources of estimated T a uncertainty. The calculation of T s takes into account the surface-atmosphere interactions and energy fluxes between the atmosphere and the ground [18,19]. The accuracy of calculating T s depends on a number of conditions, such as solar zenith angle, accuracy of atmospheric correction, surfaces of high reflectance, such as snow/ice and desert, surface wind, and humidity, etc. In addition, the observation times at a location have a window of about 2-3 h that may contribute to the uncertainty of (T s − T a ) due to diurnal variations of the temperature. The Terra's nadir passes the Equator at about 10:30 am and 10:30 pm local solar time. For a location that is away from the Equator, the observation time varies. For example, at the location (59.5 • N, 70 • E) the MODIS-Terra observation times are between 10:10 am-1:10 pm and 8:20 pm-11:20 pm. In general, the temperature changes rather quickly during these time periods, especially in the morning. The T s could increase by up to about 13 K over a desert area from 10:00 am to 1:00 pm [20].
Another important source of the uncertainty may come from the cloud and aerosol contaminated data points. The error in T s contaminated by clouds and heavy aerosols can be very large (4-11 K or even larger). The T s values that are severely contaminated by clouds and heavy aerosols were removed from Collection-5 Level-3 MODIS T s products using empirical constraints on temporal variations in clear-sky T s . However, it is very difficult for the MODIS cloud-mask to discriminate all pixels affected by clouds and heavy aerosols from clearsky pixels, particularly near cloud edges, cirrus and/or with sub-pixel clouds. Higher errors in some arid regions are apparent due to the effects of heavy dust aerosols and greater uncertainties in classification-based surface emissivities [12].
One limitation of estimating T a from MODIS T s is that the MODIS T s product is retrieved by using thermal infrared data that are available in clear-sky conditions only. The large gaps can be partly filled by using data from multiple sensors, such as MODIS-Aqua with clear understanding of the Terra-Aqua time difference contributing to T a uncertainty. The microwave techniques work for the all-sky condition. However, it is not possible to retrieve global land surface temperature at an accuracy of 1-2 K by using microwave techniques alone due to the much higher variations in the surface emissivities of land surfaces in the microwave range and the dependence of microwave brightness temperatures on surface roughness and structures [11]. An alternative method may be to use MODIS T s as an input to an assimilation model for areas under clearsky conditions and the model forecast T s values for areas under cloudy atmospheric conditions.