Evaluation and Modeling of Urban Heat Island Intensity in Basel, Switzerland

: An increasing number of people living in urban environments and the expected increase in long lasting heat waves makes the study of temperature distribution one of the major tasks in urban climatology, especially considering human health and heat stress. This excess heat is often underestimated because stations from national meteorological services are limited in numbers and are not representing the entire urban area with typically higher nocturnal temperatures, especially in densely built-up environments. For a majority of the population, heat stress is consequently monitored insufﬁciently. In this study, the factors inﬂuencing the nocturnal urban heat island have been evaluated in detail and have been tested using different spatial resolutions. A multiple linear regression model has been developed with predictors resulting from different data sources to model the urban air temperature distribution continuously. Results show that various datasets can be used for the prediction of the heat island distribution with comparable results, ideally run on a 200 m grid. Validation using random sampling indicated a RMSE clearly below the standard deviation of the measurements with an average around ~0.15 ◦ C. The regression coefﬁcients are varying within the nocturnal runs with best results around 22:00 CET (R 2 > 0.9).


Introduction
Though only a very limited proportion of the global land surface is covered by urban areas, the majority of the world's population is spending most of its time in such environments. Currently, over 75% of the people in Central Europe are city dwellers, with proportionally increasing tendencies [1,2].
Cities are known to have a special microclimate with higher nocturnal air temperatures (T a ) compared to their rural surroundings, which has been frequently discussed in literature and described with the well-established term "urban heat island" (UHI) [3][4][5][6][7][8][9][10][11][12]. This effect may be pleasant for urban dwellers during transition periods or in winter, because it reduces cold stress and space heating is less required due to the additional warmth [5]. During certain conditions, i.e., synoptically stable and long lasting anticyclonic weather situations in summer, this extra heat load can be threatening for elderly and vulnerable people [13]. The maximum temperatures are thereby not necessarily the main problem, but rather the lack of relief when the minimum temperatures are not declining below a certain level. Clarke and Bach (1971) focused their early studies on the influence of urban climate on human health. Based on a small data record, they found an increase in heat related excess mortality due to high nocturnal temperatures, either as a primary effect or as a contributing factor in heart disease, strokes or pulmonary disorders [14]. Scherer (2014) stated that extreme heat was responsible for 5% of all deaths from 2001 to 2010 in Berlin, affecting mainly people exceeding the age of 65 [15]. During extreme heat waves, the linear relationship between minimum temperature and mortality becomes an exponential δQ S = Net heat storage, δQ A = Net heat advection. The net radiation can be measured with net radiometers and/or computed with remote sensing data: with: SW ↓ = Solar irradiance, SW ↑ = Shortwave reflection, LW ↓ = Atmospheric counter radiation, LW ↑ = Terrestrial emission. The input terms SW ↓ and LW ↓ can be measured or modelled using topographic features combined with the geographical position and the actual solar altitude or boundary layer air temperature [28].
The longwave radiative loss, i.e., LW ↑ , is a function of the land surface temperature (LST) and emissivity: with σ as the Stefan-Boltzmann constant (5.67·10 −8 W m −2 K −4 ), ε as the emissivity and T as the LST in K. Since EO-derived LST is strongly dependent on adequate atmospheric correction, this term is a possible source of errors. Even small variations have significant effects on the energy balance [29,30]. A crucial part in Equation (2) is SW ↑ . It is calculated by the multiplication of SW ↓ with the broadband surface albedo (α), computed using the VIS and NIR channels from satellite data or measured by a pyranometer (Section 2.3.1, Equation (8)) [31,32]: The term Q F is currently under investigation and so far is not adequately described by remote sensing data [33]. It can be assumed, that in moderate climate cities like Basel, the summer Q F input is relatively small, because cooling systems producing a lot of anthropogenic heat in summer are uncommon in Switzerland. The largest amount of Q F in Basel is most likely generated by heating, which is limited to winter situations. Christen and Vogt (2004) found an annual average amount of Q F in the range of 5 to 20 W m −2 in suburban and dense urban environments, respectively [26].
Urban fabrics comprise of special thermal properties, which makes the urban surface an ideal heat saver. The ratio δQ s /Q*, indicating the amount of available energy used for storage, may reach ratios up to 0.58, 0.48 and 0.31 for measurement sites in Mexico City, Vancouver and Los Angeles, respectively [34]. During the BUBBLE measurement campaign in the city of Basel, values of 0.3 to 0.4 resulted at different urban sites [35]. This means, that only 60 to 70% of the net radiation is available for partitioning into the turbulent fluxes of sensible and latent heat. In case of extremely hot urban surfaces, this value might be even lower and if a minimum evapotranspiration is assumed on rural sites, which is likely to be the case during extreme heat waves and/or dry conditions, the sensible heat flux in an urban environment is even lower compared to its rural counterpart.
The turbulent fluxes Q H and Q E are related through the Bowen ratio (β = Q H /Q E ). It is defined for specific surfaces and derived from flux tower measurements.  reported values for β of 0.6 to 1.0 in suburbs and 1.8 in the city core [24]. During the REKLIP project, Parlow (1996) found values of 1.5 in high-density urban to 0.8 in low-density urban areas and a maximum β of 1.8 in industrial areas in and around Basel [36]. On forests and natural surfaces, the values ranged between 0.4 above deciduous forests to 0.7 on arable land [24,36]. During one month of intensive measurements in Basel, Christen (2004) found daily variations of β, most pronounced at the rural site (0.5 to 0.2) and less pronounced at the urban site (~2) [26]. Taking these values (β U = 1.5, β R = 0.6) and comparing them to the differences in available energy due to lower Q S on rural sites (Q S_U = 40%, Q S_R = 10%), Q H is in the same range on grassland compared to dense urban areas.
Net advection (δQ A ) is neglected here, because heat wave events are normally coupled to synoptically calm conditions with very weak winds, especially during the night (ignoring micro-scale advection [24]).
In this study, the theoretical knowledge about the physical processes happening at the transition layer between the urban surface and the overlying air is used to investigate the development of the frequently evolving UHI in Basel, Switzerland. Thereby, various data sources are tested using a multiple linear regression (MLR) approach within different resolutions and at different time steps during the development of the nocturnal UHI. The investigation period was during a minor heat wave at the end of August 2016. The investigation area, the meteorological measurement stations and the complete setup of the method are described in Section 2. Results-including statistics, validation and exemplary maps-are presented in Section 3, followed by a discussion and concluding remarks in Sections 4 and 5.

Site Description
Basel is located at the southern end of the Upper Rhine Valley between the bordering hills of the Vosges (France), the Black Forest (Germany), and the Jura Mountains (Switzerland) at about 250 m a.s.l. (Figure 1).
The climate is considered oceanic, with mild winters and warm and sunny summers (Köppen Cfb climate) [37]. The annual mean temperature is 10.5 • C with an annual precipitation rate of 842 mm for the reference period 1981 to 2010 [38]. The warmest month is July with average maximum temperatures above 25 • C, though record-high temperatures during heat waves may also occur in June or August. Cold stress situations are less pronounced in the city, occasionally such phenomena may occur due to the outbreak of cold air masses from continental Russia.
Long-term measurements and several measurement campaigns in the region describe a distinct UHI, which establishes typically during the night with maximum temperature differences of up to 8 • C between rural and urban reference stations [12,39,40].
The investigation area contains the metropolitan area of Basel with approximately 730,000 inhabitants in total, including the bordering suburbs located in the Canton of Basel-Land (BL), the German cities Weil-am-Rhein and Lörrach, the French city Saint-Louis, and the city of Basel [41,42]. From north to south and west to east the area covers 21 by 20 km 2 , respectively, spanning from the upper left 47 • 38 35" N 7 • 29 17" E to the lower right 47 • 27 25" N 7 • 45 32" E.  Table 1.

Air Temperature Measurements
Below a blending height z*, the urban turbulence field must be considered three-dimensional and small-scale fluctuations may occur [43]. In this roughness sublayer (RSL), the measured temperature signal is directly influenced by the underlying surface, with a variable footprint due to different wind directions. Ideally, comparative measurements should be carried out at the same relative position in the RSL, depending on the anticipated model resolution. Only stations located at the transition level between the UCL and the RSL at about roof level are considered in this study. At this height, the sensors are not located directly in the wakes of the individual surface elements, but a local mixture of influences is still captured [6]. Note that the lower boundary of the RSL used here is defined after  as the top of the UCL, i.e., roughly the mean building height (z H ). Christen and Vogt (2004) indicate that the measurement heights (z) of 0.8 < z/z H < 1.4 are representative for the local urban climate [26]. Stations located in large open spaces are used even if they are only located at 2 m measurement height, because the UCL may be entirely absent in such environments [4]. The main criterion for exclusion for those stations is cold air accumulation (Table 1).
All data are adjusted to the lowest station height adding a lapse rate of 0.6 • C per 100 m. To test the approach and to minimize the influence of temperature fluctuations, a mean nocturnal course of 6 consecutive nights during a heat wave starting on 23 August 2016 is used. During that period, the daytime maximum temperatures were mostly above 30 • C and several tropical nights (i.e., T min > 20 • C) were measured at the urban reference station (i.e., BKLI, Section 4.1). Table 1. List of all stations used for this study with the specific code as an abbreviation. The geographic location of the stations can be found in Figure 1. The local climate zones (LCZ) definition is based on previous work and expert knowledge after the guidance of Stewart and Oke (2012) [44,45].

Urban Surface Properties
Different types of surface properties are calculated, collected and tested separately to find reasonable input parameters for a MLR model. Three groups of surface descriptors emerged, which are described in the following sections. Atmospheric effects on the TIR signal are removed using a correction parameter accessed through the web-based atmospheric correction tool provided by NASA, which is based on NCEP reanalysis data and MODTRAN code. The conversion of top-of-atmosphere radiance (L TOA ) to surface-leaving radiance (L λ ) is done solving Equation (5): with L u as the upwelling or atmospheric path radiance and L d as the downwelling or sky radiance, which is reflected on the surface. Attenuating effects of the atmosphere are incorporated by the atmospheric transmission (τ) [46]. The emissivity map (ε) is based on a modified NDVI approach by Sobrino et al. (2008) [47]. To convert L λ to LST, the Landsat-specific estimate of the Planck curve including the calibration constants k 1 and k 2 , derived from Landsat metadata, is used: The complete procedure is described and discussed in detail in Wicki and Parlow (2017) and reveals effectively corrected LST based on in situ measurements and models of atmospheric conditions [48]. The LST represents the surface UHI (SUHI), which has different mechanisms compared to the canopy UHI [12,49]. LST is used as a predictor, because the principles of energy fluxes and the expected UHI effect are based on different surface properties and their interaction with the incoming solar radiation. Dark impervious surfaces normally have a large heat capacity and high thermal conductivity rates, which enables the surface to absorb the incoming radiant energy more effectively [49]. The LST is the best estimate for this interaction available from remote sensing data. The energy stored in the urban fabric during the day and released during the night is one of the main causes of the nocturnal UHI, especially during heat wave conditions and large solar irradiance on the surface. Even though LST is not a perfect representation for the energy stored in the urban fabric, because thermal properties are only partly represented by LST, it is certainly connected to the soil energy flux, especially if used in combination with other surface properties [50].
The NDVI represents the different reflection properties of vegetated surfaces in the visible (RED) and near infrared (NIR) range. It can be written as: A water mask based on a land cover classification is used to exclude water bodies from the analysis.
Surface broadband albedo (α short ) is calculated using calibrated reflectance of five channels from the VIS to the SWIR range (ρ 2-7 ) as a weighted average after Liang (2001) and adapted to Landsat 8 data [31]: The calibration applied on the Landsat data for α estimation includes the sun elevation, but no correction for distracting effects due to aerosols, atmospheric gases or topography was applied.
For this analysis, Landsat 8 data from a single acquisition was used. Previous studies have shown that during comparable weather situations, the LST distribution follows a certain pattern and does not essentially change [48]. The NDVI and α are even more stable, beside variation due to different solar angle, in a near-term period and do not change fundamentally within 6 days.

Urban Morphology
The sky view factor (ψ svf ), i.e., the proportion of the observed sky divided by the total upper hemisphere, is calculated using the Urban Multi-scale Environmental Predictor (UMEP) and a digital surface model (DSM) with 3 m resolution [51]. During the resampling (averaging) to coarser resolutions, roof pixels are omitted to represent for dense urban areas with narrow streets, and therefore low ψ svf . A digital object model (DOM), the product of subtracting a digital elevation model from the DSM, was used for the computation of z H . The datasets are representative for the years 2008 to 2016. The averaging is done omitting the ground pixels, but including trees.
The plan area fraction (λ P ) is derived using the raster DOM or high-resolution remote sensing data. This fraction is both a morphological variable (density) and part of the land cover fraction (surface cover), therefore, it is computed in two different ways and added to both groups (i.e., Morphology and Land Cover), separately. Since the land cover fraction is derived from remote sensing data, the resulting coefficients differ. The abbreviation BSF (building to surface fraction) is used for λ P in the Land Cover group.
The frontal area index (λ F ) or roughness density [52] combines the mean height, breadth and density of the roughness elements in per unit ground area and describes how much of the area reached by a fictional air parcel is covered by walls (or vegetation) for a specific wind direction [53]. In this case, the isotropic λ F -which is not depending on the wind direction-is calculated using UMEP [51].
The two parameters λ P and λ F are used to determine the roughness length (z 0 ) and zero displacement height (z d ). These roughness parameters are calculated using a method described by Kanda (2013) [54].
The tree fraction (λ T ), i.e., the number of trees per hectare, is estimated using high-resolution (10 −2 m) aerial LIDAR data from 2010 [55]. The processed vector dataset consists of point data including position, height and crown diameter of every single tree of a major part within the investigation area (i.e., the Canton Basel-Stadt). Areas that are not covered by this dataset are derived from the DOM with a discrimination between vegetation and buildings based on a high-resolution land cover classification, described in Section 2.3.3.

Land Cover
A land cover classification is used to compute the BSF, the impervious surface fraction (ISF) and pervious surface fraction (PSF) by summing up the respective surfaces. The land cover classification is part of the Copernicus contributing datasets allocated with the Horizon2020 project URBANFLUXES [56]. The classification technique is based on a neural network approach. It is applied on SPOT data from 26 May 2012 and supplemented by WorldView imagery and seasonally variable Landsat data (2013-2016). The map with an original ground resolution of 2.5 m can be found in Figure 1 (left).

State of the Art
Urban measurement networks are usually not dense enough to cover intra-urban differences effectively and denser networks are very cost intensive and impossible to maintain. Mobile measurements with devices mounted on cars [4,[57][58][59], bicycles, different public transport vehicles [60,61] and/or by hand/foot are alternatives to cover small-scale differences despite some limitations (spatial, temporal and synoptic representativeness [57], duration and costs).
The link between the surface and the above T a based on remote sensing or GIS data for comprehensive T a estimations has been intensively investigated since the launch of the first satellites and Rao's (1972) analysis using ITOS [62]. Karl (1988) used urban population as a surrogate for urbanization and tried to remove the "urban bias" in T a measurements using 1200 urban and rural stations around the USA. Gallo (1993 and investigated the linear correlations between T a and LST, but also between T a and NDVI, which provided better results [63,64]. The same objective was pursued by Epperson (1995), who added nighttime brightness data from the Defense Meteorological Satellite Program to the NDVI analysis, which led to an MLR analysis [65].
Better correlation is usually found between minimum T a (T min ) and nighttime LST compared to maximum T a (T max ) and daytime LST [8,59,[66][67][68][69][70]. The different behavior of surface and air temperature in urban environments, i.e., the shift in maximum heat island of the surface (day) to the maximum heat island of the air temperature (night), is crucial and mentioned explicitly in different investigations [8,57,67,71].
Regarding the application of such methods in highly heterogeneous urban areas, the coarse resolution and increased viewing angle of Meteosat SEVIRI, NOAA AVHRR or Terra/Aqua MODIS conceal important details, especially in medium-sized and small cities. Ma (2016) demonstrated the possibility of downscaling MODIS LST to 250 m resolution using a TsHARP algorithm and found better results compared to the original 1-km resolution data [72]. The possibility of using helicopter or aerial imagery is very cost intensive, difficult to conduct due to flight restrictions and often unrepeatable and thus unique. ASTER and Landsat data are most suitable for applications in urban areas regarding the spatial resolution [59,69,73,74], but ASTER has no constant revisit rate and Landsat is normally only acquiring daytime images with a revisit rate of maximum twice within 16 days (due to overlapping paths, which is the case for Basel). Nevertheless, multi-temporal LST analysis has shown that the general intra-urban LST patterns in an urban area are not varying substantially during comparable weather conditions [48].
However, LST or the built environment should not be used solely to explain the UHI effect [74]. Cresswell (1999) found an improvement of 50% less errors and higher correlation coefficients from predicted to actual temperature (0.80 to 0.84) if an additional proxy to LST versus T a analysis was added [75]. Cristobal (2008) added remote sensing variables such as α, LST and NDVI to a multiple regression model based on altitude, latitude, continentality, solar radiation and cloudiness [76]. The use of multiple predictors was previously mentioned in the outlook section of several publications, e.g., Bechtel (2017) and Vogt (1997), and was applied before in other studies [76][77][78][79].

Method Description
The GIS dataset in this study is based on factors proposed by Stewart and Oke (2012) for the classification of local climate zones (LCZ) and accounts for the causes of the nocturnal UHI as discussed in many previous studies [5,6,26,27,35,44,80]. Cities are a system of different surface units-or morphological forms-that are repeated and arranged differently throughout the urban area [27]. These units define and control the urban climate as an integration of several influences, which can be disaggregated using multiple regression approaches. Urban meteorological measurement stations represent different types of urban composition that control the nocturnal temperature. The MLR approach decomposes, or unmixes, the specific portion of different land use types to the temperature signal. Thereby, the controlling units-hereafter referred to as the predictors-are treated as separate GIS layers. The number of dependent variables, i.e., measured T a , must exceed the number of predictors (Equation (10)). Ideally, the measurement stations should represent different types of urban controlling units and a certain amount of variance is required.
The model finally enables the production of two-dimensional UHI maps based on the resulting coefficients applied on GIS and remote sensing data. Several runs with different combinations of surface properties and different spatial resolutions, which was done by resampling based on the average of every grid cell, were tested and evaluated in order to find the ideal configuration.
MLR is preferred instead of other methods (e.g., machine learning), because the individual behavior of the predictors is of major interest and not only calculations resulting from a black box.

Suitability Tests and Predictor Selection
A major problem of MLR is multi-collinearity, i.e., the explanation of one predictor by another, which can increase the estimates variance. The result is an adversely better statistic. If the data is to a large amount explained by one of the already existing variables, this excess data does not help to better explain the independent variable [81].
To build the final MLR model, two suitability tests have been applied: A multi-collinearity analysis for each group including a variance inflation factor (VIF) test and a multi-temporal simple linear correlation (SLC) analysis between the potential predictors and the measured T a .
The SLC is used to reduce the number of predictors in advance. Potential predictors providing no explanation for the variance in the multi-temporal analysis are excluded before the collinearity analysis. In the final decision process, the behavior of the predictors in the SLC analysis was a criterion for exclusion if multi-collinearity was detected. The VIF is used to estimate how much of the variance is inflated due to the presence of correlation within the predictors of the model. This indicates how much of the estimated variance of the specific regression coefficient is increased, compared to the case that R 2 would equal zero. For every predictor i in the model, the VIF formula can be written as: For the interpretation of the VIF values, several rules of thumb and the respective thresholds are described in literature. Most commonly the rule of >10 is applied to detect harmful multi-collinearity within the dataset. Since this must be considered dataset specific, models with a VIF > 10 could still include useful information and the thresholds need to be applied with caution. Nevertheless, we decided that a VIF of 10 can be used as a threshold, since this value was frequently applied [81,82].
A further method to reduce useless predictors would be a stepwise MLR (SMLR). This was also applied for testing, but not for the final model, because the model setup is intended to be stable over time and using SMLR changes the predictor composition with every run. However, the results are used to verify the decisions about inclusion or exclusion of predicting parameters.

Multiple Linear Regression
For every regression model, the LST is added as an additional predictor to account for the heat released from storage [12,24,27,71]. The variables that passed the suitability test are used for a MLR analysis by solving a set of linear equations (Equation (10)): The dependent variables (T 1−m ) and the input predictors (X 11−nm ) are used to determine the coefficients (β 1−n ) for the best fit. The set can be solved, if more equations (i.e., T a measurements, m) than unknown variables (i.e., input predictors, n) exist (m > n).
The MLR returns a vector of coefficient estimates ( β 1−n ) for each predictor ( X 1−n ) to solve a multiple linear regression equation (Equation (11)) and to find the responses (T pre ) for every grid point in a two-dimensional raster environment. The intercept value ( β 0 ) is the expected value if all predictors are set to zero and close to the average of the measurements [48].
The model is tested iteratively with varying conditions considering time, resolution and predictor combination to find the best results for the three groups defined as Land Cover, Landsat Data and Morphology.

Validation
The model output is validated using in situ measurements, dividing the T a measurements in two clusters (i.e., 2/3 input and 1/3 validation). To ensure a certain variation within the model, the stations are classified into four groups (similar to Figure 1, but combining Park with Suburban) and one station per group is used for the validation cluster. The group Urban supplied two stations for the validation process, since it is the most populated class. The model is run five times with different input predictors, chosen by randomly generated numbers (MATLAB function randperm), and the resulting predictions are compared to the independently measured T a .

Suitability Test and Predictor Selection
The suitability test reveals important information about the significance of individual predictors and allows assembling of the different model groups. Figure 2 describes the results for the SLC as a surface plot including all correlation coefficients for six resolutions (subplots) at different steps in time (rows) for every predictor (columns). Above the columns and on the right side of the rows, mean values indicate the average behavior for every predictor and time step, respectively (displacement due to space limitation). The total average of all predictors for all time steps for every model resolution is printed as a bold number at the upper right of every subplot. For the first group (Morphology), z d , λ P and z H reveal the best results in the runs with a resolution below 400 m. With increasing cell size, z 0 is gaining importance. The lowest correlations for almost every run are found for ψ svf . Especially low correlations are found at resolutions below 400 m. Weak correlations between ψ svf and urban T a are similarly described in Nichol (2005). They found the same cooling rates on parking lots and between high-rise buildings at 21:40 local time in Hong Kong, which was explained by the early time of comparison and the intense heating of the surface [69]. The second group (Landsat Data) showed a high abundance of nocturnal T a to LST and NDVI. Albedo revealed very low correlation in the 100 m run and performed better at 200 and 300 m resolution. For the Land Cover group, the pattern changes with different resolutions. In most cases, the PSF and BSF are explaining the variance better than the ISF. Only for a resolution of 400 m, the ISF outperforms the BSF. The PSF revealed very high correlation coefficients and, especially for a resolution of 200 and 300 m, explains already a lot of the variance within the nocturnal T a distribution. As a total, the multi-temporal analysis showed variations of regression results during the night. The best results are usually found between 20 UTC+1 and midnight. The best overall results are found at 200 and 300 m resolutions. The resulting coefficients are in good accordance to comparable SLC analyses [74].
The evaluation of multi-collinearity showed high collinearity for z H , z d , λ F and λ P . The effects on the microclimate described by these parameters are therefore already explained to a large amount by one of these parameters. High VIF values up to 72 resulted from the multi-collinearity analysis. Reducing these parameters to only one, where the SLC regression factors are used as a criterion, helped to lower the VIF value to a maximum of 3.9 (at 200 m cell size). The resulting model includes the parameter z d , z 0 and λ T for the group Morphology ( Figure A1).
No problems with collinearity are found for the Landsat Data group. Therefore, all three predictors (LST, NDVI and α) are included ( Figure A2).
The third group Land Cover reached values close to the exclusion criterion, which was still a considered valuable. Increasing the cell size also increases the VIF, which needs to be taken into account during the interpretation of the results. Therefore, better regression results at higher resolutions might be the effect of collinearity for this group in special ( Figure A3).

Multiple Linear Regression
Three groups, six different resolutions (100 to 600 m grid cells), and 12 different points in time (19:00 to 06:00 CET) are tested. Figure 3 shows the complete statistics of this analysis including correlation, significance and error variance-i.e., total deviation from prediction to in situ measured temperatures-of the model runs. The two significance levels (0.05 and 0.01, as a red and black line, respectively) indicate the performance considering the specific criteria. A run falling below the 0.05 threshold is considered significant and valuable, a run falling below the 0.01 criteria is considered excellent. Runs above the 0.05 significance line should be treated with caution. According to the statistics, best results are found for the Land Cover and Morphology (slightly lower) model with a R 2 of 0.93 (significant on the 0.01-level) at 22:00 CET with 200 m resolution. The 200 m resolution showed the highest correlation factors also for the Landsat Data group (0.84) with decreasing tendencies at increasing resolutions. The significance is also reduced with coarser resolutions (except 100 m), with the same tendencies for the error variance. Highest correlations are found around 22:00 with a minimum at 04:00 CET. The same drop for significance is found at around 03:00 to 06:00 CET. The error variance typically decreases during the night since the temperature decreases, but the lower model success at around 04:00 CET is captured as well. This might be due to the loss of 'fuel', i.e., heat stored in the urban fabric, and the resulting decrease of the urban heat island and/or katabatic wind systems generated after substantial cooling on the surrounding hillsides. In Section 4.1, meteorological measurements during the investigation period are described and they demonstrate the higher UHI intensity at the same time the correlation reaches the highest numbers, or vice versa. As the UHI intensity decreases, the variance within the response is reduced, which depreciates the model statistics. The analysis of the wind systems indicates slightly stronger winds after midnight, but not with a significant trend. However, the wind direction-an indicator for cooling slope wind systems-does not change dramatically throughout the night.
Average statistics for the entire model run (i.e., 19:00 to 06:00 CET) are shown in Table 2. The R 2 -values and mean error variance are temporally averaged and the two significance levels (0.05 and 0.01) indicate how much percent of the overall runs have fulfilled the specific criteria for the different model resolutions. Again, the 200 m runs show highest correlation with almost 100% significance for all runs. More than three out of four runs show very high significance. The scales of 200 and 300 m seem to capture the temperature influence properly, because the error variance throughout all runs is the lowest for this resolution with errors ranging from 0.11 to 0.20 • C. Lowest correlations, accompanied by low significance and highest error variances, are found using 500 m grid cells. Figure 4 shows the evolution of T a throughout the night with a resolution of 200 m. This predicted T a results from the regression formula (Equation (11)) applied with the resulting coefficients and the specific predictor maps. Significant height differences are taken into account by a lapse rate of 0.6 • C per 100 m-the same cooling rate used to normalize the measured T a . Therefore, a temperature bias can be addressed to elevated areas in the south and northeast of the city. However, the intra-urban differences are not resulting from height differences, because the urban area is relatively flat (Figure 1, Section 2.1). In this area, land cover is the main driver for regulating the T a distribution and thus the UHI evolution. The old town core in the center of the image is clearly defined and remains always one of the warmest regions until the distribution is getting flatter around 03:00 CET. This is a typical feature of the nocturnal UHI, which often shows largest urban to rural differences in the first half of the night. Urban parks can clearly be detected as cool spots close to the city center. Figure 5 includes the difference of every raster cell to the modelled temperature at BBIN (i.e., an official weather station of MeteoSwiss). It indicates that the official temperature measurement for the city underestimates the heat load for most of the urban dwellers by 2-3 degrees. The temperature of this station would only be valid for the urban park sites, which have an influence of less than the park extent [83]. Large residential areas like Bachletten or St. Alban reveal a lower temperature difference to BBIN, and, therefore, less heat stress for the population. Highly densified areas like the old town (i.e., Altstadt) are clearly on top of the heat scale. The different railway stations located in the Rosental, St. Johann and Gundeldingen neighborhoods also show high positive deviations. Additionally, the industrial areas in St. Johann (at station UFB1) and at the eastern city limits are about 2 degrees warmer than BBIN in this model run.

Validation
The RMSEs of the validation runs are clearly below the standard deviation of the measurements at the validation sites for all input predictor groups with the highest errors for the Morphology group and the best results using the Land Cover group, on average. The latter performed with RMSE values ranging from 0.07 to 0.27 • C in the exemplary run. The validation shows that some stations are specifically important for the model, especially if they are located in a unique environment. For a successful model run, a certain variation of the input predictor is needed. RMSEs are varying from one run to the other, because the different predictor setup creates slightly different coefficients for the regression formula. Nevertheless, the tendencies seem to remain constant and the results are still reasonable, even though a third of the input predictors are removed. Figure 6 shows deviation plots for five random sample sets on the 200 m grid, which is the resolution that performed best with the lowest RMSE. The largest errors are typically found at the beginning of the night, when the UHI is not yet fully developed. The error often decreases with decreasing temperature and increasing UHI intensity. Figure 6. Validation of the model output using five different random samples on a 200 m grid for three groups. For every step in time (x-axis), the RMSE resulting from the difference between predicted T a to measured T a is indicated for each group. The resulting average is shown as numbers referring to the specific groups according to the color (blue = Land Cover, grey = Landsat Data and red = Morphology) and the stations used for validation are listed at the right side of the plots. The black dotted line depicts the standard deviation of the measured T a used for validation.

Synoptic Conditions and Meteorological Measurements
The synoptic conditions during the investigation period were very calm with a persistent high-pressure system centering around 1025 hPa around northeast Germany. Due to the orientation of the high-pressure system, winds were expected to be very weak and coming from the eastern direction. The end of the period resulted from an eastward movement of the high-pressure system and the afterwards approaching cold front on 29 August 2016. During this small heat wave, a strong upper-atmospheric ridge developed with a stable omega-like pattern providing Central Europe with warm and dry air. The meteorological conditions measured at the urban reference station above roof-level delivered the same results as expected from the weather maps. The nocturnal wind was usually blowing from the eastern and east-southeastern sector with a mean velocity of 1.7 m s −1 , almost never exceeding 4 m s −1 . A wind system providing the city with fresh air regularly established in the afternoon, which is also one reason for the almost absent heat island during the day and the homogeneous temperature distribution. The development of the observed heat island in Basel started shortly after T max at around 18:00 CET (Figure 7) and generally peaked at around 22:00 CET. A maximum heat island intensity of 5.3 • C was found in this period, with a total average nocturnal heat island intensity of about 2 • C. During 15% of the time, the temperature difference between the urban and the rural station was larger than 3 • C between 19:00 and 06:50 CET. Most of the nights, the UHI intensity decreased steadily until it almost turned into a cooling island after sunrise in the morning hours. With increasing turbulence in the afternoon and the above-described wind system establishing at about noon, no significant difference between the urban and the rural measurement stations are apparent. On average, a slight cooling island with −0.2 • C was measured between 07:00 and 18:50 CET.

Model Output
The model output depicts a typical UHI development during the night with some colder areas in the suburbs and hot spots in the center and at industrial sites, including railway stations. T a distribution seems to be rather a "heat archipelago of a city", as mentioned by Kuttler (2012) [84]. Other studies using coarser datasets, such as MODIS or AVHRR, found one single UHI [85]. Within this urban archipelago, urban parks are well distinguished as cool ponds. This shows the strength of the model, because measurement sites in urban parks are very limited and the cooling effect of urban parks (i.e., park cooling island) is well described in previous urban climate studies [43,83,86]. Due to the small size of the parks in Basel, i.e., below 10 ha, these features only become apparent with increasing resolution. Since also the R 2 /p-value decrease/increase with coarser resolution, a minimum cell size of 200 m for medium-sized European cities, i.e., cities with a population <500,000 inhabitants, is proclaimed. Depending on the predictor used and the measurement height of T a , a narrow resolution of 100 m-representing only the specific environment (i.e., the specific street canyon)-might be a good solution. For comprehensive maps with T a measurements at about roof level, 100 m cell size is not ideal. Brazel and Stoll (1992) found a resolution of around 250 m reasonable, which was identified for North American cities with larger city structures [67]. Therefore, a 200 m resolution in this model seems to be realistic. Eliasson and Upmanis (2000) found the greatest difference between urban parks and the surrounding urban areas between 2 and 6 h after sunset, which is exactly the time where the MLR model performed the best, considering the statistical analysis [86]. Park breezes, as mentioned by Eliasson & Upmanis (1999) and , and cold air drainage flows in general, are difficult to include in the model since they are allochthonous and have, besides barrier and channeling effects, no relation to the surface properties [43,86]. This might be a future improvement in the model.

Statistical Measures
It is difficult to contextualize the resulting statistical measures since most studies focused on different methods, periods, resolutions and/or datasets. If larger datasets (in number and spatial) are used, a higher error variance is expected, which cannot be explained with a better or worse model performance. On the other hand, if seasonal averages are used, less fluctuation and higher regression results are expected. This is also responsible for the usually lower RSME in our investigation (~0.1 tõ 0.3 • C) compared to other studies, even though the regression coefficient is mediocre (between 0.71 and 0.82 with 200-m grid size on average), compared to other 'successful' studies. Nevertheless, as important as high regression coefficients are the significance is expressed through low p-values. Thereby, the null hypothesis states that there is no relationship between the measurements and the predictors. A low p-value indicates that this null hypothesis is wrong and the model is significant. In this analysis, 0.05 was defined as the upper benchmark. If this benchmark is fulfilled, the specific predictors are useful for predicting T a . If this threshold is not met, the model output should not be considered, even if the error variance is low or the R 2 is very high. The analysis of multi-collinearity showed high VIF for the class Land Cover at higher resolution. The improving statistics at a resolution of 600 m might be explained by mapping unit problems and not be related to a better description of the model in this case.
Other studies also included different types of MLR approaches. Stoll and Brazel (1992) found correlation coefficients >0.9 with a stepwise multiple regression using helicopter and in situ TIR data [67]. Epperson (1995) has found R 2 up to 0.92 using NDVI and nighttime brightness with an error of 1.84 • C, which can be explained through the large variance using 1000 stations [65]. Blennow (1998) achieved comparable results (R 2 = 0.87, RMSE = 1.08 • C) in a rural surrounding and Cresswell (1999) found R 2 between 0.80 and 0.84, but with a substantial error of 3.3 to 4.2 • C, which can also be explained with the large variance (modelled T a for entire Africa) [75,87]. Ninyerola (2000), and later Christobal (2008) with a comparable approach, reached correlation coefficients up to 0.97 and 0.86, respectively [76,79]. They applied monthly and annual minimum, mean and maximum T a and used different climatological factors such as altitude, latitude, continentality and solar radiation. Cristobal (2008) added remote sensing variables and called his model a "remote sensing and GIS hybrid", which is comparable to our idea [76]. Many studies are concentrated on monthly or daily mean, minimum or maximum values and often found best correlation predicting daily minimum (i.e., nocturnal) T a (e.g., [64,65,68,76,[88][89][90][91] etc.). Modelling diurnal (or nocturnal) courses are seldom done. Bechtel (2014 and 2017) has applied MLR based on multi-temporal Meteosat SEVIRI data to predict short term T a with very high correlation (R 2 between 0.97 to 0.98) [77,92]. Nichol (2005 and used ASTER nighttime data in Hong Kong and compared the remote sensing data to T a -measured during vehicle traverses and at measurement stations with R 2 between 0.80 and 0.94 [59,69]. Ma (2015) noted that the results of single predictor studies could be improved using multiple predictors and found better correlations using higher resolution data [72]. This is confirmed by our study, but with a lower limit in spatial resolution of 200 m.
Even though a comparison of our regression approach with other studies is difficult, the statistical parameters are very promising and allow rejecting a model output in case of low significance and/or low correlation. The good performance is particularly confirmed by the random sampling validation, which indicates that for the tested grid cells T a is predicted with high accuracy.

Potentials
The analysis of different model setups to evaluate the best method for extrapolating measured temperature into a two-dimensional environment has shown that the type of input predictor is not the most critical decision, but that it is essential to have multiple predictors. The selection of the responses are very critical, because a certain variation in the surrounding land cover of the measurement stations is crucial for the success of the model. Linear correlations between LST and T a must be handled carefully, because the amount of thermal infrared radiation does not include the thermal properties of the specific surface (e.g., thermal inertia) and the interaction with the overlying air. For example, surfaces with a high thermal admittance readily accept energy during the day, transmit it to the substrate and release it at night with similar capability [24]. If a surface has a low thermal admittance, it reacts very fast to energy input and is relatively warm during the day, but it loses this energy similarly very quickly and is therefore unable to maintain positive energy fluxes to the surface during the night [6]. The ability of storing this energy might be explained by other parameters, such as the building density (λ P ).
An important application of the resulting dataset is the assessment of heat stress due to high nocturnal T min . Small increase in urban T min during strong heat waves increases the mortality and the morbidity substantially [14,16,19]. During the night, most people are located inside their building and therefore exposed to the respective indoor climate, which does not necessarily need to be the same as the outdoor climate [15]. Nevertheless, the outdoor temperature is the best measure of nocturnal heat stress that is widely available since not every household can be monitored. The resulting maps offer a high-resolution view on the heat distribution throughout a large region and they can be used for risk maps or intervention guidelines. For example, during heat waves, nocturnal heat stress in every grid cell can be summarized after every night, and areas that transcend a certain threshold can be considered for special care or even evacuation of vulnerable people.
The findings are also suitable for surface flux estimation, where the difference between T a and LST is an important component required to compute the sensible heat flux [93].
The model is evaluated using average temperatures of six consecutive days to exclude singularity effects and compare different setups for testing. The model is also tested as a heat stress predictor with individual hourly means during the evaluation period. During the night with the strongest heat island intensity (25 to 26 August 2016), the correlation coefficients reach values of 0.97 with very high significance. This proves that the model is also applicable on single day events with hourly temperature measurements as the response. As a measure for difference in heat stress, maps with the number of tropical nights during the specific heat wave are developed. Despite the higher elevated areas, in almost every grid cell at least one tropical night happened. In some areas, the number of tropical nights reached three during this period. The official measurement station (i.e., BBIN) is located in an area where only one tropical night was measured and this might lead to underestimation of heat stress in other areas (Figure 8).

Conclusions
The evaluation of multiple predictors influencing the development of the urban heat island (UHI) is shown and described in detail. Based on an empirical approach using in situ measurements, different groups of input predictors are tested within different grid resolutions to find the best parameters, resolution and time for a multiple linear regression model (MLR) to predict the UHI distribution and intensity.
Average correlation coefficients are ranging from 0.71 to 0.82 for the nocturnal model run and for the model resolution with a 200 m grid size, which emerged as the optimum spatial model resolution. The highest correlation coefficients are found at 22:00 CET, indicating that the nocturnal UHI phenomenon is most pronounced during that time under typical heat wave conditions, with 0.93 for the best model run during the entire measurement period (average diurnal course) and 0.97 for single nights. All model runs show high significance if applied on the 200 m resolution during the entire night.
Other studies based on multiple regression often predict monthly mean or climatological data with less fluctuations, larger measurement differences and less outliers, which provides higher correlations, but also larger error bias. In this study, the aim of the evaluation is a model applicable on near real-time air temperatures with a resolution representing different urban structures. The resulting two-dimensional maps are an innovation considering temporal and spatial resolution and they are useful for medical staff or, if used for statistical analysis of heat waves, as guidelines for urban planners and politics. Based on the above presented results, the implementation of a spatially coherent UHI surveillance can be further investigated.
An important finding of the study is that different datasets can be used with comparable results, which means the model is not tied to specific data and the user can decide which model setup provides the best results for his location. In our case, the use of land cover fraction including the distribution of sealed and built-up surfaces plus the LST offered best results.
The recently intensified discussion about the classification of different thermal zones throughout a city, i.e., the local climate zone scheme [44], improves the scientific knowledge about the UHI distribution and supported the discussion beyond urban to rural differences. The findings within this study are adding important insights to this discussion and they offer a quantification of theoretical considerations. Since an increasing number of people are living in the urban area compared to its rural counterpart, it is crucial to account for inner-urban differentiations.
Future improvement will be the testing of the method with less air temperature input data and the application in other cities.
Based on this study, tests and evaluations during the next summer are planned to further improve the model and make it a valuable tool for urban planners, the local administration and medical care.
Author Contributions: A.W. conceived and designed the experiments, conducted the analysis and is responsible for writing; E.P. helped developing the idea, supervised the analyses and the writing process and he is responsible for parts of the text; C.F. is responsible for parts of the measurement network, helped developing ideas and is responsible for the review of the text.

Funding:
The project leading to this application has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement No. 637519 (URBANFLUXES).
Acknowledgments: Many thanks go to Dieter Scherer for the initial idea and Fred Meier for a lot of discussion at the beginning of the project, and Stavros Stagakis providing ideas to improve the study. The whole process always includes many persons responsible for the database, maintenance and fruitful talks during lunch breaks. Therefore, I would like to thank all members of the MCR research group at the University Basel.

Conflicts of Interest:
The authors declare no conflict of interest.