Quality and sensitivity of high-resolution numerical simulation of urban heat islands

High-resolution numerical simulations of the urban heat island (UHI) effect with the widely-used Weather Research and Forecasting (WRF) model are assessed. Both the sensitivity of the results to the simulation setup, and the quality of the simulated fields as representations of the real world, are investigated. Results indicate that the WRF-simulated surface temperatures are more sensitive to the planetary boundary layer (PBL) scheme choice during nighttime, and more sensitive to the surface thermal roughness length parameterization during daytime. The urban surface temperatures simulated by WRF are also highly sensitive to the urban canopy model (UCM) used. The implementation in this study of an improved UCM (the Princeton UCM or PUCM) that allows the simulation of heterogeneous urban facets and of key hydrological processes, together with the so-called CZ09 parameterization for the thermal roughness length, significantly reduce the bias (<1.5 °C) in the surface temperature fields as compared to satellite observations during daytime. The boundary layer potential temperature profiles are captured by WRF reasonable well at both urban and rural sites; the biases in these profiles relative to aircraft-mounted senor measurements are on the order of 1.5 °C. Changing UCMs and PBL schemes does not alter the performance of WRF in reproducing bulk boundary layer temperature profiles significantly. The results illustrate the wide range of urban environmental conditions that various configurations of WRF can produce, and the significant biases that should be assessed before inferences are made based on WRF outputs. The optimal set-up of WRF-PUCM developed in this paper also paves the way for a confident exploration of the city-scale impacts of UHI mitigation strategies in the companion paper (Li et al ).


Introduction
The 'urban heat island' (UHI) is probably the most wellknown environmental impact of urbanization (Oke 1982), and the most well-documented example of anthropogenic climate modification through land use change (Arnfield 2003). It arises from a variety of factors: the extensive use of manmade materials that have substantially different thermal and hydrological properties compared to natural materials, the reduction of evapotranspiration due to limited water bodies and green surfaces, and the anthropogenic heat sources (Grimmond 2007;Oke 1982), to name a few. Recent years have witnessed a growing interest from the scientific community, the public, and policy makers in understanding and mitigating UHIs, particularly due to the added pressure of increasing global urbanization and climate change. Currently, over 50% of the world population is living in cities, and this percentage continues to rise rapidly. By 2030, the urban population is expected to exceed 60% of the global population; 95% of the net future increase in the global population will occur in cities (Grimm et al 2008). The combined effects of UHIs, global climate change, and soaring urban populations pose significant challenges to energy and water sustainability and to human health in urban environments. In particular, a recent study has shown that heat waves, which are projected to become more frequent and last longer under a warming climate (Lau and Nath 2012;Meehl and Tebaldi 2004), interact nonlinearly with UHIs to produce extremely high heat stress for urban residents (Li and Bou-Zeid 2013).
The scientific research on UHIs has traditionally focused on their energetic basis and characteristics (see e.g., Christen and Vogt 2004;Cleugh and Oke 1986;Dabberdt and Davis 1978;Grimmond et al 1993;Oke 1995 1999;Loridan and Grimmond 2012;Oke 1982;Oke et al 1999;Peterson and Stoffel 1980;Ryu and Baik 2012), as well as on their impacts on regional hydrometeorology and climate (e.g. Bornstein and Lin 2000;Dixon and Mote 2003;Li et al 2013a;Miao et al 2011;Shephard 2005;Zhang et al 2009). More recently, high-resolution numerical simulations of UHIs under real atmospheric conditions are increasingly being used to investigate the dynamics of these events and their potential mitigation (see e.g., Grossman-Clarke et al 2010; Li and Bou-Zeid 2013), partly due to the urgency of providing a decision framework for policy makers (Chow et al 2012). However, the capability of high-resolution numerical models to faithfully capture all the spatial and temporal dynamics of UHIs is far from being a settled matter. Two related open questions are: what aspects of a numerical model are most critical to the simulated UHI and can model-improvements focusing on these aspects improve the quality and reliability of UHI simulations?
This study aims to assess the quality and sensitivity of UHI simulations, and to improve them by coupling a newgeneration urban canopy model (the Princeton UCM or PUCM) with a regional climate model, the Weather Research and Forecasting model (WRF). The sensitivity of the model's performance to the choice of physical parameterizations in WRF is investigated based on high-resolution simulations over the Baltimore-Washington metropolitan area. The new UCM is a simple, yet more realistic, representation of urban environments and surface energy budgets compared to the default UCM in WRF. More importantly, it allows the investigation of realistic mitigation strategies such as the use of green roofs and white roofs , which are the focus of the companion paper of this study (Li et al 2014).
The scenario that we selected to study is the hottest day during a heat wave period that has been investigated previously by the authors (Li and Bou-Zeid 2013). A heat wave is recognized as a sustained period (longer than 72 h) during which the temperatures (usually the daily maximum temperatures or the nighttime minimum temperatures) exceed a certain threshold percentile (95th or 97.5th for example) of the climatic temperature distributions (Robinson 2001;Meehl and Tebaldi 2004;Lau and Nath 2012). Due to the UHI effect, cities are already hotter than rural areas (Grimmond 2007). Heat waves worsen the conditions in urban areas not only by boosting the temperature of both urban and rural areas, but also because urban temperatures are increased more intensely than rural temperatures during heat waves (Li and Bou-Zeid 2013). Under such conditions, mitigation strategies are critically needed to reduce the health risks in urban areas.
The paper is organized as follows: section 2 presents the WRF set up and experimental data sets; section 3 presents the findings and results. Section 4 concludes the study and discusses its implications.

WRF model description and setup
In this study, the WRF version 3.3 is used. WRF is a nonhydrostatic, mesoscale numerical weather prediction model that solves the conservation equations of mass, momentum and energy on terrain-following coordinates (Skamarock and Klemp 2008). It has multiple parameterization schemes for each of its five physical packages: cumulus clouds, microphysics, radiation, planetary boundary layer (PBL), and surface (Skamarock and Klemp 2008). WRF has been widely used to study urban meteorology (see Chen et al 2011 for a review). This is largely attributable to the nesting capability of WRF that allows high-resolution simulations, and to the UCMs coupled into the WRF that allow a better representation of complex urban environments. For example, the default single-layer UCM that is coupled with the Noah land surface model in WRF can represent three types of urban facets: roof, wall and ground. In addition, the WRF-UCM framework can distinguish between three urban categories: low density residential, high density residential, and industrial/commercial.
The UCM has been shown to be critical for reproducing the correct air/surface temperature patterns in the urban areas (Lee et al 2011;Li and Bou-Zeid 2013;Zhang et al 2011). Although the default single-layer UCM can include three types of urban facets, it represents these urban facets as homogeneous surfaces and thus cannot simulate fractional white (high-reflectivity) roof coverage for example. These homogeneous surfaces are also impervious and are assumed not to contribute to evapotranspiration except when rainfall occurs (they have zero surface water storage capacity). As such, an improved UCM was implemented into WRF (hereafter PUCM due to its development at Princeton University as an offline urban model). There are many advantages of PUCM over the default UCM such as its ability to simulate heterogeneous facets, with subfacets consisting of different materials. For example, roof surface can be a combination of conventional roofs and green/white roofs; ground surface can be a combination of asphalt, concrete, and urban grass. For realistic representation of these green roofs and urban grass in PUCM, detailed models of hydrologic processes in the urban canyon were included. Realistic water storage and flux representation was shown to be crucial to the performance of urban models (Grimmond et al , 2011. In addition, the use of urban material properties calibrated for the northeastern United States makes PUCM a particularly powerful tool for diagnosing UHI effects and mitigation in that region. The full details and validation of the new UCM, and the material properties calibration, can be found in Wang et al (2011aWang et al ( , 2011bWang et al ( , 2013; note however that some aspects of the offline PUCM, such as analytical solution of the heat conduction equation and the independent treatment of incanyon grass that are detailed and tested in Wang et al (2011aWang et al ( , 2011bWang et al ( , 2013, were not implemented in WRF since their impact on the analyses we perform here is deemed minor. The offline PUCM allows a detailed analysis of the building-scale impact of mitigation strategies such as the use of green roof and white roof (Wang et al 2013, see also Sun et al 2013 where a more sophisticated but computationally expensive green roof component of PUCM was validated in great detail; this component is not used here). When coupled into WRF, PUCM can be also used to evaluate the city-scale impacts of these mitigation strategies, i.e. the feedback of these strategies on atmospheric properties over the whole city, which then feedback to influence building-scale impacts.
Different combinations of PBL schemes and thermal roughness length (z 0T ) parameterizations are tested (details in section 2.3) due to the significant sensitivities of WRFsimulated surface and air temperatures to these schemes/ parameterizations, as will be illustrated later in section 3.1. Some physical parameterization schemes that were selected and not changed include: (1) the rapid radiative transfer model scheme for longwave radiation; (2) the Dudhia scheme for shortwave radiation; (3) the 2D Smagorinsky scheme for horizontal mixing; (4) the Noah land surface model for nonurban surfaces.
The WRF simulations are performed over the Baltimore-Washington metropolitan area using three nested domains with horizontal grid resolutions of 9 km, 3 km and 1 km. Cumulus parameterization was not used for any of the domains since even the largest grid size is less than 10 km and there is no rainfall during the simulation period. One-way nesting is used since all the analyses are conducted using the highest-resolution results from the innermost domain (d03, see figure 1). The domain configuration and the land-use map are shown in figure 1. The largest domain (d01) covers most of the northeastern US; d02 includes Delaware, most of Maryland and parts of West Virginia and Virginia; d03 covers the Baltimore-Washington metropolitan area. The Baltimore-Washington metropolitan area is treated as a whole because previous studies have found that the upwind UHI effect in Washington area can have a significant impact on the downwind meteorology in the Baltimore area (Zhang et al 2011). The three domains have 100, 100 and 121 horizontal grid cells, respectively, in both x (East-West) and y (North-South) directions. In the vertical direction, 109 levels are used to resolve the bulk boundary layer structure. All simulations start from 0000UTC on 9 June and end at 0000UTC on 10 June 2008, with a duration of 24 h and an output frequency of 10 min. We also conducted one simulation (case 6 in table 1) that is initialized at 0000UTC on 6 June, and the validation results shown in section 3 are not very different between the two simulations with different initial conditions. Hence for validation purposes in this paper, the simulation period is chosen to be from 0000UTC 9 June to 0000UTC 10 June 2008. However, we note that in the companion paper that examines the effectiveness of different mitigation strategies, the simulations span from 0000UTC 6 June to 1200UTC 10 June 2008 (Li et al 2014) since there we introduce cool and green roofs that alter the urban microclimate and a warm-up period is thus needed to reduce the effect of initialization from data where such roofs are absent. The initial and boundary conditions for the WRF simulations are taken from the North American Regional Reanalysis (NARR; details can be found on http://emc.ncep.noaa.gov/ mmb/rreanl/ and on http://rda.ucar.edu/datasets/ds608.0/). The land-use map is taken from the National Land Cover Data (NLCD) 2006 (Fry et al 2011). As can be seen in figure 1, the three urban categories (i.e., low density residential, high density residential, and industrial/commercial) that are needed by the UCM can be distinguished in NLCD2006.

WRF computations of urban and rural temperatures
In our study, the UHI is defined as the difference between urban and rural temperatures. Two different UHI indices are usually used in the literature, one based on surface temperature, and the other based on near-surface air temperature at 2 m (Voogt and Oke 2003). Air temperature at 2 m can directly influence human comfort, while surface temperature contributes to the radiative component of thermal comfort and to the surface heat fluxes that produce higher air temperatures. As such, both the surface temperature and 2 m air temperature affect human comfort, and higher surface and 2 m air temperatures are often associated with higher mortality risks during heat waves (Anderson and Bell 2011). Both can also significantly affect urban energy and water consumptions (Akbari et al 2001). As such, both UHI indices are considered in this study and in the companion study that examines the city-scale impacts of UHI mitigation strategies. Here, the detailed calculations of surface temperature and 2 m air temperature in WRF-UCM are presented since they impact the validation and interpretation of the results.
2.2.1. Non-urban grid cells. Surface temperature (T s ) in WRF is a prognostic variable that is calculated to close the surface energy balance: n where R n is the net radiation, H is the sensible heat flux that heats the urban air, LE is the latent heat flux resulting from evapotranspiration, and G is the ground heat flux. All variables are functions of surface temperature (Chen and Dudhia 2001) and are in units of W m −2 . In particular, the sensible heat flux H is calculated through: h s a where ρ is the air density (kg m −3 ); C h is the transfer coefficient that corresponds to the first level of the atmospheric model (Brutsaert 2005); and U is the wind speed at the first level of the atmospheric model. T s is the surface temperature and T a is the air temperature at the first level of the atmospheric model. 2 m air temperature (T 2 ) is a diagnostic variable that is calculated based on the alternative expression of the sensible heat flux that uses atmospheric variables at 2 m: where C h2 is the transfer coefficient at 2 m; U 2 is the wind speed at 2 m; and T 2 is the air temperature at 2 m. As such, T 2 can be calculated as: where H is the same flux computed by WRF through equation (2a). The transfer coefficients C h and C h2 are calculated using the Monin-Obukhov Similarity Theory (Monin and Obukhov 1954): where κ (=0.40) is the von Kármán constant; z is the height of the first grid level of the atmospheric model (m); z 0 is the momentum roughness length (m); z 0T is the thermal roughness length (m); and L is the Obukhov length scale (m). ψ m and ψ h are the stability correction functions for momentum and heat, respectively (Brutsaert 1982). From equations (2)-(4), it is evident that the momentum roughness length (z 0 ) and the thermal roughness length (z 0T ) can have a significant impact on the calculation of surface temperature and 2 m air temperature. The momentum roughness length (z 0 ) in WRF is a function of land use categories only; these categories are specified before the WRF simulations start and remain unchanged throughout the simulation. Starting with version 3.2 of WRF, the momentum roughness length (z 0 ) can change with the seasonal changes in vegetation, but for short simulations like the ones conducted in our study, the changes in the momentum roughness length are negligible. The thermal roughness length (z 0T ) in WRF, on the other hand, depends on a variety of physical factors that change significantly with time. Some of the parameterizations for the thermal roughness length that are available in WRF are listed in table 1.
The Yonsei University (YSU) PBL and surface scheme has a default parameterization for the thermal roughness length that is based on Carlson and Boland (1978, hereafter MM5 due to its use in the Penn State-NCAR fifth-generation Mesoscale Model). The thermal roughness length for that where k a is the molecular thermal diffusivity of air (taken in WRF to be 2.4 × 10 −5 m 2 s -1 , i.e. at an air temperature of about 0°C); u * is the friction velocity; and κu * / k a represents the thermal roughness length of a hydrodynamically smooth surface. z l (=0.01 m) is conceived as the height above which only the turbulent heat transfer mechanism is important, while molecular/radiative heat transfer mechanisms are not important (Carlson and Boland 1978). z l is intended to represent the increase in thermal exchanges and in thermal roughness length when the surface is hydrodynamically rough, but given the default values used in WRF, its contribution is very minor.
The Mellor-Yamada-Janjic (MYJ) PBL and surface scheme has a default parameterization for the thermal roughness length that is based on Zilitinkevich's (1995, hereafter 'original Zilitinkevich') formulation: where Re = z 0 u * /ν is the roughness Reynolds number, ν the kinematic viscosity of air, and C zil is an empirical coefficient that, in the original formulation, is set to be 0.1 based on field measurements over grassland (Chen et al 1997).
Later, Chen and Zhang (2009) provided a parameterization for C zil in the Zilitinkevich relationship, as follows: where h is the height of the canopy (m). As such, C zil is no longer treated as a constant. Chen and Zhang (2009) have shown that for tall canopies (h > 2.5 m), C zil is smaller than the 0.1 value used in the original Zilitinkevich relationship. As a result, z 0T is enhanced and the transfer coefficient C h is increased in comparison to the original Zilitinkevich relationship for tall canopies, which can result in a reduction in the modeled surface temperature. The opposite situation is true for short canopies (h < 2.5 m). This 'modified Zilitinkevich' relationship (combining equations (5b) and (5c), hereafter CZ09) can also be used with the YSU PBL scheme in lieu of equation (5a) if the user explicitly specifies these choices (recall that the default z 0T for YSU PBL scheme is given by equation (5a)).

Urban grid cells.
Any grid cell whose dominant land use category is one of the three urban categories (i.e., low density residential, high density residential, and industrial/ commercial) will be treated as an urban grid cell (WRF uses only the dominant land-use in each cell). When no UCM is used, the surface temperature and 2 m air temperature in urban grid cells are calculated in the same way as described in section 2.2.1 for non-urban surfaces, but using urban-specific values of z 0 and hence of z 0T . When a UCM is used, any urban grid cell is first divided into two parts: an impervious part and a vegetated part consisting of grass-covered soils (see figure 1 of the companion paper). The assigned fractions of the urban and vegetated parts depend on which of the three urban categories is the dominant land use category in this grid cell. For an urban grid cell that is dominated by low density residential, 50% of the grid cell is composed of impervious surfaces. For urban grid cells that are dominated by high density residential and industrial/commercial, 90% and 95% of the grid cell surface will be treated as impervious, respectively . The remainder is the vegetated surface fraction. These are the default fractions used in WRF that we do not alter here. The Noah land surface model will be called first to calculate the surface temperature for the vegetated part, and then an UCM will be called to calculated the surface temperature for the impervious part.
The surface temperature for an urban grid cell in this study is computed as a weighted average that depends on the surface temperatures of the impervious part (T s(impervious) ) and the vegetated part (T s(vegetated) ): where f impervious is the impervious surface fraction. The vegetated surface temperature is calculated following the methods detailed in section 2.2.1 by the Noah land surface model using the properties of grasslands. Then, the impervious surface temperature in the default WRF implementation is generated as a diagnostic variable by the UCM following: where f roof is the roof fraction of the impervious surface, H r and H c are sensible heat fluxes from the roof and canyon, respectively). Equation (7) for calculating the impervious surface temperature is correct only if the turbulent transfer coefficient C h is using representative momentum and thermal roughness lengths for the impervious surfaces. Nonetheless, this is not the case in WRF. In the WRF version used in this study, the turbulent transfer coefficient C h for the whole urban grid is inaccurately calculated using the momentum and thermal roughness lengths of the vegetated grassland urban surfaces. This inconsistency will lead to large biases in simulated urban surface temperatures that we will depict, and remove, later in the paper (see figure 5 and discussions afterwards). The 2 m temperature for an urban grid cell is then calculated as in equation (3), but using the grid cell-averaged surface temperature and the grid-cell averaged sensible heat flux following: where T s is given by equation (6); C h2 is the value of the Environ. Res. Lett. 9 (2014) 055001 D Li and E Bou-Zeid turbulent transfer coefficient over grass at 2 m. While one could use the transfer coefficient over the impervious patch or an effective/average one for this computation, we will continue to use this default T 2 computation method included in WRF for urban areas (this is especially relevant for the companion paper dealing with mitigation strategies). The reasons for this choice is that the 2 m air temperature is much less sensitive to the value of C h2 than the surface temperature, and in WRF this is simply a diagnostic representative nearsurface urban air temperature that does not truly represent the air temperature at an elevation of 2 m given the complexity of the urban surface. It is thus a good diagnostic variable that would capture the bulk influence of mitigation strategies on air temperatures in cities.

Numerical experiments design
The surface and 2 m temperatures are strongly affected by the PBL schemes and the thermal roughness length parameterizations adopted in WRF. In this study, as mentioned before, different combinations of PBL schemes (i.e., the YSU and MYJ schemes) and thermal roughness length parameterizations (MM5, Zilitinkevich, and CZ09) are tested. The WRF-simulated urban surface temperature and 2 m air temperature are also sensitive to the adopted UCM. As such, the experiments are designed to intercompare the sensitivity of WRF-simulated UHIs to different combinations of PBL schemes, thermal roughness length parameterizations, as well as the use of different UCMs. The two different PBL schemes described above are both tested, each with its default z 0T parameterization (cases 1 and 2). We also use the modified Zilitinkevich relationship (CZ09) together with both PBL schemes (cases 3 and 4) in order to assess the relative influence of PBL schemes and the z 0T parameterizations on the simulated UHI. The default UCM is used for these four simulations. Three additional simulations are conducted to assess the performance of different UCMs in simulating the urban temperatures. Case 5 does not use a UCM and case 6 uses the PUCM, which was described in section 2.1. In this study, the roof component of the PUCM is composed exclusively of conventional roofs. The ground component includes asphalt, concrete, and urban grass whose fractions are 50%, 30% and 20%, respectively . Case 7 uses the default UCM but with the calibrated surface properties that are used in the PUCM . Note that the default UCM only has one facet over the ground-level canyon surface, which is treated as an impervious surface. As such, the surface properties used in case 7 are weightedaverages of the properties of concrete and asphalt (the two impervious surface materials) used in case 6, assuming the same ratio of concrete to asphalt surface fractions as in case 6 (i.e., 5 : 3). The roof properties used in case 7 are equivalent to the properties for roofs in case 6 since both are assumed to be conventional roofs. The difference between case 6 and case 7 lies in the fact that case 6 simulates the surface heterogeneity over the ground in urban environments (including concrete and asphalt as well as in-canyon urban grass which is not included in simulation 7), and in the use of an equivalent homogeneous impervious ground facet in case 7.

Experimental data sets
In order to assess the quality and reliability of WRF simulations of the UHI, observational data sets that span a large area are needed. The Moderate Resolution Imaging Spectroradiometer (MODIS) satellite observations provide a useful tool for diagnosing the biases in the WRF simulated surface UHI. The MODIS product used in our study is the MYD11A1 version-5 level-3 Land Surface Temperature (LST) product, which is available twice a day, once during daytime and once during nighttime, at a spatial resolution of 1 km. Note that the resolution of MODIS matches the resolution of our WRF simulations in domain 3. In this study, we use the daytime surface temperature measured around 1255 local standard time due to the good data quality (low cloudiness) and to the insensitivity of this parameter to model initial conditions. The WRF-simulated boundary layer temperature profiles are also validated by comparing to measurements through commercial aircraft observations from the Aircraft Communications Addressing and Reporting System (ACARS). The ACARS data is available at the Dulles International Airport (IAD, see figure 1) and the Baltimore/Washington International Airport (BWI, see figure 1). The ACARS date set has multiple observations each day but the frequency is determined by the number of flights with installed meteorological instruments. To compare the WRF simulations to the ACARS observations, both WRF outputs and ACARS observations are interpolated to hourly intervals and 100 m vertical intervals. Figure 2 depicts the maps of surface temperatures from MODIS and from WRF simulations 1 to 5. As can be seen from figure 2(a), the remotely-sensed LSTs are evidently higher in the two urban areas (i.e., Baltimore and Washington D.C.) and their suburbs compared to rural surface temperatures. The surface UHI effect ranges from 5°C to 15°C and varies spatially. Without a UCM ( figure 2(b)), the WRFsimulated LST cannot capture the distinct urban-suburban-rural contrasts that are observed in MODIS maps. With a UCM (figures 2(c)-(f)), the simulated LST patterns show a strong UHI effect along the Baltimore-Washington Corridor that is in better agreement with MODIS observations. It is clear that the simulated UHI depends significantly on the PBL schemes and the thermal roughness length parameterizations, which are the only two parameters that vary in figures 2(c)-(f). The YSU PBL scheme (figure 2(c)), with the default parameterization for thermal roughness length, produces the largest warm bias compared to MODIS observations ( figure 2(a)). The bias is reduced by switching to the MYJ PBL scheme with its original Zilitinkevich Environ. Res. Lett. 9 (2014) 055001 D Li and E Bou-Zeid parameterization for thermal roughness length (figure 2(d)). When the modified Zilitinkevich relationship is used with the two PBL schemes (figures 2(e), (f)), the simulated LSTs are very similar, implying that the LST at this particular time is controlled by the parameterizations for thermal roughness length, rather than by the PBL scheme. One can also clearly note that the modified Zilitinkevich relationship yields the closest agreement with MODIS, but large warm biases persist in suburban areas.

Surface UHI effect using existing WRF parameterizations
To further examine the biases in the LSTs, the differences between the WRF simulations and the MODIS observations were analyzed as a function of major land use categories within the domain (see figure 3; the categories are ordered by increasing fractions in the domains from left to right in the figure). It is again evident that the YSU PBL scheme with the MM5 parameterization for thermal roughness length yields the largest biases across all land cover types, which is consistent with figure 2(c). With the MYJ scheme and the Zilintinkevich relationship for parameterizing thermal roughness length, the biases over non-forest land cover types are significantly reduced (as can be also seen from figure 2(d)). Nevertheless, over evergreen broadleaf forest, herbaceous wetland, wooded wetland and deciduous broadleaf forest (tall canopies), the LSTs from WRF are still significantly higher than in the observations. The larger surface temperatures over these categories are caused by the fact that the original Zilintinkevich relationship underestimates the turbulent transfer coefficient C h and thus provides an insufficient land-atmosphere coupling (and hence insufficient surface cooling) for tall canopy. As a result, the LSTs over these land use categories are significantly biased towards higher values when the original Zilintinkevich relationship is used. This is in agreement with the study of Chen and Zhang (2009), which compared the modeled transfer coefficient C h to the calculated transfer coefficient using measurements of sensible heat flux, wind speed, air temperature and surface temperature inferred from outgoing longwave radiation. Chen and Zhang (2009) also pointed out that the transfer coefficient over tall canopy vegetation is significantly underestimated when the original Zilintinkevich relationship is used and thus provides insufficient coupling between the land surface and the atmosphere.
To resolve the problem of insufficient coupling over tall canopies, the empirical coefficient ( = C 0.1 zil , see equation (5b)) used in the original Zilintinkevich relationship is modified by Chen and Zhang (2009) (see equation (5c)). This modified Zilintinkevich relationship strengthens land-atmosphere coupling over tall canopies such as evergreen broadleaf forest, herbaceous wetland, wooded wetland and deciduous broadleaf forest; this reduces the modeled surface temperatures, as well as the biases (compare figures 2(e) and (f) to (d) and compare figures 3(c) and (d) to (b)). Over land cover types whose canopy heights are close to 2.5 m (such as dryland cropland or dryland/irrigated cropland whose canopy heights are about 2.1 m, see the blue in figure 3), the modeled surface temperatures are not significantly altered. Therefore, the total bias observed in the surface temperature field is reduced when using the modified Zilintinkevich relationship (i.e., the CZ09 parameterization, see equation (5c)), as compared to using the original Zilitinkevich relationship. One notable exception to the success of the modified Zilintinkevich relationship is that in urban areas, which could also be viewed as tall canopies, the biases are increased (see the green in figure 3); this is not consistent with the reasoning and results for tall vegetated canopies (cf figures 3(b)-(d)). Close scrutiny however reveals that this bias is linked to the use of the vegetated transfer coefficient to infer/compute the diagnostic averaged urban surface temperatures as detailed after equation (7). When a UCM is used, each urban grid will be assigned a certain fraction of vegetated surface (grassland), with the remainder being the imperious surface. The Noah land surface model will be called first to calculate the surface temperature for the vegetated part, and then the UCM will be called to calculated the surface temperature for the impervious part. After the Noah land surface model calculates the vegetated surface temperature, the transfer coefficient C h calculated over the vegetated surface is used by WRF to compute an impervious surface temperature using equation (7) (i.e., the turbulent transfer coefficient C h is computed using the momentum and thermal roughness lengths of grasslands rather than buildings). Over short canopy vegetation types like grassland, the turbulent transfer coefficient C h is reduced when the modified Zilitinkevich relationship is used, which increases the modeled surface temperature (compared to the original Zilitinkevich). This explains the larger warm biases in the modeled LST over urban areas when the modified Zilitinkevich relationship is used (cf figures 3(b)-(d)). This large bias in the urban surface temperature field can be corrected by the use of a more accurate computation of the impervious surface temperature, and an improved UCM, as will be discussed in section 3.2.
To further investigate the sensitivity of LSTs to PBL schemes and to parameterizations for the thermal roughness length, figure 4 depicts the diurnal cycles of surface temperature averaged over all rural (figure 4(a)) and urban (figure 4(b)) grids produced with different PBL schemes and thermal roughness length parameterizations. The urban grids include those whose dominant land use category is one of the three urban categories; all other grid cells are considered rural (except those dominated by open water bodies). During daytime, the surface temperature is clearly more sensitive to the parameterization of the thermal roughness length than to the PBL scheme, as one can see from both panels in figure 4 where the daytime results with the same parameterization for z 0T (the CZ09) match very well, regardless of the PBL scheme. This is consistent with the comparison in figure 2 that showed snapshots of surface temperature around 1255PM. During nighttime, however, the surface temperature is more sensitive to the PBL scheme compared to the parameterization of z 0T . Changing parameterizations for thermal roughness length does not alter the surface temperature significantly during the night, when the turbulent fluxes are weak and the surface energy budget is dominated by the radiative terms.
The results with the same PBL scheme thus match well, while those with different PBL schemes diverge. The larger sensitivity of WRF-simulated surface temperature to z 0T during daytime is in agreement with many previous studies that also used WRF or other numerical models. For example, Zeng et al (2012) show that changing the coefficients in computing the thermal roughness length (similar to the coefficient in the Zilitinkevich relationship that is examined in our study) significantly alters the daytime surface temperature but has a negligible effect on nighttime surface temperature over arid areas. Zheng et al (2012) also demonstrate that the simulated daytime surface temperature can be improved by improving the thermal roughness length parameterization. Shin and Hong (2011) found that when the PBL scheme is fixed in WRF, switching the surface-layer formulations (and thus changing parameterizations for the thermal roughness length) has an important impact on the simulated daytime surface temperature; nevertheless, it does not significantly alter the nighttime surface temperature (see their figure 9(a)). Figure 4(a) also illustrates that when the CZ09 relationship is used in the parameterization for thermal roughness length (comparing MYJ + CZ09 to MYJ), the daytime rural surface temperature is reduced and hence the biases (compared to MODIS) are reduced. This is again due to the fact that most of the rural land-use categories in our domain have tall canopies. In addition, the only two categories that are not classified as tall canopy land-uses (dryland cropland and dryland/irrigated cropland) also have canopy heights that are fairly close to 2.5 m, which is the criterion to separate tall canopies from short canopies. On the other hand, comparing MYJ + CZ09 to MYJ over urban terrain ( figure 4(b)), one notes that the daytime urban surface temperature is increased and hence the biases are increased, which, as discussed before, is caused by the erroneous use of C h calculated over grassland for impervious surfaces.

Improving the WRF-simulated surface UHI effect
In order to reduce the biases observed in urban surface temperatures when MYJ + CZ09 are used, and to overcome the inconsistency of using the turbulent transfer coefficient (C h ) calculated over grassland for impervious surfaces, the calculation of impervious surface temperature in the UCM is revised. Given that the UCM combines fluxes from the roof and the canyon (see discussion after equation (7)), in the following analyses, the prognostically-computed temperatures of the roof surface and the canyon are aggregated to yield an average surface temperature over the impervious surface, following This is similar to the 'complete urban surface temperature' concept proposed by Voogt and Oke (1997), but the wall and ground temperatures are incorporated into the 'complete urban surface temperature' through the canyon temperature T c . Note that the T c given by the UCM is not the air temperature in the canyon. It is rather an equivalent aerodynamic surface temperature aggregated for canyon surfaces (walls and ground). This is because where h is the building height and Rd is the road width; c . As such, the canyon temperature can be calculated as: Note that the subscripts 'r', 'w', 'g', 'c', 'a' denote roof, wall, ground, canyon and air, respectively. This justifies the use of the canyon temperature as an effective surface temperature for the canyon. The impervious surface temperature calculated using equation (9) is then substituted into equation (6) in order to obtain a surface temperature that represents the whole urban grid cell. Using this approach, the calculated surface temperature patterns from equation (6) for the default UCM, the default UCM with calibrated properties, and the PUCM are compared to the MODIS satellite observations in figure 5. As can be seen in figure 5(b), the surface temperatures calculated using this method and the default UCM are substantially larger than the MODIS observations, implying that this default UCM (including the surface properties used in this default UCM) are very deficient in simulating surface UHI strengths. The simulation with the PUCM (figure 5(d)) on the other hand clearly gives the best estimate of land surface temperatures for the whole region when compared to the MODIS observations. The PUCM includes 'urban grass' on the ground and the properties of each facet were calibrated at a site in the Northeastern US (Princeton, NJ) using a wireless sensor network . Note that the green roof fraction is set to zero so that, effectively, the roof still has only one surface type. To separate the impact of introducing urban grass into the UCM and the impact of modified facet properties, as mentioned earlier, we also modified the properties of the default UCM so that it uses the properties of the non-vegetated facets that are used in the PUCM (figure 5(c)). In this case, the roof, wall, and ground material properties are modified, but the ground surfaces are still completely impervious (consisting of asphalt and concrete pavements). Using the default UCM with calibrated properties reduces the bias considerably compared to the default UCM with default properties, but it still yields a relatively larger bias compared to the PUCM, which implies that inclusion of urban grassland (inside the canyon) is crucial for reproducing the correct urban surface features. The difference is particularly significant in suburban areas where the presence of in-canyon vegetation, is important but not captured by the default UCM.
We again quantify the biases in the LSTs as a function of the three urban categories for these runs in table 2. Here, the No-UCM simulation (case 5) is added as a reference. It is interesting to see that the No-UCM case does not produce the largest biases in the urban surface temperatures, despite its erroneous simulation of the spatial patterns of the UHI ( figure 2(b)). However, since the simulated LSTs for the three urban categories are not significantly different with this option for the three urban categories (i.e., the three urban categories have similar simulated surface temperatures), as can be seen from figure 2(b), the biases for the three urban categories are quite different. In other words, the variability in Environ. Res. Lett. 9 (2014) (d) is with the PUCM and calibrated properties. All of these simulations use the MYJ scheme and the CZ09 parameterization for z 0T . In urban grids, the impervious surface temperature is a weighted average of roof temperature and canyon temperature (equation (9) figure 7), implying that our conclusion that WRF-PUCM can realistically represent the surface UHI effect over this area is robust.

Boundary-layer temperature profiles
In addition to the comparison of the modeled surface temperatures to MODIS observations, an important parameter to validate is air temperature, which is also a main determinant of thermal comfort in cities. Due to the use of C h calculated over grassland for impervious surfaces, the 2 m air temperature calculated in urban grids might not be the most suitable temperature to validate in urban terrain (it is not clear what elevation it would actually correspond to over complex canopies), despite the fact that in the companion paper we do use this 2 m temperature as a representative air temperature for assessing the impact of mitigation options. For validation purposes, a better test is the ability of WRF to reproduce the full atmospheric boundary layer temperature fields. To verify such ability, the WRF-simulated potential temperature profiles in the lower atmosphere (from 0.1 km to 3 km) at the IAD and the BWI are compared to ACARS observations, which were introduced earlier.
The IAD site is identified as an urban site due to a large fraction of urban land within the grid cell where IAD is located. The grid cell has 81%, 14% and 5% of low-density residential urban land, high-density residential urban land, and industrial/commercial urban land, respectively. While the BWI site has 35% low-density residential urban land, it is primarily dominated by deciduous broadleaf forest (40%) and has 10% deciduous needleleaf forest. Figure 6 illustrates the evolution of potential temperature in the lower atmosphere (up to 3 km above ground level) at IAD (left panel) and BWI (right panel). As can be seen in the figure, WRF simulations with different physical parameterizations (including PBL schemes and UCMs) display subtle differences, indicating that atmospheric temperature fields are less sensitive to these parameterizations than surface temperature fields. This is not surprising given the constraint imposed by the surface energy balance on surface-atmosphere fluxes, which reduces the sensitivity of these fluxes to PBL and surface schemes. One can note however that the afternoon temperatures obtained with PUCM at the end of the run match the observations better (cooler near surface temperatures than other models), which is related to the cooler surface temperatures produced by PUCM.
The mean potential temperature profiles over the diurnal cycle from observations and WRF simulations are shown in figure 7. Overall, the composite potential temperature profiles from the WRF simulations match the ACARS measurements better within the atmospheric boundary layer than above the atmospheric boundary layer. The biases inside the atmospheric boundary layer are on the order of 1~1.5°C, implying that WRF simulations capture the boundary layer temperature profiles reasonably well. Changing PBL schemes and UCMs alters the temperature profiles within the atmospheric boundary layer at both IAD and BWI; and the differences between WRF runs with two PBL schemes are comparable to those between WRF runs with different UCMs. Interestingly, at both locations, the case without a UCM seems to produce the largest differences with the other cases. The case without a UCM seems to better reproduce the temperature profiles while the case with PUCM seems to yield the largest biases at BWI; however, at IAD, slightly larger biases are associated with the case without a UCM and relatively smaller biases are observed for the case with PUCM. This, along with the vertical variability of the biases, prevent us from making general conclusions about the most suitable UCM for reproducing the boundary layer temperature profiles in WRF. But what we can conclude is that WRF is able to capture the bulk structure of potential temperature in the atmospheric boundary layer realistically.

Conclusions and discussions
In this study, numerical simulation of the UHI effect at the city scale is investigated using a regional climate model (WRF) at high spatial resolutions. The simulated UHI with WRF is validated against remotely-sensed and in situ observations from aircraft mounted sensors. The results indicate    surface temperature fields over natural surfaces when compared to satellite measurements during daytime. The urban surface temperatures simulated by WRF are also sensitive to the UCMs used, as well as to the imposed urban thermal surface properties (e.g., albedo, thermal capacity, and thermal conductivity). A more consistent method for calculating urban surface temperature is proposed and validated in this study. Compared to not using a UCM or using the default UCM, an improved UCM (the PUCM), which we implemented into WRF, yields the smallest bias (<1.5°C) in the urban surface temperature fields, though part of this improved performance is attributed to the use of more accurate surface thermal properties.
WRF-simulated potential temperature profiles in the atmospheric boundary layer are then compared to aircraft observations. Results show that WRF captures the boundary layer potential temperature profiles reasonably well at both the urban and the rural sites considered here. The biases are about 1.5°C in the atmospheric boundary layer. The performance of WRF in reproducing the boundary layer structure is not very sensitive to the PBL schemes and the surface physical parameterizations such as the UCMs.
The implementation and use of PUCM not only improve the performance of WRF in reproducing the LST patterns and the associated surface UHI intensity, but also allow a realistic investigation of UHI mitigation strategies such as the green roof and white roof strategies. These mitigation strategies have been studied at building-scales (see e.g., Li  However, a problem of these previous global studies is the use of homogeneous urban properties (all roofs had to be white for example); in addition, less is known about the neighborhood and city-scale impacts of these mitigation strategies. Given that mitigation actions are usually organized and implemented at city scales and in stages, the WRF-PUCM modeling system implemented and validated in this study is extremely useful in providing a framework to answer questions related to the effectiveness of UHI mitigation strategies. It can thus bridge the gap between building-scale experimental/modeling work and global-scale modeling work. In the companion paper (Li et al 2014), we use this WRF-PUCM modeling system to study the city-scale impacts of green roof and cool (white) roof strategies, and we examine how these impacts scale as the penetration rate of their associated mitigation approaches increases.