A comprehensive assessment of land surface-atmosphere interactions in a WRF/Urban modeling system for Indianapolis, IN

As part of the Indianapolis Flux (INFLUX) experiment, the accuracy and biases of simulated meteorological fields were assessed for the city of Indianapolis, IN. The INFLUX project allows for a unique opportunity to conduct an extensive observation-to-model comparison in order to assess model errors for the following meteorological variables: latent heat and sensible heat fluxes, air temperature near the surface and in the planetary boundary layer (PBL), wind speed and direction, and PBL height. In order to test the sensitivity of meteorological simulations to different model packages, a set of simulations was performed by implementing different PBL schemes, urban canopy models (UCMs), and a model subroutine that was created in order to reduce an inherent model overestimation of urban land cover. It was found that accurately representing the amount of urban cover in the simulations reduced the biases in most cases during the summertime (SUMMER) simulations. The simulations that used the BEP urban canopy model and the Bougeault & Lacarrere (BouLac) PBL scheme had the smallest biases in the wintertime (WINTER) simulations for most meteorological variables, with the exception being wind direction. The model configuration chosen had a larger impact on model errors during the WINTER simulations, whereas the differences between most of the model configurations during the SUMMER simulations were not statistically significant. By learning the behaviors of different PBL schemes and urban canopy models, researchers can start to understand the expected biases in certain model configurations for their own simulations and have a hypothesis as to the potential errors and biases that might occur when using a multi-physics ensemble based modeling approach.


Introduction
The world's landscape is being changed as cities expand with economic development and with emigration of people to cities from more rural areas. These urban areas are unique environments with unique effects on the local meteorology. Capturing these effects on local meteorology using numerical models has been an area of extensive study. Current mesoscale weather models have many modules that work together to produce weather forecasts; some of these modules include the following: 1) Land surface models (LSMs) that try to represent the physics of the surface and its interaction with the atmosphere.
2) Planetary boundary layer (PBL) schemes that try to parameterize the atmospheric turbulence and the heat, momentum, and moisture exchange between the land surface and the atmosphere. 3) Urban canopy models (UCMs) that aim to parameterize the unique interactions of urban environments on the atmosphere. There are many different modules and schemes that are available to use; therefore, it is important to understand the tendencies, interactions, and shortcomings of these modules in order to produce the most accurate representation of weather phenomena in mesoscale models.

Surface energy flux partitioning
The surface energy balance describes the interaction of a surface with the atmosphere. The available energy (Rnet) can be quantified by balancing the incoming and outgoing shortwave (K) and longwave radiation (L) (Eq. 1) (Allen et al., 1998).
The amount of energy absorbed by a surface is controlled by the albedo and emissivity of the surface. Urban surfaces tend to have a lower albedo and higher emissivity than rural areas (e.g. Taha 1997), which lead to higher absorption of radiation in urban environments. This available energy can then be partitioned into surface energy fluxes (Eq. 2) (e.g. Vourlitis et al., 2008).

Rnet H LE G S
Rnet is net radiation or total available energy, H and LE are sensible and latent heat fluxes, and G is the surface (ground) heat flux. These four terms represent the major components in the surface energy flux balance. The fifth term, ΔS, represents the storage term, which accounts for the heating of other objects that are not the ground, which can be high in urban areas due to large structures (i.e. buildings) that are present (Moriwaki et al., 2004). This term is difficult to accurately measure and is usually calculated by subtracting the sensible, latent, and ground heat fluxes from the net radiation. Additional terms can be added to the surface energy flux balance equation to represent other processes that are occurring in the atmosphere and at the surface. In order to create a more robust energy balance, differential horizontal energy advection (ΔA) can also be included in the surface energy balance equation, however, most of the time this term is ignored. The heat contribution by anthropogenic sources (F) has also been found to be a significant part of the energy balance in urban areas (e.g. Offerle et al., 2005;Steinecke 1999;Ichinose et al., 1999). Accounting for these new energy terms results in the following energy balance equation: Accurate simulation of the urban surface energy balance may require representation of all of these terms. Many urban canopy models have the capability to model anthropogenic heat and moisture fluxes in order to more accurately represent the surface energy flux processes that are present in urban environments, however, accurately representing the magnitude of anthropogenic fluxes is still difficult to accomplish.

Noah land surface model
One of the important modules in numerical weather models is the land surface model (LSM). Land surface models parameterize the land surface interaction with the atmosphere (i.e. the surface thermodynamics, the surface hydrology, and the exchange of momentum, heat, and water vapor between the surface and the atmosphere, etc.) Each LSM has unique land surface categories that have unique sets of parameters that try to represent the interactions that different environments can exhibit (ex. forest versus desert versus tundra). In the Noah LSM, the surface thermodynamics subroutine is responsible for modeling the surface skin temperature. The incoming solar radiation that is absorbed by the surface is controlled by the albedo of the surface, which is pre-defined for varying land surface types (such as forest, desert, marshland, etc.) The thermal conductivity, heat capacity, and soil moisture content of the surface also vary by land surface type (Chen and Dudhia, 2001). The soil moisture drives the surface hydrology in the Noah LSM. Soil parameters, such as soil water diffusivity, hydraulic conductivity, and soil moisture capacity, are defined for 16 types of soils. The model also takes precipitation, evaporation, transpiration, and runoff as the main sources and sinks of moisture in the surface hydrology subroutine. The Noah LSM also accounts for any snow that might fall and cover the surface (Chen and Dudhia, 2001). In order to simulate the processes over urban surfaces, the Noah LSM includes a land surface category with parameter values designed to enable the model to simulate the thermodynamics and surface hydrology of an urban environment. This method, often referred to as the bulk urban parameterization method, represents the first iteration of a continuing effort to enable mesoscale models to accurately simulate the urban surface -atmosphere interactions at the meso and microscale . For the purposes of this study, the Weather Research and Forecasting (WRF) model (Skamarock et al. 2008) will be used as the mesoscale model.
Accounting for sub-grid land surface heterogeneity is also an important factor when using land surface models. Giorgi and Avissar (1997) summarized that sub-grid aggregation, which was accomplished using various methodologies, can affect surface energy fluxes, soil dynamics, and snow pack dynamics performance in models. Li et al. (2013) developed a mosaic/tiling approach that integrated sub-grid land surface type data into a WRF simulation for the Washington D.C. -Baltimore corridor. The integration of sub-grid data either improved or had a negligible effect on many meteorological variables (such as surface energy fluxes, surface air temperatures, and rainfall patterns); for example, the sensible heat fluxes and latent heat fluxes were changed by ~2 0 W m -2 and ~1 00 W m -2 respectively when using a mosaic/tiling approach. Although this study was for a small temporal scale (7 day period, one clear-sky day and a six day rainy period in July), the study demonstrated the need to have better land surface definitions in the mesoscale model and how accounting for 'sub-grid land surface distributions' can improve the simulated meteorology of a system.

Urban canopy model parameterization
The urban canopy models available in WRF try to parameterize the interactions between urban environments and the atmosphere using a more complex set of equations and parameterizations. These UCMs utilize parameters that represent urban landscapes through the use of urban canyons, which is a simple geometry of a road with a row of buildings on either side. This urban canyon geometry allows unique interactions (such as trapping of radiation and shading by buildings) to be parameterized, which is not represented in the bulk urban parameterization method. By having the ability to define the urban parameters (such as building height and width, the emissivity and albedo of roofs, walls, and roads, etc.), a more realistic representation of the urban environment can be achieved. The urban canopy models provide a level of flexibility and adaptation that the bulk urban parameterization scheme is unable to match.
The simplest of these urban surface schemes is the Single Layer Urban Canopy Model (SLUCM) (Kusaka and Kimura, 2004;Kusaka et al., 2001). This model was developed to work with all planetary boundary layer (PBL) and surface layer schemes that are available in WRF. The surface energy fluxes, turbulence, and radiation are all parameterized using equations that take the urban canyon geometry into account. More complex interactions, such as shadowing effects and reflection of radiation are also considered, but the ability to specify a particular canyon orientation is not included in these UCMs . The SLUCM is also limited by requiring that the first vertical model level be higher than the building height parameter, which limits the ability to represent the processes that occur within the urban canopy.
The Building Effect Parameterization (BEP) model (Martilli et al., 2002) uses similar methods to those employed in the SLUCM model; however, the BEP model allows for multiple vertical atmospheric levels within the urban canopy. Multiple vertical levels within the urban canopy allows for additional interaction of buildings (both the vertical and horizontal surfaces) with winds, temperature, turbulent kinetic energy (TKE), as well as the absorption, reflection, emission, and shading of radiation by buildings and roads, all of which are parameterized and not explicitly simulated . In addition, the parameterization equations are more complex in the BEP model than the SLUCM. For example, sensible heat fluxes in the SLUCM are represented by a bulk parameterization methodology while the BEP represents these processes with a more complex formulation.

PBL and LSM scheme sensitivity and improving simulations over urban environments
With the vast amount of modules available in numerical weather prediction models, studies have tried to document the behaviors and effects of using different combinations of schemes on the accuracy of these numerical weather prediction models. Previous studies have tried to quantify the model sensitivity to different PBL schemes (e.g. Cohen et al., 2015;Hu et al., 2010;Sharma et al., 2016;Shin and Hong, 2011); however, these studies have not reached a consensus on which scheme combination produces the most accurate model results. A study by Feng at al. (2016) showed that modeling the boundary layer winds and PBL development for the Los Angeles basin was greatly improved when using a combination of the Mellor-Yamada-Nakanishi-Niino (MYNN; Nakanishi and Niino, 2004) PBL scheme and the SLUCM. Liao et al. (2014) showed that using no urban canopy model and using the BEP UCM both resulted in smaller model errors than SLUCM runs when coupled with the Mellor-Yamada-Janjic (MYJ; Janjić, 1994) PBL scheme in simulations of the Yangtze River Delta. Salamanca et al. (2011) also had a study in Houston that produced results that showed that the bulk parameterization scheme for urban surfaces worked well in the prediction of 2-m air temperature. There are also multiple studies that focus on land surface models (e.g. Borge et al., 2008;Chen et al., 2014;Jin et al., 2010;Zeng et al., 2015) to characterize the behaviors of LSMs. The aggregation of the different studies shows that there is not a scheme combination that works well across different landscapes and different model setups.
Despite the development of LSMs and UCMs to deal with urban environments, these models still struggle to accurately model the surface energy fluxes and meteorology of urban environments. There have been studies that have used modeling frameworks other than WRF (e.g. Loridan et al., 2013;Bohnenstengel et al., 2011;Kondo et al., 2005;Masson et al., 2000) that aimed to more accurately represent the complex processes that occur in these environments. The inaccuracies have been well documented (Best and Grimmond, 2015) and there have been studies done that have utilized parameterization techniques and different model schemes to try and get better agreement between the model and the observations (e.g. Loridan and Grimmond, 2012;Loridan et al., 2010). Loridan and Grimmond (2012) conducted a study using an offline Noah/SLUCM model to assess errors across 27 observational datasets, which represented 15 different urban sites at different times of the year. In addition, five sets of parameter values and parameter configurations were used to try to reduce the errors across the 27 datasets. This extensive study showed the difficulties of getting an optimal set of parameters that works for all urban environments for all times of year. Although a set of parameter values was recommended to use for modeling any urban environment, the variability of model accuracy across the different observation sites illustrates the value of investigating different configurations and model parameters for an individual's experiment.

Assessment of UCM, PBL scheme, LSM interactions and behaviors
This study aims to create a set of WRF simulations to cover a variety of PBL schemes and UCMs over various seasons. By covering a wide array of potential options, we hope to learn the behaviors of each scheme and how they interact with one another. This study will also look at the potential impact of creating a more accurate representation of the sub-grid land surface definitions. A variety of meteorological observations are used and these observations include: surface energy fluxes, surface wind speed and direction, surface air temperatures, air temperatures within the boundary layer, boundary layer wind speed, and PBL height. By combining the observations and the different model configurations, a detailed assessment is performed to assess the behavior and performance of different model configurations, the seasonal influence on meteorological errors, and the impact of improving the land surface definitions.
There are two overall goals for this study: 1) Use a set of different combinations of land surface and PBL parameterizations within WRF to assess the ability of these parameterizations to simulate the meteorology over urban surfaces. 2) Assess the impact of implementing a high-resolution land cover data and a sub-grid urban land cover algorithm in the modeling system. interactions in a WRF/Urban modeling system for Indianapolis, IN Art. 23, page 4 of 22

Observational Network
The urban area chosen for the evaluation is Indianapolis, Indiana (39.7684°N, 86.1581°W). This study uses a wide variety of observations in order to perform an extensive model-to-observation comparison. Most of the observations used in this study were provided by the Indianapolis Flux (INFLUX) project (Davis et al., in review).
The observational network used in this paper includes urban, tower-based eddy covariance flux data. The flux tower, located east of downtown Indianapolis (Figure 1a), utilizes a sonic anemometer (Campbell Scientific; CSAT-3), and a high-frequency open-path infrared water vapor and CO 2 sensor (LI-COR Environmental; LI-7500). The flux instrumentation is mounted on a communications tower at a height of 30 meters. area surrounding the flux tower site could be classified as suburban, with the interstate (I-70) to the north and neighborhoods with single-family detached homes all around the flux tower. Since the area around the flux tower is not a homogenous landscape, a flux footprint (Kljun et al., 2015;2004) for the tower under typical conditions was calculated in order to see what surfaces were influencing the observations. Typical conditions were defined as a range of meteorological conditions found during the day for the time period of the study. The Monin-Obukhov lengths, which were used to define the atmospheric stability, ranged from -100 meters to 200 meters. The algorithm of Kljun et al. (2015) was used to estimate the flux footprint. With our setup and location, the far edge of the 90%-level flux footprint area extended about 300 ± 50 meters upwind of the flux tower site (Figure 1b). This radius is an average distance that was typical for most days during the daytime; however, this value would vary greatly under different meteorological conditions. The urban surface surrounding the flux tower (Figure 1b) is roughly 45% mixed vegetation and 55% impervious surface.
In addition to these flux tower measurements, there are about twenty surface stations in the Indianapolis area that are a part of the Meteorological Assimilation Data Ingest System (MADIS; http://madis.noaa.gov/) database that provides observations of surface temperature, wind speed, and wind direction. Data from the Aircraft Communications Addressing and Reporting System (ACARS), which are also found on the MADIS database, provide observations of temperature, water vapor, and wind profiles. PBL height can also be derived from these ACARS profiles. The ACARS data are limited to times when commercial aircraft flights are operational. Additional observations of boundary layer properties are collected from a Doppler light detection and ranging (LIDAR) system (Hogan et al., 2009), which is deployed in eastern Indianapolis (Figure 1a).
Evaluation of many simulated meteorological fields can be accomplished with this suite of instruments. This study will focus on evaluation of simulated meteorological fields within the boundary layer. These variables will includes air temperature, wind speed, and wind direction, both at the surface and throughout the boundary layer. Evaluation of surface energy fluxes and PBL height will also be performed. Most of the evaluations will focus on daytime hours when the atmospheric boundary layer is most convective because we believe that our model setup is more conducive to simulating a convective PBL rather than a stable (or nocturnal) PBL.

WRF modeling system setup
WRF version 3.5.1 (Skamarock et al. 2008) will be used for this study. There are three model domains that are nested and co-centered on Indianapolis, IN. The model domains are fixed, have a grid spacing of 9 km, 3 km, and 1 km, and consist of 100 × 100 grid points. The model physics schemes used in this study, such as the planetary boundary layer scheme, the surface layer schemes, and radiation schemes, are outlined in Table 1. Meteorological initial and boundary conditions are obtained from the North American Regional Reanalysis (NARR) product (Mesinger et al., 2006) at six-hour intervals. The simulations were completely reinitialized after every fourth simulation day using the NARR reanalysis product. The repeated initialization of the model ensured that the model remained close to the observed meteorology. The land surface definitions for the domains were created using the data made available from the National Land Cover Database (NLCD; Fry et al., 2011). The NLCD land cover product includes 40 land surface categories at 30-m resolution for the year of 2006.

Land surface definitions in WRF
Since the NLCD land cover data is at a finer resolution than the atmospheric model resolution, there is a need for an upscaling algorithm to convert high-resolution data to the coarser grid. The default land surface definition upscaling in WRF takes a coarse grid tile and sets it to the land surface category that is most abundant in the highresolution data. For example, if a 1-km 2 grid point consists of 30% water, 40% urban cover, and 30% forest, then that 1-km 2 tile will be designated as 100% urban cover because the urban cover is the most abundant land surface at that specific grid point.

Urban fraction parameter
The urban fraction parameter (f urb ) is a number from 0 to 1 that quantifies the amount of urban cover at a model grid point, where 0 and 1 correspond to 0% and 100% urban cover, respectively. The UCMs use f urb to calculate the sensible and latent heat fluxes for an urban grid tile.
Equation 4 shows how f urb influences the calculation of these fluxes.
X Tile represents the calculated flux for the urban tile, X Veg represents the calculated flux for the tile if it were 100% vegetated, X Urb represents the calculated flux for the tile if it were 100% urban, and f urb is the urban fraction. Therefore, an accurate representation of f urb is vital when trying to predict a reasonably accurate surface flux from the UCMs. In WRF v3.5.1, there are three urban land surface categories: Low-intensity residential, High-intensity residential, and Commercial/Industrial. Each category is assigned one f urb value (0.50, 0.90, and 0.95). A problem arises when using the NLCD land cover data. The NLCD has four urban categories that are binned by f urb as follows: 0.0 -0.19, 0.20 -0.49, 0.50 -0.79, 0.80 -1.00. In order to alleviate the urban classification mismatch, one can either ignore the first NLCD urban category and set it to a vegetative land cover or combine the first two NLCD urban categories and set it to the "Low-Intensity Residential" category in WRF. For this study, we have chosen the latter of the conventions (referred to as def). Figure 2a shows the f urb values for the innermost domain using the default upscaling that was outlined at the beginning of section 2.3.
In WRF, the urban fraction parameter is only applied when UCMs are being employed in conjunction with a land surface model; in those cases the urban fraction parameter is used to calculate latent heat flux and sensible heat flux for a given grid point (Equation 1). The UCM is tasked with calculating the sensible and latent heat fluxes over urban areas while the Noah LSM calculates the sensible and latent heat fluxes for the vegetated areas. The sensible and latent heat fluxes from vegetation will not change between the SLUCM and BEP UCM model configurations because both configurations use the Noah LSM. Differences between the SLUCM and the BEP UCM will occur due to differences in how urban surface fluxes are calculated and how the model introduces anthropogenic fluxes into the modeling system.

Modifying the urban fraction field
In this study, we propose and test a method to capture more realistic variability in urban cover by modifying f urb . Recall that the NLCD data has four urban categories that are binned by f urb values (0.00 -0.19, 0.20 -0.49, 0.50 -0.79, 0.80 -1.00). Each NLCD urban category is assigned a value for f urb in the midpoint of this range (0.10, 0.35, 0.65, 0.9) and is used along with fraction cover data to compute a unique average value of f urb for each 1-km 2 tile in the WRF inner domain (Figure 2) resulting in a more realistic set of f urb values (referred to hereafter as real).

Experimental setup
For all of the simulations, the only variables that change are the PBL scheme (and their respective surface layer schemes), the UCM module, and the f urb values. The PBL schemes chosen are either the Mellor-Yamada-Nakanishi-Niino (MYNN; Nakanishi and Niino, 2004) PBL scheme, the Mellor-Yamada-Janjić (MYJ; Janjić, 1994) PBL scheme, or the Bougeault-Lacarrere (BouLac; Bougeault and Lacarrere, 1989) PBL scheme.
The urban canopy options chosen are: no UCM, the SLUCM, or the BEP UCM. The UCMs contain many parameters (a subset of parameters and parameter values can be found in Table 2) and these parameters, with the exception of f urb , were held constant at their default values. It is important to note that these parameter values are not representative of the urban landscape in Indianapolis; however, changing the values to something more representative of the Indianapolis urban landscape is outside the scope of this project. Table 3 outlines the nine combinations of PBL schemes and UCMs that define the set of simulations used in this experiment. It should be noted that there is no none_MYNN_real configuration to accompany the none_MYNN_def configuration because the urban fraction parameter is only used in the UCMs; therefore, running both none_MYNN_real and none_MYNN_ def would be redundant.
This study focused on two distinct time periods. The first period, which will represent wintertime conditions (WINTER), is from February 15, 2013 to March 20, 2013. The second time period, which will be representative of summertime conditions (SUMMER), will span June 15, 2013 to July 20, 2013. These timespans allow for comparisons in both the winter and summer, which give an understanding as to how the model performance changes as the seasons change.

Results and Discussion
For the comparisons presented in this paper, the term daytime hours is defined as the hours spanning from 12:00 to 16:00 local time (17 UTC to 21 UTC in WINTER and 16 UTC to 20 UTC in SUMMER). When model-toobservation comparisons are presented, a positive error or bias value indicates an overestimation of the meteorological variable in the model and a negative value indicates an underestimate of the meteorological variable. The impact of updating the urban fraction parameter is quantified by subtracting the 'def' meteorological values from the 'real' meteorological values; therefore, posi-tive values in these analyses represent an increase in the meteorological value when the updated urban fraction algorithm is used.

Impacts of the update to the urban fraction parameter calculation
When Figure 2a to Figure 2b were compared, it was clear that there was a large overestimation in the default calculation of f urb . With the modifications, a more accurate f urb map was created. WRF was further modified so that any tile that had a f urb greater than 0.10 employed the UCM. This increased the number of tiles that utilized the UCM and most of these tiles were located on the southwest and northeast fringes of the city. After the update to f urb (Figure 2b), the major highways were clearly represented and there was additional urban coverage in both the towns that are outside of Indianapolis and the northeast and southwest fringes of Indianapolis.  All simulations were executed using the updated NLCD 2006 land surface categories for the region. Only one configuration without an UCM (none_MYNN_def ) was performed since updating the urban fractions would not change the results in an UCM-less simulation.

Impact of updated urban fraction on simulated sensible and latent heat fluxes
A comparison between the different model configurations was performed to assess the impact of the urban fraction update on the surface energy fluxes. Comparing the default (def) and updated (real) urban fraction simulations showed the impact of changing the urban fraction on latent and sensible heat fluxes. Figures 3 and 4 show the average daytime difference for the summer and winter periods, respectively. The impacts of varying the PBL scheme on these fluxes were minimal; therefore, those comparisons are not shown. The spatial pattern and the magnitude of the fluxes were highly correlated to the UCM being used. By updating the urban fraction in the configurations using the SLUCM (Figure 3a), the summer sensible heat fluxes were decreased by 25 W m -2 to 50 W m -2 over most of the urban areas in Indianapolis. The models using the BEP UCM (Figure 3c) saw a larger decrease for the summer (anywhere from 50 W m -2 to 150 W m -2 ) and the spatial distribution was more varied than the SLUCM model simulations. The latent heat fluxes in these urban areas also increased as a result of the urban fraction update. Figures 3b and 3d show that the magnitude of the latent heat flux change was the same between the SLUCM and the BEP UCM model simulations, which was to be expected because the Noah LSM simulated the latent heat fluxes generated by vegetation in all model configurations.
In the WINTER simulation (Figure 4), the changes in the surface fluxes due to changing urban fraction were much smaller than in the summer. Both the SLUCM and BEP UCM simulations showed an increase in sensible heat fluxes in the suburban areas of Indianapolis. The BEP UCM simulation also showed that decreases of sensible heat fluxes in the urban areas of Indianapolis were as high as 90 W m -2 in certain regions. Since the change in urban fraction has reduced the urban cover in Indianapolis, this decrease in the BEP UCM simulations was most likely due to a reduction of anthropogenic heat flux contribution in the urban areas. Again, the latent heat flux differences were similar between all the model configuration simulations due to the common use of the Noah LSM.
The effects during WINTER were smaller in magnitude as compared to SUMMER, and the sensible heat flux increased in areas that had a large vegetative fraction. In the model, the vegetation had a lower albedo than urban surfaces; therefore, the addition of the vegetated surfaces allowed for more absorption solar radiation. Since the potential for latent heat release in winter was small, the additional available energy was released as sensible heat flux.

Comparison of simulated solar radiation to surface weather stations
Before evaluating the simulated accuracy of surface energy fluxes when compared to the flux tower observations, it was necessary to quantify any radiation biases in the model. Any potential bias in the simulated radiation will influence the final evaluation of the surface energy flux errors. There were three weather stations that had available solar radiation data that were used to assess the errors in the SUMMER simulations. No comparison for WINTER is included due to the lack of observations during that time period.
An hourly average was constructed for the days that spanned the SUMMER time period. This overestimate was as high as 290 W m -2 for a given hourly average of the day. A similar hourly average comparison was performed on days that were determined to be sunny (Figure 5b). A sunny day was defined as a day where at least half of the solar radiation data observations, between the hours of 16 UTC and 20 UTC, were greater than 500 W m -2 . Figure 5b shows the results for sunny days and there was a reduction in the incoming solar radiation bias. The errors for incoming solar radiation on sunny days were reduced to ~130 W m -2 (hourly daytime average value). The reduction of the solar radiation bias on the 'sunny' days demonstrates that most of the bias was due to the model underestimating the cloud cover. This result was confirmed with a comparison of MODIS cloud imagery to the simulation output (not shown). Additional simulations were also performed that modified the microphysics schemes and shortwave radiation schemes and no significant reductions in the solar radiation bias were achieved with these simulations (not shown). Inconsistencies in the solar radiation are important to consider when interpreting the errors from other simulated meteorological fields. While the errors and biases in simulated meteorological fields and their sensitivities to LSMs, UCMs, and PBL schemes are still present; the magnitude of these errors will be affected by the overestimation of solar radiation.
Observations of net radiation were not available for comparison to the model. The climatological values of emissivity and albedo used in WRF at this location, however, compared well to MODIS observations. Therefore, we will assume that there is little to no surface energy flux error contribution from inaccurate surface albedo and emissivity values in the WRF model. The simulation results were compared to observational data taken from a flux tower that is located in the eastern part of the city (Figure 1a and 1b). The 1-km 2 WRF grid point encompassing the tower is 53% vegetated/47% urban cover according to the NLCD. It is important to note that there is a sampling mismatch when comparing the tower observations and the model output. The 300 m radius area around the tower, which is roughly representative of the flux footprint (Figure 1b), is less vegetated (45% vegetation cover by area) than the 1-km 2 WRF grid point. Figure 6 shows the diurnal averages of the sensible heat flux, latent heat flux, and friction velocity for the observations and the real configurations of the model during the summer and winter period. The simulated sensible heat fluxes appear to be grouped by UCM used. In other words, the PBL scheme used had little to no effect on the magnitude of the sensible heat flux. This was expected since the sensible and latent heat fluxes are linked to the land surface; therefore, these fluxes should be most sensitive to changes in the UCM or LSM.
The comparison to the flux tower observations was done using an hourly average analysis (Figure 6). During the WINTER simulation, the none_MYNN_def configuration performed the best at simulating the sensible heat fluxes at the observation site. However, this configuration still had a maximum hourly error of 68 W m -2 during the daytime hours (Figure 6a). The simulations that used the SLUCM and BEP configurations had higher sensible heat flux hourly average errors (about 100 W m -2 and 150 W m -2 respectively) when compared to the wintertime observations.
For the summer period, the none_MYNN_def configuration had sensible heat flux hourly average errors that were as high as 232 W m -2 (Figure 6d). The SLUCM had the lowest maximum hourly average error (35 W m -2 ) during the daytime and the BEP simulations had hourly average errors that peaked at about 110 W m -2 . The none_ MYNN_def simulations do not allow for vegetation to be accounted for in the urban areas, so any simulations during the summertime saw an increase in sensible heat flux and a suppression of latent heat flux. This exaggerated the errors in the simulated sensible heat fluxes, which were as high as 130% during SUMMER.
It is also important to note that all of the simulations, regardless of season, had a positive bias in simulated sensible heat flux, which was partially due to the solar radiation overestimate that was discussed in the previous subsection. Looking at the previously defined 'sunny days' and comparing the reduction in the surface energy flux errors to the entire SUMMER simulation allowed for the quantification of the impact of the solar radiation bias. While the overestimation of incoming solar radiation was not the entirely responsible for the surface energy flux errors, the 'sunny day' average hourly surface energy flux errors were reduced by 10% to 30% (not shown). The magnitude of the error reduction was also dependent on model configuration.
For all configurations of the model that used a UCM, the latent heat fluxes were essentially the same because of the common Noah LSM being used in all the model configurations (Figure 6b and 6e). During the winter period, all of the simulations that used a UCM had average hourly errors that peaked at about 25 W m -2 and the none_MYNN_def average hourly errors peaked at -45 W m -2 . For the summertime, the simulations that used a UCM had average hourly errors that peaked at 120 W m -2 and the none_MYNN_def maximum average hourly error was roughly -180 W m -2 . Recall that this 1 km 2 WRF model grid has slightly higher vegetation cover by area (53%) than the area in the flux footprint (45%) (Figure 1b). Correcting for this small mismatch would likely increase the disagreement between simulated and observed sensible and latent heat fluxes. While the none_MYNN_def also used the Noah LSM to calculate the surface energy fluxes, the lack of UCM means that there is no vegetation in urban areas; therefore, the latent heat fluxes for the none_MYNN_def were 0 W m -2 or near zero in both the summer and winter periods. By using a UCM, the model skill in predicting the latent heat fluxes increased in both the winter and summer; however, there was a positive bias in the latent heat flux in these UCM simulations.
The friction velocity (Figure 6c and 6f) for all seasons was dependent on the both the UCM and PBL scheme being used in the model. The BEP_BouLac_real differed from all other configurations that either use the SLUCM or BEP UCM scheme, showing that the MY family of PBL schemes (MYJ and MYNN) performed similarly when parameterizing the wind shear and momentum fluxes while using a UCM. Across all seasons, the friction velocity, on average, was underestimated for all hours of the day; however, the diurnal pattern seen in the observations (i.e. an increase in friction velocity during the daytime hours) was captured by all of the model configurations, except for the BEP_BouLac configuration, which had a more constant value of friction velocity throughout an average day. The none_MYNN_def configuration had the smallest average hourly errors (average daytime error of -0.17 m s -1 in WINTER and average daytime error of -0.11 m s -1 in SUMMER). Although there was a consistent negative bias present, the none_MYNN_def configuration best represented the observed friction velocities, and the model skill was degraded with the SLUCM (see SLUCM_ MYNN_def results for comparison).

Assessment of air temperature errors
3.3.1 Impact of urban fraction changes to simulated air temperatures Figure 7 shows the changes in the 2-m air temperature across the model domain due to changes in urban fraction. The more realistic representation of urban land cover has produced a slight warming effect, as high as 0.25 K in some areas of the city, during the WINTER daytime hours (Figure 7a and 7c). The opposite occurred in the summer (Figure 7b and 7d), where updating f urb created an overall cooling effect in the summer, decreasing temperatures by as much as 1 K. The 2-meter air temperature in WRF is derived from the sensible heat flux; therefore, it is logical that the spatial patterns and sign of the 2-m air temperature changes mimic those of the sensible heat flux changes (Figures 3, 4). The PBL scheme chosen did not have a large effect on the spatial patterns of the 2-m air temperature, and the BEP UCM simulations show the tendency to have a heterogeneous spatial pattern of 2-m air temperatures. Figure S1 shows that the change in urban fraction altered air temperatures throughout the atmospheric boundary layer. The cooling effect during the summer peaks at ~0 .3 K, which was a smaller magnitude to what was seen in the 2-m air temperature (Figure 7). The magnitude of boundary layer cooling was highly correlated to the UCM being used while the PBL scheme had little effect, similar to what was seen in the 2-m air temperature comparison. The vertical extent of the cooling is highly dependent on the PBL scheme used because the PBL height itself is highly correlated to the PBL scheme chosen for a simulation (as discussed in Section 3.5).

Comparison of simulated air temperature to surface weather stations
The simulated air temperatures were compared to surface stations located in urban areas in order to quantify the mean error and the range of the daytime hourly errors. The methodology for constructing the box and whisker plots (ex. Figure 8) is as follows: For every hour within the predefined daytime hour period, an error was calculated for the entire 35-day simulation (i.e. every day had five computed errors (12:00 -16:00 local time) per observation station, if the observations were present). This collection of hourly errors allowed for the construction of a distribution, in the form of box and whisker plots, which can easily indicate the potential range (difference between the most negative and most positive error) of hourly errors, the median of the hourly error distribution (indicated by the circle on the box and whisker plots), and the mean hourly error (indicated by the triangle on the box and whisker plots). The MADIS observations were also filtered by comparing them to reanalysis data as an additional form of quality control.
The mean, median, and range of hourly errors allow us to define the magnitude of biases and random errors in the different model configurations. Identifying the magnitude of the bias will show whether or not a particular combination of physics parameterizations have an inherent tendency to overestimate or underestimate a meteorological variable on average. The magnitude of the random errors (range of the hourly errors) will identify the magnitude of error that can occur on an hourly basis. If these random errors are not correlated to the physics parameterization, then these errors may originate from the meteorological initial and boundary conditions. The median will also show if the occurrence of errors above and below the mean is evenly distributed or if the error frequency is skewed. Figure 8 shows the distribution of daytime hourly errors for both the winter and summer simulations. During WINTER, the magnitude of the average error was less than 1 K for all configurations of the model. The best performing model configuration was the BEP_BouLac_ real with an average error of -0.1 K and a range of 8.1 K. The worst performing member was the SLUCM_MYJ_def with an average error of -0.9 K and a range of 8.6 K.
The PBL schemes exhibited a predictable behavior in the 2-m air temperature error bias sign and magnitude. If the models were using either the MYJ or MYNN scheme, then the models tended to have a cold bias closer to -1 K. With the BouLac PBL scheme, the cold bias is virtually eliminated. As was shown in Figure 7, applying the custom urban cover algorithm increases the 2-m air temperature over most of the urban domain in WINTER. This increase is reflected in Figure 8 and the increase in air temperature aids in rectifying the cold bias that is exhibited by all of the model configurations. For the SUMMER (Figure 8), the UCM, PBL scheme, and urban land cover algorithm all had an impact on the range and average 2-m air temperature errors. There was still a cold bias in the models; however, these biases are smaller in magnitude than in the WINTER, with the largest bias being -0.8 K for the SLUCM_MYJ_real simulation. The best performing configuration for SUMMER was the BEP_BouLac_def configuration with an average error of -0.2 K and a range of 13.5 K. The urban fraction update cooled the surface air temperatures and exacerbated the cold bias. The impact of using the realistic urban fraction algorithm increased the magnitude of the 2-m air temperature errors anywhere from ~0 .2 K to ~0 .6 K, depending on the model configuration. In addition, all the summertime simulations have a skewed distribution of 2-m air temperature errors. This skewed distribution can be seen in Figure 8 by comparing the mean errors for a particular model configuration (indicated by the triangles) to the median of the distribution (indicated by the circles); if the mean and median of the distribution are displaced from one another, then the distribution of errors is skewed. Using the BEP UCM alleviated most of the occurrences of the extreme overestimation of 2-m air temperature (the median and mean of the distribution are closest in agreement) while using the none_MYNN_ def configuration caused the highest occurrence of warm error outliers (the mean and median of the distribution of errors differed by 0.6 K). A comparison with ACARS aircraft observations ( Figure S2, Text S1) also show similar tendencies to those seen in the 2-m air temperature comparison.

Impact of urban fraction changes on model wind speed and wind direction
There was little to no difference in the mean wind speed (maximum decrease of 0.20 m s -1 ) and wind direction (maximum direction change of 4 degrees) when the urban fraction update was applied to the different model configurations (not shown). Even though the urban fraction algorithm produced differences in the surface energy fluxes and the temperature fields, the wind speed response to the urban fraction change was almost negligible; the urban fraction algorithm impact on wind speed and direction were similar across all configurations and did not show any correlation with specific parameterization schemes. Figure 9 shows the distribution of model errors for 10-m wind speed and wind direction when compared to observations taken from MADIS surface stations located in urban areas. A minimum wind speed threshold of 1.5 m s -1 was used in order to filter out observational data that might disproportionally affect the assessment of wind speed and wind direction errors since most stations report a null value for these ' calm wind' conditions. The wind speed near the surface exhibited an overestimation in the model for all configurations and for both SUMMER and WINTER (Figure 9a). The configuration that produced the lowest average wind speed error in both the wintertime (mean error of 1.5 m s -1 ) and summertime (mean error of 2.1 m s -1 ) simulation was BEP_MYJ_real; and the highest average error in winter was produced by the SLUCM_ MYJ_def simulations (mean error of 3.2 m s -1 ) while the none_MYNN_def simulations had the highest average error in SUMMER (mean error of 2.6 m s -1 ). For all model configurations, the average error during the wintertime was similar to the average error in the summertime; also, the range of the wind speed errors was fairly well distributed about the mean error (the median error was equal to the average error). The daytime hourly range of the wind speed errors for BEP_MYJ_real was about 8.7 m s -1 and 9.7 m s -1 for WINTER and SUMMER, respectively. The range of errors for the WINTER SLUCM_MYJ_def simulation was 9.7 m s -1 while the SUMMER none_MYNN_ def simulation had an error range of 9.9 m s -1 , which represented the largest error spread among the different configurations. While the PBL scheme chosen seems to have an effect on the wind speed errors, the distribution of errors (Figure 9a) shows a higher sensitivity to the UCM being used. Overall, the BEP UCM simulations performed better than the SLUCM and none simulations in both the winter and summer.

Comparison of model winds to surface weather stations
The application of updated f urb values to every simulation seemed to have a positive effect (reduction of the error bias) on the average wind speed error for all of the simulations; however, the improvements, which were on the order of tenths of a meter per second, were small compared to the overall average wind speed error.
The daytime hourly wind direction errors (Figure 9b) have a large spread across all configurations of the model. The BEP_MYJ_real configuration had the lowest average error in WINTER (mean error of 4 degrees) and SUMMER (mean error of 6 degrees); on the other hand, the SLUCM_ MYNN_def and none_MYNN_def both had the worst wintertime performance (mean errors of about 10 degrees) and SLUCM_MYNN_def had the largest summertime errors with a mean error of 15 degrees. Figure 9b shows that there was a slight clockwise bias in the wind speed (i.e., the simulated wind directions tended to be to the right (clockwise) of the observations) during both time periods. The ranges of the errors were large for all simulations (minimum ranges were over 120° in WINTER and over 210° in SUMMER) and the SLUCM_MYJ simulations consistently had the smallest range of wind direction errors.

Comparison of simulated winds to ACARS aircraft observations
ACARS data were used to assess the validity of the simulated wind speeds in the boundary layer (Figure 10). The magnitudes of the errors are within 0.5 m s -1 to the errors that were calculated using surface observations (Figure 9a); however, there are isolated points in the ACARS assessment that are uncharacteristically high. These irregularities are due to the fact that some of the data points in Figure 10 have as little as 10 observations, which caused anomalously high errors when an outlier was averaged in the final error assessment. Most of the boundary layer has a wind speed error similar to what the surface observation analysis produced.
The different model configurations produced slightly different error magnitudes, but all configurations produced similar temporal and vertical error structures (Figure 10). All model configurations also show a definitive positive wind speed bias throughout the first 500 m of the model for hours after 12 UTC. During the late afternoon and early evening hours of WINTER, the models exhibited a positive bias (i.e. produced higher wind speeds than in the observations) in the boundary layer and the errors reached as high as 4 m s -1 for a single binned point (Figure 10); the SUMMER wind biases are smaller and were as high as 2.5 m s -1 for a single binned point.
The modeled data were also compared to lidar observations ( Figure S3, Text S2). For the late afternoon and evening hours, the wind speed errors were small, and the highest average wind speed error was similar to what was seen in the 10-m wind speed errors (Figure 9a). The wind speed accuracy in the model is degraded during the morning and early afternoon regardless of the model configuration used; however, the magnitude of the errors also exhibits a strong dependence on the UCM being used.

Impact of urban fraction changes on simulated boundary layer height
The average daytime boundary layer heights for the innermost domain were calculated for all the model configurations using virtual potential temperature profiles to infer the boundary layer height. Figure 11 and 12 show the average daytime PBL heights, for all model configurations, for WINTER and SUMMER, respectively. Both figures convey the fact that the PBL heights can be affected by many different factors, including PBL scheme used, the UCM used, and the use of the updated urban fraction parameter; however, the seasonality will determine which scheme (PBL, UCM, or f urb ) will be the dominating factor that determines the overall structure and height of the simulated PBL field. For the wintertime simulations (Figure 11), the PBL scheme had the dominant effect in determining the spatial structure and magnitude of the boundary layer heights. The none_MYNN_def (Figure 11i), SLUCM_ MYNN_def, and SLUCM_MYNN_real (Figure 11c and   11d) simulations had a remarkably similar pattern and magnitude to PBL heights (around 800 meter average throughout the domain) and very little difference was seen over the urban areas of Indianapolis despite having completely different land surface models. The BEP_ BouLac simulations (Figure 11g and 11h) was radically different with PBL heights that were over 1000 m in the urban center, which were anywhere from 100 m to 200 m larger than any of the other simulations that used either MYNN or MYJ as the PBL scheme.
In the summertime simulations (Figure 12), the PBL heights in rural areas of the domain were dominated by the PBL scheme chosen. Again, this was expected because the rural areas of the domain have the same LSM (Noah), no UCM was active in those areas, and urban fraction was ignored if the UCM was inactive for a given grid point; therefore, the only difference in those rural areas was the PBL scheme that was being used. Unlike the wintertime simulations, the urban areas for the SLUCM_MYNN and none_MYNN_def were not similar, signifying the PBL height had an enhanced sensitivity to UCM in the summer that is not present in the wintertime.
The updated urban fraction algorithm slightly decreased the boundary layer heights over the city. This decrease was expected since the algorithm decreased the amount of urban cover in the city and decreased the sensible heat fluxes. However, these changes due to urban fraction were also dependent on the UCM that was being used. For the wintertime simulations that used the SLUCM scheme (Figure11a-11d), the impact of updating the urban fraction was negligible. The configurations that used the BEP UCM simulations did have a clear urban enhancement zone in the PBL height fields, and these heights were impacted by the changes in urban fraction. For the summertime period (Figure 12), the urban fraction update decreased the PBL height by 100 m to 150 m, depending on the location within the urban area. The BEP_BouLac_ def configuration produced the largest urban PBL height values while the SLUCM_MYJ_real produced the smallest heights over the urban area. It is also important to note the differences that emerged when comparing the model sensible heat fluxes to the PBL heights. There should be a significant relationship between the sensible heat flux and the height of the boundary layer, so it would be expected that model simulations that produced higher sensible heat fluxes would also have larger PBL heights. However, this is not true for all simulations. The BouLac simulation consistently had larger PBL heights (Figure 11g-11h and  12g-12h) even though the BEP_BouLac simulations had similar sensible heat fluxes to the BEP_MYJ simulations (Figure 6b and 6e). This confirmed the impact that the PBL scheme has on the PBL height and how different the BouLac simulations can be from the MYJ or MYNN simulations, even if they have similar sensible heat fluxes.

Comparing model boundary layer height to ACARS aircraft observations
The ACARS database was used in order to assess the accuracy of the simulated PBL height. PBL heights were derived from potential temperature profiles that were created from the ACARS data. These heights were compared to the model grid point that corresponded to the observation. The profiles sampled different areas around Indianapolis and the profiles were split into rural and urban categories based on land surface categorization. Figure 13 shows the daytime errors in the simulated PBL height for the urban and rural observations for both the wintertime and summertime simulations. The errors were split into urban and rural groups because it is important to disentangle these two environments. If the rural PBL height has errors, then those rural errors will impact the urban PBL height performance relative to the observations, even if the land surface-atmosphere interaction over the urban area is working perfectly. For WINTER, the two BEP_BouLac simulations clearly outperformed the other configurations of the model. On average, the simulations that used either the MYJ or MYNN PBL schemes underestimated the boundary layer height by about 200 m in both the urban and rural environments. The range of rural PBL height errors is less than the error range over urban areas. There were roughly the same number of observations for the rural and urban PBL heights (131 and 147 observation profiles during the wintertime and summertime), so the spread is likely due to the inability of the model to capture the urban heat island enhancement of PBL height and due to higher variability in the UCMs causing more variability in the sensible heat fluxes.
The simulated PBL heights are nearly unbiased over the urban environment for SUMMER. The average PBL height error for most configurations is close to zero meters, with the maximum mean error being 100 m in the BEP_BouLac_def simulations and the minimum mean error being 20 m for the BEP_MYJ_real simulations. The BEP_BouLac configurations overestimate the PBL heights over the urban areas of Indianapolis. By applying the urban fraction algorithm, the errors in the BEP_BouLac configuration were reduced. For the rural environment, however, there is still a 50 m to 100 m underestimate of the PBL heights. This suggests that the summertime urban enhancements (i.e. the land surface -atmosphere interactions in urban areas) in PBL height may be overestimated in all of the model configurations. When there is a negative bias in rural PBL height, this error should carry over into the urban environment and produce a similar negative bias if the UCM and PBL schemes are accurately representing the dynamics of the urban environment. Since the model is producing positive biases in PBL height over the urban portions of the domain, the urban environment interactions were likely exaggerated in the model.

Significance testing of hourly error distributions
Significance testing was performed on the hourly error distributions (box and whisker plot figures; Figure 8, 9, and 13) to determine if the error distribution of a model configuration was significantly different than that of the other model configurations. Learning if one model configuration was statistically "better" (i.e. had the smallest bias) than the other model configurations can aid in choosing a model configuration that best fits the needed application. A bootstrap method was performed on these sets of hourly errors and the statistical significance threshold was set at a 95% confidence interval. Table 4 shows the results of the best model configuration separated by meteorological variable and simulation time period (SUMMER or WINTER). 'No significant differences' was used to indicate when no model configuration was statistically better than the other configurations. In WINTER, the choice of model configuration had a large impact across most meteorological variables. Not only did the BEP_BouLac configuration have the lowest bias in PBL height, air temperature, and wind speed, but also its hourly error distribution was also statistically different from all other hourly error distributions. During the SUMMER, the configuration chosen only had a statistical difference in simulating the 10-m wind speed, where BEP_BouLac and BEP_MYJ configurations were statistically better. While some of the statistical significance tests resulted in 'no significant differences', it is important to note that there were still certain configurations that outperformed all other configurations. If a study only called for simulations that had the lowest PBL height bias in the summer for Indianapolis, then the BEP_MYJ_real should be used since that configuration produced the smallest bias. The significance testing allows researchers to see which configuration choices significantly improve or degrade the model simulation for a broader application.

Summary and conclusions
The overall goal of this study was to assess the impact of model parameterizations and land surface data on model skill in simulating urban PBL properties, such as surface energy fluxes, air temperatures, wind speed and direction, and boundary layer height. By using different combinations of urban canopy models and planetary boundary layer schemes, the general behaviors of different model configurations could be assessed. An overestimation of urban cover was present when using the default urban fraction algorithms in WRF. A custom f urb algorithm was added to WRF in order to generate a more accurate representation of urban cover in the simulations, which alleviated the overestimation of urban cover. Model-data comparisons were used to quantify biases and random errors in the integrated modeling system.

Solar radiation bias
There was a clear solar radiation bias present in all simulations that was not affected by either the physics parameterizations or the treatment of urban fraction. The model overestimated the solar radiation during the daytime and the average hourly errors were highest at 16 UTC. We concluded that most of the bias was due to the cloud resolving inaccuracies that are present in the model; however, there is also another component to the solar radiation bias since roughly 20% of the total bias was still present on the clear-sky days.
It was important to quantify the solar radiation bias in the model in order to determine whether or not the other meteorological errors in the model, primarily the sensible and latent heat flux errors, were due to the solar radiation bias. When comparing the surface energy flux errors between the SUMMER 'sunny' days and all of the SUMMER days, there is roughly a 10% to 30% reduction in the surface energy flux bias. Clearly a part of the surface energy flux error is directly linked to the solar radiation bias, but there is still a significant portion of the surface energy flux error that is due to poor parameterization in the LSMs and UCMs.

Model sensitivity
The improved description of urban land cover increased the vegetation fractions in the simulation and thus improved the simulated surface energy fluxes by decreasing the sensible heat fluxes and increasing latent heat fluxes. This improved surface energy balance had a significant impact on the simulation of urban meteorology in the SUMMER, but relatively little impact in the WINTER, probably due to the relatively low magnitude of surface energy fluxes in that season. In the SUMMER, the sensible heat flux, air temperature, and PBL height all decreased with the overall decrease in urban cover. Wind speed throughout the PBL was also lower with the updated f urb , but the largest change in the domain was only 0.20 m s -1 . The WINTER simulations showed that the air temperature and the sensible heat flux both increased with the more realistic (smaller) urban fraction, while the other variables (PBL height and wind speed) decreased with the smaller urban fraction, as in the SUMMER. The sensible heat fluxes in the BEP simulations were consistently higher than the sensible The model configurations that had a statistically significant reduction of model bias are shown above. If no model configuration was found to be statistically better than the other model configurations, then the 'No significant differences' label was used.
heat fluxes in the SLUCM simulations. The SLUCM model uses a bulk parameterization technique to predict sensible heat fluxes while the BEP UCM uses a more complex formulation, and it is this difference that causes the predictable behavior in simulated sensible heat fluxes. There was also predictable behavior in the simulated PBL heights and 2-m air temperatures, especially in WINTER. The BEP_BouLac simulations consistently had deeper PBLs and warmer surface air temperatures when compared to all other configurations.

Simulated meteorology comparison to observations
The BEP_BouLac_real configuration had the highest success rate in simulating various meteorological variables; however, the 'best' model configuration is highly dependent on the meteorological variable of interest and the season being simulated. In our study, the air temperatures were primarily affected by the PBL scheme but were also affected by the UCMs and urban fraction parameter. In this case, improving the land surface definitions caused the air temperature errors to increase in the summertime simulations because the reduction of urban fractions in the domain exacerbated the cold bias in air temperatures that was already present in the model. Nevertheless, the updated algorithm did prove useful in improving both the surface energy fluxes and PBL heights. Improvements in one portion of the model can propagate into other components and increase errors. Model biases also vary with season. While the BEP_BouLac_real configuration has the least biased PBL heights in winter, it has a larger (and positive) bias in summer. These are examples of how the large number of interdependencies within the modeling system makes it difficult to identify one ideal model configuration.

Impact of boundary conditions
In some cases, the rural boundary conditions, not the simulation of the urban land surface and boundary layer, had the largest impact on model performance within the city. In these cases our model-data comparison in the rural environment surrounding Indianapolis is very similar to the results within the urban environment. In these cases the representation of the city surface is less important than the quality of the meteorological simulations in the surrounding rural environment. This is most true in the WINTER, when the land-atmosphere interactions are the weakest. The urban surface has the greatest impact on our simulation results in the SUMMER.

Implications for urban greenhouse gas studies
We hypothesize that biases in PBL wind speed, wind direction, and height are the most important metrics for applications, including INFLUX, that seek to use WRF or other numerical weather models to simulate the transport of pollutants and other tracers. We found that the PBL and land surface physics parameterizations had the greatest impact in our experiment on the simulation of these variables. The magnitude of the impact varied with the time of year. The greatest impact was seen in the WINTER simulations, where the BEP_BouLac simulations performed significantly better (i.e. had a smaller bias) than all other configurations for these meteorological criteria.
In SUMMER, the BEP_BouLac simulations produced 10-m wind speeds in Indianapolis that were significantly better than other model configurations.
Errors in the hour-to-hour simulation of these meteorological variables are also important for greenhouse gas studies, since hourly or daily flaws in the meteorological reanalysis can lead to misattribution of the distribution of greenhouse gas fluxes, even if the total flux is not biased. We found that the random errors (i.e. the range of the errors) varied seasonally. The distribution of errors tended to be skewed in the SUMMER simulations indicating that on an hour-byhour or day-by-day basis, the probability of generating a higher magnitude positive error in a particular meteorological field was larger than generating a negative error. The spread of the random errors in the simulated meteorological fields could be increased or decreased depending on the physics parameterization used, however, the changes were usually small (~5%-10% change). This implies that these time-variable errors are more influenced by the meteorological boundary and initial conditions than by the description of boundary layer physics and land cover within our model domain.  show that data assimilation of lidar and aircraft PBL wind data collected in the Indianapolis region reduced random errors in PBL wind speed and direction by approximately a factor of two. This suggests that local meteorological data assimilation is more effective for reducing random errors than altering model physics parameterizations.
Future simulations of urban meteorology in support of understanding the fluxes and concentrations of pollutants, including greenhouse gases, should use more realistic land cover and explore means of improving simulated incoming solar radiation. Surface fluxes will be more realistic, improving the overall fidelity of our modeling system. Model biases should be examined across seasons; the model configurations that yield minimal biases are likely to vary as a function of time of year and, we expect, location. Attention must be paid to the rural domain surrounding the urban environment. Finally, our experiments suggest that for limited domains, model physics has relatively little impact on the magnitude of random errors in PBL meteorology.

Data Accessibility Statement
The data for the model output are available at  and the eddy covariance flux data are available at .