Designing sensor networks to resolve spatio-temporal urban temperature variations: fixed, mobile or hybrid?

The spatio-temporal variability of temperatures in cities impacts human well-being, particularly in a large metropolis. Low-cost sensors now allow the observation of urban temperatures at a much finer resolution, and, in recent years, there has been a proliferation of fixed and mobile monitoring networks. However, how to design such networks to maximize the information content of collected data remains an open challenge. In this study, we investigate the performance of different measurement networks and strategies by deploying virtual sensors to sample the temperature data set in high-resolution weather simulations in four American cities. Results show that, with proper designs and a sufficient number of sensors, fixed networks can capture the spatio-temporal variations of temperatures within the cities reasonably well. Based on the simulation study, the key to optimizing fixed sensor location is to capture the whole range of impervious fractions. Randomly moving mobile systems consistently outperform optimized fixed systems in measuring the trend of monthly mean temperatures, but they underperform in detecting mean daily maximum temperatures with errors up to 5 °C. For both networks, the grand challenge is to capture anomalous temperatures under extreme events of short duration, such as heat waves. Here, we show that hybrid networks are more robust systems under extreme events, reducing errors by more than 50%, because the time span of extreme events detected by fixed sensors and the spatial information measured by mobile sensors can complement each other. The main conclusion of this study concerns the importance of optimizing network design for enhancing the effectiveness of urban measurements.


Introduction
Observations of the urban heat island effect have been extensively reported for major cities worldwide (Stewart 2011, Peng et al 2012. The spatial and temporal variations of urban temperatures between neighborhoods are critical in determining the vulnerability of residents to future environmental changes, especially as local and global stressors interact to increase risks Bou-Zeid 2013, Tewari et al 2019). Quantifying these variations nevertheless remains challenging due to the strong heterogeneity of land use/land cover and the multiscale temperature dynamics within cities (Grimmond 2007), as well as due to incomplete data coverage, sub-optimal network design, and challenges in data management and access, among others. Despite the increasing effort devoted to developing urban monitoring networks, including both fixed (Koskinen et al 2010) and mobile networks (Marjovi et al 2015), urban temperature information remains inadequate (National Research Council 2012).
To overcome limited data coverage, the general approach is to estimate fine-scale relations based on available temperature measurements and urban environmental characteristics, and then to apply these relations to urban areas with only regional scale observations or weather forecasts. One example is using remote sensing images from satellites to investigate the relation between land surface temperature and urban land use characteristics. Numerous studies have found that urban land surface temperature is negatively correlated with the vegetation area and positively correlated with the impervious fraction (Yuan and Bauer 2007, Li et al 2011, 2013b, Mallick et al 2013. However, the measurement frequency of two images per day on polar orbiting satellites (e.g. MODIS) is not enough for diurnal analysis. On the other hand, satellites in geostationary orbit (e.g. GOES) can retrieve sub-hourly temperatures but have a coarse resolution (Sun and Pinker 2003). Recent advancements in fusing models and downscaling methods enable the combination of different satellite observations to generate hourly temperature data sets with sub-kilometer resolution (Zakšek andOštir 2012, Wu et al 2015), but the accuracy of such data sets is not satisfactory, with root mean square errors (RMSEs) of 2°C-3°C. In addition, surface temperatures retrieved by satellites are not fully representative of the thermal environment in a city.
Air temperature is more relevant than surface temperature as it is more closely related to building energy consumption and pedestrian thermal comfort (Zhao andMagoulès 2012, Johansson et al 2014). Knowledge of surface temperature cannot be directly transferred to air temperature due to the complexity of surface-air interactions (Song et al 2017). Groundbased weather stations are needed to collect time-series of air temperature. The primary surface weather observing network in the United States currently has more than 900 sites (Automated Surface Observing System, https://www.weather.gov/asos/asostech), and the China Meteorological Data Service Center has over 2400 national stations (http://data.cma.cn/). While these numbers appear quite large, on average, each metropolitan area has fewer than five stations. The insufficient observations significantly hinder the evaluation of the aforementioned temperature-urban surface relations for a given city, let alone the validation of sophisticated urban canopy models. Therefore, deploying more dense measurement networks needs to be prioritized to capture the spatio-temporal temperature dynamics at a variety of scales in a city (Mead et al 2013, Tan et al 2014, Ramamurthy et al 2017, and to advance our understanding of the urban thermal environment (Koskinen et al 2010, Muller et al 2013, particularly given the fast cost reduction in hardware. More recently, researchers have developed and tested the use of transit systems as mobile measurement networks to collect temperature data. By attaching sensors to moving equipment such as bicycles (Brandsma andWolters 2012, Heusinkveld et al 2014), cars (Ivajnšič et al 2014, Leconte et al 2015 and buses (Llaguno-Munitxa et al 2018), a mobile system is able to continuously sample temperature across cities and potentially complement observations at fixed locations. However, the design and planning of these measurement networks, i.e. paths of mobile transect measurements or locations of fixed sensors, continue to rely on expert knowledge and on empirical experience. One recent study in Tokyo found that similar temperature patterns can be obtained with a 30% reduction of fixed measurement sites (Honjo et al 2015).
The key questions of this study are to what extent can fixed/mobile measurement systems capture the spatio-temporal variability of temperatures from the neighborhood to the city scale, especially under extreme conditions, and how to optimize the network design? We aim to address these questions by simulating virtual networks of sensors embedded in highresolution weather simulations. This simulation design of sensing networks has been used before (e.g. Pigeon et al 2006, Malings et al 2018, but to the best of our knowledge it has not been applied to mobile systems. To examine the sensitivity of the answer to climatic and geographic conditions, four cities are analyzed in this study including Chicago, New York City (NYC), Phoenix and Pittsburgh. The simulation setup is presented in section 2. We then analyze how time-averaged temperature varies spatially with surface impervious fractions, and propose fixed and mobile network designs to capture this spatial variability in section 3. In section 4, we focus on how to sense (i) the synoptic day-to-day variability, (ii) diurnal variability and (iii) temperature extremes. We then explore hybrid sensor networks as optimal systems for measuring urban temperatures in section 5, and in section 6 we provide some conclusions and recommendations.

Temperature data set from weather simulations
We utilized the temperature data set from a previous numerical study using the Weather Research and Forecasting (WRF) model (see Yang and Bou-Zeid 2019 for setup and validation details). Hourly air temperature and surface temperature are available from 0000 UTC 14 July to 0000 UTC 14 August, during a typical summer (mean daily maximum and minimum temperatures close to the 1981-2010 climate normal, different years simulated for different cities) with a grid resolution of 1 km. The WRF simulations adopted an enhanced single-layer urban canopy model (Wang et al 2013, Yang et al 2015 with the mosaic approach (Li et al 2013a, Yang andBou-Zeid 2018) to better capture the land heterogeneity, so that the output temperature at each model grid accounts for the sub-grid land use composition. Spatial coverage of the temperature data set and associated land use land cover are shown in figure 1.
The default 2 m air temperature (T 2 ) from the single-layer urban canopy model is a diagnostic variable that behaves as an effective skin temperature of the canyon. To obtain more physically pertinent canyon air temperatures comparable to in situ measurements, in this study we calculated T 2 using a revised scheme that accounts for the sensible heat flux and stability correction (Theeuwes et al 2014): where T a is the air temperature at the lowest atmospheric level; H c is the sensible heat flux from the canyon to the overlying atmosphere; r a is the aerodynamic resistance; ρ is the air density; and C p is the specific heat capacity of dry air. The studied four metropolitan areas are under three different climate zones. Chicago and New York City are located near large water bodies, while Pittsburgh and Phoenix are inland cities. The simulated temperature data set covers a domain of 160 km by 160 km for each metropolitan area. Setting up measurement networks over such large areas is unrealistic, since it would not be comparable to actual efforts. Therefore, we confine the area of interest to the urban core and its vicinity in this study (see figure 1). For comparative analysis among different cities, grid cells with a water fraction greater than 5% or with an impervious fraction smaller than 1% are excluded. This leads to more than 1000 urban grid cells for each studied metropolitan area. The physical boundaries of the studied areas are explained in table 1.

Monthly mean temperature
3.1. Temperature dependence on surface impervious fraction Monthly mean temperature is an important climate variable for analyzing the overall thermal conditions in cities (Vincent et al 2012). We first look into the  relation between monthly mean temperature and land surface impervious fractions. Figure 2 shows that both T 2 and surface temperature (T s ) increase with the impervious fraction, except for Phoenix. The semiarid rural areas surrounding Phoenix have low evaporative cooling and small heat capacity, yielding higher temperatures than urban areas during the daytime. This leads to the higher monthly mean T 2 than T s and to the unique convex profile in figure 2(d).
Note that the large difference between T 2 and T s could be partially due to the use of equation (1) for correcting air temperature, where the strong sensible heat fluxes in Phoenix may lead to slightly overestimated air temperature. Figure 2 also reveals the variation in the urban landscape configuration among the studied cities. Chicago and NYC have more high-density urban grids (more points on the right side in figures 2(a) and (b)), while Pittsburgh has more lowdensity ones (figure 2(c)). Within each city, the monthly mean temperatures vary significantly over areas with similar impervious fractions (vertical spread). This variability demonstrates the difficulty of measuring temperatures in the complex urban environments.
To compare the relation among the studied cities, we group data into bins of similar impervious fractions with an interval of 0.05. For example, all urban grids with 0.05-0.10 impervious fractions are gathered into one group and the average temperature is plotted over the horizontal axis with a 0.075 impervious fraction. Changes in T 2 and T s are shown in figure 3, with the group of impervious fractions <0.05 as the reference point. Urban land surface characteristics are found to regulate the temperature differently in the studied cities. From low-density to high-density urban grids, T s increases by up to 3°C in Pittsburgh, but changes by less than 1°C in Phoenix. The largest increase in T 2 is observed in Chicago, while for T s it is in Pittsburgh. This reveals the different surface-air interactions in the various cities and the impact of such interactions on temperatures. We also analyzed the same increase in temperature, but focused on the hottest grid cells with the 10% highest monthly mean temperature within individual bins; the trends are similar to those over the entire studied areas (figure S1 is available online at stacks.iop.org/ERL/14/074022/ mmedia).

Measured temperature by fixed and mobile systems
Now we examine how skillful fixed/mobile measurement networks are at capturing the mean trends. This is achieved by placing virtual sensors to collect observations within the studied areas, and then assessing the performance of different networks via comparisons against the full WRF data set (the 'truth' temperatures that use information from all grid points in the delineated boundaries of figure 1). In this study, we considered three different designs of the fixed measurement network: (1) randomly distributed fixed (RDF) sensors assuming no prior knowledge of the urban land use, (2) evenly distributed fixed (EDF) sensors with equal measurements over each bin of impervious fractions, and (3) weighted distributed fixed (WDF) sensors among different bins to adjust the sampling based on the number of urban grids within individual bins. In addition, the mobile measurement network (MMN) with sensors moving randomly within the studied area (mimic for example cars) is simulated. We did not place a constraint on how far a mobile station can move, since for ten or more stations it is always possible to find a station at the previous step/iteration that was within a plausible travel distance from the new random location. With a size of 1 km by 1 km, no urban grid has an impervious fraction larger than 0.95 in the WRF simulations, leading to 19 bins. It is expected that the quality of sampling would increase with the sensor number, and hence we tested five different numbers for each measurement network (19, 38, 57, 76 and 95 sensors). The network design does not assign specific locations for individual sensors, and as a result, measured temperature data by the same design will change if the random assignment of locations changes. To investigate the uncertainty and to estimate the overall performance, we conducted 20 independent runs/ realizations for each combination of measurement network and sensor number.
The monthly mean temperatures measured by 95 RDF sensors (more than any of the four cities has right now) are shown in figure S2. Errors of up to 1°C are observed in some realizations, primarily over the bins containing a small number of urban grids, which are not well sampled by the random strategy. Measured temperature has the least deviation among different realizations in Pittsburgh, due to the small temperature variability over similar impervious surfaces (vertical scatter in figure 2(c)). The nature of random distribution of stations makes it possible that no sensors are placed over a certain range of impervious fractions. Consequently, the lines can be discontinuous (or at least jagged, figures S2 and 4(a)). RMSEs for different sensor numbers are computed; with 95 RDF sensors they range from 0.37°C-0.41°C among four cities, increasing up to 0.85°C for 19 sensors in Phoenix (table S1).
We then compare the performance of different network strategies, illustrated using 57 sensors in Chicago in figure 4. The deviation among realizations in figure 4 mainly reflects the spatial heterogeneity of temperature, as the temporal heterogeneity is largely removed by taking the monthly mean. By distributing fixed sensors to cover the full range of impervious fractions, EDF and WDF networks result in smoother and continuous lines compared to the jagged RDF measurements. The EDF and WDF designs also reduce the deviation among different realizations associated with RDF, but not sufficiently. The inter-realization variability of EDF and WRF remains considerably larger than that using the mobile system ( figure 4(d)). Despite the random movement of sensors, the MMN is able to sample more grids within the studied area. The spatial information contained in mobile measurements leads to a small range of deviation, and the mobile system therefore produces the most precise/ repeatable realizations.
To illustrate the relation between measurement errors and sensor numbers, RMSEs averaged across the four studied cities produced by different networks are shown in figure 5 (see table S1 for results in each city). For each combination of network design and sensor number, RMSEs of T 2 and T s are found to be comparable and thus we focus only on T 2 in the subsequent analyses. The RMSE of RDF increases significantly with decreasing number of sensors, while the RMSEs of other networks increase more slowly due to their more representative sampling approach (see supplementary materials for computational details). With an equal number of sensors, the randomly moving mobile system consistently outperforms the fixed networks, even when the latter are designed with EDF or WDF strategies. For deploying fixed sensors, EDF and WDF designs are about equally effective in reducing measurement errors, with EDF coming out slightly ahead with an RMSE of 0.42°C using 19 sensors. This error is only slightly larger than  the RMSE of 0.38°C with the much more prevalent 95 RDF sensors and is smaller than the RMSE of 0.48°C using 76 RDF sensors.

Temporal variation of temperature
At this point, we have assessed the performance of different networks in measuring the trend of monthly mean temperature. The spatial variation of temperature at the city scale is implicitly embedded in such a trend via surface impervious fractions (there remains some variability within an impervious fraction bin). However, the usage of monthly mean value conceals the important temporal variability of urban temperatures. For any urban grid, the sub-daily temperature variability caused by the diurnal variation of incoming radiation can easily reach 10°C. Moreover, synoptic meteorological forcing changes from day to day can lead to significant inter-daily temperature variability. As MMN and EDF were found to be the most promising network design options in the previous section, in this section we examine how well these two strategies capture the sub-daily and inter-daily temporal variations of WRF-simulated temperatures in the studied cities.

Synoptic day-to-day temperature variability
High air temperatures in the summer afternoon create hazardous thermal environments and a successful measurement network should capture the distribution of these temperatures. To emphasize the temporal variability of temperatures, we compute the probability density function (PDF) of T 2 for individual bins. The results between 14:00 and 16:00 local time over urban grids with 0.50-0.55 impervious fractions are shown as an example in figure 6. By considering only this narrow range of impervious fractions over a short period, both urban fabric-related and sub-daily temporal variabilities are minimized; therefore, the range of variation depicted in figure 6 results from (i) the synoptic day-to-day variability and (ii) the residual variability within a single bin. This latter variability is less than ∼2°C (vertical scatter in figure 2) so that the wide range of PDFs in figure 6 is mostly caused by synoptic variability. Figure 6 shows that the inter-daily variability is the largest in Chicago, with temperatures spanning 15°C-40°C between 14:00 and 16:00 local time. New York City and Pittsburgh have a similar temperature range of 20°C-35°C, but their PDF profiles are very different. A single peak is found in NYC, while multiple peaks are observed in Pittsburgh. Temperature data in Phoenix during this period are strongly clustered, with the majority falling between 30°C-40°C. Both EDF and MMN strategies are found to reproduce the PDF reasonably well in all studied cities with 95 sensors. This good performance also applies to other bins of impervious fractions (see figure S3 and table S2). Between 14:00 and 16:00 local time, the mean correlation coefficient between the PDF of T 2 from 95 EDF/MMN sensors and the PDF from the full WRF temperature data set over all bins among all four studied cities is 0.95/0.93. The strong correlation suggests the synoptic temporal variability can be captured well with 95 EDF or MMN sensors. The performance of EDF and MMN networks decreases with fewer sensors, but the reduction is mild because the synoptic variability is tied to large-scale meteorological forcing ( figure S4).

Diurnal temperature variability
After looking into the inter-daily temporal variation, we investigate the sub-daily variation of measured T 2 by EDF and MMN networks. Results over urban grids with 0.75-0.8 impervious fractions in NYC are shown in figure 7 as an example. The shift in PDFs between different times of the day is more prominent than the difference between different days depicted in figure 6. In the afternoon, WRF-simulated T 2 peaks near the midpoint over the temperature range of about 15°C (figures 7(c) and (d)). Shortly after midnight and in the early morning, the peak of PDFs is skewed towards high temperatures (figures 7(a) and (b)). Distinct PDF profiles at different times are nevertheless reasonably captured using 95 EDF and MMN sensors. Results here again prove that EDF and MMN are effective networks capable of capturing the temporal variation of urban temperatures.

Extreme temperatures
While both EDF and MMN networks capture the PDFs in figures 6 and 7 well overall, they still may fail to capture the tail of PDFs. Temperatures on the right end of PDFs are extreme high temperatures in the temporal dimension occurring during the hottest analyzed periods. When the meteorological synoptic forcing is relatively steady, the hottest periods will consist of the hottest time of individual days. In this case the temporal extreme can be represented by the daily maximum temperature averaged over all days (T 2max ) for each grid/sensor. On the other hand, the hottest periods can be a few days of uncharacteristically high temperatures if meteorological forcing changes significantly, which is analogous to extreme events of short duration such as heat waves. This extreme can then be captured by T 2ext , computed as the mean of the 5% hottest (hourly) observations across time-series measurements of T 2 . For each fixed or mobile sensor and each grid point, this yields 36 values of T 2ext and 30 values of T 2max over the one-month simulation period. Resulting data are then averaged over sensors/grids belonging to each bin of impervious fractions to yield mean T 2max and T 2ext .
Figures 8(a) and (b) compare the mean T 2max by 95 EDF and MMN sensors to the full WRF data set in NYC and Pittsburgh (see figure S5 for results in Chicago and Phoenix). Note that at each measurement time, randomly moving mobile sensors record temperature data over a different urban grid within the metropolitan area. To estimate the mean T 2max from MMN sensors, the maximum temperature detected every day by a sensor was plotted at its corresponding impervious fraction, and the results are then averaged over bins. Figure 8(a) shows that fixed sensors significantly outperform mobile sensors in measuring the mean T 2max . Errors of the MMN network can be up to 5°C in NYC, while the maximum errors using EDF sensors are smaller than 2°C. The large errors of the MMN network are caused by the uncertainty related to random movement of mobile sensors. For example, a mobile sensor may measure areas of low impervious fractions during the daytime and sample hot grids at nighttime. The estimated T 2max over a diurnal cycle could therefore correspond to the morning or afternoon temperature over a hot grid, which will be significantly smaller than the T 2max from a fixed station over the same location measuring peak noon temperatures. In addition, for each diurnal cycle, the distribution of T 2max measurements among different bins of impervious fractions by MMN sensors is arbitrary. Consequently, a large deviation between different realizations is consistently found. Figures 8(c) and (d) show the results in measuring T 2ext . Both EDF and MMN networks capture the behavior of T 2ext very well in Pittsburgh, but they have an unsatisfactory performance with maximum errors of about 3°C in NYC. This illustrates that the grand challenge for urban measurement networks is to collect data during extreme events with short periods. This measurement challenge is extremely important to address as heat waves are projected to become more frequent and of longer duration in the future (Meehl and Tebaldi 2004). Because neither fixed station nor mobile sensor alone can capture T 2ext at the city scale, we explore the usage of a hybrid network design in the next section.

Hybrid sensor networks
Measurement errors of T 2ext from EDF sensors are caused by their underrepresentation of the spatial information, while those from MMN sensors are related to sub-optimal placement during the hottest periods. Using both EDF and MMN sensors, a hybrid network can potentially reduce the errors by combining the spatial and temporal information. A straightforward way is to treat data from fixed and mobile sensors similarly and estimate T 2ext from the mixed measurements. Alternatively, the hybrid network can be designed with a strategy to utilize the respective advantages of fixed and mobile sensors: estimating the time span of extreme events from fixed sensors and subsequently extracting temperature measurements from mobile sensors only during this period to complement the measurements by fixed stations. Here, we investigate the performance of two hybrid networks (with and without this strategy) using 38 EDF sensors and 57 MMN sensors in measuring T 2ext . The hybrid networks are tested for NYC and Chicago in figure 9 because errors are the largest in these two cities (figures 8 and S5). It is found that the hybrid network, with or without a strategy, better captures the trend of T 2ext than using only EDF or MMN sensors (figure 8(c)). By informing mobile sensors of the period of extreme events from fixed sensors, the design strategy reduces the fluctuations between different realizations using the hybrid network even further. In NYC, RMSEs of T 2ext by the hybrid network are reduced to 0.38°C, and further to 0.21°C when the strategy is implemented (compared to 0.54°C and 0.48°C by 95 EDF and MMN sensors, respectively). Note that we only analyze one feasible strategy for hybrid networks in this section. To achieve the optimal design of hybrid networks, in-depth analysis (e.g. machine learning approach) should be explored in future work, but our work unequivocally indicates that hybrid networks have a clear advantage in measuring extremes.

Discussion and conclusion
In this study, we examined the capability of different measurement networks to resolve the spatio-temporal variability of urban temperatures from neighborhood to city scale. Our analysis relied on deploying 'virtual sensors' in simulations of an urbanized weather prediction model (Yang and Bou-Zeid 2019). While the simulated data might not capture all the fine-scale variability in a city that an equivalent physical network could, the data abundance in the simulations far exceeds any available physical network in the analyzed city. Our findings related to network design strategies were consistent over all four studied cities, indicating that the conclusions are not generally sensitive to the slight changes in simulation accuracy.
For each studied city, we investigated how well fixed versus mobile sensors, and various strategies and number of sensors, can capture (i) time-averaged temperatures, (ii) diurnal temporal variability, (iii) day-to-day temporal variability and (iv) temperature extremes. Compared to fixed sensors, mobile sensors are more effective in measuring the spatial variability of monthly mean temperatures due to their greater ability to collect spatial information. With 95 sensors, equivalent to 5.5%-9.2% of the number of total urban simulation grid cells in the studied cities, randomly moving mobile sensors and evenly distributed fixed sensors capture the diurnal and day-to-day temporal variability under different climates with a reasonable accuracy. However, both measurement networks still fall short in capturing extreme events of short duration. Mobile sensors have a lower performance in capturing these extremes since the cost of their better spatial coverage is a loss of temporal information. A hybrid network of fixed and mobile sensors can efficiently overcome the deficiencies of purely fixed or mobile networks, especially if information on the detected time span of extreme events from fixed sensors is used to filter the temperature measurements from mobile sensors.
One important limitation of the WRF data set is its spatial resolution of 1 km. The resolution is sufficient to identify the heterogeneity of urban temperatures at a neighborhood scale. In the built-up environment, however, sensors are often located within urban canyons or in other locations where their observational footprint is reduced to a scale of ∼10-100 m. This is a spatial scale mismatch that a simulation-based study alone cannot resolve. The full urban temperature data set for a real city is therefore much more heterogeneous, but our study illustrates how fixed, mobile and hybrid systems can be used to sample temperatures in such complex environments, and the advantages and disadvantages of each strategy. It is also noteworthy that previous studies reported a significant relation between the sky view factor and canopy air temperatures (Svensson 2004, Chen et al 2012. Urban measurement network designs based on the sky view factor can be tested in future analysis.