Machine Learning Techniques to Map the Impact of Urban Heat Island: Investigating the City of Jeddah

: Over the last decades, most agricultural land has been converted into residential colonies to accommodate the rapid population expansion. Population growth and urbanization result in negative consequences on the environment. Such land has experienced various environmental issues due to rapid urbanization and population increases. Such expansion in urbanization has a big impact on worsening the residences soon and in the long term, as the population is projected to increase more and more. One such issue is the urban heat island (UHI), which is computed based on land surface temperature (LST). The UHI effect has fundamental anthropogenic impacts on local areas, particularly in rapidly growing cities. This is due to the unplanned shifts in land use and land cover (LUALC) at the local level, which results in climate condition variations. Therefore, proper planning based on concrete information is the best policy in the long run to remedy these issues. In this study, we attempt to map out UHI phenomena using machine learning (ML) algorithms, including bagging and random subspace. The proposed research also fulﬁlls the sustainable development goals (SDGs) requirement. We exploit the correlation and regression methods to understand the relationship between biophysical composition and the UHI effect. Our ﬁndings indicate that in the megacity of Jeddah, Saudi Arabia, from 2000 to 2021, the urban area enlarged by about 80%, while the UHI increased overall. Impervious surfaces signiﬁcantly impact the UHI effect, while vegetation and water bodies have negative implications for the UHI effect. More than 80% of the total parts in Jeddah have been classiﬁed by extremely high UHI conditions, as determined by the bagging and random subspace models. In particular, the megacity’s south, north, and central-east parts were categorized by very high UHI conditions. This research is not only expected to assist in understanding the spatial patterns of the UHI in Jeddah, but to assist planners and policymakers in spatial planning. It will help to ensure sustainable urban management and improve life quality.


Introduction
Rapid urbanization occurs in various parts of the world [1]. In the mid-20th century, urbanization became a key driver due to human activity [2]. According to the United Nations prediction in 2018, 68% of the population will move towards cities by 2050. Massive-scale influxes of people in urban areas and landscape alteration from green space to impervious surface areas have resulted in imbalances between rural and metropolitan areas [3,4]. Such an imbalance ultimately alters energy and water balances in urban areas, affecting airflow dynamics [5]. As a result of the rapid rate of urbanization, natural and semi-natural landscapes have been transformed into impervious surfaces, resulting in significant land-use and land-cover (LUALC) alterations [6]. Changes in LUALC and deterioration of the regular landscapes (e.g., wetlands, vegetation, and water cover) cause various eco-environmental issues, including ecosystem service (ES) loss, air, water, and soil quality degradation, and the UHI effect, especially in India and China. Therefore, mapping and assessing UHIs is essential to ensure a safe and sustainable urban environment. Understanding the UHI phenomena is also necessary for urban climate management and controlling features affecting the city climate process [7].
The UHI effect is considered an undesirable effect of urbanization, resulting from the conversion of pervious into impervious surfaces [8,9]. The surface UHI is estimated by exploiting the satellite images based on the land surface temperature (LST) [10]. Due to surface thermal properties changes, cities are experiencing the UHI effect and are becoming hotter than rural places [11,12]. Variations primarily influence variations in temperature between rural and urban areas in landscape composition and configuration [13,14]. LUALC alteration due to rapid urbanization has been stated as one of the dominant drivers of global climate change [15]. The authors in [16] conducted a study in Australia and observed an increase in the UHI by 1.20 ± 0.20 • C from 2001 to 2014, associated with a 14.93% increase in built-up areas. The authors in [17] stated that alteration in LUALC is a prime factor of global warming, and the UHI is expected to increase by 0.27 • C per century. Previous studies have considered social and ecological factors significantly affecting UHI phenomena [18,19]. Geographic information systems (GIS) and remote sensing are extensively utilized to determine how socio-ecological variables affect UHI occurrences [20,21]. These socioecological variables include vegetation indices, water indices, and impervious indices. These indices based on remote sensing have been routinely utilized to evaluate UHI's spatial variability [20,22]. Although the abovementioned indices have been utilized previously, no work has mapped and assessed UHI phenomena using machine learning (ML) and deep learning (DL) algorithms. Thus, some accountable research gaps have been identified from a review of the existing literature.
First, previous research on such mapping and modeling was carried out in cities in tropical and subtropical regions, and very few studies have considered hot desert climate areas. The authors in [23] introduce a novel technique for integrating remote sensing data with the available datasets. The methods highlight substantial insights to the planners and policymakers to suggest better sustainable planning. The authors in [24] map the air temperature spatial pattern over a 1 km hourly rate. The study shows good accuracy of UHI at nighttime and a pronounced lower level in a hilly region. It also shows a fall in air temperature compared to the urban fringe in metropolitan areas. Research work in [25] evaluates the effects of UHI using the artificial neural network. The research develops a technique to evaluate the overall risk concerning UHI and minimize the damage to residents' health. A comprehensive review is presented in [26]. An integrative framework with the help of ML is analyzed. The research discusses urban planning and provides efficient management exploiting the ML technique.
A theoretical approach is presented in [27] to minimize the effects of extreme UHI. An ML approach is utilized to best describe the simulation results. An ML-based prediction of LUALC is used to map the surface temperature changes in Ahmedabad City, India [28]. ANN-based Cellular Automata (CA) model is utilized, and the prediction of the future LUALC shows an accuracy of 89.2%. A broad review of UHI in the sense of thermal data is provided in [29]. The authors include the datasets, concepts, applications, and methodologies to highlight the contribution, focusing on studying the progress on multi-sensor image optimization, computational analysis, image fusion, and DL with the development of novel data metrics. Methods of ML application for predicting the UHI and digital analysis are provided in [30].
Cities in hot desert areas have experienced massive population growth and urbanization. Therefore, it is essential to understand the UHI effect for climate mitigation. Second, in previous studies, UHI phenomena have been modeled from the LST but have not been modeled considering independent factors affecting UHIs. Third, cities in the hot desert climate regions have experienced dramatic urban expansion and have faced serious environmental problems related to heat waves and extreme temperatures. Therefore, it is crucial to comprehend and map out UHI phenomena for UHI mitigation and sustainable urban  [23][24][25][26][27][28][29][30], the proposed study utilized ML algorithms to map a UHI from remote-sensing-based indices for 2021. In context to the contribution and novelty, in this work, we have tried to show the urban heat island pattern based on the remote sensing indices and the relationship with biophysical indices.

Background
Cities and urban regions have a characteristic local climate that differs greatly from that of the surrounding area. Elevated temperatures, polluted air, and poor ventilation are typical characteristics of densely built inner cities [21][22][23]. In cities, the effect of high temperatures can be amplified by local effects: restricted wind circulation due to dense development, the lack of shade and green spaces, the absorption of the incident solar radiation by the many sealed surfaces, as well as waste heat from industry, buildings, and transport, contribute to the UHI. UHI increasing daytime heating and nocturnal heating/cooling is significantly reduced [24][25][26][27][28].
Accordingly, the maximum temperature differences between the core cities and the surrounding rural areas can be seen up to 10 • C [10][11][12][13]. The so-called heat island effect can be found in most cities worldwide and smaller cities with fewer than 100,000 inhabitants [13][14][15]. The UHI effect is one of the most significant man-made changes related to near-surface climate [29][30][31][32][33]. The heat island intensity is the size of the temperature difference, which correlates positively with the size of the city, although there are also differences between cities on individual continents. The heat island is noticeable, especially when walking between the city and the surrounding area in the morning or late at night [34][35][36][37][38]. The increase in air temperatures between the city and the surrounding area is not only limited to the summer but is valid for the whole year. The heat island has been increasing in recent years and is more recognized by the planning authorities as an important urban planning factor and integrated into their planning processes. It is increasingly becoming the focus of discussions about climate change. The UHI was modeled for this study using two popular ML algorithms: bagging and random subspace. The details of these ML algorithms are discussed in the following.

Bagging
Bagging, proposed by the author in [39], is a widely used ML algorithm broadly categorized within the group of bootstrap sampling techniques. In the previous literature, it has been well-documented that bagging can be successfully applied to predict various environmental issues accurately [40,41]. It is applied by constructing decision trees integrating different subsets combined within the final output [40]. This machine-learning approach enhances alignment accuracy by reducing inconsistency errors [37]. The bagging algorithm has received much attention in landslide exposure [22,37], drought vulnerability [42], air quality modeling, land surface temperature modeling, soil erosion moisture mapping, etc. However, the application of the bagging ML algorithm in the field of UHI modeling remains limited and has not been explored widely to date.

Random Subspace
Stochastic subspace is one of the popular meta-classifier techniques, which has been widely utilized in environmental and regression problems [43,44]. According to the authors in [45], several training datasets are produced from the random subspace for base training algorithms. Multiple features are utilized in this meta-classifier, while other algorithms use only instance space [46]. Thus, this ML algorithm is considered efficient in terms of the many redundant features and over-fitting problems experienced by others. This algorithm has been widely applied in various environmental modeling contexts [47,48] and air quality prediction [11]. However, the application of this machine algorithm in modeling the UHI effect has not yet been explored.

Investigation Zone
In Saudi Arabia, Jeddah is a populated city that is growing rapidly. It lies on the Red Sea shore and has grown quickly in the last 40 years. Jeddah was chosen for this investigation because of its interest in establishing a connection between LST and town biophysical indices. Jeddah, Saudi Arabia, is the country's second-largest city on the Red Sea's western coast. The population density in Jeddah is 5400 people/km 2 , and the city covers roughly 1600 km 2 . It is a dry, hot desert, and during the summer, its temperature rises to 40 • C. Approximately 45 mm of mean rainfall is recorded in the megacity.

Data Gathering Centers
This research used multi-temporal satellite images to develop LUALC, SEV, and UHI maps. The United States geological survey (USGS) provides information to develop the LUALC maps for 2000, 2010, and 2022. There was usage of LANDSAT 5 TM images from 2000 and 2010, as well as LANDSAT 8 OLI imagery from 2021, as shown in Figure 1. The detail of the methodological framework is presented in Figure 2.

Investigation Zone
In Saudi Arabia, Jeddah is a populated city that is growing rapidly. It lies on the Red Sea shore and has grown quickly in the last 40 years. Jeddah was chosen for this investigation because of its interest in establishing a connection between LST and town biophysical indices. Jeddah, Saudi Arabia, is the country's second-largest city on the Red Sea's western coast. The population density in Jeddah is 5400 people/km 2 , and the city covers roughly 1600 km 2 . It is a dry, hot desert, and during the summer, its temperature rises to 40 °C. Approximately 45 mm of mean rainfall is recorded in the megacity.

Data Gathering Centers
This research used multi-temporal satellite images to develop LUALC, SEV, and UHI maps. The United States geological survey (USGS) provides information to develop the LUALC maps for 2000, 2010, and 2022. There was usage of LANDSAT 5 TM images from 2000 and 2010, as well as LANDSAT 8 OLI imagery from 2021, as shown in Figure 1. The detail of the methodological framework is presented in Figure 2.

Estimation of UHI
Previous research has broadly used the UHI value to assess the UHI effect quantitatively. Many studies have been done in mapping and assessing UHI phenomena using machine learning (ML) algorithms [49][50][51][52][53][54][55][56]. The UHI value is calculated based on the LST for any area from which the UHI intensity can be estimated. A higher LST value indicates high UHI intensity and vice versa [23]. In many previous studies, the UHI value has been used to estimate the UHI effect, but its application in the hot desert climate areas remains limited. For example, Naim and Al Kafy [24] studied the association between UHI and LUALC in Chattogram City, Bangladesh. To calculate the UHI, where T s refers to the LST, T m refers to the mean LST, and SD refers to the standard deviation of the area.

Estimation of UHI
Previous research has broadly used the UHI value to assess the UHI effect quantitatively. Many studies have been done in mapping and assessing UHI phenomena using machine learning (ML) algorithms [49][50][51][52][53][54][55][56]. The UHI value is calculated based on the LST for any area from which the UHI intensity can be estimated. A higher LST value indicates high UHI intensity and vice versa [23]. In many previous studies, the UHI value has been used to estimate the UHI effect, but its application in the hot desert climate areas remains limited. For example, Naim and Al Kafy [24] studied the association between UHI and LUALC in Chattogram City, Bangladesh. To calculate the UHI, where refers to the LST, Tm refers to the mean LST, and SD refers to the standard deviation of the area.
We computed the urban heat island from the land surface temperature and extracted data from LANDSAT images. However, there are many reasons for increasing temperatures, such as solar radiation. However, in this work, we have tried to show the urban heat island pattern based on the remote sensing indices and the relationship with biophysical indices. In the city's central part, the urban heat island effect is relatively lower, most likely due to water bodies, as the water bodies have an impact on the temperature. Due to the limitation of this work, we could not incorporate other data, such as solar radiation.

SEV Affecting UHI
The LST of any area is largely determined by the biophysical composition [23,25]. In urban areas, the spatial distribution of LST is not homogeneous but presents spatial heterogeneity. The spatial heterogeneity of the LST across a city typically varies due to variations in surface properties [26]. In previous pieces of literature, it has been well-recognized that the composition of the landscape, such as impervious surfaces, water bodies, and green spaces (vegetation cover), largely determines the spatial distribution of LST in cities [27]. Therefore, it is essential to incorporate the SEV as an independent factor affecting the UHI in an area. This study used seven SEVs as independent factors affecting UHI phenomena. These SEVs were broadly categorized into four groups: impervious surface We computed the urban heat island from the land surface temperature and extracted data from LANDSAT images. However, there are many reasons for increasing temperatures, such as solar radiation. However, in this work, we have tried to show the urban heat island pattern based on the remote sensing indices and the relationship with biophysical indices. In the city's central part, the urban heat island effect is relatively lower, most likely due to water bodies, as the water bodies have an impact on the temperature. Due to the limitation of this work, we could not incorporate other data, such as solar radiation.

SEV Affecting UHI
The LST of any area is largely determined by the biophysical composition [23,25]. In urban areas, the spatial distribution of LST is not homogeneous but presents spatial heterogeneity. The spatial heterogeneity of the LST across a city typically varies due to variations in surface properties [26]. In previous pieces of literature, it has been wellrecognized that the composition of the landscape, such as impervious surfaces, water bodies, and green spaces (vegetation cover), largely determines the spatial distribution of LST in cities [27]. Therefore, it is essential to incorporate the SEV as an independent factor affecting the UHI in an area. This study used seven SEVs as independent factors affecting UHI phenomena. These SEVs were broadly categorized into four groups: impervious surface (NDBI, NDBaI), vegetation (NDVI, SAVI), water (MNDWI, NDWI), and heat (LST). The details of the parameters are discussed below.
The parameters related to the impervious surfaces are very sensitive to UHI phenomena [22,28]. Areas with a high proportion of impervious surfaces are typically characterized by high UHI intensity [29,30]. Therefore, considering the imperviousness of surfaces is crucial to understanding the relationship between impervious surfaces and UHI intensity. This study utilized two popular indices-NDBI and NDBaI-to determine their impact on the UHI intensity. This study applied the below equations to calculate NDBI and NDBaI: The mid-infrared band (MIR) refers to bands 5 and 6 for LANDSAT TM and OLI 8, while NIR refers to bands 4 and 5. NDBI values vary from −1 to +1, NDBI → 0 indicates the foliage cover, −ive water bodies, and +ive impermeable planes.
Regarding the vegetation cover, NDVI and SAVI were used to determine the impact on UHI intensity. These parameters have been widely used in the previous literature to observe the influence of vegetation cover on UHI intensity. It has been shown that low UHI intensity occurs in places with high vegetation cover and vice versa [31,32]. Recent studies have also found that vegetation cover has a significant cooling effect in urban areas [33,34]. Therefore, vegetation indices can be very helpful parameters for understanding the UHI intensity in a given area [22]. NDVI and SAVI were used as vegetation parameters, and these are calculated as follows: LANDSAT TM and OLI 8 are the 5th and 6th MIR bands. LANDSAT TM and OLI 8 s near-infrared and red bands are N IR and R, respectively.
Water bodies or blue spaces also significantly impact UHI intensity [35,36]. Previous research has shown that water bodies or blue spaces significantly reduce temperature [35]. Thus, areas with water coverage are characterized by low temperatures. In this study, we used two indices related to water bodies-NDWI and MNDWI-which are calculated as follows: whereas G (i.e., green band) relates to the 2nd and 3rd bands for LANDSAT 5 and OLI 8, MIR corresponds to the 5th and 6th bands for LANDSAT TM and OLI 8, and N IR corresponds to 4th and 5th bands for LANDSAT 5 and OLI 8. LST is the strongest contributing issue affecting UHI intensity [21]. It is largely influenced by the configuration and composition of the land surface in any area. For example, the imperviousness of the surface is positively associated with LST, while vegetation and water bodies are negatively associated with LST [22]. Thus, alteration of the biophysical composition largely affects the LST [22]. Pockets with high LST in urban areas are characterized by high UHI intensity [37]. Thus, LST has been considered a dominant factor affecting UHI intensity [37]. According to Siddiqui et al. [38], LST can be extracted from the thermal bands of LANDSAT imageries.
Regarding LANDSAT 5 TM and LANDSAT 8 OLI, the 6th and 10th bands retrieve LST from image [23]. Some fundamental steps must be followed to excerpt LST from the thermal bands. These steps are discussed successively in the following.
The first step in retrieving the L λ is to alter the digital number into radiance using the following equation: where L λ refers to the radiance from the sensor, L maxλ refers to the maximum radiance from the thermal band, L minλ refers to the minimum radiance from the thermal band, QCal refers to the calibrated DN values, and QCal max and QCal min are the maximum and minimum calibrated pixel values in the DN, respectively. where M L and A L refer to the multiplicative and additive scaling factors, respectively, obtained from the band, and QC al refers to the DN value. In the next step, the satellite temperature (T, in degrees Celsius) is calculated by applying the below equation: where K 1 and K 2 refer to the calibration constants of the thermal bands. Finally, the LST can be retrieved by the equation provided below: where λ represents the wavelength of the emitted radiance, T represents the satellite's temperature, and ε represents the emissivity evaluated: where P v refers to the vegetation cover proportion, calculated as: where NDV I max and NDV I min refer to the maximal and minimal normalized difference vegetation index values.

Statistical Analysis
Several statistical analyses were conducted to ascertain the overall pattern of UHI in Jeddah. The Spearman correlation was applied to evaluate the relationship between UHI and SEV. The correlation was found to be noteworthy at the 0.01 level. Ordinary linear regression (OLR) was conducted to examine the impact of SEV on the UHI. Statistical Package for the Social Sciences (SPSS), version 22, performs the statistical analyses. LUALC maps for Jeddah are shown in Figure 3 and Table 1 (dropping by about 12%), as shown in Figure 3.     Figure 4 displays the LST spatial maps for Jeddah. In particular, the northern, central-eastern, and southern areas of Jeddah are characterized by very high LST, as seen from the maps.

Impact of SEV on UHI
Spearman's correlation analysis was applied to determine the association between UHI values given in Table 3 and socio-ecological variables (SEVs), as shown in Figure 6a-f. The result makes it easy to determine the correlation between the temperature surface values and the UHI. Figure 6f shows that the LST had very strong positive correlations with UHI and Figure 6b,d shows that NDBI and NDBaI have positive parameter values. However, Figure 6a shows that UHI had strong negative correlations with MNDWI. Figure 6c,e shows that NDVI and SAVI had very weak correlations with UHI. As per the regression model, it was found that the r 2 value was 0.763, indicating that this model can be accepted to explain the impact of the SEVs on the UHI.

Modeling the UHI Based on Bagging and Random Subspace
The spatial distributions proposed by the bagging and random subspace models are presented in Figure 7. We found that large and very large UHI values characterized the

Modeling the UHI Based on Bagging and Random Subspace
The spatial distributions proposed by the bagging and random subspace models are presented in Figure 7. We found that large and very large UHI values characterized the south, north, and central-east parts of Jeddah. However, the city's west and central-west parts were characterized by low UHI values. As per the bagging model, it was observed that 5.11% and 82.32% of the megacity were characterized by high and very high UHI values, and 3.32% and 4.82% were under low and very low UHI values, respectively. The random subspace model indicated that 5.24% and 81.68% of the areas were under high and very high UHI values. The areas with low and very low UHI values comprised 3.49% and 4.75%, respectively (Table 4).

Discussion
UHI impacts have been linked to the rise in population caused by fast urbanization. [57]. The transformation of the natural landscape into impervious surface areas and the rapid increase in impervious surfaces have enhanced the UHI intensity in cities [57]. In the previous literature, it has been documented that the uplift in the impervious surface area in a city is the crucial factor affecting UHI intensity [22]. In the megacity of Jeddah, impervious surface areas had high UHI intensity, while water-covered areas had low UHI intensity, showing that landscape composition and configuration largely determine UHI

Discussion
UHI impacts have been linked to the rise in population caused by fast urbanization. [57]. The transformation of the natural landscape into impervious surface areas and the rapid increase in impervious surfaces have enhanced the UHI intensity in cities [57]. In the previous literature, it has been documented that the uplift in the impervious surface area in a city is the crucial factor affecting UHI intensity [22]. In the megacity of Jeddah, impervious surface areas had high UHI intensity, while water-covered areas had low UHI intensity, showing that landscape composition and configuration largely determine UHI intensity. According to research by [57] in Shanghai, China, construction lands had higher UHI intensity than water bodies and green grounds. Authors in [58] researched China and reported that spatial variations in UHI are largely affected by the spatial complexity of built-up areas.
According to [18], the mean LST of impervious surfaces was 3 • C higher than green areas, so the LST significantly correlated with impervious surfaces. This finding was similar to that observed in this study. A study conducted by [4,14] in Shanghai (China) found that LST was affected by landscape patterns and anthropogenic activities. The correlation analysis found that impervious surface areas positively influenced UHI intensity, while vegetation and water indices negatively influenced UHI intensity. The findings of this study were similar to those of various previous research. An author in [57] researched Hangzhou, China, and discovered that impervious surfaces increase the UHI, while vegetation and water bodies decrease it. Therefore, sustainable landscape planning is necessary to mitigate the UHI intensity in cities, particularly in hot desert areas.
Cities have experienced dramatic increases in LST in the past [22,58]. Therefore, effective actions must be implemented to mitigate UHI effects in such cities. Various measures have been suggested in the previous literature to mitigate UHI phenomena, such as urban greening (e.g., development of green roofs and green facades), building design, and changes in building materials [59,60]. In the previous studies, three stakeholder groups/actions have been considered significant in mitigating the UHI effect in cities: government and political engagement at the local and national levels, city planners and city administrations, and public participation [61][62][63]. Furthermore, private sectors play significant roles, as these are built privately.

Conclusions
The effect of UHI was modeled and assessed in Jeddah, a megacity under a hot desert climate in Saudi Arabia, via two widely used machine learning approaches-bagging and random subspace. A few notable research findings were captured from the study results. There was a dramatic upsurge in built-up areas in Jeddah from 2000 to 2021. Over the 22 years, there has been an uplift in the urban areas and vegetation cover by 80% and 32%. No previous work has been done in this regard to show the effect of UHI. Secondly, from 2000 to 2021, the LST in Jeddah increased by 10.22% (from 38.47 • C to 42.4 • C). Third, the UHI in the "very high" category increased in Jeddah from 2000 to 2022. The correlation results indicate that impervious surfaces positively affect the UHI, while vegetation and water cover negatively impact the UHI. No such analysis is found in the previous work. As per the bagging and random subspace models, we found that more than 80% of the total area was identified as having very high UHI values. It is projected that the findings of the study will be of great assistance in the process of formulating methods to mitigate UHI exploiting the ML. Statistical analysis and a complete simulation highlighted the presented research's importance in mapping and evaluating the UHI. The importance of various surface temperature parameters is studied, and the correlation concerning each parameter is investigated. The presented research proposed a quantitative and qualitative analysis with a computational-intensive ML approach.

Conflicts of Interest:
The author declares no conflict of interest.