Multi-zone optimisation of high-rise buildings using artificial intelligence for sustainable metropolises. Part 1: Background, methodology, setup, and machine learning results

Abstract Designing high-rise buildings is one of the complex tasks of architecture because it involves interdisciplinary performance aspects in the conceptual phase. The necessity for sustainable high-rise buildings has increased owing to the demand for metropolises based on population growth and urbanisation trends. Although artificial intelligence (AI) techniques support swift decision-making when addressing multiple performance aspects related to sustainable buildings, previous studies only examined single floors because modelling and optimising the entire building requires extensive computational time. However, different floor levels require various design decisions because of the performance variances between the ground and sky levels of high-rises in dense urban districts. This paper presents a multi-zone optimisation (MUZO) methodology to support decision-making for an entire high-rise building considering multiple floor levels and performance aspects. The proposed methodology includes parametric modelling and simulations of high-rise buildings, as well as machine learning and optimisation as AI methods. The specific setup focuses on the quad-grid and diagrid shading devices using two daylight metrics of LEED: spatial daylight autonomy and annual sunlight exposure. The parametric model generated samples to develop surrogate models using an artificial neural network. The results of 40 surrogate models indicated that the machine learning part of the MUZO methodology can report very high prediction accuracies for 31 models and high accuracies for six quad-grid and three diagrid models. The findings indicate that the MUZO can be an important part of designing high-rises in metropolises while predicting multiple performance aspects related to sustainable buildings during the conceptual design phase.


Introduction
High-rise buildings began to emerge at the end of the 19th century to provide extra floor space in limited urban plots (Al-Kodmany and Ali, 2013). In the 20th century, population growth and urbanisation trends increased in the world (Cohen, 2006). According to a United Nations report (UN, 2019), 30% of the world's population lived in urban areas in 1950. This percentage increased to 55% in 2018, and the projection by 2050 was 68%. An increase in population and the percentage of those living in urban areas will add 2.5 billion people to the world's urban population by 2050. Moreover, there were 33 megacities with more than 10 million inhabitants in 2018. The projection indicates that this number will increase to 43 by 2030. Because of population growth and urbanisation trends, the number and height of completed high-rise buildings have also increased over time (CTBUH, 2020).
Owing to a rapid and global increase in floor areas, the final energy use of buildings reached approximately 128 exajoules (EJ) in 2019, while it was 118 EJ in 2010 (IEA, 2020). An increasing number of highrise buildings contribute significantly to energy use as they consume more energy with an additional effect of CO 2 emissions compared with low-rise buildings (Godoy-Shimizu et al., 2018). Another consequence of constructing more and taller high-rise buildings is the increase in building density in urban areas (Lee et al., 2017). To achieve the targets of the International Energy Agency for sustainable development scenarios, architects and engineers should consider the following challenges while designing high-rise buildings for metropolises: • Dense urban areas cause performance variations between ground and sky levels in high-rise buildings (Samuelson et al., 2016). • Sustainable buildings require the integration of multiple performance aspects, such as natural daylight, energy consumption, and comfort (Evins, 2013).
During the design process, the conceptual phase requires a high awareness of decisions because it affects the overall performance of the buildings (Sariyildiz, 2012). Owing to the complexity of design problems, optimisation algorithms are widely used to investigate sustainable design alternatives during the conceptual design phase (Asadi and Geem, 2015). Because the performance aspects of sustainable buildings require simulations, the optimisation process entails a significant amount of time. The common approach is to integrate machine learning (ML) techniques to predict performance aspects to support swift decision-making with optimisation algorithms in computationally expensive design problems (Westermann and Evins, 2019). Optimising high-rise buildings in dense urban districts is more challenging because various floor levels require different design decisions owing to performance variations in ground and sky levels. In addition, these decisions are based on simulations, which require expensive computational time, and optimisation processes that need to cope with an enormous number of design parameters. Therefore, new methods are required to optimise the multiple floor levels of high-rise buildings when proposing sustainable alternatives within a limited time.
This paper introduces a novel multi-zone optimisation (MUZO) methodology of optimising high-rises by considering multiple floor levels as different optimisation problems to investigate sustainable alternatives during the conceptual phase. The proposed methodology includes parametric modelling and simulations of high-rise buildings, an artificial neural network (ANN) (an ML technique based on a network of neurones) for performance prediction, and computational optimisation with a decision framework. Part 1 of the MUZO study focuses on solving computationally expensive simulations while presenting the background, methodology, and setup for case studies that contain two types of shading devices: quad-grid and diagrid. The building performance model focuses on the daylight metrics of Leadership in Energy and Environmental Design (LEED) v4.1, namely, spatial daylight autonomy (sDA) and annual sunlight exposure (ASE), for each scenario. The results present the learning scores of 40 surrogate models developed for each performance aspect using advanced ANN techniques. Part 2 of the MUZO study deals with the optimisation challenge while explaining the problem formulations to optimise the sDA and ASE using the 40 predictive models presented in this paper. Considering the near feasibility threshold adaptive penalty function, the optimisation process employs three algorithms, namely, self-adaptive differential evolution with ensemble of mutation strategies using the Optimus plug-in (Cubukcuoglu et al., 2019), covariance matrix adaptation with evolution strategy, and radial basis function optimisation using the Opossum plug-in (Wortmann, 2017). After validating the method by comparing the MUZO results with the regular high-rise scenarios, the paper discusses the advantages and disadvantages and underlines the potential and future research directions. In this paper, Section 2 presents the state of the art, Section 3 introduces the MUZO methodology, Section 4 explains the setup, Section 5 reports the ANN results, and Section 6 concludes the paper.

State of the art for AI in the design of sustainable high-rises
This section presents previous studies focusing on performance aspects related to sustainable high-rise buildings in three subsections: ML, computational optimisation, and ML with optimisation applications. Subsequently, the original contribution of the MUZO methodology is summarised.

Machine learning applications
Over the last two decades, ML techniques have been used to address the computational burden of simulations of high-rise buildings. An early study discussed regression models to predict energy performance (Lam et al., 1997). After a decade, Ko et al. (2008) focused on the daylight factor as part of the LEED v2.2 for building shape, layout, and façade parameters. In the following years, Li and Li (2015) examined the annual ventilation rate, in addition to energy performance. Tian et al. (2020) developed models for energy-efficient heating design in office buildings considering conventional modelling processes and an innovative two-step method. Recently, researchers began to use sensitivity analyses with ML techniques to decrease the design complexity (Chen et al., 2017a;Chen et al., 2017b). Since the early years, various aspects have been used to predict the performance of high-rises. However, none of these studies focused on predicting the performance of an entire building. The general approach focused on a single-floor level (or part) of the high-rise model.

Computational optimisation applications
High-rise buildings are one of the complex design tasks of architecture because various decisions are required for the shape, layout, and façade parameters considering multiple performance aspects. Therefore, different methods have been examined to address the complexity of these buildings. Considering resource production systems, Imam and Kolarevic (2016) proposed a concept to optimise energy, food, water, and land in high-rises. In addition to producing energy, two studies focused on energy performance (Chen et al., 2019a;Chen et al., 2019c), one study examined the energy demand with adaptive thermal comfort (Giouri et al., 2020), and another considered techno-economic aspects . In addition to façade parameters, Gan et al. (2019) investigated the geometric, position, and functional attributes to optimise energy efficiency. Despite the promising results and design alternatives, two studies (Chen et al., 2019c;Li and Li, 2015) considered the surroundings of the plots being studied, one study compared various optimisation algorithms (Chen et al., 2019a), and none of them replicated the heuristic optimisation process. Consequently, the general approach, as in ML applications, focuses on a single floor level (or part) of the high-rise model.

Machine learning and computational optimisation applications
In the high-rise domain, early examples of predictive models focused on evaluating the design performances. Some of the recent studies considered predictive models with optimisation algorithms because of the potential to determine optimal solutions in a short time. Early examples used regression models, support vector machines (SVMs), and multi-objective optimisation Yang, 2017, 2018). In addition, Chen et al. (2019b) conducted a sensitivity analysis to decrease the design complexity. Despite the fast evaluation potential of using ML with optimisation, the aforementioned studies considered specific floor levels of high-rise models.

Original contribution of the research
The MUZO methodology is proposed to optimise the entire shape of a high-rise building to investigate sustainable design alternatives while addressing the computational burden. Because dense urban areas result in performance variations between the ground and sky levels, a unique optimisation strategy is required. Therefore, the MUZO methodology suggests dividing the high-rise building into equal subdivisions (or zones), which can be considered as different design problems. In addition, this paper suggests an advanced model selection to provide high prediction accuracies, as well as a decision framework by comparing the algorithms and replicating the optimisation process. Thus, the MUZO methodology aims to determine the optimal design solution by achieving sustainable high-rise alternatives for dense urban districts.

Multi-zone optimisation methodology
Previous studies demonstrated that the optimisation of high-rise buildings can focus on multiple performance aspects that may require various digital platforms. Considering the flexibility of integrating different software, Fig. 1 shows the phases of the MUZO methodology. The parametric high-rise model focuses on generating design alternatives with performance evaluations in phase 1. ML for surrogate models addresses the computational burden of multiple performance aspects related to sustainable buildings in phase 2. Finally, the computational optimisation and decision-making phase investigates the desirable performance for the entire high-rise building.

Parametric high-rise model
The first phase of the methodology considers the parametric model, which involves generating configurations of the high-rise building using design variables. Preparing the model requires three steps: developing the parametric high-rise model to generate design alternatives, identifying zones according to the surroundings of the plot being studied, and integrating performance aspects.
Step 1 (Generating high-rise alternatives): Initially, creating the context around the plot area is the first step during the development of the parametric high-rise model. This is because surroundings with different densities may require various design strategies and parameters in the conceptual phase. When the built environment is modelled, a parametric high-rise model that involves decision variables related to the building shape, façade design, layout, and operation is generated. Few tools are available for use in this step, i.e. Generative Components (Aish, 2003;Bentley, 2003), Dynamo (Dynamo, 2011;Keough, 2011), and Grasshopper 3D (GH) (Rutten, 2015). The MUZO methodology can include all of these parameter types and available tools during form generation.
Step 2 (Identifying zones): As mentioned previously, dense urban districts result in performance variances between the ground and sky levels. Therefore, the second step of parametric modelling identifies the zones, which involves subdividing the entire building into smaller pieces to focus on various floor levels as different optimisation problems. The number of zones is a predefined variable that depends on the density of the plot under study. For instance, in urban areas with low-density, highrise buildings can be divided into three zones, whereas this amount may increase to five in the mid-density scenarios. For high-density scenarios, more than five zones can be used for an extensive investigation of the effects of the surrounding at various levels. After determining the number of zones, the next step is to identify floor levels in each zone, because performance aspects, such as daylight and solar radiation, require floor surfaces for the simulation to be conducted. While a large number of selected floor levels requires an extensive simulation time, fewer selected floor levels may result in decision-making with limited awareness of the entire building's performance. Fig. 2 shows zoning scenarios for low-density, mid-density, high-density urban areas and different selections of floor levels.
Step 3 (Integrating performance aspects): The final step of the first phase in MUZO methodology involves evaluating the high-rise model using the performance aspects of sustainable buildings. The state of the art considers a limited number of performance criteria because of two reasons. First, considering multiple aspects requires extensive computational time for simulation-based evaluation. Second, the complexity of the design task increases owing to multiple performance aspects. In addition, conflicting performances introduce an additional challenge during the conceptual phase (Kirimtat et al., 2016). The proposed MUZO methodology can integrate any performance criteria to determine sustainable high-rise alternatives. Challenges on computational burden and complexity are addressed in the subsequent phases.

Machine learning for surrogate models
When the parametric model and simulations are set, various design alternatives can present the simulation results to gain awareness of the performance for different design scenarios. ANN models, which can swiftly evaluate the building performance, are used in the second phase of MUZO, which requires three steps: Step 1 (Collection of samples): Sampling, which is the first step of ML in MUZO, is an essential process of surrogate modelling. With a specific distribution in the data, ML algorithms can learn and predict data with high accuracy. Recently, Westermann and Evins (2019) presented two types of sampling: static sampling (e.g. Latin hypercube sampling (Loh, 1996)) and adaptive sampling (i.e. sequential space-filling (Crombecq and Dhaene, 2010)). The selection of the sampling method depends on the category of surrogate models that can be either global or local models. All sampling methods using a global modelling approach can be used in the MUZO methodology. On the effect of the sample size, Chatzikonstantinou and Sariyildiz (2016) discussed that the extension of the dataset is frequently beneficial. In addition, Roman et al. (2020) presented the most commonly used sampling methods in buildingperformance simulations. Among these sampling methods, one of the most common approaches is where n s is the sample size and n i is the number of independent variables. Because each subdivided part corresponds to a different optimisation model, the MUZO methodology proposes a unique sample collection framework (Fig. 3). After subdividing the high-rise building into zones, each subdivision is used to generate its own samples. When the process is complete, each generated sampling file, which belongs to one zone, can be used in different surrogate models.
Step 2 (Developing ANN models): ANNs, which correspond to the second step of the ML phase, are widely used methods in ML domains to predict various aspects of building performance. This because ANNs can manage large sample sizes for many variables and predict the performance with high accuracies (Westermann and Evins, 2019). Various ANN types, such as feedforward neural networks (FNNs) and radial basis function neural networks (RBFNNs), have been used to estimate building performance (Roman et al., 2020). In this paper, the development of ANN models consists of two stages: Stage 1 (Neural net with dropout): The development of ANNs begins with reading and scaling the data, which frequently contain different parameters with units and metrics. After the reading process, scaling is performed to obtain all inputs and outputs within the same boundaries using several scaling methods (Grus, 2019). For min-max scaling, the data is normalised as where x ′ is the scaled value, x is the original value and σ is its standard deviation. Before selecting the scaling method, the problem type must be identified, which can be either classification or regression. While classification problems focus on predicting a class label, regression problems consider predicting a quantity. In addition, splitting data is crucial for identifying training and test sets using a rate, e.g. 0.2. or 0.25, according to Westermann and Evins (2019). When the ANN model is finalised, the architecture contains various layers (Fig. 4).
Each neurone in the hidden layers receives the weighted sum of inputs to pass the result through an activation function. In the output where f is the activation function, b is the bias, w ij is the i th layer of the j th weight, and x i is the input vector of the i th layer. When using rectified linear units (ReLU), each neurone is activated as follows: Different functions, e.g. sigmoid, softplus, and tanh, activate the neurones with various equations that may affect learning performance. The forward process of ANNs can predict the solution using Eq. (3). To achieve high accuracy, a backward process is necessary to determine the best values for the weights and biases. Hence, backpropagation (Hecht-Nielsen, 1992) involves a loss function and an optimisation algorithm. Researchers have widely used gradient descent (GD) (Ruder, 2016), stochastic gradient descent (SGD) (Bottou, 2010), Adam (Kingma and Ba, 2014), and RMSProp (Mukkamala and Hein, 2017) algorithms for optimisation. For the loss functions of classification problems, crossentropy (De Boer et al., 2005) and Kullback-Leibler divergence (Kullback and Leibler, 1951) can be considered. In regression problems, researchers use the mean squared error (MSE) in Eq. (5) (Mood, 1950), mean absolute error (MAE) in Eq. (6) (Willmott and Matsuura, 2005), and R-squared (R 2 ) value in Eq. (7) (Draper and Smith, 1998): where y i is the predicted data, x i is the observed data, x is the mean of the observed data, and n is the sample size. Various loss functions can be used to validate the accuracy of the trained model . In addition, the dropout technique (Srivastava et al., 2014), which randomly drops units from the neural network with their connections, avoids overfitting. The MUZO methodology can involve multiple loss functions and dropouts at a rate between 0 and 1.

Stage 2 (Grid search with k-fold cross-validation (CV))
: Developing a surrogate model is a black-box process. One of the reasons is that multiple hyperparameters, which are the parameters of the ANN (e.g. neurone size and batch size), are involved in the learning process. Various combinations of these factors affect learning and prediction accuracies. Therefore, a model validation technique is required to evaluate the accuracy of predictions. K-fold CV (Stone, 1974) is a well-known method for accurate estimations that randomly divides the original sample into k equal-sized subsamples. While one subsample is maintained as the test set, the remaining k − 1 subsamples are the training sets. The standard deviation (Std) indicates the difference for each error for the k-fold CV: where N is the number of observations, {x 1 , ..., x N } are the observed values, and x is the mean value of these observations. The aim is to determine satisfactory results for the MAE, MSE, and R 2 , and achieve small Std values for each accuracy metric.
Step 3 (Selecting the best model): The final step of the ML phase involves the selection of the best model using the results of the grid search. In each zone, the criteria are the highest R 2 with low MAE, MSE, and Std for this process. Using the weights and biases of the final ANN, the predictive models are ready for use in the optimisation process.

Computational optimisation and decision-making
The final phase of the MUZO methodology, which consists of three steps, involves determining the design parameters for sustainable highrise alternatives. The first step considers the development of predictive models using the ML outputs. The second step is selecting the problem formulation. Finally, the proposed decision-making framework reveals the optimised design solution by completing the MUZO methodology.
Step 1 (Defining predictive models): The development of predictive models requires weight and bias results collected from each ANN model. Subsequently, the collected results are transformed into matrices considering the input vector and neurone sizes for each layer to initiate the first step of the optimisation phase. The definition of activation with n layers is as follows.
where y is the performance criterion to be predicted, x is the input vector, w n is the n th weight, b n is the n th bias, and f n is the n th activation function. For any given x, the model estimates the performance results. Having weights and biases as recorded data suggests the possibility of using predictive models in various platforms, such as C#, C++, Python, and GH, during the optimisation process.
Step 2 (Selecting formulation): When the predictive models are ready, the next step is selecting the problem formulation for the optimisation process. Previous studies on building optimisation used single objective, weighted summation, multi-objective, many objectives, and constrained optimisation problems (Ekici et al., 2019a). For n parameters, the definition of the generalised problem formulation is where an integer k > 0 is the number of objective functions, S is the entire search space, p is the number of inequality constraints and m − p is the number of equality constraints. For the maximisation problem, the transformation of the function can be The trade-off between building performance affects the selection of the formulation. For one aspect, e.g. maximising daylight (Mangkuto et al., 2018), the single-objective formulation is convenient for the optimisation process. Another scenario may have two conflicting objectives, such as maximising the sDA and minimising the ASE. If one of these aspects requires a threshold according to the building standards, the formulation can be a single-objective constrained optimisation (Vera et al., 2017). Otherwise, the multi-objective (Yi, 2019) or weighted summation (Wagdy et al., 2015) approaches can be alternatives. For more than three objectives, the options are multi-objective constrained or many-objective formulations (Pilechiha et al., 2020).
Step 3 (Optimisation): The final step of phase 3 involves exploring the optimal alternative for each zone. In the optimisation domain, heuristic algorithms are employed to solve complex problems by mimicking behavioural patterns and social phenomena observed in nature (Del Ser et al., 2019). Additionally, in the domain of sustainable building design, heuristics are widely used because promising alternatives are discovered in a reasonable time frame (Evins, 2013). Despite their advantages, these algorithms do not guarantee an optimal solution. According to the no free lunch (NFL) theorem (Wolpert and Macready, 1997), a global algorithm that can determine the optimal result for all problems does not exist. In architectural design, the subject is more dynamic than the benchmark problems. Each design scenario is a specific problem owing to the variances in the surroundings. In addition, the surroundings of the different cities have diverse climate types (e.g. Mediterranean climate in Izmir, Oceanic climate in Amsterdam). Therefore, architects can propose various alternatives for the same design problem (i.e. high-rise buildings) because concerns and the required strategies are different. Thus, we may conclude that "the global optimal of each design problem is unexplored". Therefore, the optimisation process of the MUZO methodology involves comparing various algorithms with replications for decisionmaking (Fig. 5). Single-objective optimisation algorithms report the best solution that can be used as the final design alternative. In multiobjective or many-objective optimisation problems, various postoptimisation analysis methods can be used to evaluate the quality of the single best solution during the decision-making process (Si et al., 2019), e.g. weighted summation approach (Cevizci et al., 2019), TOPSIS (Kim et al., 2013), analytic hierarchy process (Goussous and Al-Refaie, 2014), minimum distance to the utopic point (Riquelme et al., 2015), auto-associative models (Chatzikonstantinou and Sariyildiz, 2017).

Setup of the case study
This section explains the setup for evaluating the MUZO methodology considering a hypothetical dense urban district. The first subsection describes a parametric high-rise building with variables for the two façade types. The subsequent subsection presents the selected performance aspects of the simulation setup. Finally, surrogate modelling introduces the details of the sample collection and the development of ANN models.

Parametric high-rise model and the built environment
The hypothetical district had 25 plots in GH, each with a 2500 m 2 footprint with building heights between 50 and 150 m, which were generated randomly. The focus of the study was the central plot with 60 floors, 2100 m 2 net one-floor area, 150,000 m 2 gross floor area, and 50 × 50 m façade length. Fig. 6 shows the subdivisions (zones) of the building beginning from ground-level zone 1 (Z1) to sky-level zone 10 (Z10), as well as selected floor levels (second and fifth) of every zone for simulations. Table 1 presents the façade, shape, and glazing parameters  The first façade design focused on horizontal and vertical shading devices using the number, length, and rotation of the devices with four glazing types. The second design considered diagonal shading devices involving the number, length, and rotation of first and second-order diagonals with the same glazing types. The design setups in Figs. 7 and 8 were used for each orientation, i.e. north (N), south (S), east (E), and west (W). Including floor-to-floor height and rotation parameters, the search space for the quad-grid scenario in one zone had 2.893399115e+28 design alternatives with 26 parameters, whereas this number was 3.054543465e+23 for the diagrid scenario with 22 parameters. Floor-to-floor height and rotation parameters of the lower zones affected the height and rotation of the higher zones. Therefore, the total amount of the design parameters in one zone increased from Z1 to Z10 (Fig. 9). Consequently, the quad-grid design had 26 parameters in Z1 and 44 variables in Z10, while the diagrid design had 22 parameters in Z1 and 40 variables in Z10.

Performance metrics and simulation setup
We investigated two of the LEED v4.1 metrics for the case buildings, namely, the sDA and ASE, introduced for the green building certification program (USGBC, 2014). Both metrics are commonly used for various building functions to achieve sustainable design solutions (Bauer et al., 2017;Korsavi et al., 2016;Nezamdoost et al., 2018;Nezamdoost and Van Den Wymelenberg, 2017;Sherif et al., 2016;Wagdy et al., 2017). Recently, Illuminating Engineering Society (IES) presented definitions for sDA and ASE metrics (IES, 2013). The sDA evaluates the annual efficiency of ambient daylight levels in interior spaces. The calculation method results in the percentage of an analysis area with a minimum daylight illuminance level for specific hours. In contrast, the ASE indicates the potential visual discomfort in interior work environments.  The method results in the percentage of direct sunlight that exceeds a defined illuminance for the specified number of hours for the analysis area.
The simulation setup focused on the second and fifth floors of each zone (Fig. 10). The parametric model used the Diva plug-in v4.0.3.1 (Jakubiec and Reinhart, 2011) developed for GH to simulate sDA and ASE metrics with an EnergyPlus weather data file for Izmir City with a dry summer Mediterranean climate, latitude: 38.423733 and longitude 27.142826. Each zone was simulated using two analyses of planes with 180 sensor points each and was 0.8 m above the finished floor. Four glazing types, listed in Table 1, were separately used as decision variables for each orientation. As suggested by (IES, 2013), the setup simulated sDA 300/50% and ASE 1000,250h for 10 h of occupation between 8 am and 6 pm. A single simulation task of one zone involved 360 sensors. For the radiance parameters listed in Table 2, the simulation process used values similar to those in previous studies because of the high computational cost. The setup was used to simulate the daylight performance of 7200 sensor points for the overall building evaluation of two scenarios.

Surrogate modelling
The surrogate modelling began with sampling collection, which considered 1000 samples for each zone using Latin hypercube sampling (Loh, 1996) and Eq. (1). One simulation required 4 min for two floors with the radiance parameters provided in Table 2. A computer with an Intel I7 4 core processor at 2.7 GHz and 16 GB DDR3 memory was used to calculate the computational burden as more than 55 days were required to collect 20,000 samples. In the next step, Python 3 (Van Rossum, 2009) was used with the additional libraries listed in Table 3 to develop ANN models with FNNs. After scaling the data for each zone with min-max scaling in Eq. (2), the SGD algorithm optimised weights and biases using Eq. (3) for all models considering 10-fold CV, three hidden layers, dropout rate with 0.1, and the ReLU activation function in Eq. (4). The automated Python program fit the model 324 times for all hyperparameter combinations for every zone. In total, the program ran 6480 different ANN models with various complexities.

Results
This section presents the sampling, grid search with CV, and tuned ANN results. The supplementary material provides statistics of collected samples, selected ANN models with learning scores, weights, biases, and computation time spent on the model selection for each zone.

Sampling results
The collected samples, which were published as an open-access dataset in (Ekici et al., 2020), contain the simulation results for the quad-grid and diagrid. Each zone had ASE and sDA results indicated as ASE_1 and sDA_1 on the second floor and ASE_2 and sDA_2 on the fifth floor. ASE_avg and sDA_avg, which represent the average values of these floors (Fig. 11), were used to develop the surrogate models.
The sDA results of the quad-grid application were between 41.9% and 100%, whereas ASE results were in the range of 9.4% and 50.7%. In the diagrid scenario, these results were 33.1% and 93.75% for sDA and 16.75% and 46.2% for ASE. For the mean values, natural daylight availability increased from Z1 to Z10 for both scenarios. However, this caused an increment in ASE results. In addition, the means of the sDA results for the quad-grid were higher than those for the diagrid.

Grid search with cross-validation results
ANN models were trained using the developed Python program, considering grid search and 10-fold CV using the collected dataset. The average results and deviations of MSE, MAE, and R 2 in Eqs. (5), (6), (7) and (8) were recorded for each parameter combination during the    The interactive graphing library (Plotly, 2015) Matplotlib Static, animated, and interactive visualisation library (Hunter, 2007) search process. The best hyperparameters with their statistical results are shown in Fig. 12.
The results indicated that 37 out of 40 ANN models had the best accuracy using 200 neurones. In the three models, the number of neurones was 100. For the momentum parameter, 33 models had the best score with 0.9, five models had 0.6, and two models had 0.3. Additionally, 25 models had the highest score using 0.1 for the learning rate, while twelve models using 0.05, and three models using 0.01. For epochs, eighteen models had the highest accuracy with 500, fourteen models had 750, six models using 1000, and two models had 250. Finally, the best selection for the batch size was 50 in twenty-one models, whereas it was 100 in nine models and 10 in ten models. The deviations of MAE, MSE, and R 2 indicated that all CV folds had similar results for all metrics with high accuracies. The R 2 values of 33 models were higher than 0.9, whereas in seven models they were higher than 0.8. All MAE, MSE, and Std results were less than 0.05. Consequently, the grid search results indicated promising accuracies to develop predictive models with the selected hyperparameters in the next step.

Tuned ANN results
Using the best hyperparameter sets shown in Fig. 12, ANN models were fit considering 0.2 for splitting data to demonstrate the learning behaviour and convergence by separating the dataset as training and test sets. Fig. 13 shows the R 2 results of these models, and appendices A1 and A2 provide the convergence of MSE and MAE while fitting the ANN models.
The R 2 values for all training sets were higher than 0.9. For the test sets, R 2 of fourteen quad-grid models out of twenty were higher than 0.9 and higher than 0.8 for five models. R 2 was slightly lower than 0.8 for only one model. In the diagrid application, R 2 for seventeen models was higher than 0.9, while it was higher than 0.8 for three models. The accuracy of the predictive models through the MSE and MAE results are also provided in Appendix A. All reported MSE results were lower than 0.05. For the MAE, results of the ASE and sDA were lower than 0.05 in Z1, Z2, Z3, Z4, Z6, Z9, and Z10 for the quad-grid scenario. However, in other zones, the MAE of the ASE was slightly larger than 0.05, and it was lower than 0.05 for the sDA. In the diagrid scenario, the ASE and sDA models had MAE results lower than 0.05 in Z1, Z2, Z3, Z4, Z5, Z6, Z7, and Z9. In other zones, the MAE of the ASE was slightly higher than 0.05, whereas it was smaller than 0.05 for the sDA.
To compare the accuracy results reported for different design complexities, Fig. 14 shows the R 2 of similar studies focusing on ML applications in daylight Chen et al., 2017a,b;Chen and Yang, 2017;Kirimtat et al., 2019;Luo et al., 2021;Ngarambe et al., 2020;Sun et al., 2020). Papers in this domain have promising results for DA, sDA, illumination level (IL), and useful daylight illuminance (UDI). However, visual comfort metrics, such as daylight glare probability (DGP), have moderate accuracies for various ML applications. In this study, a similar result was achieved for the ASE metric because of the challenges in predicting comfort metrics. In addition, most of the previous studies considered design variables between 5 and 15, which provided less design complexity compared with this study. Only Kirimtat et al. (2019) considered 25 variables with R 2 values between 0.9 and 0.3. Consequently, the ML part of the MUZO methodology could address more complex designs while presenting high accuracies for all 40 models.

Conclusion
This paper presents the first part of the MUZO study, focusing on the background, methodology, setup, and ML results. The proposed methodology managed sampling and ANN development for 40 different models using a parametric high-rise model in a dense urban district. In addition, the developed Python program was used to investigate the best models for all zones of the two scenarios in 403 h. Based on the reported accuracies, building zones close to the sky levels were more challenging than the ground levels because of the increasing number of design variables. The study also proved that dense urban surroundings affect the performance of high-rise buildings at various floor levels by determining different simulation results during the sampling process. Therefore, architects and engineers should consider various zones as different problems while designing sustainable high-rises in metropolises.
The ML part of the MUZO methodology indicated prediction scores with high accuracies using different hyperparameters for batch size, epoch, neurone size, momentum, and learning rate in each model despite various design complexities considering multiple performance aspects. Future research can integrate more hyperparameters, such as activation function, dropout rate, various optimisation algorithms, different numbers of hidden layers, and sample sizes. Thus, the ANN models can provide higher accuracies with an exponential increment in computational time. Hence, having all these parameters can be more applicable to real-world high-rise scenarios.
In conclusion, the parametric high-rise model and ML for surrogate model phases of the MUZO methodology could automate form generation, performance evaluation, sampling, data processing, ANN development, and reporting the predictive models for all zones in both highrise scenarios. Using the ML part of the MUZO methodology, architects and engineers can address the computational burden while optimising the entirety of a high-rise building to propose sustainable alternatives in metropolises. Nevertheless, optimisation of the high-rise building, which is addressed in part 2 of this study, remains challenging owing to the high number of parameters involved in the design process.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.