Co-simulation and validation of the performance of a highly flexible parametric model of an external shading system

The article presents a validation study of a modelling approach implemented in a numerical script for external louvred shading systems based on an experimental analysis in a full-scale test facility. The model developed to abstract the system was entirely parametric and used co-simulation to predict the indoor air temperature andthe system was entirely parametric and used co-simulation to predict the indoor air temperature and illuminance levels in two points of the test cell. The calibration of the model of the test facility was carried out using a combination of two methods: automated calibration based on multi-objective optimization with a genetic algorithm and manual calibration. In total, six different configurations of the external shading system with varying complexity were investigated to validate the script. Its performance was assessed using three metrics: the root mean square error, the coefficient of variation of the root mean square error, and the normalized mean bias error. The results showed that the thermal environment was simulated with consistent accuracy for all the cases investigated, predicting air temperatures with an error well within the tolerance of building performance simulation tools and the experimental uncertainty. The daylighting model satisfactorily captured the different dynamics of illuminance peaks and dips, replicating the variations between different configurations, but with a lower degree of accuracy than for the thermal simulations.


Parametric scripting and co-simulation in building performance simulation
Parametric software allow the exploration of a larger solution space in early design stages when changes are still relatively easy to implement and less costly. These tools rely on an explicit dynamic linkage between geometric definitions of the buildings elements, system parameters, and whole-building performance [1,2]. Because these tools can simultaneously be used as interfaces to different building simulation engines, they can help increase the interoperability of simulation tools by supporting co-simulation frameworks. Co-simulation is a method used in building performance simulation which allows coupling different models that describe parts of the building (i.e. thermal models, daylighting models etc.), each of which is run in a different simulation tool in a way that they can exchange simulation data during run-time [3]. Co-simulation is specifically interesting for the study of geometrically complex shading systems or façade elements that simultaneously affect multiple parameters of indoor comfort (thermal and visual) and energy use, and which still suffer from simulation deficiencies in building simulation [4]. The development of new simulation approaches is thus useful as it can help investigate advanced control strategies or complex geometries [5][6][7][8] as well as support the development of design approaches such as free form facades and shading elements [9].
Previous studies using parametric design coupled to optimization algorithms have underlined the greater amount of flexibility and control over design problems they obtained and the increased ability to manage complex interactions between micro-and macrosystems [10][11][12]. This method has also been used to define more advanced control strategies, for example in kinetic façade studies [13][14][15], by dynamically connecting shading system properties such as size and movements to daylighting strategies and occupant visual and thermal comfort. Developing performance-based design workflows and integrating them into one parametric script can further help create interdisciplinary studies that combine architectural aspects like building morphology and façade design with engineering fields looking at energy demand, renewable energy availability, microclimate effects, and carbon emissions to define optimal designs [16][17][18].

Model validation
Building performance simulation and co-simulation are powerful tools to assess and predict the quality of building designs in terms of energy use, operational costs, indoor climate and more. To ensure that this approach is viable, simulations results must also be validated to safeguard their accuracy, reliability, and robustness. When it comes to shading systems, experimental validation is important because it can help improve existing models in software [19][20][21][22][23][24] and help develop new modelling approaches for complex or novel façade-and shading elements [24][25][26][27][28]. It also allows comparing more accurately different solutions with baselines, characterizing the performance of novel components, and understanding the relationships between actual versus simulated performance -which in turn drives product development.
Model validation of façade components is used to verify both thermal and daylighting models [29][30][31][32][33][34]. Commonly, models are calibrated before they are validated using existing measurement data to overcome limitations and uncertainties connected to input data. This can be done using global or local sensitivity analysis, manual calibration methods, and more recently automated techniques for model calibration. Different procedures and approaches for automated calibration can be found in the literature [35][36][37][38][39][40], most of which typically use mathematical and statistical key performance indicators like the Root Mean Square Error (RMSE), the Normalized Mean Bias Error (NMBE), or the Coefficient of Variation of the Root Mean Square Error (CV RMSE). The impact of the choice of the indicator or combination of indicators used in the calibration on the accuracy of the model is investigated and discussed in Ref. [41].

Innovative aspects of the study and outputs
The work in this article presents the co-simulated performance of a highly flexible parametric model of an external louvred shading system, the results of which are validated using experimental data from a fullscale laboratory to ensure its robustness. The geometric definition of the model used for this study was developed using the parametric design tool Rhinoceros v5 by McNeel & Associates [42] and its visual programming editor Grasshopper [43]. The model developed is built in an entirely parametric way and allows for a high degree of freedom in the geometric definition of the individual louvres to support free form search studies. Among other parameters, it allows defining the number of louvres in the system, their individual width, spacing, and angle as well as their appearance. The model is also set up in a way that the material and geometric input parameters can be used in optimization studies and ensures that the distribution of the louvres does not contain any geometric collisions by dynamically defining individual vertical distribution constraints based on adjacent louvre sizes and tilt angles. Because Grasshopper is compatible with a large number of plug-ins including the Ladybug tools [44] and sub-packages Ladybug, Honeybee, Honeybee [+], Butterfly, and Dragonfly [45], the model can be connected to validated building energy performance simulation engines such as EnergyPlus [46] and the backwards ray-tracing engine Radiance [47]. In this study, co-simulation was used to describe both the thermal model and the daylighting model of the system simultaneously by connecting geometric outputs to Honeybee daylighting analysis via Hon-eybee_context and to energy simulations by connecting to the EnergyPlus module in Honeybee. It is important to note that the geometry of the shading system was not created using the component for integrated shading systems nor was it implemented as a BSDFs, but it was modelled as a "Honeybee_context" shading element in the Honeybee legacy plug-in. Special care was given to ensure that its reflectance was considered both in the daylighting and in the thermal simulations as this is not a default setting. The model and its degree of flexibility are described in the appendix with a link allowing to download the model from a data repository for further use.
The experimental data used to validate the model was collected in a full-scale test laboratory which was equipped with a series of different versions of the shading system and collected weather data, temperature data, and illuminance data, offering the possibility of a full characterization of the system investigated. The experimental campaign started in the second week of June 2019 and lasted until the first week of August 2019 in the location of Trondheim (Norway), in total providing two months of data and a range of varying boundary conditions. These separate studies aimed at testing key aspects of the robustness of the modelling approach, such as the effect of the density and regularity of the shading device configuration, or the architectural expression of the system.
The main output of this work is thus a modelling workflow, which can be used for the co-simulated performance of external louvred shading systems characterized by a high degree of flexibility. It aims at contributing to exploring applications of parametric design and ongoing multi-physical validation efforts of models for shading systems, specifically for systems which cannot be modelled with existing predefined modules inside whole building simulation tools. This approach is also useful to provide an assessment of the accuracy of using parametric shading device models for studies in which using a more detailed model of a fenestration system, such as a bidirectional scattering surface distribution (BSDF) description, is either not possible or not convenient. For example, if one is interested in exploring free form facades or using optimization algorithms, creating a new BSDF for each simulation run may lead to too much computational overhead.
The remainder of this article is set up with the following structure: in section 3, the methodology for the study is laid out and describes the parametric modelling assumptions, the test facility used, and the procedure for the calibration and the validation. The results of both the calibration and the validation are presented in section 4, with a separation between the results obtained for the test facility without the shading system and those obtained with the shading system. Section 5 of this article contains the discussion of the validation results obtained, as well as the limitations of the study. Finally, the conclusions of the study are presented in section 6.

Description of the building performance simulation model
For the experimental assessment presented in this study, the input of the model was set up to generate an external fixed louvred shading system with 155 mm wide louvres with variable tilt angles. The system modelled and studied is based on an existing passive louvre system [48] which was modified in its set up to accommodate a much larger degree of freedom, with a variable number of louvres that can be vertically distributed in any chosen way. Each louvre can individually be tilted using interchangeable brackets from 0 • (horizontal) to 45 • in 15 • increments ( Fig. 1). An early modelling approach of this system is also described in a previous study available in Ref. [49].
The modelling approach developed in this work was used to generate the studied louvred shading system in front of a test chamber identical to the one used to validate the model experimentally. The properties and characteristics of the test chamber and the surrounding guard volume are described in section 3.2. The chamber itself is a rectangular volume modelled as a single zone surrounded by three volumes which were merged into a second zone and formed the guard room around the chamber.
While the test chamber was modelled as unconditioned for reasons discussed in section 3.2, the guard room zone was conditioned with an ideal system with scheduled heating and cooling setpoint temperatures. These dynamically scheduled setpoints were defined to match the measured surface temperatures of the test cell chamber wall (measured on the side of the guard volume). The interior convection coefficients on these surfaces were increased to fictionally high values to ensure that the temperatures of the surfaces of the test chamber (facing the guard room) were identical to the air temperature of the guard room, and thus, recreating the boundary conditions which were measured during the experiments.
For the daylighting model, two analysis surfaces were created inside the model of the chamber to replicate the measurement points of illuminance on a desk and on the ceiling. The simulated illuminance measured in the model was calculated as the average illuminance on a 10 cm × 10 cm surface centred around the position of the sensor. The desk sensor was placed at 1.30 m from the window (desk height 0.8 m) and the ceiling sensor was located at 3 m height in the middle of the room.
To characterize the effect of the shading system on the air temperature and illuminance levels in the chamber, six different cases corresponding to six different configurations were investigated (Table 1). These configurations differed from one another in the number of louvres considered in the system, the tilt angle of each louvre, the interspace between the louvres, and the colour of the louvres which was either dark blue (colours RAL 5000) or pure white (RAL 9010). The latter was used to investigate the effect of the appearance of the shading system. It is important to note that the cases with 13 modified louvres are configurations that aim to be more complex than the previous cases by having louvres that are no longer equally spaced and have several different tilt angles. This means that the shading system no longer forms a regular patterned shadow in front of the window. These configurations are interesting to investigate to understand how well the modelling approach applied to an odd geometry is translated into inputs for the different simulation engines. The choice of these configurations was based on previous work described in Ref. [50]. A vertical cross-section of the different shading system configurations is shown in Table 2.

Description of the experimental facility
The experimental campaign aiming to measure the effect of different configurations of the studied external louvred shading system was carried out between June and August 2019 using one of the test chambers from the ZEB TestCell facility located in Trondheim, Norway. The test chamber is a rectangular room (with internal dimensions 2.    Table 6) Optical properties of the surfaces in the chamber with manual calibration (see Table 7 brown colour at the time of the experiments. The south-facing façade element is built of insulated timber frame with a 2.2 m × 2 m triple pane window with argon gas (Fig. 2). More details about the window are provided in Table 3. A more exhaustive description of the whole test cell facility in Ref. [51]. For this study, although the test chamber is equipped with a full HVAC system to condition the indoor volume, the chamber was left unconditioned while the surrounding guard volume was conditioned. The reasons for this choice were plural. First, keeping the volume unconditioned created larger temperature fluctuations which could be measured more accurately than a smaller temperature signal. Second, if the volume were conditioned with an HVAC system and setpoints, the measurements would have to be done on the amount of energy delivered to condition the room. This would have required many more assumptions regarding the modelling of the HVAC system itself, and would have added significant uncertainty to the results of the model. The nature of the data recorded during the experimental campaign is summarized in Table 4 and Table 5.
It is important to note that the sensors measuring the illuminance level in the chamber were set with two different ranges (see Table 5). This choice was as a compromise between limiting the time the sensors would saturate and obtaining accurate readings. Because this study did not assess the risk of glare, the goal of these measurements was only to investigate whether the space with the shading system would receive a minimum threshold value of illuminance.
During the time of the experiments, weather data was collected by a weather station to create an EPW weather data file for the corresponding analysis period. Because the pyranometer of the weather station only recorded global irradiance, the Engerer2 code described in Ref. [52] was used to obtain the fractions of diffuse and direct radiation (W/m 2 ). This algorithm was selected because it is validated, and its developers found that it performed particularly well with hourly radiation data in cold climates in latitudes close to the ones of Trondheim.

Table 2
Vertical cross-sections of the facade with the different shading configurations, including individual louvre tilt angles and interspaces.

Fig. 2.
Facade of the test cell facility and a picture of the test chamber used.

Table 3
Characteristics of the window of the test cell.

Table 4
Quantities measured inside the test chamber during the experimental campaign.

Quantities measured in cells
Uncertainty on measure Air temperature at 1 and 2 m height ±0.5 • C Illuminance on a surface at 0.8 m height (desk surface) and 3 m height (ceiling surface). Note that the sensor on the desk was set to have a measurement range of 0-1000 lux while the one on the ceiling was set to have a measurement range of 0-500 lux (sensor manufacturer S + S Regeltechnik) ±5% of the maximum value in the range

Description of the procedure for the calibration of the thermal and daylighting models
The calibration period defined for case 0 spanned a two-day period during which no shading system was installed on the test cell facility. The calibration itself was done using the multi-objective optimization plug-in Octopus [53] with the following procedure. First, a small number of parameters were selected on the basis that they had the most uncertainty due to lacking documentation or because previous sensitivity studies [54] had determined them to have the most impact on the heat balance of the test chamber. These parameters were then provided as input to the algorithm with a range of values they could take. Second, the objectives of the optimization were defined by a fitness function which aimed at minimizing the root mean square error (RMSE) between the measured and the simulated air temperature in the chamber at each hour, as well as reducing the maximum hourly error for each day (peak error). To avoid overfitting the model, the range of the input parameters used by the Octopus algorithm was contained, at most, within a ±10% interval of the assumed or known value except for the g-value for which a value 15% lower than what was provided by the window manufacturer was set as the lower boundary. This assumption was based on the findings of [55], which showed that the discrepancies between measured and announced g-values of double glazed units could reach up to 23% during a summer period in the same location as the experiments described in our study.
For the daylighting model of the cell without the shading system, a similar procedure was followed, but using a manual calibration on the same two days used for the automated thermal calibration. This was because the saturation point of the sensors was reached for many hours during the day when the chamber had no shading. A second daylighting calibration, this time only focusing on the reflectance values of the shading system, was carried out over another set of two days when the test chamber was set up with the case study called case 16. In this case, a light hand calibration was used on one parameter only.
In the fitness function for the optimization, the RMSE (Eq. (1)) was calculated according to the following formula where N is the total number of values, m is the measured value and s is the simulated value, both at time step i: The accuracy of the thermal and daylighting models for each case was evaluated using two additional metrics: the CV RMSE or coefficient of variation of the root mean square error in % (Eq. (2)) and the NMBE or normalized mean bias error (Eq. (3)).
The CV RMSE is calculated similarly to the RMSE but uses m the average of measured value during the considered period, as follows: The NMBE is calculated according to the following formula:

Description of the procedure for the validation of the thermal and daylighting models
For each calibration and validation procedure, two completely independent data sets were used each time to ensure that the model results were reliable. For the cases with the shading system (case 16, 13, 13 modified A, 13 modified B, and 13 white), the model was also run with   new datasets representing a simulation period of three consecutive days starting at 7 a.m. on the first day and ending at 7 a.m. on the last day. For each period considered, as much as possible, the days selected for the validation period were chosen to be a series of days containing one fully sunny day, one slightly cloudy day, and one cloudy day. The choice of the starting times for the analysis, that is 7 a.m., was selected to create consistency. Each time the shading configurations were changed during the experiments, the switch was done in the early morning at approximately 7 or 8 a.m. and required about 40 minutes to execute. It was assumed that it would take about 24 h before the data recorded was no longer influenced by the louver switching intervention, and so the data recorded on these switching days was discarded until the next day at 7 a. m. This method was followed for each set of measurements to avoid any dependency of the data collected on the order of the cases investigated.
To assess whether the model could be validated or not, the same metrics used during the calibration were calculated for the new set of results (RMSE, CV RMSE and NMBE). Additionally, a graphical assessment was used to understand whether the simulated values also matched visually with the measured data. This was specifically important for the daylighting model because illuminance values can vary very rapidly and, in theory, a model giving statistically accurate values could fail to capture the dynamics of the measured data and suffer from a cancellation effect between time steps.

Results of the calibration and validation of the simulation model without a shading device
The results for the calibration and validation of the thermal model of the chamber without the shading device (case 0) are presented in Table 6. For the daylighting model, the hand calibration of the model yielded the values given in Table 7.
The results of the simulated values for the illuminance levels and air temperature in the chamber given by the calibrated model are compared against the measured values in Fig. 3 and the accuracy of the calibrated model is estimated in Table 8. The outdoor boundary conditions (outdoor air temperature and global horizontal irradiance) are provided below each graph to compare the results of the model and the measurements to the variations of intensity of the environmental signal. From Fig. 3 and Table 8, it possible to see that the simulated temperature was almost always within the uncertainty interval of the measured value (±0.5 • C) and yielded an RMSE of 0.5 • C. The CV RMSE was +2% and the NMBE was 2%, which indicates that the distance between the measured and simulated data points was small, and the level of accuracy of the model is well within the acceptable error of building performance simulation tools. Overall, the evolution of the indoor air temperature in the chamber without the shading system followed the same trend as the outdoor air temperature, but had a small delay of approximately one to two hours in the peaks due to the inertial effect of the test cell.
For the daylighting, the model with the selected parameters recreated the correct shape of the signal for solar irradiation entering the chamber, and captured the small dips in daylight levels measured both Fig. 3. Results of the calibration of the model for case 0 (no shading) during the period August 3rd 7 a.m. to August 5th 7 a.m. on the ceiling and on the desk surfaces. Despite the boundary conditions depicting one fully sunny day and one slightly cloudier day, both illuminance sensors in the chamber saturated during the middle of the day and made it impossible to calibrate the model with peak illuminance levels. For the daylighting model the RMSE was calculated as 41 lux and 36 lux for the desk and the ceiling surface respectively, the CV RMSE as 8% and 17% and the NMBE as 0% and 10% again for the desk and ceiling surface respectively. The model was then tested on a new independent data set representing two days in order to be validated. The results of the validation phase of this study are reported alongside those of the calibration period in Table 8 and show that the RMSE was 0.6 • C for the thermal model. As can be seen in Fig. 4, the air temperature simulated in the chamber was also almost always within the confidence interval of the measured value, only slightly above during the first day. For the validation period, the CV RMSE of the thermal model was calculated as 2% and the NMBE as 2%.
These two values indicate good accordance between the measurements and the simulation results, but the positive bias shows that the model predicted a slightly higher air temperature than what was measured insitu.
For the daylighting model, the shape of the illuminance dome received by the two surfaces in the chamber matched the measured illuminances as it did during the calibration period, but again it was not possible to compare peak illuminance levels because of the saturation points of the sensors. For the daylighting results, the RMSE was calculated as 71 lux and 35 lux for the desk and ceiling surface respectively, and the CV RMSE and NMBE were calculated as 14% and -3% for the desk, and 15% and 8% for the ceiling surface. Overall, the values given by the RMSE, CV RMSE, and NMBE for both models are considered close to the ones obtained during the calibration period given the accuracy of the sensors, and thus satisfactory in replicating the thermal and daylighting performance of the space under test.

Results of the calibration and validation of the model with the shading device
An overview of the results for all the cases is presented in Table 9 before being discussed individually in the following section. Table 9 also shows the results of the second part of the calibration associated with determining the reflectance value of the louvres using case 16. For each case, as previously, two separate graphs are plotted: one for the daylighting model and one for the thermal model with the specific corresponding measured boundary conditions reported below each graph.
The second calibration, which only changed the reflectance of the louvres was carried out manually over two days corresponding to June 17th 7 a.m. until June 19th 7 a.m. and provided an RMSE of 0.2 • C for the thermal model, a CV RMSE of 5%, and an NMBE of 0%. For the daylighting model, the RMSE was calculated to be 42 lux on the desk and 57 lux on the ceiling. The CV RMSEs were 18% and 35% for the desk and the ceiling respectively. Finally, the NMBEs were − 2% on the desk and − 24% on the ceiling. This calibration allowed determining a reflectance value of 0.07 for the blue louvres as reported earlier in Table 7 and the corresponding graphical results of the calibration are shown in Fig. 5.
To validate the model for case 16, a simulation was run on a new set of data corresponding to three days between June 12th at 7 a.m. and June 15th at 7 a.m. The results of the simulation are shown in Fig. 6. These show that the simulated temperature was always either within or very close to the value measured and within the uncertainty range. The trend formed by the simulated temperatures, although almost identical in its shape, was slightly delayed compared to the measured values and the discharge phase (i.e. the time after the temperature peak) did not seem as rapid as it did in the measurements. According to Table 9, the calculated RMSE for the thermal model of the case 16 was also 0.2 • C. The CV RMSE was determined as 5% and the NMBE, once again, showed a negligible bias with a value of 0%. For the daylighting model, the model yielded an illuminance profile similar in its shape to the measured illuminance levels, this time without Fig. 4. Validation of the model for case 0 (no shading system) during the period August 5th 8 a.m. to August 7th 8 a.m.

E. Taveres-Cachat and F. Goia
the sensors saturating. The shape of the peaks was respected but the intensity was underestimated, especially on the last day of the validation. The RMSE, CV RMSE, and NMBE were 58 lux, 22%, − 10% and 46 lux, 27%, − 17% for the desk and the ceiling surface, respectively.
For the case 13, which had 13 louvres equally spaced and tilted at 15 • , the thermal model was able to predict the air temperature inside the chamber within the uncertainty interval of the measured temperature as seen in Fig. 7. The simulation error was particularly small when the outdoor temperature and the global horizontal irradiance were lower. During this period, as shown in Table 9, the RMSE of the thermal model was 0.3 • C, the CV RMSE was 5%, and the NMBE -1%. All these values indicate that the model for case 13 maintained the same accuracy level as it had during the validation of the model without the shading system and with the shading system in case 16. For the daylighting model, the simulated illuminance on the desk and ceiling followed quite closely the values obtained with the measurements. However, as in the previous case, the illuminance was often overestimated on the ceiling. On the third day, both simulated illuminance profiles match the recorded global irradiance but provided a poorer match to the measured values, especially on the desk. The RMSE for the daylighting model for case 13 was calculated as 74 lux for the desk and 58 lux for the ceiling. The CV RMSE and NMBE were calculated as 25%, − 5% and 41%, 27% respectively for the two analysed surfaces. This indicated that the model was on average a less accurate in predicting illuminance on the ceiling in conditions where the illuminance profiles showed a large amount of variation during the day, and tended to overestimate the amount of light in the chamber.
For the 13 louvres modified cases, the louvres were set up in a way that their spacing was heterogeneous and the angles of each louvre could also be different from one another as shown previously in Table 2. The results for the first one of the modified cases, referred to as case 13 modified A, are shown in Fig. 8. For the thermal model, the predicted temperature was well within the uncertainty interval of the measured temperature during the days with lower outside temperature and weaker solar radiation. The RMSE of 0.2 • C indicates that the distance between the simulated and measured values was small ( Table 9). The CV RMSE and NMBE which were − 5% and 0% were in line with the previously determined accuracies.
For the daylighting model, the simulated values followed the trend of the measured values, but the daily profile shows a slight early dip in the

E. Taveres-Cachat and F. Goia
illuminance a couple of hours before the measurements did on days with higher solar irradiation. The RMSE values (52 and 40 lux) for the two surfaces were like those of case 16 and smaller than in case 13. The CV RMSE values showed the model was consistent in its level of accuracy for the ceiling surface (16%) and slightly more accurate than previously on the desk with a CV RMSE of 29%. In terms of the NMBE, the illuminance on the desk was, on average, overestimated by 2% while the illuminance on the ceiling was overestimated on average by 18%. For the second modified configuration (Fig. 9), referred to as case 13 modified B, the thermal model predicted the air temperature with an   The simulated illuminance values show that the model was able to reproduce the variations of the measured values but was less accurate when the light was more variable as it was on the last day. The RMSE value of 72 lux on the desk is like the value obtained in the case 16, and the RMSE of 39 lux is the lowest value obtained for the blue louvres all configurations considered. The CV RMSE values were 19% on the desk and 26% on the ceiling. The NMBEs also indicate a more accurate model with 0% on the desk and 13% on the ceiling surface.
For the case with 13 white louvres, it was not possible to select three days with a large variation in the outdoor boundary conditions, and the  E. Taveres-Cachat and F. Goia measurements were obtained during three mostly sunny days (Fig. 10). The results of the simulated temperature in the chamber were, once again, quite close to the measured values and within the uncertainty interval. However, the profile of the simulated temperature slightly underestimated the peak temperature. Overall the RMSE was 0.2 • C, which was identical to previous validation cases. Both the CV RMSE and NMBE reported in (Table 9) indicated that the model was as accurate as previously.
For the daylighting models, because the boundary conditions consisted of three very sunny days and due to the reflecting nature of the louvres, both sensors saturated during the day as previously during the first validation period. The global shape of the illuminance on the ceiling was in line with the measurements while the one for the desk showed a flawed trend in which the illuminance level on the desk dropped preemptively at the end of the day. As a result, the RMSE was as 82 lux on the desk surface while it was 25 lux on the ceiling. The CV RMSEs and NMBEs were 35%, − 1% and 11%, − 1% respectively, which makes this model one of the least accurate of the models investigated in predicting the illuminance on the desk and the most accurate on the ceiling.

Discussion
The approach chosen in this study was to use parametric design coupled to co-simulation to run both thermal energy and backwards raytracing daylighting simulations. The thermal model was calibrated using automated calibration, and yielded results that were within the uncertainty of the simulation engine for all cases investigated (RMSE ≤ 0.3 • C, 0 ≤ NMBE ≤ 1%). The CV RMSE was similar during validation and calibration (2%), and ranged from 1 to 5% for the cases with the shading system. This indicates that the thermal model was particularly accurate since, according to the authors of [41], calibrations with a CV RMSE below 3% provide the highest accuracy for the input parameters in energy or temperature simulations.
The daylighting model estimated the illuminance at two different heights in the test chamber and was calibrated using hand calibration. The illuminance on the ceiling was predicted in all cases with an RMSE between 25 and 58 lux, but the CV RMSE and NMBE indicated that the model mostly overestimated the amount of light reaching the ceiling sensor. The illuminance predicted on the desk had an RMSE between 53 lux and 74 lux, except for the case with white louvres where it was 82 lux. Considering that the model without a shading device during the validation had an RMSE of 71 lux, the accuracy of the simulated illuminance on the desk for the cases with the blue (low-reflectance) louvres was as good as when there wasn't a shading system, and slightly less when the system was white. The value of the CV RMSE on the desk was consistent for all the cases with shading devices, but was sometimes twice as much as when there was no shading system. This error could be due to the conditions during the calibration where the sensors saturated during the day, and may have provided a false sense of accuracy which was revealed when the shading system was present and the sensors no longer saturated. For the case with 16 louvres, it appeared that the model possibly underestimated the amount of light entering the room, which could be an issue tied to how the global solar radiation was split between its direct and diffuse components, or be due to how the Daysim software calculates sun positions. The latter is a weakness of the software discussed in Ref. [57]. For other cases, the main type of error in the model seemed to appear on sunny afternoons where the simulated illuminance dropped ahead of the measured one, and the models always underestimated the amount of light. This error could be due to how direct radiation was reflected into the room.
In order to provide a sense of the magnitude of the error related to using a simplified modelling approach such as is used in Daysim, two daylighting metrics, the daylighting autonomy (DA) [58] and the continuous daylighting autonomy (cDA) [59] were calculated on the desk surface based on the simulation results and the measurements. The calculations usied two different illuminance thresholds and a standard occupancy profile (7 a.m.-6 p.m. with all days considered weekdays). The results are shown in Fig. 11. Although these values only provide a Fig. 10. Results of the validation for case 13 white louvres (equally spaced and tilted at 15 • ).

E. Taveres-Cachat and F. Goia
snapshot of the expected accuracy because of the limited analysis period, it is possible to see that most of the modelled cases yielded metric values within the uncertainty range of the measured values when using an illuminance threshold of 300 lux. With the higher threshold of 500 lux, it appears that the models for the modified configurations were less accurate, but the differences reported are still within the 20% uncertainty range of climate-based daylighting metrics [60].
Globally, the values outputted by the models showed that the accuracy of the simulations was below the maximum threshold defined in the ASHRAE guideline 14 [61] when it came to the thermal model, and within the uncertainty of the simulation engine. However, it is important to keep in mind that the maximum values provided in the standard are for annual simulations with hourly values. This may indicate that the values calculated over a shorter time only reflect the accuracy of the model for the type of boundary conditions measured at that time, i.e. summer conditions with high solar altitudes. Model calibration and validation are nonetheless, by nature, under constricted problems and many models using different parameter input values can theoretically yield similar results. To make sure that the model is accurate during other times of the year, it would be useful to verify the results during a different time with different boundary conditions, for example during the winter, spring, or fall.
Finally, the current model may suffer from certain limitations due to the modelling choices. For example, to avoid concave surfaces which may sometime lead to instability in the thermal engine, the oval-shaped surfaces of the louvres were not modelled as such but as diamondshaped surfaces. This could impact how the radiation impinging on the louvres was reflected into the room. Additionally, because the louvres were modelled as context elements, the thermal model does not consider their temperature and whether they radiate heat towards the glazed surface behind them. This aspect was, however, considered quite minimal given the fact the glazing assembly had a low thermal transmittance with low e-coating and the airflow was not restricted around the shading system. For the daylighting model, the accuracy of the results was more inconsistent than for the thermal model, even though the illuminance profiles were well replicated by the model. Inaccuracies in the results could also be due to the fact that the daylighting model showed to be sensitive to shading masks from surrounding buildings and reflections. Unknowns of these parameters may be contributing to the deviations seen and possibly explain the discrepancies in the late afternoon hours of sunny days. Despite these limitations, the results altogether indicate that the simplified shading model implemented in the Honeybee legacy model has an acceptable level of accuracy for early design phases to model louvred shading devices, even when they start to take on non-traditional setups and resemble more free form configurations. However, the model may not be used for glare studies, as these are highly directional and work plan illuminance may not be enough.

Conclusion
In this study, a full-scale test facility was used to validate a highly flexible parametric co-simulation script for different configurations of an external louvred shading device. The simulations for the validation were carried out using a combination of thermal and ray-tracing simulation engines, which allowed assessing both daylighting and air temperature results. To ensure the robustness of the validation, six different cases corresponding to six different configurations were investigated. This approach aimed to understand whether the models could provide a consistent level of accuracy when specific properties of the system were modified such as the number of louvers, the homogeneity of the shadow created, and whether the model could accurately capture the effect of the appearance of the system. To further improve the robustness of this study, the calibration process is presented in detail including the specific precautions which were taken to avoid overfitting the data. First, the values of the parameters used for the calibration never deviated more than 10% from the nominal values. Second, the days selected for the Fig. 11. Evaluation of two daylighting metrics on the desk surface compared to experimental data. E. Taveres-Cachat and F. Goia validations were specifically picked to cover different boundary conditions as much as possible.
The results of this study showed that for all six cases considered, the results of the simulation were in good agreement with the measurements and it was possible to validate all the models. The thermal models for the shading system were specifically reliable with an RMSE between 0.2 and 0.3 • C when the shading system was used. The models for the illuminance were slightly less accurate and more sensitive to surroundings (context elements) around the test chamber. Indeed, the results were not able to completely capture every peak when the incoming radiation varied abruptly, which would create difficulties estimating glare situations for example. However, for work plane illuminance studies, the general trends of the measurements were satisfactory with a maximal RMSE of 58 lux on the ceiling and 82 lux on the desk. The work presented in this article supports the idea that parametric scripting can be used in the early design phase to model complex shading elements which are not described with BSDFs with a certain level of accuracy, and that these models can successfully be coupled to multiple simulation engines to achieve co-simulation. Additionally, the approach was proven to be compatible with automated calibrations processes using optimization algorithms. By nature, parametric scripts allow modellers to access and control specific parameters which may not always be easy to isolate in the interface of whole building simulation tools, and these same parameters can be used as inputs for the calibration process. The ability to perform mathematical operations directly in the canvas of the parametric script also allows calculating key metrics that can be used as fitness functions (objectives) for the optimization component.
The output of this study is a robust grasshopper script which can be used and downloaded by users to model highly flexible external louvred shading systems considering a variable number of louvres, individually controlled tilt angles, material properties, and sizes. The script can be connected to daylighting studies and energy simulations as well as it can be implemented in optimization frameworks for more freeform search type studies for shading systems. Overall, the findings regarding the validation of the model are promising as façade design becomes more and more complex and the effect of non-conventional shading elements must be assessed considering the full spectrum of physical domains they interact with, that is light, air, and heat. As modern architecture evolves and façade elements gradually incorporate more and more functions, approaches such as the one described in this article are becoming more common for early design exploration and for this reason, validating models is of utmost importance to ensure the reliability and performance of advanced façade designs.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.