Dynamic modelling of indoor environmental conditions for future energy retrofit scenarios across the UK school building stock

UK schoolchildren spend on average 30% of their waking lives inside schools. While indoor environmental quality (IEQ) is critical for their health and attainment


School indoor environment prediction using stock modelling
From a health perspective, school-aged children are particularly vulnerable to both extreme thermal conditions and poor indoor air quality [1]. Children spend around 30% of their waking hours at school, 70% of which is spent indoors [2]. Indoor classroom conditions are, thus, critically important. Internal temperature and air contaminants, such as carbon dioxide (CO 2 ), nitrogen dioxide (NO 2 ) and particulate matter under 2.5 μm (PM 2.5 ) are key components of Indoor Environmental Quality (IEQ). Guideline thresholds for IEQ parameters are defined by both health [3] and educational bodies [4]. In addition to health concerns, cognitive performance, which is linked to educational attainment, is affected both by heightened internal temperatures [5] and lack of ventilation [6][7][8].
School buildings should not be considered as static environments. The UK non-domestic building stock contributes around one fifth of all UK-based carbon emissions [9]. As a result, energy efficiency retrofitting is considered a key step towards meeting the national target to achieve net zero emissions by 2050 [10]. In addition, improvements in ventilation and shading are required to mitigate against hotter temperatures due to ongoing climate change [11] and increased levels of contaminants [12]. The dependence of IEQ on both energy retrofitting and IEQ improvement means that these should be combined to create pair-wise scenarios.
To predict building performance across a heterogeneous building stock, building simulation should incorporate national datasets [13,14] to enhance resolution. Tools for calculating annual, whole-building energy consumption were recently adapted [15] to auto-generate building archetypes by region and era based upon the nationwide Property Data Survey Programme (PDSP) building fabric dataset [16]. These models demonstrated a reasonable match to measured Display Energy Certificate (DEC) data. We have also incorporated airflow modelling into updated classroom models [17] to account for the effect of external temperature and wind on ventilation airflows.
Hence, school building stock models predictive of classroom performance should incorporate (a) energy and IEQ performance criteria, (b) pair-wise energy retrofit and IEQ improvement scenarios and (c) archetypes. On the basis of this multi-objective problem, we initiated the 'Advancing School Performance: IEQ, Resilience and Educational outcomes' (ASPIRE) project, funded by the UK Engineering and Physical Sciences Research Council (EPSRC). ASPIRE addresses whether high IEQ in UK school buildings, enhancing learning and health, can be incorporated into school buildings while achieving lower carbon emissions. A key output of this work is for us to create a toolkit to assess relative impact of different future scenarios on IEQ and energy criteria across the heterogeneous stock.

Research objectives
The aim of this study is two-fold: 1. To determine the extent to which combinations of pair-wise energy efficiency retrofit and IEQ improvement interventions influence comfort, health, attainment and energy performance criteria in classrooms across England and Wales. 2. To investigate the impact of building stock heterogeneity on the resilience of the best performing pair-wise intervention for each criterion.
The following research objectives address individual steps in response to the aims outlined above: 1. To generate model archetypes accounting for stock heterogeneity, and incorporating energy retrofit and IEQ improvement pairwise scenarios. 2. To demonstrate optimum pair-wise retrofit and IEQ improvement scenarios for different performance criteria, considering four classroom orientations and three climate scenarios.
This paper presents our analysis of the results of the building stock-wide simulation of 195 archetypes of UK primary and secondary school classrooms, developed through the analysis of the PDSP database of over 18,000 schools, incorporating scenario modelling and performance criteria requirements discussed in the previous section.
In a forthcoming paper by the same authors, we will address quantitively the resilience of the optimal performing pair-wise scenarios across multiple stakeholder criteria using multi-criteria decision analysis. D. Grassie et al. Introduction. In this section, we describe previous school building research, first exploring the characterisation of building tools for integrating health, attainment and energy criteria in Section 2.1. We then discuss the use of building stock archetypes for stock-wide resolution in Section 2.2 and scenario modelling for dynamic resolution of future iterations of the stock in Section 2.3.

Multi-criteria school building-related research
Classroom monitoring studies have previously identified relationships and conflicts between energy efficiency and IEQ [12,18,19] in school buildings across Europe. Key findings, applicable to this study, are the need to incorporate within modelling: -Airflow network modelling of ventilation type and rate [18] and hence operation of ventilation, on energy performance. -"Integrated design solutions" [12] combining both energy efficiency and IEQ improvements.
-Intra-day and intra-year investigation of periods when "external air can be more polluted than indoor air" [12].
-Simplified [19], rather than exhaustive "time consuming" construction of building simulation models, to sufficiently demonstrate differences in performance across archetypes and scenarios. -Additional data such as "building envelope" and "renovation processes" [18], due to limitations of energy performance certificate (EPC) datasets.
The use of simulation modelling to scale-up classroom monitoring to stock-level analysis had previously been isolated to energy related issues [20,21]. A key reason for not using models for IEQ prediction may be the empirical difficulties of separating out individual IEQ mechanisms [22]. It has been found that significant drops in attainment are usually a result of multiple IEQ factors [23]. Measured impacts have also been variable across performance of different attainment-based tasks [22], negating the value of such models in predicting a broad definition of attainment.
Due to these discrepancies between individual studies, meta-analysis [1], and surveys [24] have been used to derive effects of IEQ on student performance. Associations between cognitive performance and internal temperature [5] and ventilation rate [8] have been recently reported. By linking these findings to building simulation outputs [7], our research team has facilitated analysis of changes of climate, building typology and ventilation on cognitive performance. However, the school sector has not yet fully developed building models providing indicative outputs of attainment which could be coupled to modelling campaigns. Other UK sectors such as residential [25,26] and care homes [27] have focused on characterising buildings based on health effects of IEQ.
The relative importance of achieving various performance thresholds for each criterion is a key issue when generating multiple criteria (such as health and attainment measures) through a common modelling framework. While determining performance of buildings in terms of health, attainment and energy criteria requires definition of acceptable thresholds, there are three main categories of criteria available: 1. Criteria with defined exposure limits due to the need to drive policy to address specific health concerns. Such criteria and associated limits include external contaminants [3] and overheating/internal CO 2 [4] 2. Criteria which require minimising/maximising and have a common scale for measurement and prediction but no current defined limit. This includes energy use, where benchmarking [28] is being used to track relative progress of individual buildings against the energy performance of the stock as a whole. 3. Criteria which require minimising/maximising, for which building simulation can predict relative changes in performance, but are incompatible with measurements. This includes attainment, where acceptable thresholds or common measures have not yet been defined since direct measurement is not possible and proxies (such as exam performance) have to be created and used instead.
The clearer definition of standards of criteria in category 1) may make them easier to address at the expense of harder to report criteria in category 3). Hence, deriving cross-criteria performance of schools is critical in order to ensure consideration of less quantifiable but equally important indicators of performance. Studies which have focussed ostensibly on energy in different era buildings [29] mentioned changes in IEQ as being a driving force to more fresh air, and hence greater heating requirements in case studies. However, while overheating and indoor air quality have been analysed in recently built schools, these have been done in isolation to energy use rather than concurrently [30].

Stock-wide resolution of comfort, health, attainment and energy use
Energy [31,32] and IEQ criteria [26,33] have been simulated separately for entire districts. This has been achieved by coupling governmental census [31], land-use [32] and housing surveys [26,33] with geometric or contaminant datasets. These studies demonstrate the use of single criterion models across a diverse set of buildings comprising entire sectors. However, analysis of energy retrofitting across different typologies has only been possible where occupancy patterns [31] have been harmonised, and in this case only energy, and not IEQ measures, have been investigated.
A trade-off between more detailed archetypes, with smaller sample sizes within each archetype, and greater number of buildings within less detailed archetypes had been demonstrated within residential building stock models [34]. While the former may constrain significance of analysis of policy decisions due to small samples, the latter case may suffer from archetypes being insufficiently descriptive, due to averaging across a large number of heterogenous buildings [35]. A second trade-off is that while approximation of building fabric within archetypes may prevent calibration at an individual building level, by demonstrating dynamic impacts of changes to the stock, rather than a static description, engagement can be achieved with policymakers [36].

Dynamic resolution utilising scenario modelling
Scenario modelling had been used previously to investigate energy use and internal temperatures concurrently for a range of operational measures for school buildings in a changing climate [37]. Although operational measures were added iteratively to the process, there was no indication of how this would change at stock-level once heterogeneity was accounted for. Governmental resilience modelling in the UK [38] accounted for this heterogeneity by including geometric changes, such as orientation, room height and glazing ratio, within a single baseline model of a real school. Relevant stock-wide measures were designed for mitigating overheating through worsening climate scenarios. However, stock-wide scenarios were IEQ improvement measures alone, such as night ventilation and use of thermal mass, rather than a mixture with retrofit. Similarly, a schools model in Karlsruhe, Germany [39] describes optimal overheating mitigation efforts by stock typology, but without further pairing with the degree of retrofit.
While school building datasets offer stock-wide coverage, this tends to be limited to condition and age [16], limiting scope of stock archetypes to fabric rather than operational practice. Within a similar sector, a scenario modelling approach in the care-home sector [27], individualised and combined both "hard" fabric alterations to improve energy efficiency and "soft" operation measures to improve IEQ. These were provided as a matrix of options and used both future climate files and heatwave periods to analyse resilience rather than a performance over a typical operational year.
School building stock models have been used previously to simulate energy and IEQ criteria individually. Converting these into health and attainment has proved challenging due to difficulty of separating out individual effects and lack of agreed metrics. Rather than show a single snapshot, relative movements in these criteria could be achieved by considering dynamic changes. However, impact of retrofit and IEQ improvement measures should be considered within the same framework. We have differentiated between measurability of different criteria against existing standards and relevance to performance. Trade-offs in the use and granularity of archetypes to approximate the stock have also been identified as key elements. However, the three dimensions of resolution shown in Fig. 1, of determining performance of multiple criteria across the heterogeneous stock under future scenarios, is the basis of the novel approach reported in Section 3.

Construction of building simulation models
We have provided an overview of the process of automating 42,120 unique simulations of UK school buildings in Fig. 2, which is described in detail in the following section. From a single EnergyPlus [40] seed model, we incorporated four different orientations within each model. 195 archetype descriptions and 24 pair-wise scenarios were then auto-generated by utilising Python scripting [41] on UCL's Myriad High Performance Computing module, coupled with the EPPY library [42]. We batch simulated the auto-generated models for three different indoor contaminant types (CO 2 , NO 2 and PM 2.5 ) and three climate scenarios. We then carried out post-processing of the output files, also using Python, to derive the 6 performance criteria identified in Section 3.3.

Seed model construction
We created an EnergyPlus seed model with the geometry given in Fig. 3, approximating a single-side ventilated classroom. This is located within a block of classrooms inside a school building, surrounded above, below and either side. We created an airflow network model as described previously [17], defining the external wall as shown in Fig. 3, with glazing properties and opening scheduling defined in the following sub-sections.
Table A-1 in the Appendix provides details of internal gains, defined by Building Bulletin 101 (BB101) [4], which we implemented within the model to facilitate heating demand and overheating calculations during heating and cooling seasons respectively.
UK school buildings are required to be open for at least 190 out of the 260 weekdays of the year (R [43]. Exact timing and number of holidays varies from region to region, and annually depending on which month Easter falls. In addition, the extent of occupancy during holiday periods varies since some schools will be closed whereas others will continue to operate children's clubs. During the cooling season, the BB101 overheating calculation [4] specifically requires classrooms to be occupied during weekdays from 1st May to 30th September. This neglects the summer holiday period of around 6 weeks from July to September in England. We decided to follow this occupancy pattern consistently throughout the entire year since it is not possible to match 190 occupied weekdays without removing up to 70 weekdays from the heating period from 1st October to 30th April, underestimating the requirement for space heating. An alternate approach could be to combine the National Calculation Methodology (NCM) [44] for occupancy during the heating period. However, assumptions around internal loads differ between the NCM and BB101 and it was accepted that due to the occupied summer period, annual heating calculations are not going to be an exact match for annual heating load in practice. As discussed in Section 2, the model has been defined to simulate relative changes in criteria between two model results rather than define a base-line for a single model. Hence we decided that adding additional assumptions and complexity about when the school is unoccupied is likely to reduce rather than enhance comparability of results. Therefore Table A-1 refers to year round schedule of occupancy on weekdays, rather than including holiday periods.

Archetype generation
We constructed archetypes by first incorporating phase of education (primary/secondary) and era of construction, shown below in Table 1, to include the following features: -Categorisation by era was based upon previous life-cycle analysis [45].
-Use of cavity walls for post war archetypes was based upon the practice in UK buildings at the time [46]. -Double-glazed windows were included in the most recent Post-1976 archetype only [47].
-Floor to soffit heights were implemented based on the Resilient School Buildings Programme [38].
-Glazing ratios were derived from a weighted average by floor area of all glazing ratios within each of the 10 combinations of primary/secondary school and construction era within the PDSP [16].
Evidence of how airtightness varies across construction age had not been found to be significant at 95% confidence interval [48], hence we decided to fix permeability at 9 m 3 /h/m 2 @50Pa, representing a "normal" school [49].
For the second stage of archetype construction, we interrogated the PDSP database to determine the number of schools in the UK under four different categorisations (phase of education, era of construction, geographical region and ventilation type). If present, we then generated the relevant archetypes from the appropriate phase/era seed models. We used geographical region, defined as degreeday regions in energy benchmarking [28] shown in Appendix Figure A-1, to select the regional weather files used for simulation, as described in Section 3.2, rather than model construction.
Although our research group have previously categorised around 95% of over 18,000 premises in the PDSP database as naturally ventilated [50], the ASPIRE team auto-generated separate mechanical and naturally ventilated archetypes, where available within the PDSP. Since the PDSP has not recorded any assumptions about the mechanical ventilation system other than condition, we assumed that both infiltration and ventilation rates meet design standards, parameters of which are given within Appendix Table A

Scenario modelling
We have created a matrix of 24 pair-wise combinations of four energy retrofit and six IEQ improvement scenarios. Table 2 summarises progressive improvements which we applied to fabric to the base (Base) description from Table 1, to Building Regulations (MinR), intermediate (IntR) and EnerPHIt (EnPH) retrofit scenarios. The final row, indicating heating system efficiencies, is described in Section 3.3 on post-processing. Wall and window U-values corresponding to each retrofit scenario are summarised in Appendix Table A Table 3.

Simulation
We simulated all 24 scenarios for each of the 195 archetypes at hourly timesteps over a calendar year for three different contaminant types (CO 2 , NO 2 and PM 2.5 ) and a range of variables, summarised later in Table 4. We carried out all simulations in parallel using the Myriad High Performance Computer (HPC) module to generate all output files in around 4 h. For each model, we selected three weather files, representing three climatic scenarios (2020s, 2050s and 2080s) corresponding to the relevant geographical region. We sourced weather files from CIBSE's current and future weather files, based on UKCP09 climate projections [53], for which EnergyPlus weather files are readily available in suitable format for simulation. Table A-4 in the Appendix describes the process which was used to select and create hybrid weather files.
Internal CO 2 concentration was calculated by EnergyPlus at each hourly timestep as a function of CO 2 generated by occupants (based on Table A-1), diluted by ingress of external air. We fixed current external and initial classroom concentration at 435 ppm for 2020s, based on 415 ppm 2020-21 average value [54] with a 20 ppm uplift to account for urban setting [55]. Since the Medium emissions case corresponds to the A1B IPCC scenarios [56], we calculated the 2050s and 2080s CO 2 concentrations including urban uplift as 552 (532 + 20) ppm and 669 (649 + 20) ppm respectively.
Since EnergyPlus does not permit external contaminant concentrations to vary on an hourly basis, hourly internal NO 2 and PM 2.5 concentrations were calculated using Python for each simulation during post-processing by multiplying: -External concentration, specific to geographical region and contaminant.
-EnergyPlus calculated indoor-outdoor ratios (I/O ratio) for each simulation.
In the absence of reliable estimates, and with viability of future reduction strategies [57] outside the scope of this work, we decided to use the same external concentration figures across all 3 climate scenarios. We used hourly rather than monthly or weekly averaged external concentration and I/O ratio data since intra-day variations and week-long peaks of high NO 2 or PM 2.5 could be lost at lower resolutions. Fig. 4 shows intra-day and weekly variation of both contaminants from a typical fortnight, illustrating this point.  We defined the following process to calculate hourly internal NO 2 and PM 2.5 concentrations: 1. We downloaded raw external hourly data for all sites for which 2019 hourly data was available via the UK Department for Environment Food and Rural Affairs (DEFRA) monitoring website [57]. We then grouped all hourly annual data by contaminant (2x) and geographic region (13x) to create 26 sub-sets of data. We replaced gaps in data with preceding values and for a couple of regions where there were no monitoring stations, we used adjacent regions. 2. We plotted all data from each sub-set to determine which single monitoring station provided the closest to typical median values consistently for each respective contaminant and region.
We then calculated I/O ratio of each contaminant, for each region, through windows and surfaces at each hourly timestep by setting external concentration to 1. For surfaces, EnergyPlus calculates flow using deposition velocities based on surface area and experimentally derived deposition rates of 0.87/h [58] for NO 2 and 0.19/h (C. M [59]. for PM 2.5 .

Post-processing and performance criteria derivation
We identified six performance criteria through engagement with a Project Advisory Group consisting of governmental, academic and industry stakeholders: attainment, overheating, stuffiness, air quality, energy cost and emissions. Table 4 describes the calculation of the metrics for the six criteria, with air quality comprising both NO 2 and PM 2.5 components and overheating simplified to a single criterion to simplify quantitative comparison. We coded the relationships to derive these from files of hourly EnergyPlus output data created by each individual EnergyPlus simulation using Python scripting.
For attainment, it is important to note the validity of the attainment calculation only within specific temperature (20-28 • C) and ventilation rate (2-7 l/s/person) ranges. This is due to availability of empirical data, which may impact the viability of such measures in extreme heat/high ventilation scenarios. Although a relative measure, the percentage result provided is related to reference conditions of 20 • C and 7.35 l/s/person, when the attainment metric equals 100%.
To calculate air quality criteria, we used the same methodology, using I/O ratios from EnergyPlus, coupled with penetration factors Where f is Floorspace (m 2 ), η is efficiency factor or COP, from and external concentrations as presented previously [17]. For energy cost and emissions, we used the following parameters for energy costs and carbon intensities: -We derived unit gas and electricity costs of £0.08/kWh and £0.30/kWh, respectively from the USave consumer website in July 2022 [60] and kept them constant for later periods. -We used a heating carbon intensity factor of 203 gCO 2 /kWh, representative of domestic gas supply, for all scenarios. Grid electricity carbon intensity factors are 151.5, 51.8 and 0 gCO 2 /kWh for the 2020s, 2050s and 2080s scenarios in line with the "Steady progression" scenario from National Grid [61] in decarbonising the grid. Although the UK government's Clean Growth Strategy [62] includes a move to electrical heat pumps, grid/fuel mix has been maintained except where part of the EnerPHIt retro-fit strategy.
As discussed in Section 2.1, the derivation of implications for policymakers requires visibility on the regulations and standards which exist to drive improvements in the above criteria: -Air quality: The World Health Organisation (WHO) [3] has recently updated annual air quality guidelines on NO 2 and PM 2.5 -10 μg/m 3 and 5 μg/m 3 respectively.
-Overheating: BB101 [4] defines overheating in terms of three criteria, of which the first, annual hours of exceedance, provides the minimum requirement for assessing overheating risk at under 40 h annually. The other two criteria: daily weighted exceedance and upper limit temperature, while used to report extent of overheating, are indicative of peak short term discomfort during heatwaves and as such are less indicative of performance over the year. -Stuffiness: "sufficient outdoor air should be provided to achieve a daily average concentration of CO 2 of less than 1500 ppm" [4]. An equivalent figure of 1000 ppm for mechanically ventilated classrooms is also described. -Energy: Annual fossil fuel energy use intensity per unit floorspace in kWh/m 2 could be used for benchmarking individual school buildings against a "typical" school [28]. For this research, however, we could not convert classroom heating demand into a whole building figure without making many assumptions about the use of the rest of the building. Instead, our research team previously validated whole building models constructed with a similar methodology against DEC data to demonstrate the robustness of simulation methods [50] across the stock.

Format and weighting of analysis
Since performance criteria vary across archetypes, climates and orientations as well as across scenarios, a common normalised scale is required to show best-performing pair-wise scenario when aggregated, as in Section 4.1. Hence, we created normalised criteria performance, θ pcalo for every pair-wise scenario, p, to facilitate evaluation of each criterion, c, across a set of models, alo, representing a specific combination of archetype, a, climatic scenario, l and orientation, o: Since attainment is the only criterion which requires maximising rather than minimising, we used the following formula, for when c = attainment, so that the best performing pair-wise scenario has θ pcalo = 0 across all criteria.
In Table A-5 of the Appendix we derived the frequency of occurrence of the 195 era-geographic archetypes within the PDSP, by ventilation type and phase. This provides weighting factors to each archetype for the following analysis, with archetypes constituting <0.1% of the floorspace in the PDSP excluded from the analysis. While the trends of naturally ventilated primary schools comprising 72.5% of the entire stock are not surprising, it can be seen that a relatively higher proportion of secondary school floorspace is mechanically ventilated (roughly 50:50) than primary school (roughly 1 in 30). The three regions of Thames Valley (26.1%), West Pennines (15.6%) and Midlands (14.3%) comprise just over half of the stock and that the Pre-1918 and 1945-1967 era-archetypes are most common, representing around half the stock. We have considered this propensity when simplifying the simulation results deck in Section 4.2. The top performing scenario is the least energy retrofitted option (Base), which provides most consistent movement of heat and airflow through fabric, coupled with all four IEQ improvement measures within the cumulative (Cumtve) scenario. Demonstrating the relative impact of individual measures; coupled to base retrofit, use of albedo and blinds (KpHtOt) and external shading (ExtSha) appear most effective relative to passive ventilation (PasVen) and use of thermal mass (Manage). However, as the degree of retrofit increases, passive ventilation becomes more effective at mitigating overheating. This is indicative of the greater relative importance of night-time heat retention over day-time heat absorption in breaches of the overheating threshold during occupied day-time hours. NO 2 air quality criterion, as demonstrated in Appendix Figure A-2, displays similar trends as overheating with respect to highly favouring the leakier, non-energy retrofitted base case due to maximising ingress of air at night time when pollutant levels are low. Based on Fig. 6, the un-retrofitted base option provides the least negative impact on attainment. The presence of a separate set of outliers indicates differences for less prevalent mechanically ventilated schools. In the equivalent plot for stuffiness in Appendix Figure A-3, a converse preference is demonstrated for the EnPH retrofit since natural ventilation is utilised for a larger proportion of the year than in schools retrofitted to a lower energy efficiency standard. Aside from a 10 min forced purge period at the start of each occupied hour as indicated in Table A-1, the only means of CO 2 removal is when cooling is required above the setpoint of   In terms of IEQ measures, for both criteria the use of passive night ventilation for overheating mitigation appears to have a negative effect. This is due to lower required ventilation rates when the classroom is occupied and ventilation rate being independent of CO 2 concentration. A very marginal relative difference between the other three IEQ improvement measures (ExtSha, KpHtOt, Manage) and base operation demonstrate the impact of attainment having dual dependence on internal temperatures and ventilation rate, which are interdependent in the model.

Performance of each pair-wise scenario by criterion
Although some degree of retrofit is highly preferential, Fig. 7 shows the extent to which middle-range MinR and IntR retro-fit cases are sufficient at minimising heating without having to resort to expensive EnerPHIt (EnPH case). As shown in the equivalent energy costs plot (Appendix Figure A-4), the relatively higher cost per unit (despite differences in efficiencies) of using the heat pump in the EnerPHIt option skew costs to make it less favourable. IEQ improvement measures used appear to only be significant for non-retrofitted scenarios.

Impact of school stock features on performance
We simplified the simulation results deck to investigate the influence of stock heterogeneity on the performance of each criterion for the top individual pair-wise scenarios. 36 different archetype-orientation combinations were retained by consolidating the following factors, based on propensity derived in Section 3.4: -Four orientations have been reduced to two most extreme (North/South facing) -Four climatic scenarios reduced to two most recent (2020s/2050s) -Five construction eras reduced to three most diverse (Pre-1918, 1945-1967, Post-1976 -Thirteen geographical regions have been reduced to three most diverse, representing: oSouth Easternbased on London Heathrow weather (Z01) oWestern -Birmingham (Z06) oNorthern -Leeds (Z10) Using the North facing, 2020s climate, Pre-1918 construction, London Heathrow classroom as the reference baseline, multiple linear regressions have been carried out for each criterion for the best performing pair-wise scenario, as presented in Table 5, row by row. Regression coefficients and p-values for the multi-linear regressions carried out for each row in Table 5 are given in the Appendix Table A-6. For each regression, absolute values of each criterion have been used as response variables and each of the four factors above were utilised as dummy explanatory variables. To test the resilience of the best performing pair-wise scenario across the four factors, two regressions have been carried out for each criteria, with the Base-BaseOp scenario provided for baseline comparison.
Based on Section 4.1, we have identified the following four retrofit-IEQ improvement pair-wise scenarios as generally topperforming for various individual criteria, and have included them together with baselines in the second column descriptions.
-Base retrofit paired with Cumulative operational scenario (Base-Cumtve) for Overheating, NO 2 and PM 2.5 exposure -Base-KpHtOt for Attainment -EnPH-Manage for Stuffiness -MinR-Cumtve for Energy costs and emissions From left to right, Table 5 contains the following results of the regression.
-In the third column, we have included a comparison of the intercepts of each criterion's regression of applying the best retrofitoperational pair-wise scenario instead of the baseline scenario. For example, the baseline of 253 annual overheating hours is reduced to 100 h by applying all 4 operational measures via the Base-Cumtve scenario. -The absolute change in each criterion is given by regression coefficients of each individual change in factor, across all 36 different archetype-orientation combinations, for both baseline and top-performing pair-wise scenarios. For example, overheating is reduced by 93 h by moving to Birmingham from the London baseline scenario (Base-BaseOp), and 111 h for the best performing scenario (Base-Cumtve). -The Adjusted R squared value for each regression, noting that the impacts of the Climate factor for the Energy cost and Emissions criteria are not significant at 95% confidence level (shown in grey), while all other values are. -The final two columns denote colour coding applied to indicate improvement/deterioration and percentage change in each criterion. We selected the range arbitrarily to facilitate comparison of relative impact of different factors within each criterion and across different criteria. We have applied these both to percentage change: oFrom Base-BaseOp to best-performing scenario in adjacent rows of column 3. oDue to factor changes (columns 4-9) from base-line (column 3 of same row).
Let us consider the first row corresponding to the "Base-BaseOp" combination intervention. For that, the number of overheating hours is 252.7. On average, the change relative to the London region is − 93 h for Birmingham and − 103 h for Leeds (assuming all other explanatory variables are the 'reference' value). In terms of construction era, on average the change relative to the pre-1918 construction era is +150 h for 1945-1967 era and +128 h for post 1976 era. In terms of climate, on average the change relative to the 2020s is +54 h for 2050s. Finally, in terms of class orientation, on average the change relative to north facing is +211 h for south facing.
Since the direction of changes in criterion are fairly intuitive for most factors in Table 5, the relative magnitudes between factors are of most interest with the following salient points for each criterion: -Overheating: North/South-orientation shifts from being the most important factor for overheating for the baseline to the least significant factor. As suitable operational measures are applied, external climate and heat transfer become more significant. -NO 2 /PM 2.5 annual averages: The impact of the Cumtve scenario (due to use of night ventilation when there is less traffic) and location is relatively higher for NO 2 than PM 2.5 . This is due to the daily cyclical nature of NO 2 and sensitivity to location, compared to greater persistence of PM 2.5 concentrations, documented in Fig. 4. -Attainment: 1945-1967 era-construction, has been shown to be at least 3x more detrimental to attainment than any other factor.
Lower floor to ceiling heights relative to pre-1918 classrooms make it more difficult to fully mitigate against using increased albedo alone to keep heat out. This is due to the reliance of attainment on both low internal temperatures and high ventilation rates, which although interdependent, are both boosted by having additional air volume. -Stuffiness: Factors which improve heat retention (using EnerPHIt, being South-facing) appear most positive, since they facilitate temperature-driven ventilation for a greater proportion of the year. However, there are a couple of notable exceptions. Negative impact of low ceilings in the 1945-1967 archetype outweigh increases in insulation. Also, increasing external CO 2 (by 117 ppm) in the 2050s is a more dominant effect than the increased use of ventilation which would lead to lower internal CO 2 concentration. -Energy cost/emissions: The improvement in energy criteria performance by using the MinR retrofit is equivalent in magnitude to the improvement seen through the 1945-1967 and post-1976 era-archetypes. However, other contextual factors such as orientation, region and climate are less critical. These have been quantified respectively at roughly 1/3, 1/4 and 1/6 of the impact in absolute energy demand terms, relative to the baseline.
In terms of reference to existing standards described in Section 2.1, Table 5 shows that all but a small number of cases (usually Pre-1918 Leeds cases with the Base-Cumulative scenario) breach existing guidelines of 10 μg/m 3 for NO 2 , 5 μg/m 3 for PM 2.5 and 40 h for annual overheating. This demonstrates a further need for optimisation of timing and temperature of fresh air provision in terms of setpoints for external temperature and contaminant concentration.

Implications for policymakers
We have demonstrated the development of the three-dimensional approach of multi-criteria, stock-wide and dynamic resolution within a UK classroom stock model, as detailed in Sections 2 and 3. In addition we present the following findings, relevant for future policy making: 1. On overheating, Fig. 5 demonstrated a shift from shading and albedo to passive night-time ventilation as effective components for preventing overheating as degree of retrofit increased. However, fresh air supply while occupied remains critical for attainment to be maintained, potentially conflicting with guidelines on keeping windows closed during summer heatwaves. This is indicative of the need for published advice on the use and impact of passive measures to evolve as stock retrofit improvements are made. 2. In terms of extent of future retrofitting required, Figure A-3 and Fig. 7 demonstrate an example of the large incremental benefit of the first retrofit step to Building Regulations standard in stuffiness and energy demand criteria.
3. Comparison of baseline and best-performing pair-wise scenarios in Table 5 quantified the diminishing causal effect of orientation on overheating as IEQ improvement measures were implemented within the Cumulative scenario. This indicates a possibility of mitigating fixed contextual causes of poor IEQ through optimising air flow and heat transfer. However, it also potentially increases the relative importance of addressing unfavourable fabric related causes of overheating through retrofitting. 4. Classrooms have been identified where the existing set of passive IEQ improvement strategies combined with existing retrofit standard cannot adequately meet the selected criteria without volumetric alterations or some form of mechanical ventilation. For low-ceiling modern era constructions, reduced air volume results in an inability to rely simply on operational measures or basic retrofit to mitigate impact on attainment.
In terms of developing a tool for policy evaluation, most previous research has focussed on scaling up of monitored case studies [12,19,20] demonstrating operational factors influencing energy and IEQ. Such studies have significant discussion on whether well-calibrated results could be scaled-up to other types of buildings. The four points presented above indicate that it is possible to account for differing impacts of energy retrofit and IEQ improvement, working in the opposite way, to scale down stock-modelling to specific sectors in the design of policy instruments. However, this comes mostly at the expense of calibration, a key strength of case studies which is a key limitation acknowledged in the next section.
Hence we have demonstrated a novel approach, which could provide policymakers the means to determine the extent to which retrofit and IEQ improvement scenarios could be paired in order to reach net zero emissions without compromising attainment and IEQ. Within the constraints laid out in the next sub-section, we have been able to demonstrate performance of various conflicting IEQ, attainment and energy criteria under a number of dynamic retrofit, IEQ improvement and climatic scenarios, while accounting for stock heterogeneity.

Limitations of current approach
We have identified the following limitations and constraints of this research: -The oversimplification of seed model design excludes other forms of ventilation such as cross and stack ventilation at the expense of single-sided. Additionally, by using a single external surface, we have assumed that classrooms are surrounded on both sides, above and below by other classrooms. This negates the effect of specific design features such as flat roofs in post-1945 schools. While introducing a fourth dimension of study, a stock-wide approach could involve simulating a finite set of classroom geometry archetypes and ventilation layouts rather than a single seed model. -While the oldest four era-archetypes are related to national school building programmes, our simplification of the post-1976 stock to a single archetype is a gross over-simplification which requires greater disaggregation. Since 1976 there have been a number of different school building programmes nationally and regionally. Many studies have demonstrated a wide range of energy and IEQ performance [21,38] due to vastly different construction types. -While we have incorporated glazing ratio into archetypes, it was averaged over such a large sample of buildings that all glazing ratios reported in Table 1 fall in a narrow range between 25 and 30%, rendering it practically non-consequential. -We previously demonstrated that urban setting of each individual school is a major relevant factor to providing additional detail in airflow and shading characteristics [17]. This could potentially be derived individually from the more recent Condition Data Collection (CDC) [63] containing additional site-level data. -Peer-reviewed multiple-year hourly air pollution data representative of region would benefit this research, similar to that which exists for weather files. In conjunction with this and the previous point, the allocation of NO 2 and PM 2.5 monitoring site settings to specific archetypes (defined as kerb-side, urban background and rural background), could avoid entire regions being labelled as entirely rural or urban. -More stringent setpoints in the model, possibly related to CO 2 concentrations, could improve operation of ventilation. For example, night ventilation negates the cooling requirement to ventilate as much when occupied. However, there is still a separate fresh air requirement for maximising attainment during occupied hours, which requires an option to ventilate further, dependent on external temperatures and contaminants.
The authors' previous work demonstrated agreement within 4-12% between whole building simulated and DEC measured data [15]. We also validated peak ventilation rates of 15-20 l/s/person and internal temperatures of 30-33 • C derived from base modelling [17] against measured values from Western classrooms [2]. However, a key limitation of this work is calibration of relative changes in criteria, rather than static performance, in response to implementing retrofit and IEQ improvement changes. While applying specific monitored examples to non-specific stock models may be tricky, it is hoped that in the future a meta-analysis of such examples could provide suitable values to calibrate such models.

Future work
Future development of the UK classroom stock model should largely be focussed on ensuring the relevance of IEQ and energy criteria to performance in real classrooms. While this work demonstrates the output of all criteria individually from a single simulation run of each model, future iterations of this work should utilise several runs to derive criteria. Such an approach could demonstrate an operating envelope, for example of IEQ versus energy measures rather than a single point, demonstrating how trade-offs between different criteria would be achieved in practice. For example: • The requirement to use BB101 schedules and setpoints to derive overheating, prevents other operational practices such as out-ofhours usage and vacating the building during summer months to be investigated. • Energy costs should also incorporate a penalty in mitigating overheating through an ideal cooling model, rather than present overheating as a wholly IEQ-related issue, similar to how heating is modelled. The significant overheating hours predicted in our model would, in practice, should be partially or completely mitigated against. Future multiple simulations of the same model with different treatment of excessive temperatures could demonstrate trade-off between cooling load energy cost and extent of overheating.
Similar to changing classroom layout (discussed in the first bullet of Section 5.2), such models would likely require a fourth dimension to be applied within analysis to optimise across different operational patterns of ventilation and heating timing and setpoints.
The transparency of addressing criteria which have defined measurement and standards, could lead to others (second and third categories from section 2.1) being ignored, without a means of prioritising criteria relative to each other. Hence a future paper will address how such criteria can be weighted using a stakeholder survey and combined within a Multi-Criteria Decision Analysis (MCDA) tool, to evaluate performance of the pair-wise scenarios across all criteria. This will be aided by the conversion of contaminant concentrations into specific health costs.

Conclusions
We have demonstrated a novel approach to investigate UK classroom stock resilience, based upon multiple IEQ and energy performance criteria, using archetypes to model heterogeneity. We used pair-wise coupling of future retrofit and IEQ improvement measures to demonstrate change in performance given dynamic changes to the stock. As degree of retrofit was increased, we have measured the increasing effectiveness of passive ventilation in mitigating overheating when compared to shading and albedo measures. Conversely, we have quantified diminishing returns to reductions in energy demand from retrofitting from Building Regulation standard through to EnerPHit.
Our analysis of individual factors of school building stock heterogeneity has helped to demonstrate how the impact of orientation on overheating could be mitigated through a range of improvement measures. However, through quantifying the impact of reduced air volume in lower floor to ceiling height in 1945-1976 era constructions on attainment, we have demonstrated a lower effectiveness of the parcel of IEQ improvement measures. The next step of this work is to consolidate these results into a single recommended option based on the preferences of different stakeholder groups through MCDA.

Funding information
This study was funded by an Engineering and Physical Sciences Research Council (EPSRC) grant: Advancing School Performance: Indoor environmental quality, Resilience and Educational outcomes (ASPIRE,EP/T000090/1).

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
Data will be made available on request.