A framework to identify and prioritise the key sustainability indicators: Assessment of heating systems in the built environment

Sustainability indicators (SIs) are important instruments to quantify, analyse, and communicate complex sustainability information, with a history of application in energy research. It is critical to identify an effective set of indicators which can holistically evaluate the energy systems encompassing the three facets of sustainability: environment, economy, and society. However, the literature has been lacking in either proportionally representing the sustainability dimensions or reflecting the stakeholders ’ preferences. This paper develops a framework to identify and prioritise a set of SIs, critically reviewed to ensure reflection of a wide array of factors and conceptions of what sustainability entails. The developed framework utilises a series of methods within three phases: identification, refinement, and prioritisation. Applying the proposed framework to building heating technologies, a set of 22 SIs consisting of 4 economic, 8 environmental, and 10 social indicators were identified. According to the results, the economic indicators of Operation & Maintenance Cost and Net Present Value were found to be the most impactful factors, while environmental SIs contribute the most to the overall sustainability weight. The identified indicators apply to the assessment of heating systems and policies, and the proposed framework could more broadly support analysis of key sustainability criteria in various fields.


Introduction
Decarbonisation of heating in the built environment has been recognised as a key priority in transitioning toward future energy and climate change targets . The global energy crisis and related risks to energy security, combined with wider cost-of-living challenges and rising utility bills are providing unprecedented momentum for a transition away from fossil fuel-based heating, particularly in Europe (IEA 2022). Heat transition, however, is tied up with a wide diversity of social, economic, and environmental factors that need to be considered before implementing transition measures and policies. These factors could be bridged and studied under the term of sustainability in an integrative and inclusive way in order to plan and deliver a sustainable and equitable transition.
First coined by Elkington (Elkington, 1997), the Triple Bottom Line (TBL) sustainability is a tri-dimensional concept that incorporates social, environmental, and economic dimensions to examine sustainability performance. These dimensions have a life of their own, but they are also closely intertwined and can trigger transformations in each other (Al Sarrah et al., 2020). Each dimension is measured by reference to sustainability indicators (SIs). SIs reflect the level of sustainability and provide means for monitoring and signalling the progress towards sustainability (Moldan and Dahl, 2007). SIs emerge from the fact that sustainability is affected and depended on a long list of factors (Kylili et al., 2016).
Based on the TBL notion of sustainability, environmental indicators measure various types of pollution and implications that result in environmental impacts from a local to a global scale. The environmental sustainability of the energy systems is often affected by air and water emissions, land degradation, freshwater exploitation, depletion of nonrenewable resources and changes in wildlife. Economic indicators contribute to the progress of society toward achieving its economic objectives. Clune and Zehnder (Clune and Zehnder, 2018) argue that economic objectives include wealth, employment, income, welfare and high productivity. Finally, social indicators usually deal with the impacts on human health, equity, community liveability, historic and cultural heritage, and aesthetics (Ajmal et al., 2018).
Selecting an effective set of SIs, encompassing all economic, environmental, and social aspects of the systems, is essential prior to any multi-criteria analysis. Both the building industry and the energy section have a relatively long tradition of developing and using SIs for tracking sustainability in the built environment and energy systems (Liu, 2014, Lynch andMosbah, 2017). The existing literature, however, often presents considerable limitations such as subjectivity of the SIs, lack of stakeholders participation, predomination of environmental criteria, and dissimilarity of the indicator sets (Fernández-Sánchez and Rodríguez-López, 2010). Regarding the building heating systems (BHSs) in particular, the lack of a comprehensive and consistent set of SIs is probably the major challenge on the way to monitoring whether or not a particular heating scenario is on the course of sustainability.
Built upon these gaps, the current study seeks to find out: a) How to derive a set of critical SIs for different systems while ensuring proportional representation of all facets of sustainability and reflection of the stakeholders' priorities? b) Which indicators could accurately portray Table 1 Indicators used for sustainability assessment or multi-criteria analysis of energy systems and interventions in buildings.

Source
Year Application (Case study location) Sustainability dimensions Indicators (Vasić, 2018)  Renewable energy, non-renewable energy, area, heat price, CO₂ emissions, SO₂ emissions, wastewater, regional added value, overall efficiency, avoided environmental impacts (Kuznecova et al., 2017) 2017 Household heat generation systems (-) Economic Energy costs for one household member, share of costs from income, share of low-income households, Gross domestic product (GDP) Environmental Heating consumption in household, share of RES, share of fossil fuels, CO₂ emissions, PM emissions Social Number of rooms in a house, number of rooms per inhabitant, size of dwelling, environmental problems, expenditure problems (Zhang et al., 2019) 2019 the sustainability of BHSs and can be used to evaluate heating systems and strategies in the built environment? Therefore, a generic methodological framework is established, aimed at identifying, selecting, and prioritising a representative set of SIs in various fields. The framework accounts for a series of qualitative and quantitative data to determine the SIs and their importance weights thus reducing the subjectivity and uncertainties of the process. The framework is then elaborated and tested through its application to the case of BHSs.

Motivation and context
According to (Vidal et al., 2011, Wang et al., 2009, Siksnelyte--Butkiene et al., 2021, Baker et al., 2001, the SIs utilised in multi-criteria analyses are required to have some qualities to reflect the sustainability and its roots within a system. They have to be (1) representative to holistically reflect the essential characteristics of the system; (2) sensitive and operational in addressing the changes in the system to accurately portray the differentiation between system elements and comparisons among them; (3) independently measurable and verifiable using methodologically-based and repeatable methods, as well as accessible and transparent data; and (4) concise and few in number to avoid repetition and overlapping between them and minimise the complexity and indeterministic nature (plurality) of the problem.
Developing an indicator set that can fulfil the SI requirements and comply with the relevant literacy is essential in determining the direction and assessment of sustainability (Rajabi et al., 2022). Extensive literature is available regarding the identification of the SIs associated with the built environment and energy systems. Focusing on the overlap of these areas, Table 1 provides a list of some recent studies presenting SIs for building energy systems or building energy interventions.
The studies depicted in Table 1, have utilised various SI sets to develop a decision-making tool for new projects or to assess the sustainability of existing projects. However, a solid and uniform set of SIs which can be generally applied to BHSs is still lacking in the existing literature. This is due to some limitations that are discussed in the following, in addition to the general belief that there is no particular indicator set that is suitable for all applications (Grafakos et al., 2017).
Firstly, most of the aforementioned studies have established the SIs based on the conditions and requirements of a specific country. Therefore, they cannot be applied universally to different locations to track the sustainability of energy systems or transition plans. Additionally, depending on whether the technology or the whole sector is assessed, the selected indicators vary widely in terms of their application scale (Siksnelyte-Butkiene et al., 2021). The reviewed indicator sets are primarily produced based on the top-down approach and are often aimed at global, national, or state scales. Thus, the effectiveness of these methods in assessing sustainability at finer spatial scales could be problematic (Graymore et al., 2008).
Another important limitation is that many studies do not involve stakeholders in the decision-making process in a systematic and participatory way. They often attempt to mitigate stakeholders' preferences instead of directly including them in the decision-making process. This is while implementing socio-technological analytical approaches such as social construction of technology (SCOT) is increasingly supported in the literature to further understand the relevant social groups and stakeholders and their concerns in the development of technologies (Elle et al., 2010). Indicator developers also have rarely attempted to validate the credibility of the SI selection, alternatively relying on the long-term acceptance of indicators by other users (Grafakos et al., 2017).
Finally, the existing literature has not equitably considered the three dimensions of TBL sustainability. Reviewing the articles in Table 1, what is often found to be underrated or not included at all is the social dimension of sustainability. In a broader sense, the lack of social factors consideration in research and practices is underpinned by several scholars , Hashempour et al., 2020. In the building assessments, for instance, a recent review by Hashempour et al. (Hashempour et al., 2020) shows that in only 22% of the assessment frameworks, social aspects are considered in analysing energy retrofits and sustainable renovations in the buildings. Gathering 51 academic publications, they found that social indicators are considerably underdeveloped compared to economic and environmental ones. Fig. 1 shows the balance of sustainability indicators in the investigated studies by (Hashempour et al., 2020).
Similarly, Pombo et al.  conclude that only three out of the 42 reviewed studies have incorporated social indicators in the multi-criteria assessment of sustainable renovations. Where social sustainability is included, the focus has been mostly on indoor air quality, functionality, employment, thermal comfort, and cultural aspects (Nielsen et al., 2016, Antunes andHenriques, 2016). As a result, some other important social factors such as fuel poverty and health issues that are directly influenced by building energy performance are not investigated properly in studies.
Likewise, a similar lack can be found in the energy systems' scholarships. Zanghelini et al. (Zanghelini et al., 2018) showed that social sustainability in energy systems can be often found in general propositions, usually integrated with the environmental or technical aspects. This gap is noticed by other scholars, generally stating that most multi-criteria analyses focus on environmental and technical aspects of energy systems (Grafakos et al., 2017, Kowalski et al., 2009, Campos-Guzmán et al., 2019. Afshari et al. (Afshari et al., 2022) noted that conflicting objectives and subjectivity of indicators often make implementing social sustainability difficult, which is one of the reasons why it is been understudied. The role of social factors, however, is increasingly paid attention to in technology assessments (Mainali and Silveira, 2015). The highlighted gaps reveal the need for revisiting the traditional sustainability assessments and renewing the focus on the TBL notion of sustainability. Therefore, as this study aims to address, it is required to develop an inclusive and purpose-designed set of SIs for the assessment of BHSs at the product level. The motivation behind this research is that the sustainability of the building heating sector is increasingly gaining attention, but the evaluation of heating technologies needs to be further supported by the research. The proposed selection of SIs will reflect the TBL sustainability aspects of BHSs with easily accessible data and replicable processes. It also renews the focus on social sustainability and stakeholders' participation to address the existing gaps in the assessment of energy technologies.

Material and methods
For this research, a framework is developed to obtain the required set of SIs through three phases, comprised of six stages, which are illustrated in Fig. 2. The process begins with the identification stage in which a preliminary list of indicators that have been applied in building and energy studies are gathered. Collecting SIs from the previous research through the literature review is a prevalent starting point for this process and a foundation for the development of an effective sustainability assessment model (Rigo et al., 2020, Daugavietis et al., 2022. Therefore, at this stage, a wide range of relevant SIs are obtained through a systematic review of the peer-reviewed literature that reflects sustainability issues in energy systems and building energy interventions.
The long list of identified indicators needs to be reviewed and clustered to shape the categories required for sustainability assessment. Therefore, in the second stage, the collected indicators are classified to comply with the principles of TBL sustainability which defines sustainability upon the three pillars of the economy, society, and environment. The SIs are recategorized into economic, social, and environmental indicators based on the area of their ultimate impact.
The abundance of the SIs, however, is problematic as it complicates the data collection and processing. Furthermore, the reliability and maturity of sustainability assessment relies on developing a concise set of indicators which can lay the ground for rational comparisons and decisions. Therefore, the refinement phase, constituted of three refinement stages, is designed to dismiss the indicators that are not vital and alternatively select those which reflect the most important aspects of sustainability.
The first stage of refinement is performed using the Pareto analysis method to identify the most frequently used indicators in the relevant literature. Using this method, the essential SIs under each dimension of M.H. Abbasi et al. Sustainable Cities and Society 95 (2023) 104629 sustainability are determined and the trivial indicators are screened out from further consideration. The shortlisted indicators, however, are sometimes not applicable or relevant to the context or have overlaps in functionality that need to be cleaned up to avoid confusion. These indicators are, therefore, eliminated or merged at the second stage of refinement, referred to as compatibility check, to ensure alignment between the SIs and the characteristics of the study context and scope. This is followed by the last stage of refinement based on the Staticized group technique to validate, revise, or improve the selected SIs using the experts' opinions. To do so, a survey is carried out to collect reliable comments from certified professionals in design, planning and policymaking.
The final selected SIs have different levels of importance and impact on the sustainability performance of the systems. The level of importance can be quantitatively expressed by the indicator's priority weight in sustainability assessment or multi-criteria decision-making (MCDM) frameworks. Priority weights are directly taken into the calculation and need to be assigned rationally and carefully. Thus, the last stage of the framework deals with the prioritisation of the indicators according to the Analytic Hierarchy Process (AHP) (Saaty, 1987). Priority weights are obtained via polling, based on the judgments of stakeholders. Aggregation of the experts' judgments and consistency checks are critical steps of prioritisation which are also addressed at this stage.

Results and discussion
The methodological stages of the framework are elaborated in this section with their application to the case of BHSs. The collected data, conducted analyses, and derived results for this context are also presented as follows.

Identification
This stage aims to identify a preliminary list of SIs that can potentially be used for this study. A long history of SIs can be tracked both in the building industry and energy systems. On this occasion, the process of searching started with a focus on the overlap of these two sections, i. e., the building energy technologies. However, to provide a more comprehensive list of SIs, the search domain was extended, covering a broader area of building energy interventions and distributed energy systems, using the keywords such as 'sustainability indicators', 'multicriteria decision analysis', 'building heating technologies', 'energy renovations', and 'renewable energy technologies'.
The focus of this research was sustainability of energy systems at the product level, rather than at building level or larger spatial scales such as local or national level. From the initial list of articles that were found through extensive searching, those not addressing the sustainability of energy systems or building energy interventions are excluded. Finally, a set of 66 articles published between 2010 and 2022 are reviewed. A total of 156 SIs are identified from these articles as the preliminary list of indicators that could potentially be used for BHS studies.

Classification
The long list of collected SIs in the previous section needs to be recategorised into the TBL sustainability dimensions which is the basis of this research. The TBL model has been the model for many studies, while in many other studies presented SIs and their classification do not exactly correspond to the TBL definition of sustainability (Hehenberger-Risse et al., , Yang et al., 2018, Chen et al., 2020. In such cases, indicators have to be re-categorised under one of the TBL dimensions of sustainability based on the area of their ultimate impact. For instance, indicators such as job creation and indoor air quality which are both categorised under social sustainability in this study, are sometimes considered economic and environmental indicators in other studies. Furthermore, the identified SIs are reviewed to avoid any repetition of the indicators. Because despite the broad differences in indicator sets, there are some commonalities such as upfront costs, carbon emissions, and land use are referred to by different terms in the studies (Ahmad and Thaheem, 2017). Therefore, the initial SIs are reviewed and those with the same meaning and functionality are merged to ensure no duplication of SIs. Upon this filtration, the initial collection of 156 indicators is  Table A-1 in the appendix presents the categorisation of these indicators and a few references for them.

Refinement step 1: Pareto analysis
The first refinement step aims to identify the critical indicators that are frequently used by researchers using the Pareto analysis. Also called the 80/20 rule, the Pareto Analysis is a statistical technique of decisionmaking, primarily presented by Vilfredo Pareto (Craft and Leake, 2002).  The Pareto principle is used in various areas, helping to identify a vital limited number of factors among a large number of factors that produce a significant overall effect. The Pareto principle states that 80% of consequences in many problems come from 20% of causes (Fernández-Sánchez and Rodríguez-López, 2010). Accordingly, it can be argued that 80% of sustainability can be achieved through 20% of the most important indicators (Fernández-Sánchez and Rodríguez-López, 2010). This principle is widely used in sustainability studies, assisting in distinguishing the "vital few" from the "trivial many" decision factors (Fernández-Sánchez and Rodríguez-López, 2010, Hasan et al., 2017, Gani et al., 2021, Gani et al., 2022, Lazar and Chithra, 2021. Therefore, the Pareto Analysis is utilised in this study to filter the critical SIs. This process can be demonstrated with the aid of a Pareto chart in which the frequency of SIs are presented in descending order and their cumulative percentage are presented on the secondary axis. Where the frequency graph cuts an 80% cumulative percentage, the SIs can be divided into the vital few indicators and the trivial many (Gani et al., 2021). In this study, the Pareto analysis is separately performed for each category of SIs, depicted in Fig.s 3-5. The vital indicators with less than or equal to 80% cumulative frequency are separated via the red line and proceeded to the next stage.
Applying the Pareto analysis to the list of identified SIs, this list is narrowed down to 34 critical indicators that are frequently used by researchers. In brief, from the initial 48 environmental SIs, 15 of them are shortlisted, making up 78.5% of the total frequency of environmental SIs. Regarding the economic indicators, the initial list with 32 SIs is screened down to 8 indicators. Also, social SIs are reduced from 39 indicators to 11 critical ones after the Pareto analysis.

Refinement step 2: Compatibility check
The indicators obtained from the Pareto analysis have not yet been evaluated against the range of SI qualities which were mentioned in Section 2, including, representativeness, independency, and applicability. Furthermore, there is a risk of overlap among the indicators that undermines their independence and objectivity in assessments. The number of selected SIs is also still quite considerable, making them technically and practically impossible to be implemented on real-world projects. It is highlighted in the literature that having a reasonable number of indicators is beneficial to the sustainability assessment (Fernández-Sánchez andRodríguez-López, 2010, Wang et al., 2009). Experiments show that most individuals cannot accurately judge between more than seven, plus/minus two criteria (Bagočius et al., 2014). Therefore, the second round of refinement is required to filter out the indicators which do not meet the SI qualities, as well as merge those with overlap or correlation in functionality. This also further reduces the number of indicators, making their understanding and usage more consistent. This refinement step, called compatibility check in this research, is conducted based on the researchers' intuition and evaluation. By doing so, the following modifications are made with regard to the environmental indicators: ○ NO x and SO₂ emission factors are eliminated because these compounds are already included and addressed in the 'Acidification potential'.
○ The indicators of 'Global warming potential', 'GHG saving', and 'Climate change impact' have a clear overlap in addressing the same issue of GHG emissions. Thus, the 'GHG saving', and the 'Climate change impact' indicators are removed to avoid repetition.
○ Likewise, indicators of 'Fossil fuel depletion' and 'Primary energy consumption' overlap in capturing relevant aspects associated with resource depletion. 'Fossil fuel depletion' is thereby eliminated.
○ The acoustic performance and noise level of the systems are studied under social sustainability in this research. Therefore, the indicator of 'Noise pollution' is eliminated from the environmental SIs.
○ Fine particles are one of the biggest contributors to human health problems. Therefore, the PM emission factors are studied under the social indicator of 'Health impacts' and 'PM emissions' is removed from environmental SIs.
○ The indicator of 'Waste generation' is also removed because, concerning the case of buildings without solid fuel heating, the level of waste production and disposal is negligible (Lebersorger and Beigl, 2011).
Likewise, regarding the economic indicators: ○ Energy cost constitutes a sizeable share of O&M costs of a heating system, and it is taken into account in this indicator. It is, thereby, the 'Energy cost' indicator is eliminated to avoid double-counting.
○ Net Present Value (NPV) and the Payback time are two different approaches to performing the life cycle cost (LCC) analysis. While the payback method is found to be the most used indicator, LCC based on NPV is more accurate and efficient as it uses cash flow instead of earnings (Jensen et al., 2018). Therefore, 'Net present value' is used in this study, and indicators of 'Payback period' and 'Life cycle cost' are removed from the list.
And finally with respect to social indicators: ○ The indicator of 'Safety' in this article represents all the injuries, accidents, and mortality over the life cycle of the systems. Thus, 'Severe accidents' is eliminated from the SI list to avoid duplication.
○ The indicator of 'Social benefits' refers to the positive impact that an energy system has on the social progress of the community and region and is often used for large-scale energy systems (Saraswat and Digalwar, 2021). The crucial social impacts associated with household-level energy systems are covered via separate social SIs. Thus, this indicator is deemed less relevant to the scope of the study and is removed from further consideration.
Taking the above considerations into account, from the list of 32 SIs, 21 remain as the modified set of indicators. The outcome of the second refinement step is presented in Table 2.

Refinement step 3: Staticized group technique
In most of the previous studies, the selection or validation of SIs is undertaken exclusively by the researchers without involving the stakeholders. However, compared to individuals' decisions, groups' decisionmaking provides the advantages of a broader perspective and more experience and knowledge, while reducing the harms of individuals' cognitive restrictions and evaluation mistakes (Ossadnik et al., 2016). Also, including stakeholders in the initial stages of the development process ensures the effectiveness and applicability of the framework and facilitates long-term commitment and cooperation in implementing the results (Grafakos et al., 2017, Figueiredo et al., 2021. Thus, the current study engages stakeholders in the process of identification of SIs, assuring that experts' perspectives are reflected in the assessments. This approach is similarly used in (Gani et al., 2021, Gani et al., 2022, Lazar and Chithra, 2021 to distinguish the critical SIs in different fields. This stage of the framework is thereby designed to: a) Validate the selected set of SIs in the previous steps b) Identify the potential missing indicators c) Find out if any amendments for clarity purposes are required Several participatory techniques exist to incorporate judgments from a group of experts. Traditionally, interviews and group-brainstorming techniques, which involved substantial bias and uncertainties, were often used to collect subjective data from experts in engineering areas (Hallowell and Gambatese, 2010). However, alternative methods that could control the bias and ensure the qualification of the respondents are increasingly employed to collect data in these fields. Methods such as the Delphi technique, Staticized groups, Dialectic procedure, and Nominal group technique allow researchers to maintain a greater level of control over bias in well-established rigorous processes with the aid of qualified experts (Hallowell andGambatese, 2010, Contadini et al., 2002).
Among these methods, the Delphi technique has been useful for finding the key sustainability criteria. Comprehensive reviews of Delphi method applications in energy research and the building industry have been presented by J. Wang et al. (Wang et al., 2009) and D. Jato-Espino et al. (Jato-Espino et al., 2014), respectively. However, the Delphi process is not recommended in all circumstances, e.g., when there is limited access to participants, when achieving the consensus is not desirable, or when objective data are available (Hallowell and Gambatese, 2010). When the Delphi method is inappropriate, the Staticized groups method is one of the alternatives with high accuracy of results (Graefe and The Staticized groups' technique is identical to the Delphi method with the exclusion of feedback and iteration stages (Hallowell and Gambatese, 2010). It is described as the Delphi method with one round of estimates (Deniz, 2017). Therefore, there is no interaction between experts, avoiding the need for conformity among individuals as well as reducing bias in judgments. The Staticized group is preferred over the Delphi method by many researchers (López-Arquillos et al., 2015, Rey-Merchán andLópez-Arquillos, 2022), mainly because experts are not led to conform to a value which is not necessarily correct. In other words, this method avoids the lack of accuracy of consensus results after many iterations in the Delphi method . Therefore, the Staticized groups is used in this research to conduct the refinement step three.

Qualification and selection of experts
Research works that use group decision-making techniques tend to rely on the knowledge and skills of the experts, rather than depending on statistical methods and sources (Alqahtany, 2019). Therefore, selecting a group of competent experts is a fundamental step in such research and it is itself a matter of judgment (Zio, 1996), underlined in many research (Geist, 2010). To date, there are no universally-agreed instructions or criteria for selecting the experts (Hallowell and Gambatese, 2010). However, as the Staticized group method is very similar to the Delphi technique, the same guidelines for the selection of experts can be applied to both methods (Skinner et al., 2015).
In general, an expert is defined as someone possessing a special or high-level education qualification, or someone with distinct skills or knowledge evident through their track record in professional organisations or academia (Ahmad and Wong, 2019). They also need to have the willingness, adequate time, and ability to participate in the process of the exercise (Rådestad et al., 2013). Furthermore, experts are required to be independent and have no conflict of interest with the study to minimise motivational biases (Zio, 1996). They should also represent a diverse spectrum of viewpoints and backgrounds to provide a realistic assessment of the given uncertainty (Zio, 1996).
Therefore, to constitute a decision panel, members are not chosen randomly, but purposively to meet the defined criteria. Accordingly, the candidates in this study are selected from the following groups to ensure a wide range of perspectives and a high level of expertise and competence: The Scopus database is used to explore relevant research and to find qualified academics and researchers who are based in the UK. For industry experts, accredited professionals by UK professional bodies such as CIBSE (Chartered Institution of Building Services Engineers), CIOB (Chartered Institute of Building), CIPHE (Chartered Institute of Plumbing and Heating Engineering) and the Energy Institute have been considered. Members of governmental bodies and professional institutions have also been contacted on the basis of their credibility, reputation, and authority in the respective fields.
The number of panellists is another important factor in determining the quality of group decision-making (López-Arquillos and Rubio-Romero, 2015). According to the literature, a minimum size of eight experts for homogeneous groups (experts in the same field) and a range from 20 to 60 participants for heterogeneous groups (experts from different social or professional groups) are deemed appropriate (Ahmad and Wong, 2019, López-Arquillos and Rubio-Romero, 2015). Particularly concerning sustainability studies, a range of 3 to 19 experts is often considered in the reviewed articles, e.g., (Ahmad and Wong, 2019, Hsu et al., 2017, Henning and Jordaan, 2016. For this research, 210 qualified experts from the mentioned references were invited to participate in the survey via e-mail and a link to the questionnaire. The survey was open for five months, between September 2021 to January 2022, and it was completed by 25 experts which is slightly higher than the number of experts normally used in Delphi and Staticized groups surveys. The response rate was 11.9% which is acceptable for an online survey with an average response rate of 10-15% in the literature (Xu et al., 2012).
The analysis of the characteristics of the respondents shows that a variety of well-educated and experienced professionals from different stakeholders have participated in the survey. In terms of participants' affiliation, as illustrated in Fig. 6 (a), those from academia and industrytechnical build the biggest share of participants (36%), followed by respondents from professional/governmental bodies (12%). In the experts' panel, 64% of the members are postgraduates, having a Master's degree (10 members) or a PhD (6 members) in the relevant fields. The composition of the participants based on their academic knowledge and professional history is illustrated in Figs. 6 (b) and 6 (c), respectively.
The questionnaire also includes questions to analyse the level of knowledge and expertise of the panel concerning the research focus points, i.e., building energy systems, building energy performance, and their sustainability understanding. On a Likert scale, participants are asked to indicate their level of knowledge/experience in these themes. As shown in Fig. 7, experts who either agree or strongly agree that they have an advanced level of knowledge/experience in each field constitute a range of 70 to 88% of respondents, with no one strongly opposed to these statements.

Survey design and results
A questionnaire survey is developed in three separate parts to collect all required data in one round survey. In the first part, some questions are asked regarding the participants' knowledge and experience which are discussed in Section 4.5.1. The second part of the questionnaire is designed to collect some qualitative data to validate or improve the set of SIs in terms of effectiveness, inclusivity, and conciseness. The third part is designed for rating the importance weight of indicators which is Achieved from the second round of refinement, 21 shortlisted SIs have been put under the lens of experts to be analysed at this stage. Indicators which are deemed incompatible or unapplicable by at least two experts are excluded from the analysis. On the flip side, indicators suggested by at least two experts are considered to be added to the final list. Analysing the responses from experts, two indicators are added to the final list of SIs as follows: ○ The importance of embodied carbon emissions is highlighted by three experts as part of a whole life building assessment which has come into sharper focus in recent years:

"The embodied carbon is critical to the efficient specification of the equipment. But it matters naught what I think once the client has possession of the system."
Recent studies show that embodied carbon associated with energy, mechanical, and electrical systems accounts for a large proportion of the building life cycle footprint (Rodriguez et al., 2020). Therefore, it is concluded that embodied carbon of BHSs is important enough to be independently taken into account in the design and decision-making stages. Thus, the factor of 'Global warming potential' is split into two separate indicators of 'Operational carbon emissions' and 'Embodied carbon emissions' to be able to differentiate the running and embodied footprint. ○ Concerning social indicators, `fuel poverty` is added to the list of indicators as the households' struggle to pay the bills was brought up by three respondents: "Selection of heating systems is usually a factor of who pays the bills when it is designed. Many options are pricy to install and operate, so not an option for many." This finding chimes with Abbasi et al. (Abbasi et al., 2022) who thoroughly argued that fuel poverty is an essential consideration for designing effective, just, and targeted energy interventions in the built environment, but it is often overlooked by designers and decision-makers. A new predictive indicator for fuel poverty is also devised to facilitate the inclusion of this factor in decision-making. Using the Potential Fuel Poverty Index (PFPI), proposed in (Abbasi et al., 2022), the probability of fuel poverty that different BHSs can pose to households can be estimated and included in sustainability assessments.
Survey analysis also resulted in the exclusion of one indicator from the initial list.
○ The survey is designed to gain a fresh look at the existing understanding and delivery of sustainable heat transitions that may lead to new conclusions. Thus, experts are asked to respond based on their own specialist perspectives. However, two respondents raise an issue that they were unsure of what approach to take while completing the questionnaire, remarking 'Availability of funds and subsidies' as one of the confusing reasons: "I generally feel that the answers to these questions will depend on the perception taken. Are these to be responded from a policy maker point of view as it is stated? Or low-income households? I wasn't sure how best to answer in some cases like the availability of public funds." The authors agree that the existing funds and support should not be a matter of concern in this study as it contradicts the purpose of the research and its critical eye on the current policies. Therefore, this indicator is eliminated from the list of SIs.
In addition, some minor amendments are made in the indicators to improve their presentation based on the experts' feedback. For instance, the term 'job creation' is changed to 'employment impact' to expand its indication from the number of created jobs to include the job losses. Accordingly, the final list of SIs is obtained and presented in Table 3.
Overall, 22 SIs are finalised, comprised of 18% (4/22) economic, 36% (8/22) environmental and 45% (10/22) social indicators, which will be the base of the sustainability assessment of BHSs. The direction of impact of each indicator is also given in Table 3. A positive (+) or negative (− ) sign is assigned to the indicators based on their direction of impact on sustainability. In other words, if increasing the score of an indicator positively contributes to sustainability, its sign is positive (+); otherwise, it is negative (− ).

Prioritisation: AHP weighting method
Several weighting methods are suggested in the literature to be used in the multi-criteria analyses that are reviewed in (Jahan et al., 2016). These methods can generally be divided into three groups as follows (Wang et al., 2009, Jahan et al., 2016: ○ Subjective methods in which priority weights are assigned based on the judgment of decision-makers, not on the measured data or analysis, i.e., AHP, SIMOS, Pair-wise comparison, TRADEOFF, Delphi method, SMART, SWING, Best-worst method, etc.
○ Objective methods in which mathematical models based on the analysis of the initial data or measured data are used for determining the importance of the indicators, i.e., Entropy method, TOPSIS, Least mean square method, Mean Weighting, etc.
○ Combined weighting methods that integrate the two previous groups to strengthen the existing methods, i.e., multiplication synthesis, additive synthesis, game theory, etc.
Within the context of sustainability, subjective methods have been widely used since they can accurately reflect the preferences of different stakeholders (Ren and Toniolo, 2020). The AHP, in particular, has been the most popular weighting technique for energy systems analyses (Wang et al., 2009, Ren andToniolo, 2020). The AHP weighting method, first developed by Saati (Saaty, 1987), is part of a structured multi-criteria analysis method that relies on pairwise comparisons to obtain the relative importance of decision criteria. It transforms the quantitative or qualitative comparison indices into numerical comparison matrices, through which the relative importance weight of each criterion can be obtained.
This research uses the AHP method to assign priority weights for the selected SIs. Accordingly, the third part of the questionnaire records the participants' views on the level of importance of each indicator. Once the required data is collected, the AHP process can be followed through the below steps (Taylan et al., 2020, Kamaruzzaman et al., 2018: 1 Build a hierarchical model 2 Prioritisation based on individual judgement matrices 3 Aggregate individual priorities to obtain the overall weights 4 Consistency check The first step structures the problem into its constituent parts by building a hierarchical model to identify the goal of the process, criteria, sub-criteria, and alternatives (Kamaruzzaman et al., 2018). The hierarchy refers to a special form of a system presentation, in which each element of the system forms classified sets according to its entities and connections with other elements (Song and Kang, 2016). The hierarchical structure of the current study is presented in Fig. 8. The consecutive steps of the AHP process are separately discussed in the following sections.

Prioritisation based on individual judgements
This step is founded on the pairwise comparisons collected from the survey. Experts have evaluated the SIs by comparing them to each other with regard to their impact on the above element in the hierarchy structure. Comparisons are made by pairing two SIs based on the fivepoint Likert scale, as defined in Table 4. When the number of factors is n, the total number of n(n − 1)/2 comparisons should be made for establishing the comparison matrix (Song and Kang, 2016). Fig. 9 shows an example of pairwise comparisons needed to find the relative importance of the three dimensions of sustainability in the overall sustainability performance. The resulting output of this procedure is the comparison (judgment) matrix, expressed as ratios and built to express each decision-maker's preference. Pairwise comparisons are converted into comparison matrices to derive the individual priority vectors. According to the AHP procedure (Saaty, 1987), the comparison matrix A n×n , based on each expert's judgment, is constructed as equation 1: where a ij is the relative importance weight of indicator i compared to indicator j, based on the comparison scale for AHP preferences given in Table 4. In fact, a ij indicates experts' opinion on how much more important the ith factor is than the jth factor for achieving the AHP goal, meeting the following conditions: Once the comparison matrix is built, the weightage of indicators then can be computed by prioritisation. Prioritisation refers to the process of deriving the weight vector w i (A) = [w i ] T = (w 1 , …, w n ) from the comparison matrix A n×n . The row geometric mean method (RGMM) is one of the most preferred methods in the prioritisation step (Dong et al., 2010). Crawford and Williams (Crawford and Williams, 1985) have shown that w i (A) unique weight vector using the RGMM can be found as follows:  Very strongly important One SI is strongly favoured and its dominance is demonstrated in practice 5 Extremely important The evidence favour one SI over another is of the highest possible validity Fig. 9. A pairwise comparison example concerning the main dimensions of sustainability.
M.H. Abbasi et al. where w i ≥ 0 and the w i (A) satisfies the normalisation function as ∑ n i=1 w i = 1. The comparison matrix and the weight vector are generated for all 25 respondents. Fig. 10 (a) shows an example comparison matrix that is arrayed by the random expert A after making 28 comparisons concerning environmental indicators. The weight vector corresponding to this comparison matrix is presented in Fig. 10 (b), where the W(A) Env represents the weight factor of each environmental SI based on expert A's point of view.
The variations of the weight factors obtained from the individuals' judgments are displayed via the box-whisker plot in Fig. 11. A comparatively lower spread of weighting was observed in the case of social sustainability as compared to significant variations in environmental and economic factors. The Net present value stands out as the indicator with the highest mean and median values. However, it is discussed in the next section that using the mean or median values is not the best method to represent the collective value of individual judgments.

Aggregation of individual priorities
The AHP weighting process is followed by the aggregation (consensus) step, in which different individual preferences are aggregated to obtain a single collective preference. The term consensus in decision-making was traditionally known as the unanimous agreement of all decision-makers (Dong et al., 2010). However, since a full agreement is not always achievable in real-life problems, aggregation methods are utilised to combine the decision-makers opinions to reach a collective decision.
The aggregation method used in this study is the Aggregation of Individual Priorities (AIP), also called the weight aggregation technique. In this method, individual weight vectors are estimated and then combined to obtain the consensus weight vector (Entani and Inuiguchi, 2015). The AIP is recommended in the specialist assessment processes where the decision-makers are experts with individual viewpoints, no supra decision-maker dominates the others, and they do not want to compromise their judgments (de FSM Russo and Camanho, 2015). The AIP is also the only method that does not require an agreement on a common decision model (Ossadnik et al., 2016).
Under the AIP approach, two calculation techniques, Weighted Geometric Mean Method (WGMM) and Weighted Arithmetic Mean Method (WAMM), can be used to obtain the aggregated weights (Forman and Peniwati, 1998). The WGMM, however, is favoured by several researchers (Ossadnik et al., 2016, Forman and Peniwati, 1998, Krejčí and Stoklasa, 2018 and, therefore, is utilised in this study. Within this process, let w k (A i ) = [w k ] = (w 1 , …, w m ) be the individual weight vector derived from the individual comparison matrix A i , made by the decision-maker k, and λ k = (λ 1 , …, λ m ) be the weight of the decision-maker k where λ k ≥ 0 and ∑ m k=1 λ k = 1. Then the normalised collective weight vector, P(A i ), using the WGMM method can be obtained by (Ossadnik et al., 2016): Applying this method to each group of SIs, the collective local weights can be obtained, as presented in Table 5. Local weights refer to weights of the indicators with respect to their above element in the hierarchy tree; that is, their importance regarding their parent criterion. Whereas global weight is the multiplication of the local weight of the SI by its dimension, representing the weight of the SI with respect to the overall goal of sustainability (Chatzimouratidis and Pilavachi, 2009).
In this study, weights of the main criteria are separately obtained based on expert judgments. Weighting the main sustainability dimensions independently and, consequently, unequally weighted dimensions are employed in the literature to analyse the sensitivity of parameters under different scenarios, e.g. in (Si et al., 2016, Ghenai et al., 2020. According to Table 5 and Fig. 12 (a), the environmental dimension has received the highest weight, followed by the economic and social dimensions. This could be explained by the fact that sustainability is traditionally viewed and perceived in exclusively environmental terms (Redclift, 2000). Furthermore, the social aspect of sustainability is less prominent in the energy and building industry discourses and perhaps harder to pinpoint.
From the environmental viewpoint, the SIs related to pollution generally scored higher AHP weights. The Operational carbon has been the most crucial indicator in this group, the weight of which reaches 0.246. Primary energy consumption also has attracted considerable attention from the view of the decision-makers and accounts for almost 21% of the overall environmental score, while the two SIs at the bottom of the list, land requirement and acidification potential, collectively contribute to less than 11% of the environmental sustainability. The embodied carbon emissions and share of renewable energy as the third and fourth environmental SIs weight about half of the first indicator. The contribution of these indicators to the overall environmental sustainability of the BHSs is illustrated in Fig. 12 (b).
After the environmental dimension, the economic dimension was next in rank. This dimension is given a reasonable weight because all stakeholders, regardless of their sustainability knowledge and concern, feel directly connected to at least one of the identified economic indicators. For instance, occupants are often cautious about operational costs while developers care more about upfront costs. Overall, the O&M costs dominated the economic category probably because it has a direct impact on the cost of living, whereas investment costs are the most important economic factor for industry (Chinese et al., 2011). Among the four economic indicators, there was only one indicator representing the profit (saving compared to the basic scenario) which obtained the second rank in the indicators list, as per Fig. 12 (c).
Regarding social sustainability, although this dimension received a lower weight, it has the highest number of indicators. This could be explained by the fact that heating systems have more direct connections with human health and wellbeing than other energy systems. Thus, apart from the social factors that are commonly considered in different sectors, such as employment and safety, heating systems have a wider domain of impact on end-users and societies that must be explored. This is confirmed by the experts who added fuel poverty to the list of SIs and rated it as one of the most prominent indicators. The health impacts factor has also been given a high score because of the prevailing health problems and detriments that could be caused by poor indoor heat conditions. The least important SIs of this category are related to subjective factors such as user-friendliness and aesthetical aspects, as illustrated in Fig. 12 (d).

Consistency Check
The AHP method has the advantage that the consistency of judgments can be verified using consistency check methods. In individual judgments, consistency represents the condition for rational decisions, since the comparison matrix could be affected by the experts' knowledge, bias, and many types of misattributions. In group decision settings, however, the consistency check examines the homogeneity of the group judgments, as well as the misattributions of individuals, ensuring the reliability of the outcomes.
In group decision-makings, the aggregation process holds the consistent properties of the individual comparison matrices (Dong and Cooper, 2016). P. Grošelj and L.Z. Stirn (Grošelj and Stirn, 2012) have proved that if the degree of consistency for each of the initial comparison matrices is satisfactory, then the aggregated priorities will be consistent. Therefore, to check the reliability of the indicator weights, the consistency ratio of all the individual comparison matrices is calculated. The consistency ratio (CR), established by Saaty (Saaty, 1987), can be obtained using equations 5 and 6: Where CR is the consistency ratio; CI is the consistency index; k is the number of criteria; and RI is the random index, whose value depends on the matrix's dimension and can be selected from Table 6: And λ max is the largest eigenvalue of the judgment matrix and is defined by: The expert's judgment and its associated comparison matrix have acceptable inconsistency only when CR is smaller than 10%. When the ratio often falls beyond the threshold, inconsistency issue arises, and the comparison matrix needs to be reassigned and modified by decisionmakers. The new judgments then follow the AHP process until they meet the consistency check requirements. Typically, when the order of the comparison matrix grows, as a result of the increased number of pairwise judgments, the inconsistency issue appears and increases exponentially (Asadabadi et al., 2019).
The conducted analyses showed inconsistencies only in four matrices that were excluded from the aggregation process. For the rest of the matrices, consistency ratios range from 0.028 to 0.097, implying the reliability of the conducted assessments. For example, the CR factor corresponding to the example comparison matrix given in Fig. 11 is 0.092 (9.2%) which meets the consistency check requirements. The results shown in Table 5 are obtained after treating the inconsistencies. Fig. 13 recaps the results of the study on BHSs in a pie chart.

Conclusions
The lack of a specific and applicable set of indicators is one of the major barriers to measuring and tracking the sustainability performance of energy technologies in the built environment. Current literature has often used the SIs developed for national-scale assessments or building assessment tools, which do not always reflect the nuances of three facets of sustainability in smaller scale applications. Furthermore, the participation of stakeholders is essential in identifying the key sustainability criteria and structuring an effective and consistent analysis framework which presents another important lack in the relevant studies.
To address the aforementioned gaps with specific consideration of the BHSs, a framework for the identification and prioritisation of the SIs set is proposed. The developed framework utilises a series of quantitative and qualitative methods in 6 stages to ensure the reflection of the stakeholders' priorities and a balanced representation of all facets of sustainability. Using the developed framework, a representative set of SIs can be determined to quantify, analyse, and communicate complex sustainability information through systematic, consistent, and transparent measures. This framework can be broadly applied to the routine determination and analysis of key sustainability factors in various fields.
Applying the developed framework to the BHSs, a total number of 25 experts from diverse stakeholders provided their judgments. The study ended up with a total of 22 SIs consisting of 4 economic, 8 environmental, and 10 social indicators. The environmental dimension was found to be the most crucial element of sustainability (39.5% of the overall weight), followed by the economic dimension (33.2%). It was also found that social sustainability constitutes a considerable proportion (27.3%) of the overall sustainability weight. Based on the obtained priority weights, the O&M cost, net present value, and operational carbon emissions were the top three critical SIs.
Further research, however, is required to determine the quantification method associated with each identified SI. The availability of data for some of the SIs, mostly in social indicators, could limit the utility of some of the indicators in practise. Therefore, quantification methods should be defined based on the accessible data so that they can be independently used by practitioners in different analyses. Following that, sustainability of different low-carbon heating alternatives such as heat pumps, biomass boilers, and solar thermal systems should be assessed to determine whether they are mature enough to serve a just and sustainable transition. Furthermore, this study examined the functionality of the proposed framework in the context of BHSs but paves the way for future scholarship and public policy to holistically explore the SIs in different domains.
The developed SI framework is primarily developed for household- scale sustainability assessments -individual BHSs-but it can also support larger scale evaluations such as communal systems, local interventions, and national strategic policies. The results from the study of BHSs further suggest that it is critical for policymakers to understand how adopting the TBL approach could shift the prevailing perceptions of a sustainable system. For instance, it is found that the sustainability of BHSs is highly susceptible to broader sociotechnical drivers such as fuel poverty and thermal comfort that are often disregarded in public policy.
In conclusion, findings suggest by integrating the experts' inclusion and the holistic sustainability principles, we might better understand the routes to achieve more sustainable transition pathways, which can contribute the most to the planet, profit, and the people.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
Data will be made available on request.