An Ontology-Based Innovative Energy Modeling Framework for Scalable and Adaptable Building Digital Twins

Digitalization of buildings and the use of IoT sensing and metering devices are steadily increasing, oﬀering new opportunities for more autonomous, eﬃcient, and ﬂexible buildings. As part of this transformation and inspired by the added value demonstrated in other domains, the concept of a building digital twin that can monitor, simulate, manage, and optimize building operation has received increased interest. To aid such digital twin implementations, accurate and adaptable simulation models are required, which can eﬀectively integrate and utilize the available data. However, traditional building modeling and digital practices, such as Building Information Modeling and white-box modeling tools, are not easily compatible with these requirements. This work presents an innovative and ﬂexible energy modeling framework based on the SAREF ontology. With a basis in the SAREF4BLDG extension for buildings and the deﬁned classes, diﬀerent models are presented for a selection of typical systems and devices such as spaces, space heaters, dampers, coils, etc. Using the generic semantics and relations of the SAREF4SYST extension, a method for linking and simulating component models is then presented. A proof-of-concept of the modeling framework is provided, showing its application and feasibility to provide a dynamic simulation of the diﬀerent systems and devices included in a demonstration case. Finally, a future line of work is identiﬁed considering the implementation of the modeling framework in an actual building case study, including integration with actual sensing equipment to demonstrate diﬀerent digital twin services such as performance monitoring, strategy planning, and operational optimization.


Introduction
Driven by ambitious environmental goals and the promising potential for improved efficiency and automation by employing the Internet of Things (IoT) and Artificial Intelligence (AI) technologies, the building sector is currently undergoing fast-paced digitalization with increasing adoption of large sensing and metering networks.While the utilization of such data offers many opportunities, it also presents great challenges to status quo data management, building operation, and building energy modeling.As a consequence, the demand is steadily growing for more flexible and scalable data and energy models that integrate seamlessly.
As part of the digitalization of buildings, Building Information Models (BIM) have for the past two decades played a major role, unlocking significant cost savings and improving efficiency by easing communication and information exchange between contractors in the design and construction phases of buildings [1].However, despite its demonstrated success, the BIM technology is now facing significant challenges as at-Fig.1.The DT concept illustrated with typical systems and components present in buildings with their virtual counterpart and the bidirectional flow of data.these two systems.This is conceptualized in the context of a generic building energy system as shown in Fig. 1.Here, the physical system is the actual asset to be managed, i.e. a building including all systems, devices, spaces, occupants, etc.The virtual system employs relevant simulation models to emulate the behavior of the physical system as closely as possible.The physical and virtual systems are linked through a bidirectional data flow, where the virtual system receives raw data in the form of static design and topology information as well as dynamic sensor and meter data.The virtual system then processes this information to adapt, monitor, simulate, and optimize based on the employed simulation models.This processed information is then communicated back to the physical system for facility management.As a basis for the work presented in this paper and the envisioned DT, three fundamental services, targeting the operational phase of the building life cycle, have been defined as follows: Service 1 Efficient collection and management of data generated by sensors, meters, and IoT devices through a user-friendly open-standard context information model Service 2 Automated continuous commissioning and performance monitoring in real-time to detect faults and malfunctions, ensuring a smarter and more cost-effective facility management Service 3 Operational strategy planning support to enable more informed decision-making by running different control and management scenarios in a zero-risk virtual environment In recent years, successful implementations of fault detection and performance monitoring, as listed in Service 2, have been reported numerous times with different approaches.Typically, fault detection methods are divided into two categories.The first category is modelbased methods, which rely on comparisons between the simulated behavior of a system with actual measurements collected from the physical unit.The second category is model-free methods, which typically utilize only operational data collected from a system to find patterns that distinguish faulty from normal operation.Although model-free methods require no models to implement, their effectiveness is also often very limited in comparison to model-based methods when ap-plied to large-scale distributed systems such as Heating Ventilation Air-Conditioning (HVAC) systems [6].Therefore, most efforts in fault detection and continuous commissioning of buildings have considered model-based methods, often with the use of white-box simulation tools [7][8][9][10].In addition, an accurate dynamic energy model enables simulation and optimization of future operational scenarios, as required by Service 3, aiding decision-makers with operating the building on a day-to-day basis, taking into account weather forecasts, expected occupancy, and price signals.This potentially increases indoor comfort, reduces operational costs, and improves the overall energy flexibility of the building.
The added value of the outlined services is well-established and has been demonstrated numerous times in case-studies [11][12][13].However, using traditional energy modeling tools and being challenged by various interoperability issues, such implementations usually result in highly specialized solutions tailored to each specific building.This building-bybuilding approach, relies on manual workflows, is costly, hard to scale, and acts as a major barrier to broadly implementing such systems.
Therefore, inspired by the added value provided in different domains, there is a need to promote and develop DT technology for the building sector to harness the benefits provided by advanced sensing and metering devices and provide the outlined services at scale.As a key milestone in such promotion, an automated and adaptable framework for energy modeling of buildings is needed to effectively integrate such models into a building DT.
In this work, an innovative framework is proposed and presented for automated and adaptable energy model development to provide the simulation models required by building DTs.The framework builds upon the Smart Applications REFerence (SAREF) ontology [14] to ensure interoperability by using existing classes, concepts, and relations.A selection of SAREF classes is extended with adaptable data-driven gray-box and black-box models, to describe their dynamic behavior.Furthermore, a generic framework based on the SAREF4SYST extension is presented to describe how these component models interact based on simple topology information.To provide a proof-of-concept, a demonstration case is considered to illustrate the framework implementation considering the simple single-zone system shown in Fig. 1.
The presented framework in this work is capable to serve as a basis for the development and implementation of scalable DTs for building applications, allowing the delivery of major services to enhance the energy efficiency and intelligence quotient of buildings.This includes performance monitoring, data management, performance optimization, and strategy planning.
As established, one of the core elements of DTs is accurate simulation models.Therefore, in the following section, an overview and discussion of different approaches for building energy modeling are provided.

Building energy modeling
Over the years, considerable effort has been invested in developing large-scale energy simulation tools, e.g.EnergyPlus, DOE-2, and TRN-SYS [15][16][17].Models developed in such tools fall under the category of white-box models, which are based entirely on first-principles building physics, typically in the form of mass and energy balance equations, which often lead to large systems of Ordinary Differential Equations (ODE).However, being the case for the majority of large-scale whitebox models, they require extensive information on the building and a substantial amount of resources and time to develop.Taking EnergyPlus as an example, a typical workflow consists of three time-consuming phases before an accurate energy model is obtained.First, the building geometry is defined with detailed geometry data from floor plans, cross-sections, 3D models, etc.Second, HVAC systems design data must be specified along with material and thermal properties of the envelope and occupancy, lighting, and equipment schedules of the spaces.Finally, to achieve acceptable prediction accuracy, the model must be calibrated using actual consumption data from the building, a process that often relies on a manual trial-and-error approach that requires domain-specific expertise from the modeler [18].In addition to the high amount of resources and time that must be dedicated to the manufacturing of these high-resolution white-box models, high computational costs and execution time is also a well-known challenge, which makes them unsuitable for operational optimization purposes [19,20].
Therefore, to reduce the amount of manual work of white-box model development, considerable efforts have been put into the automatic translation of the design and geometrical data available in BIM to attain working Energy models for simulation tools (BIM2BEPS) such as EnergyPlus and Modelica.For instance, Ramaji et al. [21] proposed a BIM2BEPS transformation tool for conversion from the BIM-format International Foundation Classes (IFC) into an OpenStudio model.
Andriamamonjy et al. [22] proposed an automated BIM2BEPS workflow between IFC and Modelica.The workflow applies an intermediate representation called Model View Definition (MVD), on which different checks are performed to ensure a given IFC file includes all required information.If the IFC passes this test, it can be directly translated into a working Modelica model.While the workflow is successfully applied to a test case study, the authors also highlighted potential barriers of adoption.Namely, that it relies greatly on information that is typically not available in status quo IFC files.
Despite the significant efforts in BIM2BEPS research, many of the existing interoperability issues such as interdependence on extensive toolchains, inconsistent current practices by BIM practitioners, and loss of information still persist [23,24].
As a result of these shortcomings, data-driven modeling methods have recently gained increasing popularity for building modeling and simulation, due to their lower computational demands and a high potential for automation and integration in smart buildings where sensor networks and metered HVAC components collect large amounts of operational data.
Data-driven modeling methods are typically divided into black-box and gray-box methods.Black-box models are purely data-driven, meaning that no domain-specific model structure is assumed.Given a set of data, specific algorithms are thus applied to find appropriate functional relationships that map system inputs to system outputs.Commonly used black-box models are Artificial Neural Networks (ANN) [25,26], Support Vector Machines (SVM) [27], Decision Trees [28], and state-space models [29].Due to the absence of domain-specific assumptions, these types of models have been successfully applied across many academic disciplines as well as in industry [30,31].
Typically, black-box models are considered lighter than white-box models in terms of computational complexity and simulation speed.For existing buildings with available historical data, black-box models have been used for aggregated heating and electricity load forecasting [32][33][34].These types of models are usually referred to as monolithic models, i.e. whole-building models that do not incorporate any knowledge of the modeled building, besides historical weather and operational data such as heating or electricity consumption.While this offers convenient and fast implementation, there are also certain drawbacks to consider.If the building operation or design is changed, e.g. if system setpoints are changed or if the building is retrofitted, new data have to be collected and the model has to be re-trained.Furthermore, if the data available is scarce or of low quality, the performance of black-box models might be affected.A promising solution to this issue is the use of transfer learning, a technique where knowledge is transferred from one domain to another [35].Thereby, the knowledge contained in models that are trained on large amounts of data can be reused for training other models, given that the domains have a certain amount of overlap.This concept has been applied for both indoor temperature prediction [36] and energy prediction [37].
One of the often highlighted drawbacks of black-box models is the lack of interpretability, i.e. the model structure and the obtained parameters have no direct physical meaning and the underlying cause of model behavior is thus unknown.This is opposed to gray-box models, the second category of data-driven models.As the name suggests, these types of models are a mix of white-box and black-box models where the model structure is based on physical principles, while the unknown parameters of this model structure are estimated through parameter identification techniques.The physics-derived model structure enables interpretation and validation of the obtained parameters from parameter estimation in a physical context, e.g. as demonstrated by Macarulla et al. [38].An example of the use of gray-box modeling in buildings is the thermal Resistor Capacity (RC) model, a thermal analog to electrical circuits used to model the thermal dynamics of buildings [39,40].However, gray-box modeling is a general concept that can also be applied to individual building components such as heat pumps, recovery units, thermal storage tanks, cooling coils, etc., as demonstrated by Afram et al. [41].
In general, the various recent studies on data-driven modeling of building systems and components using black-box and gray-box approaches report clear benefits regarding automation potential and flexibility as well as high prediction accuracy.Considering the discussed challenges of BIM2BEPS automation, these are key properties for applications in a dynamic DT-environment where close integration between data collected from IoT networks and simulation models is fundamental for delivering the outlined DT services.

Ontologies and metadata schemes for improved interoperability and building energy modeling
Buildings have a high degree of heterogeneity and complexity, including a wide variety of systems and devices from different contractors and providers.This makes it difficult to develop applications and models that can be scaled and reused across different buildings and usecases.This has caused the development of ontologies, which attempts to standardize and structure metadata and communication between devices to avoid the status-quo many-to-many translation between the multitude of protocols used.
Pritoni et al. [42] provided a review of 40 different ontology and metadata schemas for building design, energy modeling, and building operation applications.From these 40 different ontologies, five of the most popular ontologies were selected for a deeper comparison based on their applicability in three use cases; Energy auditing, automated fault detection and diagnosis, and optimal control of HVAC systems.The selected ontologies were SAREF, Semantic Sensor Network/Sensor, Observation, Sample, and Actuator (SSN/SOSA), Building Topology Ontology (BOT), Brick, and RealEstateCore (REC).
Generally, it was highlighted that all five ontologies had varying degrees of missing concepts, which impacted their utility in the three use cases, i.e. choosing the right ontology depends on its intended application.For instance, BOT and SSN/SOSA missed central concepts for describing sensed properties, actuator idioms, and units of measurement.BOT, Brick, and REC missed equipment properties to describe e.g.nominal capacities, flowrates, efficiencies, etc.All ontologies except Brick missed schedule concepts for describing control strategies.Generally, Pritoni et al. conclude that none of the five ontologies accounts for all required concepts and that the ontologies must be customized and extended for certain applications.
In this work, SAREF is chosen as a backbone for the energy modeling framework.SAREF was developed in 2015 by Daniele et al. [43] in close collaboration with industry to define a unified reference for recurring concepts and relations in the IoT domain.The core SAREF ontology revolves around the concept of devices, e.g. a lighting switch, sensor, or meter.The ontology has extensions spanning across multiple domains, e.g.SAREF4BLDG [44], SAREF4CITY [45], SAREF4SYST [46], SAREF4ENER [47], etc.Here, SAREF4BLDG is an extension dedicated to the building domain and is based on the ISO-published Industry Foundation Classes (IFC) standard [48].The extension has 72 classes and 179 object properties that represent typical appliances and devices in buildings.Hence, this ontology allows an almost direct mapping between IFC and SAREF for most of these objects within the building domain.This is a great advantage compared with other relevant ontologies, which could potentially allow for significant reuse of existing well-structured information, contained in the frequently available IFC models for buildings.Another ontology that also has IFC mapping capabilities is IFCowl [49].However, as discussed by Rasmussen et al. [50], IFCowl was designed to be backwards compatible with IFC and as a result inherits two main drawbacks; complexity and size.This makes the ontology both hard to understand and hard to use.In comparison, SAREF has a higher degree of modularity and its extension SAREF4BLDG is lighter in complexity and size.In addition, the multiple domains and broader scope of SAREF could allow for further expansion of the DT concept to consider larger systems such as districts or cities at a later stage.
As is the case with the components in IFC, the component classes specified in SAREF4BLDG are data containers with no formal or mathematical description of their dynamic behavior or interaction with other components or systems.For instance, a s4bldg:SpaceHeater object holds the property s4bldg:outputCapacity op , but it has no attached model which specifies how this property should be used to simulate actual heat consumption.
Therefore, to build upon the SAREF4BLDG extension and ensure high flexibility, a modular and component-based modeling approach is proposed.Using the component definitions and class hierarchies from SAREF4BLDG, the energy model is thus split into a matching set of component models that can be automatically assembled and adapted, based on the specific use case.Each component should be able to learn from the collected data through various parameter estimation techniques and strive to improve prediction accuracy as the collected data accumulates.This approach will make it possible to continuously monitor and compare the performance of actual components in real-time with their digital counterparts in the digital twin environment, as required by Service 2. Furthermore, a highly modular and flexible model allows for the interchange of component models during the entire building lifecycle, which is necessary to maintain the Digital Twin as a close replica of the actual building, as the installed systems or building use might change over time.Therefore, the component models should be accurate enough to meet the defined services to a satisfactory level, but simple enough to allow for robust and automated model calibration.
With a basis in the components depicted in Fig. 1, a description of the components and their properties from the SAREF4BLDG ontology is given in the Appendix Table B.3.The superscripts op and dp stand for object property and data property, respectively.An object property points to another object, while a data property holds data in the form of standard types such as integer, string, floating-point, etc.A complete list of available components and a description of the listed properties can be found in the SAREF4BLDG documentation [44].

Component modeling
When modeling buildings, it is common practice to take advantage of the major difference in physical time constants that are associated with the HVAC system and the building envelope and spaces.The HVAC system is often governed by fast-moving dynamics that reach equilibrium within seconds or minutes, while the large thermal inertia of the building envelope and spaces has very slow-moving dynamics.Therefore, it can often be justified to model the HVAC systems as quasisteady-state systems, which decreases model complexity and lowers the computational burden considerably [51,52].For these reasons, building simulation tools such as EnergyPlus is also heavily based on this modeling approach [53].Therefore, with a basis in the presented components in Table B.3, potential candidate models have been found in the literature for implementation and detailed descriptions are included in Appendix A. The key selection criteria for the models have been simplicity and applicability for data-driven calibration and parameter estimation.
To provide an overview of all the identified models, the inputs, outputs, parameters, and constants have been summarized for each model in Table 1.The parameters for the models in Table 1 can either be given through design specifications or calculated through parameter estimation techniques if time-series data is available for the inputs and outputs.In the case where input-output data can be continuously provided, parameter estimation can be performed on a regular interval to attain a continuously improving performance on the different grayand black-box component models.This method is demonstrated for indoor temperature forecasting by Ruano et al. [54].Additionally, each physical component can be monitored closely, by comparing its operation with its digital replica to provide fault detection and continuous commissioning capabilities on a component level.If neither design specifications nor operational input-output data is available, default values must be assumed.
Based on the listed inputs and outputs in Table 1, and the model descriptions in the previous sections, certain expected relationships between components arise.For instance, the space heater model in Table 1 expects the massflow ṁ , and the room temperature   as input, which it can acquire from the valve model and the building space model, respectively.The valve model expects the valve position   as input, which it can acquire from the controller component etc.In the following section, these types of relations are formalized using simple but broadly applicable concepts from the SAREF4SYST extension.

Modeling framework
In the previous section, common building components were presented, along with generic and scalable component energy models.However, to express how the defined components interact, a domainindependent SAREF-compliant methodology for linking the inputs and outputs of component models is defined.The concept of having different self-contained component models defined by an input-output relationship and an overall framework that links these models share many similarities to the co-simulation framework defined in the Functional Mockup Interface (FMI) standard [55].Here, models are exported from simulation tools as Functional Mockup Units (FMU) and are used as Fig. 2. Overview of the SAREF4SYST ontology extension with available classes and properties [46].

Table 1
Overview of inputs, outputs, parameters, and constants of potential mathematical models for the highlighted components.

Component
Inputs Outputs Parameters Constants so-called slaves.The slaves are allowed to communicate at specific time intervals called communication points, managed by a master algorithm.Between communication points, the slaves are simulated independently.The modeling framework presented in this section shares many of the same principles as co-simulation and is thus well-aligned with the use of FMUs for modeling individual components.

Linking component models using SAREF4SYST
SAREF has a dedicated extension called SAREF4SYST, which aims at providing a generic framework for representing the topology of systems and how they interact [46].Through SAREF4SYST, the system topology can be expressed through three generic classes, s4syst:System, s4syst:Connection, and s4syst:ConnectionPoint along with nine applicable properties as shown in Fig. 2.
As shown in Fig. 2, a s4syst:System object can be a subsystem of other s4syst:System objects specified through the property s4syst:subSystemOf or have subsystems itself specified through s4syst:hasSubSystem.Various forms of flows between systems, e.g.information or physical quantities can be expressed through the s4syst:Connection and s4syst:ConnectionPoint classes.In this work, the s4syst:Connection class is used to represent system outputs and the s4syst:ConnectionPoint class to represent system inputs.Hence, all inputs and outputs in Table 1 can be represented by s4syst:ConnectionPoint and s4syst:Connection objects respectively, while each component can be represented by s4syst:System objects.This concept is shown in Fig. 3, where three SAREF4BLDG components are shown.As recommended in the SAREF4BLDG documentation [56], each component is extended with its identified model as a specialized subclass from section 4. Furthermore, these components all inherit from the s4syst:System superclass, enabling them to interact and communicate through the SAREF4SYST ontology.It should be noted that the full SAREF4BLDG class hierarchy is not included for clarity.
In  input-output terminology.Hence, the presented framework is not specific to the models shown in Table 1, which could be modified or extended to cover other components such as s4bldg:Chiller or s4bldg:Pump, where models from external simulation tools and libraries could be considered, e.g. as FMUs.

Model execution order
The topology in Fig. 3 defines the required information exchange between components at a given timestep.This produces a scheduling problem, where it must be ensured that all required inputs are available when a model is executed during a simulation time step.For instance, in Fig. 3, both the s4bldg:BuildingSpace and the s4bldg:Valve model must be executed before the s4bldg:SpaceHeater model can be executed.Such task scheduling problems are well-known in computer science and are formally known as topological sorting [57].Given a directed graph  = ( , ), with  being the set of nodes and  being the set of edges, topological sorting is the ordering of nodes such that for every edge (, ) ∈  the node  comes before the node .This is only possible if the graph has no cycles, i.e. if it is a Directed Acyclic Graph (DAG), in which case it is guaranteed to have at least one solution [57,58].Hence, if the s4syst:System instances are interpreted as nodes, and the s4syst:Connection, and s4syst:Connection-Point pairs are interpreted as directed edges, the correct execution order can be found using topological sorting, given that the graph is acyclic.

Cycle removal procedure
The no-cycles condition of topological sorting is essential to consider when applying the presented methodology to model actual buildings as feedback control loops are an integral part of all building automation systems.When such control loops are modeled, cycles are introduced.An example of a cycle produced by a controller is shown in Fig. 4. Here, a s4bldg:BuildingSpace model predicts CO 2 -concentration   , which is used as input  in a saref:Sensor model.The output   is then used as input in a s4bldg:Controller model.An appropriate input signal  is then computed, which is sent to a s4bldg:Damper model to calculate the resulting airflow   .Finally, this value is sent back to the s4bldg:BuildingSpace model for recalculation of indoor CO 2 -concentration   at the next time step, completing the cycle.To systematically deal with such cycles, an algorithm is proposed and is presented in Algorithm 1.Given the set of all s4syst:System objects  and an initially identical set  * , the algorithm iterates through the set of all s4bldg:Controller components  ∈  * , identifies reachable components   from  ∈  , and removes all edges from  ∈   to the controlled component .con-trolsProperty.isPropertyOfif such edges exist.For instance, applying Algorithm 1 on the simple system shown in Fig. 4 the Supply damper would be identified as a reachable component from the CO2 controller.The Supply damper is connected with the controlled component Space, and this connection is thus removed.

Topological sorting procedure
After applying Algorithm 1, topological sorting can be applied on the modified set of components  * .This is shown in Algorithm 2 where an adaptation of Kahn's topological sorting algorithm is presented, which matches the semantics of SAREF4SYST.The input of this algorithm is the set  * , while the output is a sequence  with all the components in  in a topologically sorted order.The algorithm recursively adds all mapped components  () ∈  with |.connectsAt| = 0 (i.e. the component  has no inputs) to a sequence  while removing all edges originating from .In addition, a sequence  is created which holds integers starting from 0 that represent the execution priority of the components in .Components in  with the same priority can be executed in parallel.For instance, applying Algorithm 2 on the simple system shown in Fig. 3 produces the sequence  = (Valve, Space, Space heater) and the priority sequence  = (0, 0, 1).This implies that the Valve and Space models can be executed in parallel and that they must be executed before the Space heater model.

Simulation procedure
When the execution sequence  is found using Algorithm 2, the system can be simulated by simply traversing through this sequence   end for 27: end for 28: end for of components, while gathering the needed inputs and executing the attached model.Executing this routine once completes one system timestep of length Δ and can be repeated indefinitely given the re-Algorithm 2 A SAREF4SYST-specific adaptation of Kahn's topological sorting algorithm [59].
1: Let  ∶  * →  be a function that maps from the modified components ℎ * ∈  * to the original components ℎ ∈  2: Let  be the subset of  * with no inputs, i.e.  = { ∈  * ∶ |.connectsAt| = 0} 3: Let  ← () be an empty sequence which will be populated with the components of  in a sorted order 4: Let  ← () be an empty sequence which will be populated with the priority of the components in the sequence  5: Let  ← 0 be the priority counter 6: The notation  ⌢ () is used to append the element  to the sequence  quired inputs, such as weather data and schedule values, are available.This is formulated in Algorithm 3, where each component model is assumed to have an input and output dictionary which are indexed by the properties inputName and outputName of the associated s4syst:ConnectionPoint and s4syst:Connection objects, respectively.These property names are simply the mathematical notation for the inputs and outputs shown in Table 1 translated into a representative name, e.g."indoorTemperature" to represent   .In addition, each component model is assumed to have a do_step() method, implementing the governing equations defined in section 4 or executing external models or libraries such as exported FMU models.
Algorithm 3 An algorithm for simulating building devices and systems given the execution sequence .

Demonstration case and implementation
To validate the proposed modeling framework, the single-zone system shown in Fig. 1 is considered as a demonstration case.The modeling framework is implemented in Python as a reusable library [60], following the semantic structure and guidelines provided by the SAREF ontology.However, to accurately model the system, basic topological information is required.To ensure that the model can easily be constructed and adapted without manually specifying all the logical connections between models as presented in section 5, a simple modelindependent representation is instead used as a starting point.Based on this representation, the logical connections between models can then be inferred based on simple rules.For this purpose, the single-zone system has been formulated using the SAREF ontology semantics and concepts as shown in Fig. 5 only considering the actual topology of the physical system.

Available topology information
The system consists of one s4bldg:BuildingSpace instance, containing one s4bldg:SpaceHeater, one s4bldg:ShadingDevice, one s4bldg:Valve, two s4bldg:Damper, two s4bldg:Controller, and two saref:Sensor instances as represented by the s4bldg:contains property.The s4bldg:BuildingSpace instance is linked with one saref:Temperature instance representing indoor temperature and one saref:CO2 instance representing CO2concentration through saref:hasProperty, which is inherited through the saref:FeatureOfInterest super class.The saref:CO2 subclass of saref:Property has been added as an extension to the core SAREF ontology similar to the work carried out by Weerdt et al. [61], as it is not included by default.The measured properties of the two saref:Sensor instances are then determined through the saref:measuresProperty, pointing either to the saref:Temperature instance or the saref:CO2 instance.Similarly, fashion, the controlled properties of the s4bldg:Controller instances are determined through the saref:controlsProperty.To determine which devices are actuated by the controllers, each s4bldg:Controller instance is associated with a saref:LevelControlFunction instance (subclass of saref:Function) through the saref:hasFunction property, which further points to a saref:SetLevelCommand instance (subclass of saref:Command) through the saref:has-Command property.Finally, the saref:SetLevelCommand instances each point to a saref:MultiLevelState instance (subclass of saref:State) through the actsUpon property.The specific subclasses used here indicate that the actuated s4bldg:Damper and s4bldg:Valve instances can be regulated with a level-setting, e.g. from 0-100%.Alternatively, for representing binary control of other objects such as windows or doors, the saref:OpenCloseFunction, saref:OpenCommand, saref:CloseCommand, saref:OpenState, and saref:CloseState could be used instead.For the shown system, the supply and exhaust dampers share the same state, resulting in balanced supply and exhaust airflows.However, for controlling supply and exhaust airflows separately, each damper instance should have its own saref:MultiLevelState instance each associated with two different s4bldg:Controller instances.
To describe the topology of the ventilation system components and their order of placement in the flow path, the s4syst:connect-edTo property is used, which is inherited from the s4syst:System class.This property is symmetric, meaning that two connected components both point to each other with the connectedTo property (although only one path is shown in Fig. 5).Hence, the s4bldg:Air-ToAirHeatRecovery instance is connected to both the supply and exhaust s4bldg:Fan instances while the exhaust fan is further connected to an exhaust Node instance.Here, the Node class represents nodes in a flow network and keeps track of flow and temperature before one stream branches out or after multiple streams join together.In the demonstration case, there is only one space, resulting in the same flow properties before and after the Node instances.However, this addition is necessary in the case of multiple spaces and a more complex flow path topology.The exhaust Node instance is further connected to an exhaust s4bldg:Damper instance.The supply s4bldg:Fan is connected to the heating s4bldg:Coil instance, which is connected to a supply Node instance.Finally, this supply Node instance is connected to a supply s4bldg:Damper instance.
For Fig. 5, inverse properties also exist for most of the shown properties, although these are omitted for clarity.For example, s4bldg:is-ContainedIn and saref:isMeasuredByDevice are inverse properties of s4bldg:contains and saref:measuresProperty.

Obtaining a working energy model
With the SAREF representation in Fig. 5 of the considered singlezone system, the topological context of all devices and components is defined.However, this representation cannot be directly used with the presented algorithms in section 5 to simulate the system.Therefore, inputs and outputs of each model extension, as described in section 5, are first matched using the topological relationships shown in Fig. 5 and the s4syst:Connection and s4syst:ConnectionPoint classes as demonstrated in section 5.
For example, a connection between the s4bldg:BuildingSpace instance and the CO2 saref:Sensor instance is inferred through the properties saref:measuresProperty and saref:isProper-tyOf of the saref:Sensor and saref:CO2 instance, respectively.A connection between the saref:Sensor and s4bldg:Controller is established through the saref:controlsProperty and saref:is-MeasuredByDevice properties.
The resulting connections are shown in Fig. 4. Completing this process for all s4syst:System instances produces the graph shown at the top of Fig. 6.Here, each pair of s4syst:Connection and s4syst:ConnectionPoint instances linking two models are represented as one edge labeled with the shorthand "C:" and "CP:" representing the outputName and inputName properties, respectively.As mentioned previously, these property names correspond to the mathematical notation of input and outputs in Table 1.In addition to the already discussed Node class, two other helper classes OutdoorEnvironment, and Schedule have also been introduced.OutdoorEnvironment represents relevant outdoor conditions such as outdoor temperature and solar irradiation, which affects different systems, in this Fig. 5. SAREF representation of the demonstration case, including the required topology information for the involved components.The representation incorporates concepts from both SAREF core, SAREF4BLDG, and SAREF4SYST.case, the s4bldg:BuildingSpace and s4bldg:AirToAirHeatRecovery instances.Schedule is a generic class that outputs predetermined schedules for a specific period, e.g.daily occupancy or setpoint profiles.
With the inputs and outputs defined for all components in the system, the three algorithms from section 5 can be applied to simulate the system.An overview of this workflow is shown in Fig. 6.As a first step, Algorithm 1 is applied to the set of components  , to identify and remove cycles caused by the controllers.This results in the updated set  * , where four edges caused by the two controllers are removed.As shown in Fig. 6, all four removed edges point to the s4bldg:BuildingSpace instance, which holds the controlled properties.The obtained set of components  * is then used as input for Algorithm 2 from which a sequence  is obtained, containing the components of  in a topologically sorted order.This sequence is then used in Algorithm 3 for simulating the original system  .To run this simulation, parameters have been specified for each model as listed in Table B.2.The parameters are based on a classroom in an actual university building.This is the case for the temperature prediction model for the s4bldg:BuildingSpace instance, which was trained on actual data collected from the classroom as described in our previous work [62,63].For other components such as the s4bldg:Fan and s4bldg:AirToAirHeatRecovery instances, standard values, suitable for the demonstration case, are used.It should be stressed that the simulation is not intended to validate the accuracy of the presented component models, but rather to provide a proof-of-concept and validation of the overall modeling framework, information carryover, algorithms, and workflow.

Simulation results
For simulation, a 4-day winter period using actual historical weather data from Denmark is considered.The results for some of the components are shown in Fig. 7  Due to the low solar irradiation in the winter period, the shading position   has a limited effect on the indoor temperature and is therefore set to 0 during the entire period.However, as expected, running simulations for summer periods, it is observed that shade position has a high impact on indoor temperature, due to the higher irradiation levels.7b shows the simulation results for the CO 2 model attached to the s4bldg:BuildingSpace instance, where an artificially generated occupancy profile   is used as input.Here, ṁ represents both the supply and exhaust airflow as they are equal.Fig. 7d shows the CO 2 setpoint  , is set constant at 600 ppm, which the controller tracks by inversely actuating the supply and exhaust damper positions   .As shown in Fig. 7b, the predicted CO 2 -concentration   is sensitive to rapid changes in occupancy and airflow, causing the tracking error to occasionally approach 100 ppm during occupancy periods.
The predicted power consumption Ẇ for the supply s4bldg:Fan instance is shown in Fig. 7f.As expected, this consumption closely follows the supply airflow ṁ .The exhaust fan power consumption is identical due to identical models and balanced supply and exhaust airflows and is therefore not shown.

Discussion
As demonstrated from the generated results, the modeling framework enables direct dynamic simulation of the components and systems represented through SAREF semantics.Although the demonstration takes basis in a simple single-zone system, the framework is not casespecific and can readily be applied to larger systems.
The majority of the presented component models are validated through other studies and implemented in simulation tools.However, their performance and effectiveness, as part of the presented modeling framework and a dynamic DT environment, are important to establish.Implementation of the framework in a real building case study is therefore planned as part of the next steps.Here, model parameters should be specified through a combination of available design data, and parameter estimation using measured operational data.The semantic alignment between SAREF4BLDG and IFC potentially provides an advantage in the task of obtaining the required design information and properties of the modeled devices and systems.
In addition, given the fact that sensor and meter equipment is virtually represented through the saref:Sensor and saref:Meter classes, the task of retrieving and comparing actual readings from the physical system and simulated readings from the virtual system for the same device is conceptually straightforward, which facilitates the implementation of continuous commissioning and parameter estimation.Relating to the demonstration case, the virtual temperature saref:Sensor instance reads the simulated indoor temperature shown in Fig. 7a while the CO 2 saref:Sensor instance reads the simulated CO 2 -concentration shown in Fig. 7b.These readings are logged as saref:Measurement instances and retrieved through the saref:hasFunction and saref:hasSensingRange properties.Therefore, if actual readings from the two physical sensors are represented using the same concepts, the primary communication point from the physical system to the virtual system would thus pass through the physical and virtual saref:Sensor and saref:Meter instances.Here, robust and automated live communication with the physical IoT devices and preprocessing of the data will play a crucial role in the overall DT solution for proper use in continuous commissioning and parameter estimation.

Conclusion and future work
As part of the global energy transition, increasing the smartness of buildings is a key milestone, improving automation, efficiency, and flexibility.As part of this transformation, increased availability, sharing, management, and use of data from IoT devices and sensor networks both entail new opportunities as well as challenges.
Inspired by the demonstrated success and applications of the DT concept in other fields and industries undergoing a similar digital transformation as the building sector, the building DT is now emerging as a promising solution to handle and utilize these data streams for different important services such as fault detection, operational optimization, and strategy planning.
One of the fundamental constituents of a building DT which enables this is a dynamic simulation model.However, to avoid manual and time-consuming modeling workflows, and to ensure the adaptability and scalability of building DTs, close integration between the simulation models and IoT-devices at the building site is essential.
This work presents an innovative energy modeling framework that builds directly upon the semantics of the SAREF ontology with an emphasis on modular data-driven models.With a basis in the SAREF4BLDG extension, different candidate models were considered as extensions for a selection of classes.Using the existing ontology patterns from the SAREF4SYST extension, a generic method for systematically linking model inputs and outputs between components was presented.To enable direct dynamic simulation of the different components and systems, three essential algorithms were then presented.As a proof-ofconcept, the framework was applied to model and simulate a generic single-zone system with common building components and systems.The presented framework is considered the foundation for future work that will consider an actual building as a case study with the integration of actual devices, real-time sensor equipment, parameter estimation, and delivery of different services.This will demonstrate the feasibility of using the DT services to enhance the efficiency and smartness of buildings.
The use of SAREF ontology as a semantic backbone for the modeling framework improves interoperability and integration with the actual physical system and devices and avoids translation to simulation software.This potentially enables broader adoption and scalability of building DTs.In addition, using the multiple domains covered by the SAREF extensions such as industry, energy, water, smart cities, etc., the DT concept could potentially be expanded to consider streets, districts, and cities.

Declaration of competing interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Jakob Bjoernskov reports financial support was provided by Danish Energy Agency.

Coil
A coil is a device used to provide heat transfer between non-mixing media.airFlowRateMax op A common example is a cooling coil, which utilizes a finned coil in which circulates chilled water, airFlowRateMin op antifreeze, or refrigerant that is used to remove heat from air moving across the surface of the coil.
nominalLatentCapacity op A coil may be used either for heating or cooling purposes by placing a series of tubes (the coil) nominalSensibleCapacity op carrying a heating or cooling fluid into an airstream.The coil may be constructed from tubes nominalUa op bundled in a serpentine form or from finned tubes that give a extended heat transfer surface. operationTemperatureMax

Controller
A controller is a device that monitors inputs and controls outputs within a building automation system.A controller may be physical (having placement within a spatial structure) or logical (a software interface or aggregated within a programmable physical controller).

Building Space
An entity used to define the physical spaces of the building.contains op A building space contains devices or building objects.

Fig. 3 ,
simple information flows between the three components are shown.The s4bldg:BuildingSpace component points to a s4syst:Connection object through the s4syst:connect-edThrough property.This s4syst:Connection object represents the indoor temperature output   of the s4bldg:BuildingSpace model extension.Through the s4syst:connectsSystemAt property, it further points to a s4syst:ConnectionPoint object, which represents the input   of the s4bldg:SpaceHeater model extension.This relation is expressed through the s4syst:ConnectionPoint property of the s4syst:ConnectionPoint object.The same concepts are then applied to relate the s4bldg:Valve water massflow output ṁ to the s4bldg:SpaceHeater input ṁ .Altogether, this example shows how information is translated from model output to model input.Correspondingly, the inverse path from model input to model output is expressed through the s4syst:connectsAt, s4syst:connectsSys-temThrough, and s4syst:connectsSystem properties as shown in Fig. 2.This generic framework and the translation from output to input and vice versa at the s4syst:Connection and s4syst:Con-nectionPoint objects provide the flexibility to exchange information between any type of models independently of model formulation and

Fig. 3 .
Fig. 3. Example of data structure showing how SAREF4SYST is used to link models contained in the SAREF4BLDG components through the s4syst:System, s4syst:Connection, and s4syst:ConnectionPoint classes.

Fig. 4 .Algorithm 1
Fig. 4. Example of a cycle produced by a controller, which actuates the opening position of a supply airflow damper of a space.Some model inputs are neglected for clarity (e.g. the setpoint for the controller).Algorithm 1 Removes cycles created by feedback control loops.DFS( , ) is a recursive depth-first search to find the set of reachable components  from the component .

1 :
Let  be the set of all instances with superclass s4syst:System 2: Let  * contain a copy of all the elements in  such that the elements in  * can be modified to contain no cycles 3: Let  be the subset of all instances with the class s4bldg:Controller of  * where the output of each model is shown with the black dashed line while the colored lines show model inputs.The weather data used for the simulation are shown in Fig. 7i as a separate plot, characterized by low solar irradiance and temperatures.Results for the temperature and CO2 models attached to the s4bldg:BuildingSpace instance are shown in the two top plots, Fig. 7a, and Fig. 7b.Fig. 7c shows the inputs and outputs of the temperature s4bldg:Controller instance.As shown, a constant 23 • C setpoint  , is provided as input to the s4bldg:Controller model during the whole period, resulting in a fluctuating valve position, which closes when the indoor temperature   exceeds  , and opens when the temperature drops below  , .From Fig. 7e, the resulting heat consumption Qℎ is shown for the s4bldg:SpaceHeater instance following the same fluctuating pattern, based on the water massflow ṁ .

Fig.
Fig.7bshows the simulation results for the CO 2 model attached to the s4bldg:BuildingSpace instance, where an artificially generated occupancy profile   is used as input.Here, ṁ represents both the supply and exhaust airflow as they are equal.Fig.7dshows the CO 2 setpoint  , is set constant at 600 ppm, which the controller tracks by inversely actuating the supply and exhaust damper positions   .As shown in Fig.7b, the predicted CO 2 -concentration   is sensitive to rapid changes in occupancy and airflow, causing the tracking error to occasionally approach 100 ppm during occupancy periods.The predicted power consumption Ẇ for the supply s4bldg:Fan instance is shown in Fig.7f.As expected, this consumption closely follows the supply airflow ṁ .The exhaust fan power consumption is Fig.7gshows the predicted outlet temperature at the supply side  ,, of the s4bldg:AirToAirHeatExchanger instance.The inputs here are inlet temperature  ,, , i.e. the outdoor temperature, the exhaust inlet temperature  ,ℎ, , i.e. the indoor air temperature predicted by the s4bldg:BuildingSpace model, and the airflows at the supply and exhaust side ṁ .The simulated heat consumption for the heating s4bldg:Coil instance Qℎ is shown in Fig.7h.The consumption closely follows the trend of the supply airflow ṁ , due to a constant setpoint  , and a relatively invariant inlet air temperature  , between 18-20 • C.

Fig. 7 .
Fig. 7. Results of a 4-day winter period simulation for the demonstration case, employing the presented component models and modeling framework.(For interpretation of the colors in the figure(s), the reader is referred to the web version of this article.)

1 :
Let   and   be the start and end simulation time 2: Let Δ be the simulation time step 3:  ← op operationTemperatureMin op placementType dp Air To Air Heat Recovery An air-to-air heat recovery device employs a counter-flow heat exchanger hasDefrost dp between inbound and outbound air flow.It is typically used to transfer heat heatTransferTypeEnum dp from warmer air in one chamber to cooler air in the second chamber operationTemperatureMax op (i.e., typically used to recover heat from the conditioned air being exhausted operationTemperatureMin op and the outside air being supplied to a building), primaryAirFlowRateMax op resulting in energy savings from reduced heating (or cooling) requirements.primaryAirFlowRateMin op secondaryAirFlowRateMax op secondaryAirFlowRateMin op Fan A fan is a device which imparts mechanical work on a gas.capacityControlType dp A typical usage of a fan is to induce airflow in a building services air distribution system.motorDriveType dp nominalAirFlowRate op nominalPowerRate op nominalRotationSpeed op nominalStaticPressure op nominalTotalPressure op operationTemperatureMax op operationTemperatureMin op operationalRiterial op Damper A damper typically participates in an HVAC duct distribution system airFlowRateMax op and is used to control or modulate the flow of air.