A semantic web approach to uplift decentralized household energy data

In a decentralized household energy system comprised of various devices such as home appliances, electric vehicles, and solar panels, end-users are able to dig deeper into the system's details and further achieve energy sustainability if they are presented with data on the electric energy consumption and production at the granularity of the device. However, many databases in this field are siloed from other domains, including solely information pertaining to energy. This may result in the loss of information (e.g. weather) on each device's energy use. Meanwhile, a large number of these datasets have been extensively used in computational modeling techniques such as machine learning models. While such computational approaches achieve great accuracy and performance by concentrating only on a local view of datasets, model reliability cannot be guaranteed since such models are very vulnerable to data input fluctuations when information omission is taken into account. This article tackles the data isolation issue in the field of smart energy systems by examining Semantic Web methods on top of a household energy system. We offer an ontology-based approach for managing decentralized data at the device-level resolution in a system. As a consequence, the scope of the data associated with each device may easily be expanded in an interoperable manner throughout the Web, and additional information, such as weather, can be obtained from the Web, provided that the data is organized according to W3C standards.


Introduction
In light of the climate urgency with which humanity has been confronted in recent years, the European Commission 2021 Work Programme has established a "Fit for 55" package intended to reduce greenhouse gas (GHG) emissions by at least 55% by 2030 and achieve a climate-neutral Europe by 2050 [1]. Among a variety of other considerations, energy efficiency is a major focus for the Union's ultimate decarbonization. This makes high energy efficiency a critical priority for all energy sectors, particularly the residential sector [2], which occupies more than a quarter of the Union's total final energy consumption. Energy decentralization has emerged as one of the most popular contemporary research topic in this domain as a mean for increasing energy efficiency [3]. With the growing usage of Information and Communication Technologies (ICT) in the Internet of Things (IoT) sector, data on household energy consumption and production (HECP) may now be generated in a decentralized manner, for example, from an electric vehicle, a heat pump, or home appliances. Due to the range and granularity of data-generating devices, a new generation of smart household energy systems is geared toward decentralization and has the potential to considerably assist in the transition to a sustainable energy future [4,5].
On the other hand, evaluating household energy data is getting increasingly difficult as a result of various smart devices interacting and forming a complex energy flow data network [6,7]. Decentralized energy systems are often paired with research into data-driven technologies (e.g. machine learning) for opti-mizing the systems based on the massive ocean of incoming data in order to manage the inherent risk associated with energy usage's intermittent and unpredictable nature and achieve energy sustainability, including cost reduction, emission reduction, and energy efficiency. However, most of those technologies are developed for project-specific decentralized data (i.e. data is produced by a specific project) to solve problems of a specific energy sub-domain. A significant downside is that these technologies would fail to produce highly realistic and reliable results, as the energy sub-domain is interdependent with other domains [8]. For example, solar energy generation is sensitive to weather condition [9,10,11,12].
A key factor accounting for the data constraints in terms of cross-domain impact is the poor interoperability between the energy systems and other systems such as weather forecasts [13,14]. Assembling data of various sources and establishing interoperability among different heterogeneous systems can take much efforts, since data pieces published by different projects varies largely in terms of naming conventions, data formats(e.g. CSV,JSON...), meta data, and etc [15,16]. With the leading progress of energy studies in the last decade, many energy systems have been developed towards interoperability. This interoperability, however, is restricted to the transmission of messages across systems.
As for interoperability at the knowledge level with respect to the domain, the data and features that may be made accessible and shared, which is currently advanced in many fields in connection to social networks and encyclopedias, is seldom studied for semantic enrichment of energy data [17]. Consequently, researchers are likely to get restricted data from a single project or to gather data within the confines of their limited efforts and then organize them for the purpose of building data-driven technologies [18,19].
Recently, semantic web researchers have made significant progress in establishing knowledge-level interoperability across data from diverse areas, such as climate analysis [20,21]. Relevant to their studies, ontology is critical in the semantic web since it serves as a specialized language for modeling domains shared by heterogeneous entities [22]. The incorporation of semantics into the data transmitted between parties enables a clear conception of the knowledge shared by both parties, hence increasing the effectiveness of data sharing by eliminating misunderstandings [23,24]. Furthermore, the usage of semantic models results in other benefits, such as computational inference and knowledge reuse [25]. They may be used to design systems that are not dependent on the data model, with a high degree of abstraction and flexibility that facilitates system growth, as well as to test the system's knowledge [26] and apply rules using, for example, the Semantic Web Rule Language (SWRL) [27].
In this work 1 , we propose to address these problems by building strong ontology models and recognizing how cross-domain variables, such as weather data, affect energy consumption data. By using these technologies, it is feasible to conduct strategic explorations of semantic relationships between individual energy devices in a decentralized energy network, allowing knowledge to spread throughout the Web and be readily recognized by end users. Simultaneously, it enables the development of decentralized energy data with the capability of integrating any external data sources (e.g., a case study on meteorological data is presented in this work) without the need to re-model the data by redefining the schema or extending the fields of the table if the data is tabular in format.
This will be further explored for the potential improvements on the reliability of today's data-driven [28,29] technologies in the smart energy field.
In summary, this work's novel contributions include: • Creating systematic semantics for decentralized household energy consumoption and production data so as to grant the data knowledge-level interoperability; • Converting household energy data to Linked Data [30] to simplify the integration of Web-wide semantified data from other domains; • A case study to show that connecting external meteorological factors 1 In the spirit of reproducible research, the source code is available at https://github.com/ futaoo/semantic-energy.
with energy consumption/production improves the data understanding and analysis.
The remainder of this article is structured as follows: Section 2 gives the relevant literature that informs our study, including comparisons between our materials and those of others, as well as adoptions of other people's ideas. Section 3 contains material usage, including a description of the raw data and a number of key semantic methods utilized in this work. Section 4 concentrates on the complete proposed workflow for transforming local household consumption data into Linked Data, which provides web-wide access plus knowledge-level interoperability to the local data. The benefits of Linked Data are shown via the augmentation of National Oceanic and Atmospheric Administration (NOAA) meteorological data with household energy data. Section 5 illustrates the use of local household energy data in a Linked Data platform by analyzing solar energy production in relation to temperature. Finally, in Section 7, we summarize our work and outline potential future endeavors.

Related work
An emerging research perspective is to represent the data in an ontology model (defined in Section 3.2.2) such that the underlying relationships between data can be articulated in human words, bringing intelligence to data analysis.
Earlier literature have developed application-specific ontologies and semanticsbased systems. Abid et al. [31] repurposed existing ontologies to develop a defect detection system capable of publishing user complaints of city problems such as water leaks and broken street lights as linked data to aid in the administration of smart cities. Synapse [32] is a semantic web-based annotation system designed to enhance the metadata for all data in smart cities. Some famous applications such as Demand Response (DR) in residential energy systems are also benefited from many semantic enrichment studies [33,34]. The unifying feature of these methods is that the generated models are oriented on the handling of real data, such as user input. The purpose of these models is to improve the introduction of diverse data sources in order to increase the system's completeness. However, many projects have released data for their own reasons; for example, in our work, the original goal of collecting CoSSMic (Collaborating Smart Solar-Powered Microgrids) energy statistics (for more information, see Section 3.1.1), which are now historical records, was to investigate the smart grid system in a city. The fundamental issue addressed by this study is whether semantic web technologies may be utilized to harvest data that has been published for various reasons and then utilised to supplement our own research. This viewpoint is distinct from the majority of semantic technologies built on real data. RDF (Resource Description Framework) is one of the most dependable models for combining different types of data in order to develop applications in the area of smart energy [35]. The RDF models are often used in conjunction with Linked Data principles [30], which are critical for data interoperability.
Numerous academics have developed RDF-based semantics for data collected from smart energy systems. Chun et al. [36] designed Energy Knowledge Graph (EKG) to incorporate existing information about decentralized grids. Wagner [37] developed semantics for the privacy concerns of smart grid device usage.
These models' major weakness is their incapacity to account for the effects of other domains. The interoperability of semantic technologies benefits just the smart energy sector. In contrast to these methods, our research will utilize an RDF knowledge model to integrate multi-domain data into fixed energy data.
Several researchers have previously published several general ontologies for energy data. The primary advantage of reusing a generic ontology is that it may be modified to serve particular purposes and therefore improve interoperability in semantic processing of the dataset. SAREF4EE (the EEbus/Energy@home extension of the Smart Appliances REFerence ontology) is an ontology developed by Daniele at al. [38] for the optimization of energy demand and response. SEAS was introduced by Lefrançois [39], with the goal of enabling interoperability across smart energy sectors. Our study is inspired by prior ontologies for smart energy systems, but it focuses on the decentralized household energy systems as well as ease of inclusion of cross-domain impacts (e.g. climate domain), which has received less attention from other researchers. To complete the climatic impacts modeling portion of our model, we also referred to Wu's [40] ontology CA and Janowicz's [41] ontology SOSA in order to describe the climatic sensor data.

Overview of used data and technologies
In this section, we deliver the several common sources of smart energy and climate data, as well as related semantic web technologies, as the preparation for the workflow description in Section 4. The energy data input, on the other hand, may be changed at whim to accommodate more broad research objectives in the context of smart energy development.

Data description
The next two sub-level sections provide a description of the experimental data utilized in this work's studies.

CoSSMic household energy data
CoSSMic 2 is a smart grid project financed by the EU Framework Programme FP7 for Research and Innovation. Its objective is to optimize energy consumption in households in a German city-Konstanz-by creating intelligent microgrids [42]. The grid network's energy flow is managed by an autonomous ICT system that adapts energy consumption and distributed energy production in real time based on a variety of factors such as availability, pricing, and weather conditions. This decentralized energy flow is enabled through coordinated load shifting, in which power users and producers may negotiate an optimal energy exchange. The investigation's energy data is accessible on the open power system data site 3 and may be utilized for reanalysis. However, the current dataset 2 http://isc-konstanz.de/en/isc/institute/public-projects/completed-projects/ eu/cossmic.html 3 https://data.open-power-system-data.org/household_data/ only includes data on energy flow, and many other critical variables for understanding the energy exchange profile, such as weather data, are missing. The absence of these variables in our study will obstruct the reanalysis process. We try to address this issue in part in this work by republishing the data on the Web using semantic methods in order to acquire more data sources for reference. This approach will improve the usefulness of a dataset on local energy consumption.

Link-climate knowledge graph
Link-climate 4 is a climate observation knowledge graph (KG) that adheres to the concepts of Linked Data (details are provided in Section 3.2) [40]. It offers NOAA Climate Online Data 5 recorded by stations located in many European nations and cities (including Konstanz) through a Linked Data portal. Fig. 1 illustrates a Linked Data representation of a climate station in Konstanz. The climate observation station's data set contains a variety of meteorological measurements, including temperature and precipitation. These data are accessible through the web and provide a flexible interface to any published Linked Data in any domain provided a suitable ontology model can make the connection.

Useful semantic web approaches summarized
This section will summarize the necessary semantic web technologies employed for the modeling purpose.

Resource Description Framework
RDF data model [43] can be represented in a graph using: 1) a node for the subject, 2) an arc that goes from a subject to an object for the predicate and 3) a node for the object. In addition, uniform resource identifier (URI) is normally used to denote the subject, predicate, and object (SPO) in a RDF statement  (except the blanked nodes). Currently, ontologies are usually coupled with the RDF data model to make semantic data interchange on the Web .

Ontologies in short
According to the W3C definition 6 , ontologies (or vocabularies) clearly define words and connections for specific areas of interest. Take a wireless sensor network as an example, ontology can be used to simply refer to different kinds of sensors as "precipitation sensor", "solar radiation sensor", and so on. Similarly, relevant meteorological observation results may be connected to associated sensors through connections such as "hasResults" using the RDF SPO (subject, predicte, object) grammar. By specifying a set of words, a semantic layer of operational human-definable terms is constructed over the data.

Linked Data in short
Linked Data is an area of research that establishes a set of Linked Data prin-  [47]. This can be achieved with SPARQL 1.1, which will be detailed in Section 6.4.

Proposed workflow of linked decentralized energy data
The proposed workflow is to combine the CoSSMic decentralized energy data and link-climate data and then publish them as linked data. The ultimate goal of this process is to broaden the scope of a fixed CoSSMic data such that it can be queried together with the globe linked data on the web. We will begin by providing an overview of the process, followed by sub-level sections that explain each of the workflow's segmented components.

Workflow overview
At the beginning of this section, we provide an overview of the proposed process for semantic improvement of web-wide heterogeneous data (see Fig. 2 for a graphical representation

Ontology modeling
The ontology modeling process for the tabular CoSSMic datasets mainly consists of modeling the table headers (i.e. data fields) and the data entries.

Ontology modeling for table headings
The open power system data set contains raw tabular CoSSMic data, as well as comprehensive description for the table headers. The purpose of this work is to first clarify the meaning of each column heading in the documentation and then to identify the connections between individuals that can be deduced from the table headers. For example, the heading "DE_KN_industrial1_pv_1" has the annotation "Total photovoltaic energy generation in an industrial warehouse building in kWh". A country-Germany ("DE"), a city-Konstanz ("KN"), an industrial building-industrial1 ("industrial1"), and a photovoltaic device-pv1 ("pv_1") can all be extracted. When each energy sector is seen as a system inside the energy network, the heading can therefore be rephrased as the semantic assertion that "pv_1" is a subsystem of "industrial1", which is a industrial building in the German city of Konstanz. Following this approach, the set of table headers can be extended to include connections vertically between the individuals representations within each heading and also horizontally between individuals represented by separate headings. We utilize the SEAS knowledge model [48] to characterize the possible individuals and their connections in order to create a complete ontology model for the CoSSMic data. The following are some of the most often used vocabulary for defining classes and attributes 7 .
• seas:ElectricPowerDistributionNetwork (CLASS) denotes a network used to distribute the electric power; • seas:ElectricPowerTransmissionSystem (CLASS) denotes an electric power transmission system capable of transmitting electricity; • seas:isPoweredBy (PROPERTY) links a System to its powered system and the inverse vocabulary is "seas:powers"; • seas:producedElectricPower (PROPERTY) denotes the produced electric power; • seas:consumedElectricPower (PROPERTY) denotes the consumed electric power; • seas:subSystemOf (PROPERTY) links a system to its super system.

Ontology modeling for data entries
To convert the tabular data to RDF data for further storage as Linked Data, this phase continues to utilize the SEAS ontology to describe actual data objects, their connections, and their associated headings (data fields) in the CoSSMic dataset. This paper treats each record of energy consumption and production from each device as a "Evaluation" specified by SEAS ontology and makes use of the vocabularies "seas:consumedElectricPower" and "seas:producedElectricPower" to differentiate between energy consumption and production. The following are the primary vocabulary (see Figure 4 for a graphical illustration) for modeling data entries: • seas:ElectricPowerEvaluation (CLASS) denotes evaluations for electric power properties; • seas:evaluation (PROPERTY) links a valuable entity to one of its evaluations.; • seas:evaluatedValue (PROPERTY) links an evaluation to the literal (numeric value in this paper);

Uplifting NOAA data
Using semantic methods, an energy distribution network is built in the stages of Section 4.2. By specifying appropriate ontology words, it is simpler to integrate more Linked Data sources accessible on the Web. This will provide additional information that will aid in the comprehension of household energy use and production. To enable semantic integration of meteorological data into the energy network, we add a new term "retrieveWeatherFrom" to the CA ontology [40] (the ontology established in Section 3.1.2 for linked climate data). As stated in Section 3.1.2, the meteorological data utilized for modeling purposes comes from our publicly available linked climate data. To illustrate the structure of the connected climate data, we provide a list of the languages used in the CA ontology and a graphical representation (Fig. 5) of a single temperature record from a Konstanz weather station: • *c:Station a CLASS denotes a station that observes some feature of interest such as precipitation, temperature, etc.; • *c:Observation a CLASS denotes an observation of some feature of interest; • *p:sourceStation a PROPERTY links an observation to the station that it belongs; • *p:withDataType a PROPERTY links the data to its data type; • *p:retrieveWeatherFrom a PROPERTY links an individual to the individual that provides the weather information; • sosa:hasResult a PROPERTY links an observation to its result.
• sosa:resultTime a PROPERTY links an observation to the time when the observation is generated.

Conversion to Linked Data
We transform the CoSSMic data from tabular to graph format, i.e. to RDF, in this step. The CoSSMic RDF data is organized using the previously developed ontology model. We write a Python script that uses RDFLib 8 to convert all tabular data entries to RDF data and then put the resultant RDF in the Additionally, it enables the CoSSMic data to be integrated with other information other than NOAA climate data, such as air pollution, greenhouse gas emissions, and additional meteorological data in other Linked Data platforms.
Section 5 will show how to do queries on the Linked Data platform and provide an example of climate and CoSSMic data analysis.

Linked Data platform analysis of HECP and climate data
The purpose of this case study of linked CoSSMic HECP is to demonstrate that, on a concrete level, knowledge of fixed CoSSMic data can be enhanced through integration with the link-climate knowledge graph, and that, on a higher level of abstraction, the ontology model used to integrate cross-domain data can successfully increase the reliability of data-driven methods developed to understand decentralized energy data. Measuring the improvement on the data comprehension may take a variety of forms and is very dependent on the computational models used. We keep it basic and employ Pearson Correlation Coefficients (PCCs) to explain some additional information that may be added to the CoSSMic energy data. The SPARQL endpoint is used to acquire these two datasets (see Section 5.1), and the PCCs calculated for a large number of smart devices will be examined in detail.

Semantic queries on the combination of HECP and climate data
The disclosed endpoint provides many methods for obtaining the linked HECP and climate data. Users may either conduct inquiries directly on the endpoint interface 10 or incorporate them in their code as HTTP requests. The latter is supported by certain of the endpoint server's HTTP APIs and may be more efficient when doing more complex query evaluations (i.e. consuming computing resources at the client side). The queries should be prepared in the SPARQL 1.1 standard language to establish a graph pattern for obtaining the questions' answers [50,51]. Listing 2 illustrates a query for obtaining data on the time series of solar energy produced by all CoSSMic energy device and their associated weather alignments.

An example analysis of HECP against temperature
By linking HECP to climatic data, a fresh perspective on HECP can be gained. The incorporation of climate data enables a more in-depth examination of the weather's impact on energy use and production. We use PCC analysis to measure the basic linear correlation between the two sources of datasets. PCC analysis is a classic technique for selecting features based on the linearity of the variables. PCC analysis continues to be the most prevalent first step to identify the most significant features in many studies that use machine learning approaches to conduct data analysis, including the HECP [52,53]. PCC analysis is hampered, however, by its inability to comprehend nonlinear correlations and the curvature of the line. A scatter plot depicting the relationship's shape is usually advised to provide more facts on the relationships beyond the plain numerical PCC [54]. The following Table 1    To classify the patterns of HECP variables against maximum temperature in terms of device types, we group all the CoSSMic smart devices into different device categories, calculate PCCs between each item in the same group and TMAX, and use scatter plots to verify that the calculated numerical linearity persists. To keep things simple, we'll mention three kinds of devices that are often utilized by families in CoSSMic projects: photovoltaic (PV), refrigerator/freezer, and grid import. Other device types may be examined similarly to the following sections.

PV vs. TMAX
We commence by demonstrating how additional information may be acquired by doing a correlation study between daily maximum temperature and solar energy production by all CoSSMic PV panels. Table 2

Refrigerator/Freezer vs. TMAX
According to Table 3 of PCCs, except the "industrial3_refrigerator" and "residential6_freezer", a strong linear relationship exists between the refrigera-   Table 3, which should not be simply recognized as linear correlation. The cause for non-linear relationship of these two variables may be due to that the refrigerator/freezer energy consumption vs. TMAX is highly dependent on the households users (e.g. food storage amount, household member number). Therefore, a universal model as illustrated in Section 5.2.1 for this pair of variables cannot be constructed using the available data.

Grid Import vs. TMAX
Similarly to the refrigerator/freezer used by households, grid import is strongly dependent on the energy consumption profile of the household, since grid import is the source energy input for all household appliances. Residential households often utilize less grid electricity at warmer temperatures, as shown in Figure 8.
This may be explained by the peculiarities of the city climate. Certain warm cities may have higher energy costs as a result of the prolonged period of high temperatures [57]. There is no solid shape of the linearity (see Table 4 and

Discussion
This section discusses the value of semantic approaches in terms of enhancing a local energy dataset. The technique used in this study is sample research on the interplay of energy and climate data. However, because of the ontology model's higher-level abstraction, this method may be extended to account for  concerning the following facets to be aware of.

Adding new datasets
A new dataset must be defined and linked to an ontology in order to be included in a SPARQL endpoint as part of the Web of data (or Linked Data), in a manner similar to the process described in this article. However, since the schema (or ontology) for RDF triples is not standardized, users interested in using the data provided by other Linked Data sources must also understand the ontology of the data in order to construct graph patterns using the SPARQL language to identify the desired data. Normally Linked Data should include itemized documentations to aid users in either updating data or making SPARQL queries.

Interlinking diverse data
In many scenarios, such as cross-domain analysis, heterogeneous data sources must be merged into a single dataset. At the ontology level, semantically linked heterogeneous data may be merged effectively with other specialized Linked Data fields. However, certain crucial disadvantages will endure for years. First, it requires a sufficient amount of domain knowledge in the schemas of all relevant datasets, as well as a number of acceptable ontology vocabularies for semantic enrichment [58,59]. Second, it will significantly increase the complexity of the data as the ontologies created by various parties are challenged to be united in compliance with some common protocol. The vocabulary selected for the same associated concept might differ substantially. Thus, ontology creators should take full benefit of the existing ontologies either by reusing them or linking them to their vocabularies.

Increasing data collection to better understand smart energy
We show in this study how one additional data source (NOAA meteorological variables) may be used to enhance our knowledge of a smart energy system in Konstanz. Numerous additional data pieces, such as solar radiation data 11 , other meteorological data (e.g. meteoblue 12 ), and points of interest (e.g. OpenStreetMap 13 ), may also be absorbed to this purpose. Assimilation of these datasets should adhere to the aforementioned criteria in Section 6.1 and Section 6.2.

Federated queries across SPARQL endpoints
The data that is published as Linked Data does not have to be contained in a single SPARQL endpoint. SPARQL endpoints implemented in standard or extended SPARQL allow federated queries with other endpoints, i.e., data spread over many distant endpoints may be utilized directly as sources and retrieved in any endpoint through SPARQL queries [60]. Linked Open Data 14 has collected a collection of open Linked data sources from which users may obtain data useful for energy data processing.

Conclusion and Future Work
We provide a process for converting a local decentralized household energy consumption and production dataset (CoSSMic) to a Linked dataset and republishing it over the web in this article. This study details the procedures necessary to accomplish this transformation, and as a consequence, the scope of the local dataset is extended, and wider perspectives of meteorological information are added to the dataset through semantic approaches. The final study of this dataset using temperature and precipitation data at the device level shows 11 https://www.dwd.de/EN/ourservices/solarenergy/maps_globalradiation_average. html 12 https://www.meteoblue.com/en/weather/archive/export/konstanz_germany_2885679 13 https://www.openstreetmap.org/ 14 https://lod-cloud.net/ how different sources of meteorological data may be beneficial for data-driven modeling jobs. In the future, we intend to work on additional semantic methods such as GeoSPARQL and temporal RDF, as well as a broader variety of energy-related datasets such as remote sensing data, energy markets, and policy data, in order to develop a robust interoperable ontology model capable of serving a nearly complete knowledge-sharing energy data ecosystem spanning multiple domains, thereby improving understanding of the decentralized energy distribution mechanism.