Dataset: A proxy for historical CO2 emissions related to centralised electricity generation in Europe

This paper presents data for the estimation of carbon dioxide (CO2) emissions resulting from public generation of electricity in the period from 1990 to 2018 in European countries. The base data used in the calculation of the proxy are the national emissions reported to the United Nations Framework Convention on Climate Change (UNFCCC) and the European Union (EU) Greenhouse Gas Monitoring Mechanism. Subsequently, this data is compiled and held by the European Environment Agency (EEA) from where it is accessed. The emission data is reported aggregated from thermal power stations, district heating plants, and cogeneration in combined heat and power (CHP) plants. We calculate a proxy for emissions by electricity generation alone by combining the emissions from thermal power stations and the share of CPH emissions belonging to electricity generation. The computed data was validated on the period from 2000 to 2015 by comparing it to a secondary dataset. The found emission values of the year 1990 are of particular importance as this is a commonly used emission reference year. The provided dataset, charts and figures can be reused for both analysing past emission evolutions and building models about future electricity generation emissions in Europe. The dataset is freely available in [1]. A subset of the dataset has been applied in “CO2 quota attribution effects on the European electricity system comprised of self-centred actors” [2] to assess the effects of potential total and national CO2 quota attributions in the European electricity system of the near future.

both analysing past emission evolutions and building models about future electricity generation emissions in Europe. The dataset is freely available in [1] . A subset of the dataset has been applied in "CO 2 quota attribution effects on the European electricity system comprised of self-centred actors" [2] to assess the effects of potential total and national CO 2 quota attributions in the European electricity system of the near future.  Table   Subject Pollution Specific subject area Historical carbon dioxide emissions due to public electricity generation in Europe Type of data Table  Chart Graph Figure  How data were acquired Relying on public data entries the dataset was computed and validated using Python 3.7 running through JupyterLab 3.0.7 with the following packages: • NumPy 1.19. From the UNFCCC statistics we use the data subset '1.A.1.a -Main Activity Electricity and Heat Production' and restrict the analysis to the pollutant 'CO 2 '. The emission split is facilitated by assuming a fixed heat efficiency. We use the heat efficiency of a standard heat boiler of 90%.

Description of data collection
The dataset is based primarily on national emission statistics reported by the European countries themselves. The raw data is downloaded from the respective public sources and archived sets are extracted. See the script files (find an overview below) for details on how the raw data files are parsed and how relevant subsets are selected. Historic energy balances from Eurostat are used to disentangle the emission statistics. Lastly, data from the JRC Integrated Database of the European Energy System (JRC-IDEES) database is used to validate the calculated proxy on a narrower time period. The primary data and validation data are publicly available on the internet. Data source location The primary data sources and their accessibility are listed here. The input data consists of the following datasets: UNFCCC emission inventory from [3] , national energy balance statistics from Eurostat [4] , and the JRC-IDEES database from [5] , described in the publication [6] .

Value of the Data
• A proxy for the CO 2 emissions related solely to electricity generation is relevant in several contexts and without it, we cannot track the evolution of pollution from public electricity generation. For each included country we can analyse the past evolution and relate this to future projections and goals. • The dataset is of crucial importance to researchers or industry professionals that are interested in the evolution of country-specific emissions due to electricity generation alone. • The data can be used as a basis to compute future emission projections in the electricity sector or even to build software tools for modelling energy systems. • Since the found emission values are validated using a secondary data set their reliability is high. • The evolution of historic emissions is a significant factor in planning, simulating, and assessing the future energy system transition necessary to mitigate anthropogenic climate change. • The CO 2 emissions of 1990 are of notable importance as they are often used as reference scenarios for future emission reduction goals.

Data Description
As for the general public, also the energy industries have seen a grown emphasis on CO 2 emissions which has increased significantly during the past decade pushing for fast reductions in many places. The year 1990 is often used as a reference year for emission statistics and our emission dataset, therefore, goes back to this year. Since the raw data does not exist, a proxy needs to be calculated which can be done based on different assumptions. We have established a clean methodology for calculating these solely electricity-generation related emissions. It is our belief that we found a good compromise between simplicity and accuracy, requiring only the assumption of a fixed-heat-efficiency. See the following section for further details on the methodology.
The dataset now made available was created aiming at the development of energy system models. Consider the tasks of assessing past electricity-generation related emissions or allocating future emission allowances for electricity generation in Europe. These tasks had not been easily accomplished and relied on making specific assumptions on splitting factors or different types of individual calculations for each study every time. With the publication of this dataset, and the data thereby being openly available, the same validated data with a single set of clearly defined assumptions can easily be applied to multiple future studies. Hence, saving both effort s and time while making different studies more comparable. Furthermore, the dataset can be utilised for other applications too. An example would be educators using the datasets for data mining and visualisation training or promoting understanding of which sectors contributed to emissions by how much during the years.
The following is a description of all provided data in the dataset. This description is organised around the folder structure of the dataset.

Repository root folder
The root folder contains three subfolders that encompass the full dataset, scripts to create it, figures visualising it, and scripts and figures validating the dataset.

Dataset
The dataset folder contains the computed proxy for electricity emissions resulting solely from public electricity generation. Two files are found in this folder: Likewise, this file contains the same values for the same countries and period but also includes the estimated CO 2 emission contribution from autoproducers.

Scripts
The scripts folder contains all necessary computer code to produce both the entire dataset and all accompanying figures (including the validation outlined in the next section). The computation is split into three steps that each are given as a stand-alone Jupyter Notebook/JupyterLab file: This file contains the routines for reading in the raw data, selecting the relevant quantities, and computing the emission proxies. This code produces the files found in the dataset folder.
This code file includes steps to produce the figures visualising the CO 2 emission development for both the European countries together and as single figures for each country. -data_3_validation.ipynb The validation of the computed proxy is facilitated in this code file. In order to compare to the secondary dataset extensive reading of additional data files is necessary. The outcome of this script is two figures found in the validation part of the figures folder (as outlined below).

Figures
All figures are to be found in this folder. It contains three subfolders as follows.

Experimental Design, Materials and Methods
The procedure of data acquisition is separated into two steps: computing of the proposed proxy and subsequently a validation hereof.

Computing the proxy for pure electricity generation emissions
The emissions from public electricity and heat production are only reported aggregated. We detangle this data by calculating a proxy for the emissions related solely to electricity generation. The emission data from the UNFCCC does not include autoproducer contributions. We estimate their contributions and present the data with and without them. In this section, we lay out a methodology to compute the wanted proxy for historical emission values.
National emission statistics are reported to the UNFCCC in a variety of categories and subcategories. An overview of the reporting standards and categories can be found in the IPCC Guidelines for National Greenhouse Gas Inventories [7] . The IPCC emission categories relevant to this study are as follows: • 1A1-Energy Industries • 1A1a-Public Electricity and Heat Production 1A1a1-Public Electricity Generation 1A1a2-Public Combined Heat and Power Generation (CHP) 1A1a3-Public Heat Plants In 1990 the European countries were only required to report the aggregated values of 1A1a and not split this data into its subcategories. Hence, these emission values need to be split into emissions from either heat or electricity generation. The process is further complicated by cogeneration in CHP facilities.
We split the emission data into the emissions from purely electricity, purely heat, and combined heat and power generation according to energy statistics from Eurostat. The latter is split under the assumption of the fixed-heat-efficiency approach into electricity and heat emission contributions. Applying the fixed-heat-efficiency approach, one first fixes the efficiency of heat generation. Afterwards, the input to heat generation is obtained, and subsequently one calculates the input to electricity generation as the residual from the total energy input. The assumed heat efficiency is set to that of a typical heat boiler at 90%.
The emission proxy for electricity generation related emission is calculated as follows: While emissions from electricity generation with estimated contributions from autoproducers are calculated as follows: emissio n el ec−onl y −incl −autoproducers = emission s 1 A 1 a * (( MAP _ EI − ( GH P _ MAP CH P/ 0 . 9 ) ) / Where we have the following Eurostat indicators: TI_EHG_MAPE_E Transformation input-Electricity and heat generation-Main activity producer electricity TI_EHG_MAPCHP_E Transformation input-Electricity and heat generation-Main activity producer CHP TI_EHG_MAPH_E Transformation input-Electricity and heat generation-Main activity producer heat TI_EHG_APE_E Transformation input-Electricity and heat generation-Autoproducer electricity TI_EHG_APCHP_E Transformation input-Electricity and heat generation-Autoproducer CHP TI_EHG_APH_E Transformation input-Electricity and heat generation-Autoproducer heat GHP_MAPCHP Gross heat production-Main activity producer CHP GHP_MAPH Gross heat production-Main activity producer heat GHP_APCHP Gross heat production-Autoproducer CHP GHP_APH Gross heat production-Autoproducer heat And used the following categories: For the full detail of the calculation, the reader is referred to the documentation in the provided code files.

Validation of the computed proxy
Let us turn to validate the dataset. Since the calculated emission quantities are not publicly available, we cannot directly assess their validity. Instead, we compare them to another data source. The JRC-IDEES database from [5] , which is described in detail in [6] , offers exactly the relevant quantities but only on the timespan from 20 0 0 to 2015. Fig. 1 visualises which sectors contribute to the national CO 2 emissions by what degree. Countries with large CHP contributions are especially interesting as a good emission split is of larger importance here. Fig. 2 is comparing the proposed proxy for European electricity-generation related emissions based on the UNFCCC data to the corresponding values provided by the JRC-IDEES database for the EU-27 countries together with Great Britain. The proposed proxy agrees well with the hatched area of the IDEES data representing emissions from public electricity generation and the share from CHP generation attributed to electricity generation.

Combining the dataset with electricity generation data
We presume that the provided data on CO 2 emissions related solely to electricity generation will in some cases be applied alongside data on the corresponding electricity generation.
Electricity generation data can opportunely be obtained from [4] . Conveniently, the script "data_1_calculation.ipynb" found in the scripts folder with this dataset is already reading similar information, including the gross heat generation. The routine can, in straightforward manners, be adapted to read either the total power generation from each country or other associated A not-far-fetched application would be the calculation of the emission intensities resulting from electricity generation. To this end, the emission values in this dataset should merely be divided by the related total power generation amounts.

Ethics Statement
The authors declare that this work does not involve the use of human subjects or experimentation with animals.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.