Analysis of greenhouse gas emissions in the European Union member states with the use of an agglomeration algorithm

Abstract The use of fossil fuels as sources of energy is related to the emission of pollutants into the atmosphere. The implementation of international commitments on reducing emissions requires their continuous monitoring. The main energy resources for electricity production in the world include fossil fuels, i.e. oil, coal and natural gas, and according to projections their dominant role in the market of energy resources will persist for at least the next two decades. The aim of this article is to analyse the level of differentiation of European Union member states in terms of emissions of four greenhouse gases and to identify groups of similar countries based on these criteria. Such studies will provide information that will enrich our knowledge about the contribution of each European Union country to the emissions of greenhouse gases. This article uses a taxonomic method - cluster analysis, namely the agglomerative algorithm, which enables the extraction of objects that are similar to each other from the data and then to merge them into groups. In this way, a number of homogeneous subsets can be obtained from one heterogeneous set of objects. European Union countries make up the objects of segmentation. Each of them are described by their level of greenhouse gas emissions, such as carbon dioxide, methane, nitrogen oxides and nitrous oxides. Groups of homogeneous countries are distinguished due to total emissions and due to the level of their emissions per capita. Analysis is based on annual Eurostat reports concerning greenhouse gas emissions.


Introduction
The concept of sustainable development is now having increasing impact on the economic situation of European Union countries. The assumptions of sustainable development relate to the combined economic, social and environmental spheres, and are directed at the maintenance and stimulation of economic growth whilst taking into consideration social welfare, as well as the quality of the environment as being of utmost importance. Unfortunately, for several decades, a sharp increase in greenhouse gas (GHG) emissions throughout the world has been observed. According to the Kyoto Protocol, greenhouse gases include seven gases: carbon dioxide (CO 2 ), methane (CH 4 ), nitrous oxide (N 2 O), and four fluorinated gases (F-gases). Different GHGs stay in the atmosphere for different lengths of time. Carbon dioxide is the most common GHG emitted by human activities. In 2014, in EU countries CO 2 emission accounted for 81% of total emissions, methane for 10.6%, N 2 O for 5.6%, and F-gases for 2.9% (EEA, 2016). The main emission sources in EU countries are: fuel combustion 55.1%, transport 23.2%, industrial processes and product use 8.5%, agriculture 9.9%, and waste management 3.3% (Eurostat, 2016).
A comparison of emission sources in 1990 and 2014 is shown in Fig. 1.
The use of fossil fuels as sources of energy is related to the emission of pollutants into the atmosphere. The implementation of international commitments on reducing emissions requires their continuous monitoring.
To meet its commitments to the 2005 Kyoto Protocol, the European Union created a system of measurement and limits for emissions of GHGs (UNFCCC, 2008). With the objective of reducing emissions of GHGs, the EU introduced three flexible systems: emission trading system (ETS), join implementations (JI) and clean development mechanism (CDM) (Ranosz, 2008). The EU Emissions Trading System (EU ETS) is Europe's flagship tool to meet its carbon mitigation objectives. It remains the largest example of emissions trading in operation today, encompassing over 11 500 installations across 30 countries and covering approximately 40% of total EU emissions. The three most challenging key areas of evaluation are emissions abatement in relation to the balance with economic objectives, investment and innovation impacts, and profits and price impacts. These are presented in (Laing, Sato, Grubb & Comberti, 2013).
The dynamic development of the economy and the growth of the human population is closely related to the continuing growth in demand for electricity. Oil and coal represent a significant share in the global supply of primary energy sources, as in 2014 they contributed 31% and 29% respectively. The emission of CO 2 from coal is greater than the emission from other fuels such as oil or natural gas, and accounted for 46% of the global CO 2 emissions, although coal only represented 29% of the world TPES (Total primary energy supply) in 2014 (Fig. 2). For Poland, the mining industry is a strategic sector of the economy and coal accounts for nearly 90% of electricity generation and for many years it has remained the primary source of energy in Poland (Dubi nski & Turek, 2014).
Discussions are continuously ongoing regarding the impact of GHG emissions from human activities, in particular CO 2 emissions, that are dangerous for climate change. Since the Industrial Revolution, the annual CO 2 emissions from fuel combustion increased from approximately 40 Mt of CO 2 in 1813 to more than 35.85 Gt of CO 2 in 2013 (Fig. 3). The Intergovernmental Panel on Climate Change (IPCC) concluded that most of the observed increase in global average temperatures since the mid-twentieth century is very likely due to the observed increase in anthropogenic greenhouse gas concentrations (Pachauri & Meyer, 2014). Different opinions are presented by the Nongovernmental International Panel on Climate Change (NIPCC), which represent arguments that it is nature, not human activity that governs climate (Idso, Carter, Singer, & Soon, 2013).
Currently, the subject of GHG emissions is very important because the policy of the European Union and electricity sector laws, including those relevant to the mining industry, are focused on implementing the strategy of sustainable development, mainly through the development of technologies which use renewable energy resources and the development of associated heat and  A. Kijewska, A. Bluszcz / Journal of Sustainable Mining 15 (2016) 133e142 electricity production. Sustainable energy policy is aimed at improving the long term welfare of society by striving to maintain a balance between energy security, satisfying social needs, economic competitiveness, and environmental protection (Lorek, 2011).
In 1994 the United Nations Convention made a commitment to Poland to develop and implement a national strategy for reducing greenhouse gas emissions, including economic and administrative mechanisms and the periodical monitoring of its implementation, as indicated in the "Climate Policy of Poland" (2003). Coal mining is a strategic sector of the Polish economy and plays a key role in ensuring the energy security of the country, however, it requires continuous restructuring, in order to adjust itself to market conditions (Karbownik & Stachowicz, 1994;Karbownik, Turek, & Pawełczyk, 2001;Bluszcz & Kijewska, 2015;Jonek-Kowalska, 2014;Korski, Tob or-Osadnik, & Wyganowska, 2015; and also requires the development of innovative technologies of coal combustion, which may have an impact on the reduction of the At the beginning of March 2011, the European Commission adopted an action plan, which aims to reduce GHG emissions by 85e95% by 2050 in comparison to 1990. The energy policy model proposed by the European Commission aims to reduce CO 2 emissions by 40% and 60% respectively by 2030 and 2040. The ongoing policy corresponds with the internal emission reduction of 30% in 2030 and 40% in 2050 (Ciepiela, 2011). However, the predictions made by Korban and Manowska (2011) indicate that the directives of the EU to reduce CO 2 emissions by 20% compared to 1990 emissions will not be met. Real and projected emission reductions in EU countries are shown in Fig. 4.
Greenhouse gas emissions in the European Union since 1990 (taken as the base year) was reduced by 24.4% in 2014, i.e. about 1383.4 million tonnes of CO 2 eqv. The largest emitters of greenhouse gases in the European Union are: Germany, Great Britain, France, Italy, Poland and Spain. The combined share of these six countries of EU gas emissions in 2014 was over 70%. On nearly 95% of the EU's emissions 16 of the 28 member states has an impact (Fig. 5). Among these countries, the most significant decrease in emissions from 1990 to 2014 was achieved by Romania (more than 56%) and the Czech Republic (nearly 37%), while there has been an increase of emissions in three countries: Spain (15%), Portugal (6.4%) and Ireland (3.7%).
It should be noted, however, that the level of global CO 2 emissions is currently to the greatest extent affected by countries such as China, the United States, India, Russia, Japan, Germany, Korea, Canada, Iran and Saudi Arabia, and they represent 75% of global CO 2 emissions.
The aim of this article is to analyse the level of differentiation of the EU countries in terms of emissions of four types of greenhouse gases and to classify them into homogeneous groups. The article presents the taxonomic methods that allow the classification of various objects characterized by certain features. The subjects of this study are EU countries, and the features describing them are the emissions of GHG, such as: carbon dioxide, methane, nitrogen oxides, as the main "Kyoto gases" and additionally nitrous oxides, which are also an important family of air polluting chemical compounds. In this study one taxonomic method was used, this being cluster analysis. The groups of homogenous countries were distinguished by their total level of emissions and due to their level of emissions per capita. Analyses using agglomeration algorithms were conducted based on the annual Eurostat reports concerning the levels of GHG emissions of the EU member states. The resulting division of countries into clusters allows the criteria to be specified, directing special attention to the groups of countries constituting the greatest threat to the environment. Also, the result of grouping can act as the basis for a diversified policy of reducing GHG emissions in relation to particular clusters of EU countries. A further step may be to look for characteristic features of the countries included in the group.
Cluster analysis is used in many fields, e.g. in medicine to segment diseases or patients (Clinton, Button, Norring, & Palmer, 2004), in marketing for customer segmentation (Punj & Stewart, 1983;Wagner, Scholz, & Decker, 2005), and in the classification of countries or regions, according to various criteria (Bluszcz, 2016). It also applies in anthropology, archeology, and ethnography. Recently it has been widely used in research on artificial intelligence and pattern recognition. From the standpoint of GHG emissions, to our knowledge, there has only been one study which uses a non-hierarchical method, i.e. k-means . In this study, we offer a hierarchical method e the method of agglomeration.
The rest of this article is organized as follows: discussion of cluster methods, in particular the method used for analysis e an agglomeration method, then the results of calculations, discussion and, finally, conclusions.

Methods
Researchers, scientists, and analysts are often faced with the need to organize a data set or various objects in a meaningful structure. Taxonomic methods in these situations are widely used because they only relate to the problems of classification and data analysis. The scope of this issue includes, among others: grouping objects, properties of taxonomic procedures, taxonomy structures, taxonomy time series, the theory of multivariate distributions and their mixtures, division optimization, classification and regression, graph theory and data mining. The following are among taxonomic methods (Figura, 2013): -methods of structural classification; cluster analysis, -methods of classification ranking e linear ordering of objects, -methods of the identification of representatives of classes.
This article uses a structural classification method called cluster analysis, which uses methods and techniques that enable the extraction of objects from the data set that are similar to each other and to merge them into groups. As a result, from one heterogeneous set of objects a few homogeneous subsets can be obtained. These objects which are in the same set are considered "similar to each other" and objects from different subsets are treated as "dissimilar" (Mikut, 2009). The similarity criterion is either (a) distance: two or more objects belong to the same cluster if they are close according to a given distance e distance-based clustering e or (b) common concept: two or more objects belong to the same cluster if this cluster defines a concept common to all those objects e conceptual clustering. In clustering techniques, no a priori information about classes is required; that is, neither the number of clusters nor the rules of assignment into clusters are known. They should be discovered exclusively from the given dataset without any reference to a training set (Girão, Postolache, & Pereira, 2009).
If we take distance as a criterion of similarity, then we can use different metrics. When all the components of the data set are in  the same physical units, the popular and relatively simple Euclidean distance can be used.
The Euclidean distance is calculated as the square root of the sum of the squared differences between the coordinates of a pair of objects (Giudici, 2003): Another version of the Euclidean distance is Squared Euclidean distance. It gives greater weight to objects that are further apart.
Minkowski distance is the generalized metric distance: (2) because for p ¼ 2 the distance becomes the Euclidean distance, and for p ¼ 1 it becomes city block distance. Chebyshev distance is a special case of Minkowski distance with p ¼ ∞.
Cluster analysis involves many different algorithms. These can be broadly divided into hierarchical and non-hierarchical algorithms.
In hierarchical clustering, a treelike structure (dendrogram) is created through recursive partitioning (divisive method) or the combining (agglomerative method) of existing clusters. In the agglomerative clustering methods, initially it is assumed that each observation is a small group consisting of only a single element. Then, in succeeding steps, the two closest clusters are aggregated into a new combined cluster. In this way, the number of clusters in the data set is reduced by one at each step. Finally, all elements are combined into a single big cluster. Divisive clustering methods begin with all elements in one big cluster, with the most dissimilar elements being split off recursively, into a separate cluster, until each element represents its own cluster. Among non-hierarchical k-means methods, probabilistic clustering and the methods of self-organizing are distinguished (Larose, 2005).
As previously mentioned, in the context of hierarchical cluster analysis, methods can distinguish agglomerative techniques and divisive techniques. In practice, the most commonly used are agglomerative methods. In these methods, we assume that a set of N objects that should be grouped and a distance matrix (or similarities) N ) N are given. The algorithm can be expressed in several of the following steps (Giudici, 2003;Stanisz, 2007): 1. Assign each object to one cluster (N objects, N clusters). 2. Calculate the distances (similarities) between all objects. This can be done using different methods of linkage (ex. single linkage, complete linkage, the average linkage, weighted average linkage, the centroid method, or Ward's method). 3. Find the closest (most similar) pair of clusters and combine them into one single cluster; therefore, we get one cluster less. 4. Calculate the distances (similarities) between the new cluster and others. 5. Repeat steps 2, 3 and 4 until the moment when all objects are grouped together in one single cluster of size N (n-1 iterations).
As mentioned in step 2, clustering methods differ only in the way of calculating the distance between clusters. Below is a brief description of the most commonly used (Giudici, 2003;Mooi & Sarstedt, 2011). Graphic interpretation of some of the methods is shown in Fig. 6.
In the method, a single linkage or a nearest neighbourhood distance between two clusters is equal to the shortest distance between each object belonging to one cluster and each object belonging to another cluster. Thus, the distance between the two groups is defined as the minimum from distances between each of the n 1 observations belonging to the cluster C 1 and each of the n 2 observations belonging to the cluster C 2 : dðC 1 ; C 2 Þ ¼ minðd rs Þfor r2C1; s2C2: (3) In the method of complete linkage or farthest neighbourhood the distance between two clusters is equal to the largest distance between each object belonging to one cluster and each object belonging to another cluster. Thus, the distance between the two groups is defined as the maximum distance between each n 1 observation belonging to cluster C 1 and each n 2 observation belonging to cluster C 2 : dðC 1 ; C 2 Þ ¼ maxðd rs Þ for r2C1; s2C2: (4) In the average linkage method (unweighted pair-group average), it is assumed that the distance between one cluster and the second one is equal to the average distance from each element of one cluster to any element of the second cluster: The method of weighted pair-group average is similar to the previous one, except that in the calculation, the size of the cluster (i.e. the number of elements contained in it) is treated as the weight.
In the centroid method (unweighted pair-group centroid) the distance between two groups C 1 and C 2 , at respectively n 1 and n 2 observations, is defined as the distance between the respective centroids (usually the averages), x 1 and x 2 : A centroid of a new cluster is calculated according to this formula: x 1 n 1 þ x 2 n 2 n 1 þ n 2 It is worth noting the similarity between the average linkage method and the centroid method: the average linkage method considers the average distance between the observations of each of the two groups, whereas in the centroid method the centroid of each group is calculated and then the distance between the two centroids is calculated.
The method of weighted pair-group centroid (median) is identical to the centroid method, except that weighting is introduced into the calculations to consider differences in cluster sizes (i.e., the number of objects contained in them).
Ward's method minimizes the objective function using the principle that the aim is to create groups that have maximum internal cohesion and maximum external separation. This method attempts to minimize the error of the sum of squares (ESS) of any two (hypothetical) clusters, which can be created at each step. This method is generally considered to be very effective, however, it tends to form clusters of small size.

ESS
where x ij is the j-th object in the i-th cluster, n i is the number of objects in the i-th cluster. Each of the algorithms described can lead to very different results based on the same set of observations. In general, it is believed that a single linkage algorithm is the most versatile. The complete linkage method is strongly influenced by outliers, as it is based on the maximum distances. Clusters built by this method can be quite compact and tightly clustered. The average linkage and centroid algorithms tend to form clusters with a relatively low withincluster variance and similar sizes. However, both these methods are under the influence of outer elements, although this influence is not as strong as in the complete linkage method.
Ward's method is the most different among the aforementioned methods. When enlarging one of the clusters, the intragroup variance grows (calculated by the squares of deviations from the average in clusters); this method involves combining groups that provide the smallest increment of the variance for a given iteration. This method works well when clusters of relatively similar sizes are expected, and the set of observations does not contain outliers (Mooi & Sarstedt, 2011).
Since the presented algorithms are quite laborious, there are some programs such as STATISTICA, SPSS, STATA, ClustanGraphics, or others, which can be quite useful. By using them, cluster analysis, including the agglomerative method, can be performed. In these programs the results of clustering are illustrated in the form of dendrograms.
The agglomerative method does not directly indicate what the optimal number of clusters is. Unfortunately, there is no objective rules to determine this number. Intuition, experience and substantive knowledge of the tested objects does matter, while some methods are proposed for estimating the number of clusters.
The easiest way is to analyse the obtained dendrograms from the point of view of the differences between successive nodes. A large value of differences is tantamount to large cluster distances e in this point the division can be made. The graph of the course of the agglomeration can be used and this shows the distance between the clusters at the point of joining. A high jump on this chart is synonymous with great distance between each cluster. This jump is a place of division. There are also some formulas that were proposed by Grabi nski or Mojena (Stanisz, 2007).
The research was performed to include four types of emission of greenhouse gases: carbon dioxide (CO 2 ), nitrogen oxides (NO x ), methane (CH 4 ) and nitrous oxide (N 2 O). Using a STASTISTICA package two perspectives were taken into consideration: the size of the total greenhouse gas emissions in the EU and the emissions in these countries per capita. Data was obtained from the website of EUROSTAT (2016). This data covers 25 EU countries (without Malta, Cyprus and Luxembourg), and additionally Norway, Switzerland and Turkey.
Each cluster method has its advantages and disadvantages. In this study, a hierarchical agglomerative method was selected. The choice was guided by several factors. Firstly, assuming in advance the number of clusters was not needed, as it is e.g. in the method of k-means. Secondly, the ability to visualize the results of grouping in the form of a dendrogram, which is more intuitively understandable. Moreover, if necessary, a different number of clusters can be chosen without the need for recalculations and a new interpretation of the results can be made. Of the five approaches to solving agglomeration algorithm (discussed previously), considering their advantages and limitations the complete linkage method was chosen. Using this method, the segmentation of EU countries into clusters will be conducted in terms of total emissions and emissions per capita.

Results and discussion
To achieve comparability between the data collected, normalization was conducted by standardization according to this formula (Morzy, 2013): where: x i S x are respectively, the mean and standard deviation of the variable in the sample. Standardized values of greenhouse gas emissions by EU country for the year 2012 are presented in Table 1.
As previously mentioned, the grouping of objects (countries) was conducted using the agglomerative algorithm with complete linkage. Incidentally Ward's method gave very similar results.
The division into clusters by the complete linkage method is as follows (see Fig. 7
For each cluster the mean values of emissions were calculated. In this way, the characteristics of each cluster can be determined.
As can be seen from Table 2, grouping countries with the method of complete linkage, cluster III e Germany e has the highest CO 2 , NO x and NO 2 . Emissions of CH 4 are at a similar level as in clusters I and III. In second place in terms of greenhouse gases is cluster I, and in third place is cluster II. Countries from cluster IV have the lowest emissions of all gases.
Greenhouse gas emissions, namely carbon dioxide, nitrogen oxide, methane and nitrous oxide has also been analysed from the perspective of emissions per capita. Standardized data is presented in Table 3.
Based on analysis of GHG emissions per capita, a different structure of homogeneous countries has been created. Also, in this analysis complete linkage was chosen. In this case Ward's method provided almost the same clusters. The analysis of dendrograms allowed four clusters to be distinguished.
Thus, in a complete linkage method the following clustering was established (see Fig. 8): Cluster I e Denmark; Cluster II e Ireland;  As can be seen in Table 4, Denmark was divided into a separate cluster because it had the highest emissions per capita of CO 2 and NO x . Ireland, in turn, was spun off into a separate cluster because it had the highest emissions per capita of CH 4 e three times more than the average for all EU countries. Group III is characterized by the lowest values of all greenhouse gas emissions per capita Although cluster analysis is widely used, it is difficult to find publications on the use of these methods for grouping countries in terms of greenhouse gas emissions. Kijewska and Bluszcz (2016) used the method of k-means for clustering EU countries with respect to GHG emissions. They also assumed the number of clusters to be equal to 4, however, the assignment of countries to groups is a little different from that shown in this study. On the other hand, Kolasa-Wię cek (2013) uses the k-means method to group OECD countries in terms of agricultural GHG emissions. Fuzzy cluster analysis is used in (Xia et al., 2011) to group industrial sectors (in China), in terms of several indicators that describe energy security, efficiency and carbon emission. If we assume that the grouping of countries according to the criterion of GHG emissions is important then we do not ask whether cluster analysis should be used, but which method is best.

Conclusions
The agglomerative methods presented in this article are universal and can be widely used in business practice. The use of clustering takes place due to the following premises: -reducing the large amount of collected information to a few basic categories, which enables easy orientation in a multidimensional phenomenon and the establishment of a typology in terms of the study issues and reasoning; -the identification of homogeneous objects of analysis, in which it is easier to extract systematic factors and possible causal relationships; -reducing the amount of time and research costs by limiting consideration to the most important facts with little loss of information and the relatively low probability of receiving the results of distorted analysis. The results of the analysis complete information about the different levels of greenhouse gas emissions in the European Union. The classification of EU countries in terms of total greenhouse gas enables the determination of groups of countries with similar emissions, the determination of the largest issuers and the group of countries which contribute least to pollution. On the other hand, information about GHG emissions per capita is equally important. As demonstrated, the resulting clustering do not coincide with those concerning total emissions of GHG. Among the largest emitters of greenhouse gases are Germany, the United Kingdom, France, Turkey, Poland, Italy and Spain. However, in the classification of gas emitters per capita the negative leaders are Denmark and Ireland. The latter has been classified as a single cluster mainly due to its high methane emissions.
Poland is in a group of average issuers. It is worth noting that considering emissions per capita of CO 2 , NO x and NO 2 Poland has a lower value than for example Germany, but in the whole cluster No. IV has the highest methane emissions per capita.
The resulting division of countries into homogeneous clusters enables the specification of the criteria that influenced such a division, special attention to be directed towards the group of countries which constitute the greatest threat to the environment. This division of countries should also be the basis for further debate in the context of the methodology of strategic guidelines for granting emission limits to EU countries.