Constructing a Method for an Evaluation Index System Based on Graph Distance Classification and Principal Component Analysis

Based on the importance of having an evaluation index system, a new method that combines PCA with graph distance classification is presented to make up the deficiencies of principal component analysis in the process of index screening, and this method is applied in the construction of an evaluation index system for the environmental quality of decommissioning uranium tailing. /e seepage indexes were classified into six classes using graph distance classification, which selects the representative elements, including pH, 􏽐 α, Pb, Po, F, and NO3. All of the representative elements were analyzed by PCA while determining the seepage indexes, including pH, U, Ra, 􏽐 α, NH4-N, and F, and establishing an index system for environmental quality evaluation that consists of two primary indexes (seepage and radiation environment) and 12 secondary indexes./e results showed that the model had ensured that the sifted indexes had a significant effect on the evaluation result and avoided the deletion of some important indexes and that it had stronger applicability and maneuverability.


Introduction
e selection of an evaluation index and whether the index system is reasonable or not have an essential effect on the evaluation results.us, how to select an index in a complicated and enormous index system is a problem in the construction of an evaluation index system for the environmental quality of decommissioning uranium tailing.If all of the indexes are evaluated, then the computing effort of the information processing will be greatly increased because of the excessive indexes.However, if only a few indexes are selected individually, then a large amount of the original data's information could be lost, which causes inaccuracy of the evaluation results.
Principal component analysis (PCA), as an evaluation index screening method, could not only substitute a small number comprehensive variables for the original multidimensional variable and obviously simplify the data structure under the precondition of minimizing the loss of the original data's information but also avoid subjectivity and arbitrariness.In the application, PCA is widely applied in index selection, evaluation, and prediction, with strong applicability [1][2][3][4][5].At the same time, there are also some defects.For example, the contribution rate of the discard component on the data analysis could be similar to the chosen principal component, or there is a high level of correlation between the major elements of the chosen principal component and the elements of the discard component, which is prone to loose the important indexes and affect the evaluation results.
To make up for the deficiency of PCA, a new classification method that combines the correlation coefficient [6] with the shortest path theory [7,8], namely, graph distance classification (GDC), was presented.Combining GDC with PCA, GDC was used to classify the indexes, and the representative elements selected from the classification were analyzed by PCA; this approach can simplify the data and reduce the redundant indexes to simplify the evaluation index system.is method was applied to simplify the evaluation index system for the environmental quality of decommissioning uranium tailing, which can reduce the computing effort of the information processing and ensure the rationality of the index system.

Construction of the Method of an Evaluation
Index System e establishment of an evaluation index system based on graph distance classification and principal components analysis can be divided into two stages: index classification and index screening, as shown in Figure 1. e first stage is the selection of the initial indexes, and then, the indexes are classified and selected by graph distance classification and principal components analysis.Eventually, the simplified index system is determined.In the graph distance classification method, the highly correlated elements are ascribed to the same class.en, the representative elements that are selected from the classification are analyzed by PCA, which can reduce the information processing workload.According to the screening results of the PCA, the selected indexes and the same class indexes serve as important indexes when constructing the index system, which can avoid the loss of important indexes.

Index Classification Model Based on a Graph Distance Classification
3.1.Data Preprocessing and Standardization.Because of subjective and objective reasons, there are some cases in which data are missing in the sample data.us, the polynomial interpolation method is adopted.At the same time, the dimension of the unit value indexes often varies, and to eliminate the influence of the dimension, the standardization method of data [5] is often used to make the indexes dimensionless.
Let x i (i � 1, 2, . . ., n) be the observed value of a certain index, and let x i ′ be the standardized data of the index.e formulas of the standardization are and x and s represent the mean and standard deviation of the observed values, respectively.

Determination of the Distance between the Indexes Based on the Correlation Coefficient.
Based on the correlation coefficient, the reciprocal of the correlation coefficient is used as the distance between the indexes.e higher the correlation coefficient between the indexes is, the shorter the distance is.e formula for the correlation coefficient is where r ij is the correlation coefficient between index i and index j.Here, x ki and x kj represent the observed values of index i and index j of the evaluation object k, respectively, and x i and x j represent the average values of index i and index j, respectively.Let the reciprocal of the correlation coefficient be the distance between the indexes.In other words, we have

Calculation of the Shortest Path Based on the Floyd
Algorithm.e complete weighted graph G is determined according to the distance between the indexes.e initial index is used as the vertex of graph G, and the shortest path between the vertexes is the shortest distance between the indexes.
Let each index correspond to one vertex v in graph G. Suppose that graph G is a complete simple graph with the vertex set Let the determining weights of each edge (v i , v j )(i ≠ j) be e graph G is said to be a complete weighted graph if each edge has determining weights w ij .Calculate the shortest distance between any two vertexes based on the shortest path algorithm.Let d ij ′ be the shortest distance between vertexes v i (i � 1, 2, . . ., k) and v j (j � 1, 2, . . ., k).Calculate the shortest path between any two vertexes by using the Floyd algorithm.e steps of the Floyd algorithm are as follows: (i) Input the weight matrix W � [w ij ] n×n of the complete weighted graph.(ii) For vertex v i and vertex v j , in the adjacency matrix, when we have e ik + e kj < e ij , the data must be updated, using e ik + e kj instead of e ij .Repeat the steps until the shortest path is found and determine the shortest path matrix Here, e ij is the distance between vertex v i and vertex v j , namely, the determining weights of the edge (v i , v j ).Additionally, e ik and e kj represent the determining weights of edge (v i , v k ) and edge (v k , v j ), respectively.
3.4.Index Classification.Index classification, in essence, is a partition of vertex sets in graph G.In other words, we have 2 Advances in Materials Science and Engineering where ρ is the distance parameter, which is determined according to the actual situation.

Selection of the Representative Element.
According to the shortest distance relationship graph of all of the classes, the sum of the distance between one index and the other indexes can be calculated.e smaller the distance is, the closer the relationship is.e minimum distance index will be used as the representative element to be analyzed by the PCA.e representative elements for all of the classes are shown in Table 1.

Principal Component Analysis.
e basic model of principal component analysis [9] is as follows: where x i is the index i, z j is the principal component j, and l ij is the principal component load of index i in principal component j. e concrete steps of selecting the index based on PCA are as follows: (i) Calculate the correlation matrix R (r ij ) p×p (r ii 1) of the standardized data of the index.(ii) Calculate the eigenvalues and eigenvectors of the correlation matrix R, variance contribution rate, cumulative contribution rate, and factor load of the principal components.(iii) Select the principal component and determine the number of principal components according to the eigenvalues or cumulative contribution rate.(iv) Screen the index according to the absolute value of the factor load of the principal component.e larger the absolute value of the factor load is, the more signi cant the in uence of the index on the evaluation results is.Such an index should be retained.

Application Example
e construction of an environmental quality evaluation index system of decommissioning uranium tailing is taken as an example to realize the process of making and modifying the index system.Graph distance classi cation was used to classify the indexes, and the principal component analysis was used to select the indexes.

Selection of the Initial Evaluation Index.
According to the evaluation purpose and considering the availability and integrity of the existing monitoring data, the construction of an environmental evaluation index system focuses on the pollution angle and selecting the seepage index and radiation index as the primary indexes.e seepage indexes include pH, α, β, U, Ra, 230 , , 210 Po, 210 Pb, Mn, NH 4 -N, F − , SO 4 2− , NO 3 − , Zn, and Cd. e radiation indexes include the radon concentration, radon exhalation rate, α aerosol, c, surface α, and surface β.

Index Classi cation.
is paper takes the seepage indexes of decommissioning uranium tailing as the research object, and the water monitoring data of six monoliths (A-F) of a decommissioning uranium tailing is targeted as the sample data.
e original data originate from the environmental monitoring report of a decommissioning  Advances in Materials Science and Engineering uranium tailing.e standardized data of the seepage indexes are shown in Table 1.
e standardized data of the seepage indexes were substituted in formula (2) to calculate the correlation coefficient, which is shown in Table 2. e distances between the indexes were the reciprocals of the correlation coefficients.e vertexes of graph G are determined according to the number of indexes.Let each index be a vertex v of graph G, and the vertex set is Calculate the shortest path by the Floyd algorithm, using MATLAB programming, which is the shortest distance d ij ′ between each pair of indexes, as shown in Table 3.
Given ρ � 1.2, according to the shortest distance (Table 3), the vertex set is divided into six categories.
which satisfies the conditions According to the construction method of graph G, six subgraphs (G 1 , G 2 , . .., G 6 ) were obtained, as shown in Fig- ure 2. e result of the index classification is shown in Table 4.

Index Screening.
Taking G 2 as an example, the relation graph of the shortest distance in graph G 2 is shown in Figure 3. e sum of the distances between U and the other indexes is 3.296, and the other indexes (Ra,  α, and NH 4 -N) have the sums 3.3977, 3.1888, and 3.2659, respectively.It can be seen that  α is minimized, which indicates that the relation is the closest. is finding is the reason why  α was chosen to be the representative element of graph G 2 .e other representative element is shown in Table 4.
According to the standardized data of the representative element, principal component analysis can be realized by SPSS [11].ere are two principles for the selection of the principal component: an eigenvalue greater than 1 and over 85% of the cumulative contribution rate.In this paper, the first and second principal components were extracted, as shown in Table 5.In the first principal component, we select the indexes for which the absolute value of the factor load is over 0.9.In the second principal component, we select the index with the largest absolute value of the factor load.Table 6 shows that pH and F − have a higher load in the first principal component and  α has a higher load in the second principal component.Advances in Materials Science and Engineering

Determination of the Simpli ed Index System.
According to the screening results of the PCA, the selected indexes have a signi cant impact on the evaluation results and are applied as the important indexes to construct the index system.Because the same types of indexes exist relativity often in the index classi cation, the same types of indexes also have a signi cant impact and were also used as important indexes.e seepage indexes were classi ed into 6 classes through the index classi cation, and the representative elements were selected.
e screening results of the PCA are shown in   6 Advances in Materials Science and Engineering Table 6.e retained indexes were used as the important indexes of the evaluation index system, and the same class indexes as the retained indexes were also used as important indexes.e simpli ed index system of environmental quality assessment of decommissioning uranium tailing is shown in Figure 4. (2) On the basis of the index classi cation, the representative elements were analyzed by PCA, selecting the retained indexes, including pH, α, and F − .Considering the classi cation, there were 6 indexes that remained in the nal evaluation index system, including pH, U, Ra, α, NH 4 -N, and F − , which is consistent with the analysis results on the main pollution sources of seepage in the environmental monitoring report.

Conclusions
(3) In the graph distance classi cation method, the highly correlated elements were divided into one class.en, the representative elements selected from the classi cation were analyzed by PCA, which can reduce the number of indexes when using PCA and the information processing workload and avoid repeated information analysis.According to the screening results of the PCA, the selected indexes were used as the important indexes with which to construct the index system, which indicates that the selected indexes had a signi cant impact on the evaluation results.At the same time, the same class indexes as the retained indexes also had a signi cant impact and were also used as important indexes, from which the loss of important indexes could be prevented.e method combined with graph distance classi cation and principal component analysis can make up for the de ciencies of PCA, which is suitable for multi-indexes and complex systems and favors the further application of PCA on the construction of index systems.

Data Availability
e data used to support the ndings of this study are available from the corresponding author upon request.Advances in Materials Science and Engineering Determination of the initial evaluation index Calculation of the correlation coefficient Calculation of the shortest path with the Floyd algorithm Graph distance classification Index classification Selection of the representative element Principal component analysis Index screening Determination of the simplified index system

Figure 1 :
Figure 1: Flow chart for the construction of the index system.

Figure 4 :
Figure 4: e sifted index system for environmental quality assessment in decommissioning uranium tailing.
v 16  .Let d ij be the weights w ij of each edge (v i , v j ), and the weight matrix W � [w ij ] n×n is produced, where W is

Table 1 :
Standardized data of the seepage indexes.

Table 3 :
e shortest distance between each pair of indexes.

Table 2 :
Correlation coefficient matrix.Table 2 is reproduced from Mikoni [10] (under the Creative Commons Attribution License/public domain).

Table 4 :
e results of the index classi cation.
Figure 3: e relation graph of the shortest distance (G 2 ).

Table 5 :
e eigenvalue and variance contribution of the principal component.

Table 6 :
e factor loading matrix of the principal component.