Multivariate analyses to establish reference values for soils in Médio

- Quality Reference Values (QRVs) for potentially toxic elements are obtained from background levels in soils. However, this determination from mean values or percentiles is not appropriate given the variability in the natural distribution of these elements in soils. Therefore, the objective of this study was to propose a new methodology to establish the QRVs, using the Médio Paraíba region (RJ, Brazil) as an example, from groups of soils defined based on the pseudo-total levels of B, Ba, Co, Cr, Cu, Ni, Pb, and Zn through the use of multivariate analyses and discriminant functions. A total of 40 points, collected at depths of 0-20 and 20-40 cm, were used for the determination of pseudo-total contents, according to the EPA 3051A methodology. The samples were separated into three groups to better represent the variability of the soils of the region. The classification functions were obtained based on the variables Mn, Fe, and Mg. In general, the groups G1 and G2 presented lower values than the ones obtained when the sample universe was used, whereas G3 presented higher values. The QRVs obtained from the soil groups presented substantial differences that translate into advantages for the management of the contaminated areas of the region.


INTRODUCTION
The determination of the background levels of potentially toxic elements in soils, as well as the proposition of Quality Reference Values (QRVs), is essential for the construction of legislation that serves as the basis for the monitoring and remediation of these areas.The use of lists with guiding values has been common practice in countries that have developed an efficient environmental policy for the protection of natural resources.In these cases, contamination of resources is recorded when concentrations of the elements of environmental interest reach a level above a set threshold, called the guiding value.Internationally, although no uniformity exists regarding the nomenclature used (background values, baseline values, guiding values, reference values, trigger, etc.), the guiding values represent the basis of the soil and groundwater protection policy (CETESB, 2001).In Brazil, among the guiding values proposed by Resolution 420 of the National Council for the Environment (Conselho Nacional do Meio Ambiente -CONAMA) of 2009 are the Quality Reference Values, which indicate the natural concentration of a substance in soils that have not been influenced by anthropic actions (CONAMA, 2009).
In recent years, several research groups, from Brazil and from other countries, have directed their research toward obtaining reference values specific to each study region (ALFARO et al., 2015;ALMEIDA JÚNIOR et al., 2016;BINI et al., 2011;BIONDI et al., 2011;COSTA et al., 2014;PRESTON et al., 2014;RÉKÁSI;FILEP, 2012;SANTOS;ALLEONI, 2013).However, an evaluation of the obtained values allows clear distinctions to be made between them.This is because these values depend on the composition of the source material, the pedogenetic processes, and the degree of soil development, which are specific to each environment.
The state of Rio de Janeiro does not yet have soil QRVs, and values determined for other Brazilian states are being used.However, the use of QRVs from another locality can lead to errors in evaluations due to the diversity of geomorphoclimatic conditions.
The region of Médio Paraíba, located in the south of the state of Rio de Janeiro, has a very intense industrial activity, where several cases of contamination have occurred, harming the quality of life in the surrounding population.Therefore, a decision was made to study the background levels of the potentially toxic elements of this region, thus providing support for understanding the relation between the soil characteristics and the distribution of these elements in these areas.
To establish the QRVs of inorganic substances, the Brazilian legislation recommends the use of the 75 th or 90 th percentile of the sample universe, with the anomalous values (outliers) being previously withdrawn (CONAMA, 2009).However, the determination of QRVs from simple statistical criteria, whether based on the mean or percentiles of the frequency distribution of the data, has limitations.According to Paye, Mello, and Melo (2012), the main limitations are related to the variability of the physical and chemical attributes of the soils, which determine the distribution of the potentially toxic elements.In addition, the removal of values considered outliers may represent another error.Therefore, particularities can occur in an area that could generate natural values of potentially toxic elements above those observed for the other areas under study.Thus, further in-depth individual investigations into the origin of such values are needed (PRESTON et al., 2014).
The establishment of reference concentration ranges has been performed through different statistical procedures with the aim of minimizing this problem, where multivariate statistical methods, which allow the simultaneous interpretation of a large number of variables, have been the most used (FADIGAS et al., 2006;PAYE;MELLO;MELO, 2012).
The objective of this work was to propose the use of multivariate analyses and discriminant functions as a new methodology for the establishment of QRVs, using as an example the Médio Paraíba region -RJ, for groups of soils defined based on the pseudo-total levels of B, Ba, Co, Cr, Cu, Ni, Pb, and Zn.

Soil selection
Soil samples were selected from the Médio Paraíba region -RJ, whose collection points were obtained from a joint analysis of the soil, geology, use, and land cover maps of the state of Rio de Janeiro, all on a 1:500,000 scale.The maps were overlaid using ArcGIS Desktop 10, produced by ESRI; to obtain the sampling points, the cLHS -Conditioned Latin Hypercube System was used.In this way, 40 points, considered sufficient to obtain all the variability of environmental covariates, were chosen for the collection of soils in areas of minimal anthropic interference in the Médio Paraíba region -RJ.The samples were collected at a minimum distance of 200 m from the roads, thus avoiding interference from particulates emitted by vehicles.
All points were georeferenced using GARMIN ® GPS navigation equipment, and the information (geographical coordinates datum WGS84, and municipality) for each point is presented in Table 1.

Soil collection and preparation
The collection and preparation of soil samples were performed based on procedures described in Embrapa (2009).For each sampling point, small trenches were opened with the aid of a straight blade, where samples of approximately 2 kg were collected at depths of 0-20 and 20-40 cm.To avoid possible contamination, samples of the lower depth were collected first.
The soil samples were air dried using white paper and a veil-type screen to avoid contamination.After drying, they were crushed with an agate mortar, homogenized, and passed through a stainless steel sieve with a 2-mm opening (10 mesh) in preparation for the physical and chemical analyses and through a sieve with a 0.149-mm opening (100 mesh) for the soil digestion procedure.

Soil analysis
The chemical and physical analyses were performed on the 80 collected samples and consisted of organic carbon, pH in H 2 O (1:2.5), exchangeable cations, potential acidity, base sum calculation, cation exchange capacity (CEC), and quantification of the grain size fractions (sand, silt, and clay fraction) according to the methods proposed by Donagemma et al. (2011).
For the determination of the B, Ba, Co, Cr, Cu, Ni, Pb, and Zn in the soil, the samples (0.500 ± 0.001 g) underwent wet digestion in a closed system in a MARS Xpress® digester for 8 min and 40 s until reaching 175 °C according to the SW-846 3051A method (USEPA, 2007); a v/v ratio of 3 HNO 3 :1 HCl was used.For the determination of the element concentrations, the extracts were analyzed in an Inductively Coupled Plasma Optical Emission Spectrometry (ICP-OES) device.
The quality control of the analysis was carried out with SRM 2709 San Joaquin soil certified material, from the National Institute of Standards and Technology (NIST).The levels obtained were compared to the leachable concentrations, as well as to those concentrations recommended by NIST for pseudo-total digestion methods, i.e., partial sample digestion (NIST, 2003).When compared to the leached values, the recovery rates of the certified samples were satisfactory, above 60%, indicating the efficiency of the method.

Statistical analyses
The descriptive statistical analysis of the contents of the potentially toxic elements was performed in a Microsoft ® Office Excel ® spreadsheet.The Pearson correlation coefficients were determined in SAS (2010) software, with a statistical significance level of 1% probability.
The Euclidian distance and the Ward method were used as a measure of similarity in the cluster analysis (CA).For this, the mean pseudo-total values of eight elements (B, Ba, Co, Cr, Cu, Ni, Pb, and Zn) were used as clustering variables (standardized data for mean 0 and variance 1).For selection of the most appropriate number of groups, a cross-validation was performed in the discriminant analysis, using the same discriminatory variables used in the clustering, assuming an equal covariance matrix and equal classification probabilities for the groups.
After the groups were defined, successive discriminant analyses were performed to reallocate the observations within the groups until reaching a 0% classification error rate by the cross-validation method.This procedure is defined in this work as Reallocation with Discriminant Analysis (RDA).
After the groups were obtained, the soil variables that best discriminated them were selected.For this purpose, the proc stepdisc of the SAS statistical program (2010) was used, with backward and stepwise selection criteria at a 5% probability for the model variables' input or output.Subsequently, the stability of each set of variables selected by the proc stepdisc was verified, based on the coefficient of variation of the multiplier constants (β) of the discriminant model and on the error rate by cross-validation.After the soil variables were defined, the classification functions of each group were then elaborated.
The QRV of each element was established based on the general mean values, mean of each group, and percentiles (75th and 90th) of the frequency distribution of the results, as suggested by Conama (2009), with the percentiles also being determined for each group formed.

Correlation analysis between potentially toxic element contents and soil attributes
Pearson's correlations among the B, Ba, Co, Cr, Cu, Ni, Pb, and Zn contents and of these contents with the chemical attributes (pH, P, K + , Ca 2+ , Mg 2+ , Al 3+ , Na + , Corg, CEC, Fe, and Mn) and the granulometric (silt and clay) levels at a significance level of 1% are presented in Table 2.
According to Pearson's correlation analysis, the B, Ba, Co, Cr, Cu, Ni, Pb, and Zn elements showed high correlations with each other (Table 2).According to Sheng et al. (2012), these elements possibly were associated with the same soil source material and were released as weathering products of these rocks.Positive and significant correlations with Fe and/ or Mn were observed for all elements, suggesting a high geochemical affinity between them.In this study, the Fe and Mn contents were considered soil attributes, as they were directly related to the content of the Fe, Mn oxides, hydroxides, and oxyhydroxides of the soils.Correlations of Fe and Mn with potentially toxic elements are associated with the formation of secondary oxides that have high adsorption capacity (BURAK et al., 2010;SIPOS et al., 2008).According to Frierdich, Hasenmueller and Catalano (2011), Fe and Mn oxides control the distribution and speciation of potentially toxic elements in soils weathered by adsorption, occlusion, and coprecipitation reactions.
B and Cr presented similar behavior because the correlation was high among them (72%), and therefore, they were similarly influenced by Al 3+ , pH, and Fe.
Because the elements Ba, Co, Ni, and Zn had a very close correlation with each other (r > 70%), they showed similar associations and were positively correlated with pH, Ca 2+ , Mg 2+ , and Mn, and they showed a negative correlation with clay.Ba presented a high correlation coefficient with Mg (77%).Kabata-Pendias and Mukherjee (2007) reported that Ba behaves similarly to Ca 2+ and Mg 2+ and may even replace these elements in the structure of carbonate minerals.
Pb correlated with Na + , Mn, and silt, and Cu correlated with Mg 2+ , Fe, and Mn.

Classification of samples into groups
Figure 1 shows the dendrogram resulting from the cluster analysis, which was performed using the B, Ba, Co, Cr, Cu, Ni, Pb, and Zn contents of the 80 soil samples.The bond distance greater than 1.25 times the standard deviation of the bond distance among all the observations was adopted as a cutoff point (MILLIGAN; COOPER, 1985).This criterion suggested the formation of two or three groups.
To better elucidate the choice of the number of groups, the error rate of the cross-validation in the discriminant analysis was used separately for each analytical repetition.By means of this error rate, the formation of two groups and of three groups that presented low values (less than 20%) was verified.We then chose three groups, as a greater number of groups gives a greater distinction of variability in the samples.The number of samples composing the groups were as follows: group 1 (G1), composed of 39 observations; group 2 (G2), composed of 26 observations; and group 3 (G3), composed of 15 observations, totaling the 80 observations.
To select soil variables that classify the groups formed and provide a better measure of the discriminant function, the procedure was performed in two stages.The first step consisted of using the proc stepdisc procedure (SAS, 2010) through two selection criteria-backward and stepwise-in three training samples, which resulted in eight sets of variables (Table 3).The most common variables sets were verified as Mg-Fe-Mn-CEC-Corg, with 12 observations, followed by Mg-Fe-Mn, with eight observations.The second step consisted of verifying the stability of the variables, based on the coefficient of variation (CV%) of the multiplier constants (β), by using the variables of the discriminant function for all sets of variables obtained in the first step (Table 3).Only the Mg-Fe-Mn and Mg-Fe-Mn-Clay models presented β with CV% less than 50% in the two clustering procedures (CA and RDA), but the Mg-Fe-Mn-Clay model presented a higher classification error rate relative to the Mg-Fe-Mn model, indicating that it is therefore less appropriate than the Mg-Fe-Mn model.
By selecting the soil variables Mg, Fe, and Mn, the discriminant analysis allowed the elaboration of a function for each group, generating a classification model (Table 4), where new samples can be allocated within the established groups.The framing of the group to which the observation belongs will be defined by the highest value generated between the discriminant functions (SAS, 2010).

Characterization of groups
Figure 2 presents the standardized data (mean equal to 0 and variance 1) of the contents of B, Ba, Co, Cr, Cu, Ni, Zn, and Pb and of the soil attributes for each group.Group 1 (G1) is verified as being the one with the lowest levels of potentially toxic elements; it also has the lowest values of Fe, Mn, Mg, Ca, and pH, with means of 23,331.1 mg kg -1 ; 97.0 mg kg -1 ; 0.2 cmol c dm -3 ; 0.7 cmol c dm -3 ; and 5.1, respectively.
The highest values of B and Cr were obtained in group 2 (G2), where the highest values of Fe, Al, and Multivariate analyses to establish reference values for soils in Médio Paraíba, state of Rio de Janeiro, Brazil  On the other hand, the highest background levels of Ba, Co, Cu, Ni, Pb, and Zn were represented by group 3 (G3), which also presented the highest levels of Mn, Mg, Ca, and pH, whose averages were 601.2 mg kg -1 ; 1.3 cmol c dm -3 ; 2.5 cmol c dm -3 ; and 5.8, respectively.This confirms the Pearson correlation analysis (Table 1), where a positive and significant correlation was observed between the Mg and Mn contents and these six elements, except for Pb, which had no significant correlation with Mg.Clay presented a negative correlation with the G3 elements, explaining its low values in this group (mean of 278.2 g kg -1 ).
Thus, the relations between the soil attributes and the potentially toxic elements are evident; these relations corroborate the results obtained with the Pearson's correlation analysis (Table 1), and they confirm that the selection of the variables Mn, Fe, and Mg to compose the discriminant function classification is appropriate.

Establishment of Quality Reference Values (QRVs)
Table 5 shows the background levels of the B, Ba, Co, Cr, Cu, Ni, Pb, and Zn elements in the soils of the Médio Paraíba region -RJ; these levels are represented by the overall means and by the 75 th and 90 th percentiles for the total set of samples and for the three classification groups.In this study, the 75 th percentile (P75), or the upper quartile, was considered in the establishment of the QRVs because it presents an intermediate condition between the mean levels and the 90 th percentile.The elimination of 25% of the data becomes more appropriate because it ensures a greater and better environmental assessment, as well as contributing to human health safety.Since the areas without anthropic interference in the Médio Paraíba region are limited, this means that the lithological units are not all represented by the points sampled; thus, the adoption of the 75 th percentile becomes more adequate.
However, the adoption of groups better contemplates the variability in soil samples and presents greater homogeneity within the groups.This behavior can be verified by the coefficients of variation, which were µ-mean; CV-coefficient of variation; T-total sample Table 5 -Background levels of B, Ba, Co, Cr, Cu, Ni, Pb, and Zn (mg kg -1 ) in the soils of the Médio Paraíba region -RJ.Overall means, coefficients of variation, and 75th and 90th percentiles of the sample universe and groups smaller in the groups than in the total sample (Table 5).In this way, the use of the 75 th percentile of the groups, instead of the 75 th percentile of the sample set, is recommended since both the more restrictive and the more permissive situations would be represented.Thus, the acquisition of a range of variation is possible for the levels of potentially toxic elements in the soils of the region.
The B levels found in the soils of the Médio Paraíba region -RJ, both in the 75 th percentile (P75) and in the 90 th percentile (P90), are higher than the reference values established by Copam (2011) for the soils of Minas Gerais state (11.5 mg kg -1 ), and the lowest value was 74 mg kg -1 in the G1 mean.
Ba presented great variation in the overall contents, according to the coefficient of variation; this was also observed between the groups.The 75 th percentile ranged from 28 to 135 mg kg -1 in the groups.However, G1 and G2 presented levels below the value determined by Copam (2011), which is 93 mg kg -1 .The 90 th percentile of G3 presented a Ba value of 198 mg kg -1 , which is higher than the prevention value (150 mg kg -1 ) adopted by Conama (2009).
Co levels, in general, approach those levels found in other Brazilian states, varying in the P75 of the groups from 3 to 15 mg kg -1 , within the range established by Fadigas et al. (2006), which was 2-20 mg kg -1 .In G3, the values found were 15 mg kg -1 in P75 and 16 mg kg -1 in P90, which were higher but very close to the value of Cetesb (2014) of 13 mg kg -1 .
In general, the elements Cr and B varied little between the groups (40 to 58 mg kg -1 in P75), as they presented low coefficients of variation.In addition, throughout the study, B and Cr presented similar dynamics, proving the high correlation between them.

Figure 1 -
Figure 1 -Dendrogram formed by Ward's Cluster Analysis and Euclidian distance as a function of the mean contents of three analytical replicates of B, B, Ba, Co, Cr, Cu, Ni, Pb, and Zn

Figure 2 -
Figure 2 -Mean of the groups standardized for a total mean equal to zero and variance 1 after classification with the discriminant model Mg-Fe-Mn

Table 1 -
Information of the selected points for soil collection in the Médio Paraíba region -RJ in areas of lower anthropogenic interference

Table 2 -
Pearson correlation coefficients among the background levels of B, Ba, Co, Cr, Cu, Ni, Pb, and Zn and the soil attributes *significant values at 1% probability highlighted 1 Organic carbon ; 2 Cation exchange capacity at pH 7.0

Table 3 -
Characteristics of discriminant models with soil variables for training subsamples *Number of occurrences in the selection of variables by the STEPDISC procedure; 1 groups formed by cluster analysis, 2 groups formed by reallocation with discriminant analysis

Table 4 -
Discriminant functions of classification with the variables Mg, Fe, and Mn for the formation of three groups of observations