Abstract
The behavior and possible contamination risk due to the presence of potentially harmful metals (PHM) were studied based on 2250 soil samples that were collected in a 5-year period (2013–2017) from the plain of Thessaly (prefectures of Karditsa, Trikala, and Larissa). The vertical distribution of metals was also investigated from sample profiles at three depths 0–30, 30–60, and 60–90cm. The soils of the sampling belong to four taxonomy soil orders that are dominant in the studied area (Alfisols, Inceptisols, Endisols, and Vertisols). In a novel approach, robust quadratic regression analysis on multiple variables was used to define prediction models of the concentrations of two metals: Fe which is an essential metal and the toxic Cd. Linear and quadratic regression formulae were estimated based on the iteratively reweighted least squares robust regression approach in an effort to eliminate the impact of the outliers. These formulae define how several soil properties affect the distribution of the considered metals in each soil order. The evaluation of the estimated regression equations based on the R2 metric indicates that they constitute a useful, reliable, and valuable tool for managing, describing, and predicting the pollution in the studied area.
Similar content being viewed by others
Introduction
The increased amounts of heavy metals in the environment during the recent past can be attributed to a large increase of metals in soils (Kanwar et al. 2020). Heavy metals have an atomic density greater than 5 g cm−1. It is well known that some of them, in low concentrations, are necessary for the smooth growth of living organisms (Sall et al. 2020). However, in higher concentrations, they have a negative impact to the growth mechanisms, causing restrictions in the growth or even poisoning of enzyme systems, and therefore death of vital organs (Steffan et al. 2018; Bartkowiak et al. 2020). That is the reason that the term potentially harmful metals (PHM) is most often used to describe them.
Soils are known to have geochemical origins, and therefore the mineralogical composition of the parent material determines the type and content of minerals both at the surface soil horizon and soil profile (Kopittkea et al. 2019; Bam et al. 2020). Anthropogenic activities also contribute to an increased concentration of metals in soils such as the use of fertilizers, pesticides, biosolids and manures, wastewater and slurry, metal mining and milling processes, and industrial wastes along with air-borne sources which have been defined as anthropogenic sources of pollution (Wuana and Okieimen 2011; Navarro-Pedreño et al. 2021; Li et al. 2021). Atmospheric deposition of emissions from the industrial activity, from the continuous and increasing movement of vehicles, and from the continuous and long-term use of sewage sludge along with the application of fertilizers on agricultural land (Molina-Roco et al. 2018) are direct sources of minerals (metals and metalloids) in urban and agricultural environments. The contamination of agricultural land by metals and metalloids is highly concerning due to their potential effects on human health in contaminated areas (Jabeen et al. 2020). On the other hand, the population increase results in increased social activities and waste generation. Better waste management is needed to reduce the environmental impact from the presence of metal contaminants in the soil (Artiningsih et al. 2020). Urban development causes an increase in the volume of municipal waste, which is harmful to the environment and human health. Using the appropriate mathematical models, it has been calculated that in 5 years, given the development of society and economy in China, the population will continue to grow. By 2021, the volume of solid waste will be more from 10,000 tons (Liu et al. 2019). Food waste management estimates that worldwide, 1.3 billion tons/year of food are disposed of in landfills (including edible and inedible foods) (Loizia et al. 2019; Zorpas 2020). The exact identification of the sources of metals as well as other pollutants in an area is a critical process since it would lead to targeted measures and policies that could potentially reduce the additional pollution deposits. Factor analysis and correlation analysis, along with principal component analysis, can be used to evaluate possible contamination sources in an area (Li et al. 2021)
PHM can be incorporated, gathered, concentrated, transferred, and accumulated in crops, groundwater, and subsequently in the human body through the food chain. Contamination of soils with PHMs is widespread in the world and poses a long-term risk to environment and human health mostly because of their non-biodegradable nature (Kopittkea et al. 2019). The chemical form of metals or metalloids in the soil solution advocates and determines their solubility and therefore their availability to plants, as well as their potential leaching to the deeper layers of the soil profile (Molina-Roco et al. 2018; Kanwar et al. 2020). Several soil properties are known to affect PHM mobility in the soil solution, such as soil reaction, cation exchange capacity (CEC), redox potential, soil structure, and clay and organic matter content (Xu et al. 2021). Under certain conditions, the contamination of soils can have a significant impact on the sustainability of groundwater. Several conceptual models can be used to qualitatively assess the groundwater sustainability levels (Alexakis et al. 2021).
Metal concentrations are indicative of the depth distribution because of their geochemical origin. The enrichment of PHM in soils with high clay content and alkaline reaction strongly implies that the distribution of metals is effectively regulated by soil properties and reductive dissolution of Fe and Mn oxy-hydroxides (Van Groeningen et al. 2020). In the temperate zone and especially in soil profiles without any distinct layer, the reduction of the concentration of metals with depth is common (Shaheen and Iqbal 2018). This is largely attributed to weathering and the leaching of PHM. Si leaching is dominated by the weathering regime resulting in the residual accumulation of Al and Fe oxides (Chen et al. 2016). Since heavy metals are strongly sorbed to Fe oxides, they may accumulate in surface horizons of desilicated soils (Caporale and Violante 2016). High concentration of Fe oxides may also result in a strong sorption of anthropogenic PHM in topsoils. However, high precipitation rates and frequently good drainage conditions in soils may reduce the extent of accumulation or even compensate for it (Wilcke et al. 2000).
The Greek Thessalian agricultural soils date back to early Greek history have been cultivated for over 5000 years. They belong to different taxonomy orders and differ in their origin and physicochemical properties. Four soil orders are most common in agricultural soils in the Thessalian plain: Alfisols, Inceptisols, Endisols, and Vertisols (Golia et al. 2019).
Using statistical techniques and machine learning approaches to monitor, explain, describe, and predict the movement and concentration of metals in soils often leads to mathematical equations that are valuable tools for scientists (Srigiri et al. 2014). Regression analysis denotes a family of machine learning techniques which has been extensively used to address problems in several research areas including life and earth sciences. It aims at establishing a prediction model connecting a response to a set of predictor variables, where the parameters of the prediction model can be estimated from data samples (Wang et al. 2018; Ogbeh et al. 2019). In its simplest form, regression analysis is based on the linear assumption which defines a linear fit to the sample data. This model can be efficient when the linear assumption concerning the distribution of the paired observations (response vs predictors) is sufficiently accurate (Vereecken et al. 2016). However, the linear hypothesis is clearly inadequate in many cases, including several datasets investigated in this research. A greater flexibility could be introduced using stochastic piecewise linear functional forms in the regression models (Diakoloukas and Digalakis 1999); however they typically require larger amounts of data. A better alternative is to simply transform the dataset using simple polynomial or trigonometric basis functions. For instance, a basis function commonly used in the environmental sciences is the logarithmic functional form (Wang et al. 2018) which can be applied to either the response (log-linear) or the predictors (linear-log) or both (log-log) (Arkes 2019). Other common curvilinear regression forms use quadratic regression variations (Wang et al. 2018; Ogbeh et al. 2019; Fang et al. 2019) which are second-order constraints of the polynomial functions. The main advantage of these alternatives is that they remain linear in their parameters since they can be seen as transformations on the linear model. Therefore, the same linear least squares method can be used to perform the parameter estimation. Nonlinear approaches have also been used such as those based on support vector machines (SVMs), artificial neural networks (ANNs), or random forest to define environmental prediction models (Lima et al. 2013). However, these approaches are more complex and harder to train when smaller amounts of data are available and are prone to overfitting. Thus, the choice of the most appropriate regression approach is not always straightforward and is highly dependent on the type, distribution, and volume of the available data.
The novelty of the present research mainly resides in the following: (a) Through a 5-year field research in different soil orders of Thessaly, the levels of potentially harmful metals (PHM) in surface soils as well as their vertical distribution are studied; (b) the chemical, physical, and taxonomic factors that determine or influence PHM levels in soils were investigated; and (c) a series of robust regression formulae are estimated from the available data denoting a relationship of the concentration of two metals (Fe and Cd) to several soil parameters and coexisted metals. The proposed prediction equations serve as a solid mathematical framework which could be used to adequately describe, capture, and predict the pollution of these metals in the studied area. Furthermore, the adopted methodology could possibly be generalized spatially and temporally.
Materials and methods
Descriptive characteristics of the sampling area
Thessaly is a geographical region in the central part of continental Greece consisting of four prefectures. Its land is mostly a large plain surrounded by a ring of high mountains. The area is of great interest since intensive cultivation takes place; therefore, fertilizers and pesticides are used which probably contribute to soil pollution. The dominant climate in the whole region can be characterized as warm continental, with distinct summer and winter seasons and summer rains which augment the fertility of the plains. The annual average temperature reaches 14°C with an average precipitation of 388 mm.
The present study was carried out in three of the prefectures of Thessaly aiming at investigating the behavior of metals in four different soil orders (Fig. 1). For this purpose, 70 soil samples were collected from the prefecture of Karditsa (37 Alfisols and 33 Inceptisols). Forty-six soil samples from Trikala prefecture were classified as Endisols, while 34 soil samples were classified as Vertisols and were collected from Larissa prefecture (Soil Survey Staff 1999; Golia et al. 2019). During the 5 years (2013–2017) of the investigation, 750 samples were collected (150 per year). Furthermore, for each soil sample, their properties were studied at three depths, as follows: 0–30cm, 30–60cm, and 60–90cm. Thus, a total of 2250 soil samples were collected and analyzed during the 5 years of the survey.
Soil sample collection process
Soil sampling sites were representative of the main soil orders that appeared in Thessaly region. In every site, a small pit was dug with a wooden shove, and a 90 cm long soil profile was obtained which was consequently divided into three layers (0–30, 30–60, and 60–90 cm) in bottom-up order (IAEA 2004). The collected soil samples were labeled, stored in polypropylene bags, and then air-dried and ground using porcelain mortar and pestle. The ground soil samples were sieved through a 2-mm sieve and were used for all the laboratory analyses.
Soil chemical and physical analysis
Soil samples were analyzed as follows: soil pH and electrical conductivity (EC) was measured in deionized water with a soil solution ratio of 1:1 (JAOAC 1984). Particle size distribution was measured using Bouyoucos hydrometer, and the organic matter was determined by the modified Walkley-Black (wet digestion method) (McLean 1982; Nelson and Sommers 1982). Cation exchange capacity (CEC) was estimated by the ammonium acetate (pH 7) saturation method. Total “free” Fe and Al oxides were measured according to citrate-carbonate-dithionite method, using NaCl and acetone, while amorphous oxides were extracted using ammonium acetate and ammonium oxalate method in dark room shaking for 2 h (Page et al. 1982).
“Pseudo-total” potentially harmful metal concentrations was measured according to modified BCR method using aqua regia (with HCl: HNO3 3:1) and after 2 h digestion (JAOAC 1984; ISO/DIS 11466 1994; Golia et al. 2007). The concentrations of all the metals studied were determined using atomic absorption spectrophotometry (AAS) with flame and/or the graphite furnace techniques (Tokalıoğlu et al. 2001) according to their detection limits. Analytical grade phosphoric acid, nitric acid, and hydrochloric acid were purchased from Merck and used for all digestion and laboratory analyses. Glassware and plastic containers used in the study were cleaned using 20% HNO3 and rinsed with distilled water before use. Metals standard solutions (1000 mg L−1) were prepared from “Titrisol” Merck. For the evaluation of the analytical methods, a certified reference material (CRM) (No 141R, calcareous loam soil) from BCR (Community Bureau of Reference) was analyzed with the soil samples.
Statistical procedures
Statistical t-tests were used to determine metal enrichment in soil profile at the level of 95%. A one-way ANOVA test was applied to evaluate possible significant differences in results obtained from different taxonomy soil order (Kargoll et al. 2018). Using SPSS v26, the arithmetic average values along with the standard deviation and relative standard deviation of PHMs concentrations were calculated to describe the data variation. Means that were statistically different were separated using the least significant difference (LSD) (Wang et al. 2020). Two replicates for each data point were averaged prior to the statistical analyses.
Regression analysis
Regression analysis can be seen as a curve fitting process for a given set of data samples (Arkes 2019). Its goal is to estimate a function y = fθ(x) + e, where fθ(.) is some arbitrary function with parameter coefficients θ and e denotes a source of noise. The above equation defines the relationship between a response (or dependent variable) y and one or more predictors (or independent variables) x = [x1, ⋯, xd]T. Typically, the functional form fθ(x) is pre-defined when configuring the regression model, while the parameters θ are estimated based on a set of training pairs (yn, xn), n = 1, 2, ⋯, N, where yn ∈ ℝ is the value of the response and xn = ∈ ℝd is the continuous vector of d predictor values which serve as the key parameters of the model. When d > 1, the process is called multiple regression and involves more than one predictor (Morrissey and Ruxton 2018).
A special case of the regression analysis which is commonly used in earth sciences is linear regression (Vereecken et al. 2016; Wang et al. 2018). Linear regression constraints the estimated function to be linear:
where X = [1, x1, ⋯, xd]T is the extended vector of predictors and B = [b0, b1, ⋯, bd] is the vector of d + 1 linear coefficients. These coefficients can be interpreted as the amount of change in the output variable that is attributed to a specific input predictor and can be used as a form of nominal range sensitivity (Chatterjee and Hadi 1988). The parameter b0 denotes the bias or the interception of the regression line. Although simple and relatively efficient, the linear approach is not a panacea. There could be nonlinear relationships in the data that can only be modeled through nonlinear regression (Wang et al. 2018). In this work, quadratic regression forms (up to second-order polynomials) are considered in an effort to better model the nonlinearities that might exist in the data. Considering the relatively small sample set, the quadratic choice aims at avoiding overfitting while retaining some form of flexibility on the curve fitting process. The regression function can then be defined as in the following equation:
where B denotes a typically symmetric and positive definite matrix of the model coefficients. Single and multiple regression models are investigated based on the above quadratic form, to obtain optimum predictions of the Fe and Cd elements in four different soil categories.
Robust regression
In a number of cases, the datasets might have a statistical distribution with extended tails since there might be a small number of samples that reside in the extremes of the distribution curve (the outliers) (Shaheen and Iqbal 2018). The extended tails of the distribution are in fact a distortion of the distribution attributed to these unusual observations and might result in an unreliable model. When using the standard cost function of the regression analysis to obtain the regression coefficients (typically mean square or least square), these outliers can largely influence the form of the curve fit since the parameter optimization is based on the minimization of the squared derivatives (residuals) which can become very large. To overcome this problem, the dataset can be pre-processed to automatically identify and exclude possible outliers (Harrell 2001). A better alternative is to use a robust cost function which reduces the impact of the extreme deviations by assigning a reduced weight to large residuals. There are several variations of the robust regression analysis that can be used such as in Wang et al. (2013). A review on a number of different computational approaches for robust linear regression can be found in Holland and Welsch (1977). In this work, we adopt the iteratively reweighted least squares (IRLS) approach (Kargoll et al. 2018). IRLS is an iterative method which is used to find robust maximum likelihood estimations of the model parameters under the assumption of a normally distributed dataset. To the best of our knowledge, robust regression has not been used in the past in a prediction task related to the current study.
Results and discussion
Soil physical and chemical properties
The values of the physical and chemical properties of the soil’s samples collected in three layers (soil depths) are presented in Table 1. The physicochemical parameters that characterize soil samples had statistically significant differences between soil orders and among three different depths.
Soil pH values vary with depth. Thus, in Alfisols, the pH increases with depth by 20%. This is probably due to the leaching of the basic cations and their vertical movement to the lower levels of the soil profile. Τhe use of acid-forming fertilizers for agricultural purposes (Mustapha et al. 2011) could also contribute to the acidic soil reaction.
Mean soil EC values ranged between 1134 and 2651 μS cm−1. High concentrations of dissolved salts, which increase the value of electrical conductivity of soils, occur for two main reasons. Anthropogenic activities create conditions of impaired salinity in soils and especially in those adjacent to urban environments, such as the capitals of the prefectures studied. Agricultural activities and cultivation practices that have been applied in the plain of Thessaly may cause salinization of soils (Golia et al. 2015).
Organic matter values ranged between 1.1 and 5.2% across soil orders. Soil samples have low values of organic matter, which is usually observed in Mediterranean soils with the characteristic climate, relatively high temperature and low humidity (Mitsios et al. 2005; Golia et al. 2007, 2008, 2015, 2019).
High mean clay content was found in Vertisols, significantly (p<0.05) increasing along with soil depth. The possible removal of fine coarse fraction due to the surface run off and soil illuviation may lead to high clay contents in soil profile (Mustapha et al. 2011). Cation exchange capacity (CEC) determines soil’s ability to exchange ions, influencing structure stability as well as soil response to fertilizers application. Soils containing high clay content are typically strongly weathered and highly fertile (Van Groeningen et al. 2020).
Macronutrients and amount of Fe and Al oxides
Table 2 shows the mean concentration of macronutrients as well as Fe and Al oxides in soil samples during the 5 years of the study. Calcium and magnesium, like potassium, are positively charged ions held to the surface of clay and organic matter in soils by electrostatic charge. The exchangeable concentrations of base cations largely govern soil acidity, and Ca and Mg are the two most abundant base cations in Vertisols and Endisols. The exchangeable Ca and Mg pools are usually influenced by origin and nature of soil parent materials and slope position, along with movement of soil solution (Caporale and Violante 2016). High pH values and high CaCO3 content are observed in the soil samples belonging to these soil classes in the prefecture of Larissa and Trikala.
High values of phosphorus and potassium were evaluated in soils with intensive agricultural activities and a long-lasting use of P and K fertilizers. Phosphorus and potassium fertilizers were highly used in Thessaly plain during the past decades. In acidic soil conditions, calcium is the major component of phosphorus tie up. Relatively high P fertilizer rates are usually demanded for crops in alkaline soils (Kopittkea et al. 2019). The availability of P to plants for uptake and utilization is usually defected in alkaline and calcareous soils probably due to the formation of almost insoluble calcium phosphate (Molina-Roco et al. 2018).
The amounts of Fe and Al oxides in soils are of high concern because they have high specific surface area and high reactivity playing a major role in sorption processes mostly in acid soils (Van Groeningen et al. 2020). The highest values of oxides occurred in highly acidic (Alfisols) or almost neutral soils (Inceptions) in the prefecture of Karditsa (Golia et al. 2019). Fe and Mn oxides may be found in high contents as weathering products of often-encountered olivine of ultramafic rocks (Chen et al. 2016).
Potentially harmful metal contents
The soil samples studied during the 5-year survey did not appear to be particularly burdened by PHM (Table 3). In fact, their average concentrations are lower than the values reported in studies in Greek soils (Farmaki and Thomaidis 2008; Antibachi et al. 2012; Golia et al. 2015). The highest mean concentrations of Cd, Pb, Zn, Cu, and Ni were observed in the acidic Alfisols in the prefecture of Karditsa. This suggests probable anthropogenic input, resulting from long-term use of agrochemicals and fertilizers (Golia et al. 2019). Intensive agricultural activity is observed in these soils, and the use of fertilizers is necessary and uninterrupted. Furthermore, the use of manures and animal wastes in agricultural soils is a source of Cu. The continuous use of fertilizers may cause an increase in organic carbon content because of the higher crop productivity resulting at great amounts of residues.
Fertilizers used as carriers of nutrients contribute to heavy metal additions to the surface soil (Kabata-Pendias 2011). High levels of Zn and Cd impurities along with Pb, Ni and Zn exist in trace element fertilizers (Mitsios et al. 2005; Golia et al. 2007). Phosphatic fertilizers, manufactured from the phosphate rocks usually containing various metals as impurities in the ores (Molina-Roco et al. 2018), are among the sources of heavy metal inputs to agricultural soils (Srinivasarao et al. 2014). The presence of lead is strongly affected by the adjacentment to the national road that connects the two largest cities in Greece, Athens and Thessaloniki (Golia et al. 2015).
Vertical distribution of physical and chemical parameters, nutrients, and PHM
In Tables 1, 2, and 3, the vertical distribution of all the properties measured during the 5 years of research is presented. The soil reaction ranged from 4.6 to 8.4 indicating slightly acidic to moderate alkaline reaction. Soil reaction, expressed by pH values, varied significantly between the locations and depths considered. The upper (0–30cm) layer was more acidic than the lower one (Yutong et al. 2016). The removal of basic cations from the surface soils to lower depths may have caused the acidification of surface soil samples (Shaheen and Iqbal 2018).
Organic matter content ranged from low to medium across the locations of sampling. Clay content significantly (<0.05) increased with soil depth, as a result of the removal by surface runoff (Rajmohan et al. 2014). This is a common phenomenon in soil in agro-ecological systems. Significant differences in vertical profiles have been observed. The exchangeable bases, except of Na, did not significantly vary with location and depth. Na and Mg probably associated with the seawater intrusion into aquifers. The average concentration of Na increases with depth. In Endisols, the sand content is higher than in other soil classes, and the leaching of soluble salts is high and concludes to up to 30% increase of Na concentration in the 3rd layer (60–90cm).
Quadratic regression configuration
Both simple and multiple regression analysis were considered in the present research to extract a relationship between the dependent variable (concentration of Fe and Cd) and one or more independent variables (predictors). Regression analysis is performed according to a hypothesis of a specific model which best models the relationship between the variables. In this work, an emphasis was given in the use of quadratic kernels under the assumption of curvilinear relationships of the variables (Wu et al. 2020). The quadratic kernels in their most generic form include an interception term b0, the linear and squared terms for each of the independent variables xi, ∀ i = 1, ⋯, K, as well as the products of all predictor pairs as in the following regression equation:
In this generic model, the parameters to be estimated are the model coefficients b0, ai, bi, and cij. Estimation is performed using a set of training data and the least square criteria to minimize the prediction error. Based on a stepwise regression approach (Harrell 2001) which uses the significance (p-value) of the null hypothesis that a coefficient has no effect, it is possible to simplify the model by rejecting some of the predictor terms when their contribution in the efficiency of the regression fit is negligible. In our experiments, the p-value threshold was set to 0.05 meaning that a term is not considered significant when the significance level exceeds 5%. In addition, in an effort to smooth the inaccuracies introduced in the distribution of the dataset due to the outliers, robust quadratic regression has been applied (Wang et al. 2013), based on the iteratively reweighted least squares (IRLS) (Kargoll et al. 2018).
Evaluation metrics
The prediction performance of the regression equation is evaluated through the R2 (R-squared) coefficient of determination. The R2 is a statistical measure which denotes how close are the data to the fitted regression line. Mathematically it is defined as follows:
where VARRES denotes the variance of the residual and VARTOT the total variance of the samples, yi is the actual values of the response, \( {\hat{y}}_i \) is the corresponding model predictions, and \( \overline{y} \) denotes the sample mean. In other words, the coefficient of determination is defined as the amount of the response variation that is explained by the model, normalized over the total variance. Based on the above definition, R2 can be seen as single-factor sensitivity index since it is related to the first-order sensitivity index which is defined as the partial variance of a parameter relative to the total variance (Ten Broeke et al. 2016). It is valued between 0 and 1, where 1 means that 100% of the response variability is explained by the model and this is considered in most cases as an indication of a good fit.
As an additional measure of efficiency, the root mean square error (RMSE) was also calculated on the basis of a 5-fold cross-validation process of the dataset. RMSE is defined as:
where N denotes the number of samples considered. However, RMSE is primarily used as a metric to compare the efficiency of different experimental setups on the same dataset and model rather than a data-independent indication of the regression fit efficiency.
Quadratic regression models per soil order
The environment and programming language of MATLAB 2015a was used to perform the data processing, the regression analysis, the evaluation, and the plots of the related diagrams.
Figures 2, 3, 4 and 5 show 2-D and 3-D plots of the regression curves that relate the Fe and Cd concentrations with one or two soil parameters (predictors) in each soil order. The datasets used to estimate the regression line or hyperplane were based on the mean concentrations measured at the three depths of the study from the samples collected in a 5-year period. In the 2-D plots, both the simple (red line) and robust (green line) regression fit are presented along with the corresponding R2 value. It can be seen that in almost all diagrams, the robust regression produced a model with a higher R2 and a smoother curve, indicating a better regression fit. Therefore, the robust approach was also adopted to obtain the quadratic regression hyperplanes based on two independent variables as shown in the 3-D plots. These quadratic regression models with their corresponding R2coefficient values are summarized in Table 4. In the same table, the final multi-variable regression model based on the whole set of independent variables considered is presented. It can be observed that in several cases, some of the quadratic terms are missing. This is due to the stepwise regression process (Arkes 2019) that was applied and the corresponding p-values which denoted a negligible significance for these terms and were consequently rejected.
In Alfisols, soil pH influences Fe and Cd concentrations (MAFF 1988; Soil Survey Staff 1999; Golia et al. 2019). The regression equation for the prediction of the Fe concentration in Alfisols (Fig. 2a) was estimated using multiple quadratic regression analysis on a dataset of 37 × 5 years × 3 depths = 555 samples. Each data sample was described as a tuple including the corresponding values of the soil pH as well as the content of the Fe and Al oxides. The estimated multi-variable prediction equation obtained using robust regression has the following form:
The corresponding R2 = 0.999 indicates an efficient regression fit. The cross-validation RMSE was 0.264, the minimum among the alternative prediction equations examined, such as those including the squared prediction terms. Among the prediction variables considered in the model, the most significant ones, based on the measured p-value, were the Fe and Al oxides although by a small margin. Τhe same can be concluded from the pair-wise variable regression fits of the Fe concentration in Alfisols shown in Table 4. The largest R2 value among all three pairs of prediction variables is 0.998 and is obtained when considering predictions from the two oxides (Fe and Al). This seems to contradict the primary hypothesis on the importance of the pH in the prediction of Fe concentration in Alfisols (Golia et al. 2019). However, since the differences in R2 are not significant given the size of the dataset, the pH value was also included as one of the prediction variables in the final multi-variable model.
The dependency to the Fe and Al oxides found in our experiments can be probably attributed to the adsorption or release of H+ and OH- and the resulting pH-dependent charge. The absorbed organic and inorganic ligands can modify the Fe and Al oxide surfaces and increase the availability of the H+ and OH- (Van Groeningen et al. 2020). Iron oxides possess large specific surface area and highly reactive surfaces since they have a nanoparticulate size usually ranging from 5 to 200 nm (Sparks 2005). Adsorption process in soils is competitive and involves the displacement of one specifically adsorbed anion by other more strongly adsorbed species (Caporale and Violante 2016). Pollutants in the form of ions often bind to oxides, as they form insoluble complexes with AI and Fe, such as phosphoric, boric, fluorinated, sulfuric, silicic, humic, and fulvic acids. It is therefore expected that there is a strong relationship between the concentration of Fe in the soil and its content of iron and aluminum oxides.
The average concentration of cadmium (Cd) in Alfisols was predicted from the variables of soil pH, the concentration of phosphorus P, and the percentage of soil organic matter (OM) (Fig. 2b). The following estimated equation relates all these variables to the concentration of Cd:
The corresponding R2 of this model is 0.936 and the RMSE as low as 0.109. Both objective metrics show a very good predictive model which can be used to obtain Cd concentration predictions with relatively small residual errors. P fertilizers are considered a major anthropogenic source of soil pollution with Cd, and the high correlation of cadmium contained in contaminated soils with the amount of soil P is well known (Alloway 2013; Huang et al. 2016). Also, adsorption/desorption reactions of metals are influenced by soil reaction (pH), cation exchange capacity (CEC), organic matter (OM), as well as pedogenic oxides (Van Groeningen et al. 2020).
The predictive models for the Fe and Cd concentrations in Inceptisols are presented in Fig. 3 a and b, respectively. Fe concentration depends upon the manganese and cobalt contents as well as on the concentration of the iron oxides (Fe oxides). In the 2-D plots, the simple regression fits (in red) are highly curvilinear, and they do not exhibit a good fit on the samples. These findings are also quantified through the R2 metric in all 2-D diagrams which range from as low as 0.46 up to a maximum of 0.78. The use of robust regression (green lines) seems to highly improve the predictive model performance since it compensates for the existence of the outliers (Vereecken et al. 2016). The values of the R2 for the robust fit exceed 0.9 in all three diagrams, indicating a much improved and efficient predictive model. We therefore choose to use robust fitting in the 3-D plots as well which are shown in Fig. 3a and summarized in Table 4. The final robust multiple regression equation obtained on all three prediction variables is shown below:
The estimated R2 of the robust multiple regression fit is 0.95, significantly higher than that of the corresponding simple regression fit which was measured 0.89. The interaction of iron cations along with manganese is the result of many scientific studies (Caporale and Violante 2016; Yutong et al. 2016; Artiningsih et al. 2020). The optimum Fe/Mn ratio often must be taken into account for apparently normal growth and maximum yield of plants and usually has a wide variation depending upon the actual levels of iron and manganese supply. In the present study, the common presence of cobalt also leads to a possible common geochemical origin of the three metals (Golia et al. 2019).
The 2-D predictive models shown in Fig. 3b show an extensive sample distribution exhibiting a rather weak correlation among the average cadmium concentration in Inceptisols and zinc and copper concentrations. The simple and robust quadratic regression lines closely matches and the corresponding R2 is very low, particularly when modeling the relation of Cd concentration with Cu. The equation obtained through the multiple quadratic regression analysis in this case is shown below:
and the corresponding R2 is 0.515.
In Endisols, the predictive model of the response variable of Fe is based on the prediction variables of sand, Na and K. In the 2-D plots (Fig. 4a), the robust regression fit (green line) corresponds to a significantly larger R2 values indicating a more efficient model (Vereecken et al. 2016; Shaheen and Iqbal 2018). Similar are the findings in the 3-D plots since the R2 values presented in Table 4 approach the maximum value of 1 for all prediction variable pairs. The final regression equation for the prediction of the Fe concentration from all three prediction variables has the following linear form:
with a corresponding R2 of 0.985. An increase in the percentage of sand along with sodium and potassium salts has been observed in saline soils of Larissa region (Golia et al. 2008, 2019). Large quantities of salts are observed in saline soils such as Endisols which negatively affect the crop productivity and the soil water extraction capacity of the plants, due to the high levels of ionic toxicity, high osmotic potential, and imbalance of the soil solution (Kabata-Pendias 2011). Furthermore, in alkaline soils, the solubility of iron (Fe) is largely reduced causing symptoms of Fe deficiency chlorosis in plants (Kobayashi and Nishizawa 2012).
The prediction of the mean Cd concentrations (Fig. 4b) is based upon the concentration of Zn and Cu and takes the following form:
with a relatively low R2 = 0.788. The common presence of copper, zinc, and cadmium is often observed in soils where anthropogenic activity is intense. The chemical form and speciation of an element define its fate and availability in soil solution (Sparks 2005; Alloway 2013; Bam et al. 2020).
In Vertisols, the dependency of Fe mean concentration from soil properties is shown in Fig. 5a. The 2-D and 3-D plots depict regression models with a high efficiency in the prediction of Fe content based on the corresponding R2 shown in Table 4. The single and multiple regression analysis performed shows that CEC is the most important prediction variable. In the same soil order, the corresponding 2-D and 3-D regression plots for the Cd are shown in Fig. 5b.1 and b.2, respectively. It can be observed that Cd content predictions can be obtained based on four independent variables (Clay, CEC, Zn, and Pb). Among them, the strongest predictor is found to be Pb, either as a single independent variable or as a pair with CEC, in which cases the coefficient of determination R2 easily exceeds 0.97. When applying robust multiple-variable quadratic regression using all the prediction variables, the estimated equations obtained for the average concentrations of Fe and Cd are the following:
The R2 was measured as high as 0.985 for the Fe and 0.924 for the Cd. Furthermore, many of the Cd predictors, including the quadratic terms of all the independent variables, have been dropped from the stepwise regression approach. It is worth mentioning that Vertisols in the study area have high clay percentage with increasing fertility. Cation exchange capacity in soils depend upon the organic matter content along with clay type and content as well (Caporale and Violante 2016). The small particle size fraction of soils (clays) may have high metal contents since they have large specific surface areas (Soil Survey Staff 1999), as well as high organic matter content (Kabata-Pendias 2011; Yutong et al. 2016). The higher the CEC, the greater the ability to retain heavy metals including Cd (Chen et al. 2016).
Overall, anthropogenic practices and pedogenic processes can result in the accumulation of metals in the soil surface layer particularly. Wuana and Okieimen (2011) presented the main sources of metals in surface soils expressed in a simple mass balance model. This model indicates that the total metal content in soils is the sum of the following six sources of metals: the mass released from the parent material, the atmospheric deposition, the amount of metals derived from the use of agrochemicals, fertilizers, the slurry to improve soil properties, as well as the inorganic pollutants. The amount of metals that are removed by the normal metabolic activity of plants and the amount of those that are removed by filtration and evaporation must be subtracted from the sum (Wuana and Okieimen 2011).
The above experimental outcome shows that soil physicochemical parameters along with the concentrations of coexisting and co-occurring metals seem to influence the possible pollution rate caused by PHM over time in the study area, and the established regression equations introduced through this research serve as an efficient prediction mechanism of the accumulated quantities.
Conclusions
Several factors have been assumed to influence the pseudo-total concentrations of metals in soils of different soil orders. These include soil reaction (pH values) and electrical conductivity, along with soil components such as organic matter, clay and sand content, exchangeable cations, and metals oxides. A robust variant of quadratic regression analysis based on the iteratively reweighted least squares robust regression approach was applied on the data collected in a 5-year-long field research in the Thessalian plain. The robust approach compensates the effect of the outliers and improves the performance of the prediction model in terms of the R2 evaluation metric considered. Most single- and multi-variable prediction formulae of the Fe and Cd concentrations introduced verify that several of the above-mentioned soil parameters along with coexisting metals can consistently serve as predictive factors of soil metals contamination. Finally, the soil taxonomy-influenced physicochemical characteristics were found to play a major role, since they affect not only the concentration of PHMs but also the independent variables that could predict it. Although the outcome of this research refers to the region of Thessaly in Greece, the proposed methodology could be easily extended to other regions and soil orders. Based on new data collections in the future, the proposed prediction equations could be verified, further optimized, and proved to be accurate enough to serve as a useful tool for PHM’s contamination monitoring.
Data availability
Data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Alexakis DE, Kiskira K, Gamvroula D, Emmanouil C, Psomopoulos CS (2021) Evaluating toxic element contamination sources in groundwater bodies of two Mediterranean sites. Environ Sci Pollut Res Int. https://doi.org/10.1007/s11356-021-12957-z
Alloway BJ (2013) Heavy Metals in Soils. In: Alloway BJ (ed) Trace metals and metalloids in soils and their bioavailability, 3nd edn. Blackie Academic and Professional, London
Antibachi D, Kelepertzis E, Kelepertsis A (2012) Heavy metals in agricultural soils of the Mouriki-Thiva Area (Central Greece) and environmental impact implications. Soil Sediment contam: Inter J 21:4
Arkes J (2019) Regression analysis: a practical introduction, 1st edn. Routledge, London. https://doi.org/10.4324/9781351011099
Artiningsih A, Zubair H, Imran AM, Widodo S (2020) Distribution of manganese heavy metal (Mn) in soil around of Antang Landfill, Makassar City, Indonesia. IOP Conf Ser: Earth and Environ Sci 473:012088
Bam EKP, Akumah AM, Bansah S (2020) Geochemical and chemometric analysis of soils from a data scarce river catchment in West Africa. Environ Res Commun 2:035001
Bartkowiak A, Lemanowicz J, Lamparski R (2020) Assessment of selected heavy metals and enzyme activity in soils within the zone of influence of various tree species. Sci Rep 10:14077
Caporale AG, Violante A (2016) Chemical processes affecting the mobility of heavy metals and metalloids in soil environments. Curr Pollution Rep 2:15–27
Chatterjee S, Hadi AS (1988) Sensitivity analysis in linear regression. Wiley, New York, NY
Chen Y-M, Gao J, Yuana YO, Ma Y, Yu S (2016) Relationship between heavy metal contents and clay mineral properties in surface sediments: implications for metal pollution assessment. Cont Shelf Res 124:125–133
Diakoloukas VD, Digalakis VV (1999) Maximum-likelihood stochastic-transformation adaptation of hidden markov models. IEEE Speech Audio Process 7:2–187. https://doi.org/10.1109/89.748122
Fang A, Dong J, Zhang R (2019) Simulation of heavy metals migration in soil-wheat system of mining area. Int J Environ Res Public Health 16(14):2550
Farmaki EG, Thomaidis NS (2008) Current status of the metal pollution of the environment of Greece-A Review. Global NEST J 10(3):366–375
Golia EE, Tsiropoulos NG, Dimirkou A, Μitsios ΙΚ (2007) Distribution of heavy metals of agricultural soils of central Greece using the modified BCR sequential extraction method. Int J Environ Anal Chem 87:1053–1063
Golia EE, Dimirkou A, Mitsios IK (2008) Influence of some soil parameters on heavy metals accumulation by vegetables grown in agricultural soils of different soil orders. Bull Environ Contam Toxicol 81:80–84
Golia EE, Dimirkou A, Floras SA (2015) Spatial monitoring of arsenic and heavy metals in the Almyros area, Central Greece: statistical approach for assessing the sources of contamination. J Environ Monit Assess 187:399–412
Golia EE, Tsiropoulos GN, Füleky G, Floras S, Vleioras S (2019) Pollution assessment of potentially toxic elements in soils of different taxonomy orders in central Greece. Environ Monit Assess 191:106
Harrell FE (2001) Multivariable modeling strategies. In: Regression Modeling Strategies. Springer Series in Statistics. Springer, New York, NY
Holland PW, Welsch RE (1977) Robust regression using iteratively reweighted least-squares. Commun Stat Theory Methods 6(9):813–827. https://doi.org/10.1080/03610927708827533
Huang G, Su X, Rizwan MS, Zhu Y, Hu H (2016) Chemical immobilization of Pb, Cu, and Cd by phosphate materials and calcium carbonate in contaminated soils. Environ Sci Pollut Res 23:16845–16856
IAEA (2004) Soil sampling for environmental contaminants. International Atomic Energy Agency, Austria, p 81
ISO/DIS 11466 (1994) In Environment Soil Quality. ISO standards compendium, Switzerland
Jabeen F, Aslam A, Salman M (2020) Heavy metal contamination in vegetables and soil irrigated with sewage water and associated health risks assessment. J Environ Agric Sci 22(1):23–31
JAOAC (1984) Official method of analysis. In: Williams S (ed) , 14th edn. Association of official chemists, Inc, Virginia
Kabata-Pendias A (2011) Trace elements in soils and plants. 4th Ed., CRC Press, Taylor and Francis Group, Ann Arbor, MI.
Kanwar VS, Sharma A, Srivastav AL (2020) Phytoremediation of toxic metals present in soil and water environment: a critical review. Environ Sci Pollut Res 27:44835–44860
Kargoll B, Omidalizarandi M, Loth I, Paffenholz J-A, Alkhatib H (2018) An iteratively reweighted least-squares approach to adaptive robust adjustment of parameters in linear regression models with autoregressive and t-distributed deviations. J Geod 92(3):271–297
Kobayashi T, Nishizawa NK (2012) Iron uptake, translocation, and regulation in higher plants. Annu Rev Plant Biol 63:131–152
Kopittkea PM, Menziesa NW, Wang P, McKenna BA, Lombic E (2019) Soil and the intensification of agriculture for global food security. Environ Int 132:105078
Li P, Karunanidhi D, Subramani T (2021) Sources and consequences of groundwater contamination. Arch Environ Contam Toxicol 80:1–10
Lima AR, Cannon AJ, Hsieh WW (2013) Nonlinear regression in environmental sciences by support vector machines combined with evolutionary strategy. Comput Geosci 50:136–144
Liu J, Li Q, Gu W, Wang C (2019) The impact of consumption patterns on the generation of municipal solid waste in China: evidences from provincial data Int. J Environ Res Public Health 16:1717
Loizia P, Neofytou N, Zorpas AA (2019) The concept of circular economy strategy in food waste management for the optimization of energy production through anaerobic digestion. Environ Sci Pollut Res 26:14766–14773
MAFF (1988) Fertilizer recommendations. Reference Book 209. HMSO, London
McLean EO (1982) Soil pH and lime requirements. In: Page AL, Miller HR, Keeney RD (eds) Methods of soil analysis Part ΙΙ -Chemical and microbiological properties. American Society of Agronomy, Inc. Soil Sci of America, Inc., Madison, Wisconsin
Mitsios IK, Golia EE, Tsadilas CD (2005) Heavy metal concentration in soils and irrigation water in Thessaly area, Central Greece. Commun Soil Sci Plant Anal 36:487–501
Molina-Roco M, Escudey M, Antile´n M, Arancibia-Miranda N, Manquia´n-Cerda K (2018) Distribution of contaminant trace metals inadvertently provided by phosphorus fertilizers: movement, chemical fractions and mass balances in contrasting acidic soils. Environ Geochem Health 40:2491–2509
Morrissey MB, Ruxton GD (2018) Multiple regression is not multiple regressions: the meaning of multiple regression and the non-problem of collinearity. PTPBio 10. https://doi.org/10.3998/ptpbio.16039257.0010.003
Mustapha S, Voncir N, Usar S, Abdulhamid NA (2011) Status and distribution of some available micronutrients in the Haplic usterts of Akko Local Government Area, Gombe State, Nigeria. Int J Soil Sci 6(4):267–274
Navarro-Pedreño J, Almendro-Candel MB, Zorpas AA (2021) The increase of soil organic matter reduces global warming, Myth or Reality? Sci. 3(1):18. https://doi.org/10.3390/sci3010018
Nelson DW, Sommers LE (1982) Total carbon, Organic, Carbon, and Organic Matter. In: Page AL, Miller HR, Keeney RD (eds) Methods of Soil Analysis Part ΙΙ - Chemical and Microbiological Properties. American Society of Agronomy, Inc. Soil Science of America, Inc., Madison, Wisconsin
Ogbeh GO, Tsokar TO, Salifu E (2019) Optimization of nutrients requirements for bioremediation of spent-engine oil contaminated soils. Environ Eng Res 24(3):484–494
Page AL, Miller HR, Keeney RD (1982) Methods of soil analysis, Part ΙΙ—chemical and microbiological properties. Soil Science of America, Madison, Wisconsin
Rajmohan N, Prathapar SA, Jayaprakash M, Nagarajan R (2014) Vertical distribution of heavy metals in soil profile in a seasonally waterlogging agriculture field in Eastern Ganges Basin. J Environ Monitor Assess 186:5411–5427
Sall ML, Diaw AKD, Gningue-Sall D (2020) Toxic heavy metals: impact on the environment and human health, and treatment with conducting organic polymers, a review. Environ Sci Pollut Res 27:29927–29942
Shaheen A, Iqbal J (2018) Spatial distribution and mobility assessment of carcinogenic heavy metals in soil profiles using geostatistics and random forest, Boruta Algorithm. Sustainability 10:799
Soil Survey Staff (1999) Soil taxonomy, a basic system of soil classification for making and interpreting soil surveys, 2nd edn. USDA, Was DC
Sparks DL (2005) Toxic metals in the environment: the role of surfaces. Elements 1(4):193–197
Srigiri S, Madasu H, Vysetti B, Parth V (2014) Forecasting the distribution of heavy metals is soil and groundwater near municipal solid waste dumpsites using linear regression. Curr Sci 107(1):78–88
Srinivasarao C, Rama Gayatri C, Venkateswarlu B, Jakkula VS, Wani SP, Kundu S, Sahrawat KL, Rajasekhara Rao BK, Marimuthu S, Gopala Krishna G (2014) Heavy metals concentration in soils under rainfed agro-ecosystems and their relationship with soil properties and management practices. Int J Environ Sci Technol 11:1959–1972
Steffan JJ, Brevic EC, Burgess LC, Cerda A (2018) The effect of soil on human health: an overview. Eur J Soil Sci 69:159–171
Ten Broeke G, Van Voorn G, Ligtenberg A (2016) Which Sensitivity Analysis Method Should I Use for My Agent-Based Model? J Artif Soc Soc Simul 19(1):5
Tokalıoğlu S, Kartal Ş, Güneş AA (2001) Determination of heavy metals in soil extracts and plant tissues at around of a zinc smelter. Int J Environ Anal Chem 80(3):201–217
Van Groeningen N, Thomas Arrigo LK, Byrne JM, Kappler A, Christl I, Kretzschmar R (2020) Interactions of ferrous iron with clay mineral surfaces during sorption and subsequent oxidation. Environ Sci Process Impacts 22:1355
Vereecken Η, Schnepf Α, Hopmans JW, Young IM (2016) Modeling soil processes: review, key challenges, and new perspectives. Soil Sci Soc Am 15(5):1–57
Wang Y, Zhang Y, Zhang F, Yi J (2013) Robust quadratic regression and its application to energy-growth consumption problem. Math Probl Eng 12:334
Wang Χ, Li Χ, Ma R, Li Y, Wang W, Huang H, Xu C, An Y (2018) Quadratic discriminant analysis model for assessing the risk of cadmium pollution for paddy fields in a county in China. Environ Pollut 236:366–372
Wang S, Zhang C, Zhan Y, Zhang Y, Lou T, Xue F (2020) Evaluation of ecological risk of heavy metals in watershed soils in the Daxia River Basin. AIP Adv 10:055109
Wilcke W, Kretzschmar S, Bundt M, Saborío G, Zech W (2000) Depth distribution of aluminum and heavy metals in soils of Costa Rican coffee cultivation areas. J Plant Nutr Soil Sci 123:499–502
Wu Z, Chen Y, Han Y, Ke T, Liu Y (2020) Identifying the influencing factors controlling the spatial variation of heavy metals in suburban soil using spatial regression models. Sci Total Environ 717:137212
Wuana RA and Okieimen FE (2011) Heavy metals in contaminated soils: a review of sources, chemistry, risks and best available strategies for remediation. Intl Sch Res Notices, ID 402647. https://doi.org/10.5402/2011/402647, 2011, 1, 20
Xu Y, Shi H, Fei Y, Wang C, Mo L, Shu M (2021) Identification of soil heavy metal sources in a large-scale area affected by industry. Sustainability 13:511
Yutong Z, Qing X, Shenggao L (2016) Chemical fraction, leachability, and bioaccessibility of heavy metals in contaminated soils, Northeast China. Environ Sci Pollut Res 23:24107–24114
Zorpas ΑΑ (2020) Strategy development in the framework of waste management. Sci Total Environ 716:137088
Author information
Authors and Affiliations
Contributions
Evangelia E. Golia: study design, analytical protocol design, writing, editing, and reviewing
Vassilios Diakoloukas: method design, data processing, writing, editing, and reviewing
Corresponding author
Ethics declarations
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Consent to participate
Not applicable.
Consent for publication
All the authors approved the final manuscript and agreed to its submission to the Environmental Science and Pollution Research.
Competing interests
The authors declare no competing interests.
Additional information
Responsible Editor: Kitae Baek
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Golia, E.E., Diakoloukas, V. Soil parameters affecting the levels of potentially harmful metals in Thessaly area, Greece: a robust quadratic regression approach of soil pollution prediction. Environ Sci Pollut Res 29, 29544–29561 (2022). https://doi.org/10.1007/s11356-021-14673-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11356-021-14673-0