Screening Risk Assessment of Agricultural Areas under a High Level of Anthropopressure Based on Chemical Indexes and VIS-NIR Spectroscopy

Intensive anthropogenic activity may result in uncontrolled release of various pollutants that ultimately accumulate in soils and may adversely affect ecosystems and human health. Hazard screening, prioritisation and subsequent risk assessment are usually performed on a chemical-by-chemical basis and need expensive and time-consuming methods. Therefore, there is a need to look for fast and reliable methods of risk assessment and contamination prediction in soils. One promising technique in this regard is visible and near infrared (VIS-NIR) spectroscopy. The aim of the study was to evaluate potential environmental risk in soils subjected to high level of anthropopressure using VIS-NIR spectroscopy and to calculate several risk indexes for both individual polycyclic aromatic hydrocarbons (PAHs) and their mixture. Results showed that regarding 16PAH concentration, 78% of soil samples were contaminated. Risk assessment using the most conservative approach based on hazard quotients (HQ) for 10 individual PAHs allowed to conclude that 62% of the study area needs further action. Application of concentration addition or response addition models for 16PAHs mixture gave a more realistic assessment and indicates unacceptable risk in 23% and 55% of soils according to toxic units (TUm) and toxic pressure (TPm) approach. Toxic equivalency quotients (TEQ) were below the safe limit for human health protection in 88% of samples from study region. We present here the first attempt at predicting risk indexes using VIS-NIR spectroscopy. The best results were obtained with binary models. The accuracy of binary model can be ordered as follows: TPm (71.6%) < HI (85.1%) < TUm (87.9%) and TEQ (94.6%). Both chemical indexes and VIS-NIR can be successfully applied for first-tier risk assessment.


Introduction
Anthropogenic activities such as industrialization, urbanization, mining and agriculture intensification can generate serious soil contamination problems worldwide. Soil pollution has been identified as the third most important threat to soil functions in Europe [1] and raises serious concerns about the risks to human and ecosystem health. The presence of both inorganic and organic pollutants may strongly affect different soil functions. One group of contaminants of special concern are polycyclic aromatic hydrocarbons (PAHs), semi-volatile, stable, hydrophobic compounds that are ubiquitous in the environment and are good markers of anthropogenic pressure [2][3][4]. Anthropogenic activities including biomass combustion, domestic and industrial coal combustion, coking industry, and emissions from road vehicles are major sources of PAH contamination; all these sources represent continuous inputs of PAH contaminants [5][6][7][8]. After deposition on soil surfaces, PAHs may further Sand-fraction of 2.0-0.05 mm; silt-fraction of 0.05-0.002 mm; clay-fraction < 0.002 mm; C org -total organic carbon content; TN-total nitrogen content; Min-minimum value; Max-maximum value; LQ-lower quartile; UQ-upper quartile; CoV-coefficient of variation (%); ΣPAH Carcin -concentration of carcinogenic PAHs (BaA, Ch, BbF, BkF, BaPyr, IndPyr, DahA); ΣPAH 4em -concentration of PAHs (BbF, BkF, BaPyr, IndPyr) derived from the emission.
The content of TN and C org was in the range of 0.2-11.7 g kg −1 (CoV 112%) and 5.2-187.2 g kg −1 (CoV 142%), respectively. However, in 75% of soils, TN did not exceed 1.4 g kg −1 , and C org was below 17.6 g kg −1 , which corresponds to the lower range according to the European Union criteria [33]. The low content of organic matter and clay promotes the increase in contaminant availability, which may result in disturbance of soil functions and create risk to soil organisms [11].
The high molecular weight hydrocarbons (HMW; ≥4 rings) dominated the PAHs profile in the region (Table 1) and ranged from 180 to 270 × 10 3 µg kg −1 . 54% of the HMW pool were compounds considered to be carcinogenic (ΣPAH Carcin ), and 30% four hydrocarbons (BbF, BkF, BaPyr, IndPyr), that are monitored within the European Monitoring and Evaluation Programme of Air Emissions [47]. The annual emission of ΣPAH 4em in the study area is estimated at 1.75 kg per km 2 , and the dust emission (another anthropogenic index) at 6827 kg per km 2 [4]. The predominance of HMW is typical in areas of high anthropogenic pressure [2,4,6,7,42] and may be due to the higher persistence of these compounds in soils, a dominance of combustion over petrogenic sources, and the tendency of HMW PAHs to accumulate in soils that are close to emission sources [2,4,5,10,36,44].
The PAHs isomer ratio of individual compounds is a common tool to identify sources of these hydrocarbons [4,7,10,37]. Three of the most stable molecular ratios were used in this study ( Table 1). The Fln/(Fln + Pyr) ratio ranged between 0.41 to 0.67 and pointed to coal and biomass combustion in almost 99% of samples. Similarly, the BaA/(BaA + Ch) ratios above 0.35 indicated combustion of vegetation and fossil fuel as a PAH source. This was also confirmed by the IndPyr/(IndPyr + BPer) ratio > 0.5, however, in 36% of samples this ratio ranged from 0.2 to 0.5, which is more typical for liquid fossil fuel combustion (vehicle and crude oil) [4,37].

Evaluation of Potential Ecological Risk-Chemical Indexes
Potential ecological risk was firstly assessed using hazard quotient values calculated for 10 individual PAHs included in Polish soil regulation [48]. According to Polish guidelines, the acceptable maximum permissible concentration (MPC) for Anth, Ch, IndPyr and BPer is 200 µg kg −1 , while for Napht, BaA, BbF, BkF, BaPyr and DahA, the MPC is 100 µg kg −1 . HQ values of individual PAH compound > 1 indicate a possible ecological risk [11,17,49]. Average HQ value of a single chemical (expressed as median) varied significantly depending on PAH compound from 0.06 for Anth to 1.32 for BbF (Table 2), what allows us to conclude that according to individual PAH, 7 to 62% of the study area might be under ecological risk (HQ above 1), and needs further actions (e.g., site-specific evaluation or remediation).
To reflect the potential risk caused by PAHs mixture the hazard index (HI) was additionally calculated, and to assess the degree of potential risk the approach of Moreno-Jiménez et al. [17] was used. As a threshold value for HI index above which the possible risk occurs-the value of 10 was assigned [11,14]. Obtained HI values ranged between 1.56 and 1523 (Table 2) but exceeded the threshold for HI only in 23% of soil samples from the study region. The moderate risk (HI = 10-100) and high risk (HI > 100) were noted in 13 and 4 soil samples, respectively.
Many authors have emphasized that the HQ-based methodology, although frequently used for screening risk assessment [17,49], as a very conservative approach, which may lead to risk overestimation [3,9]. Furthermore, in our study HQ/HI indexes were calculated only for 10PAH compounds to thus better predict ecological risk in the study region, other approaches were applied and toxic units and toxic pressure were also calculated. The model of toxic units is frequently used in ecotoxicology and represents the ratio between the concentration of a component in the mixture and its toxicological endpoint, e.g., effect concentration (EC 50 ) or non-observed effect concentration (NOEC) [3,[49][50][51]. Toxic units of the mixture (TUm) is calculated based on the concentration addition model (CA), which is based on the assumption that the mixture components possess a similar toxicological mode or mechanism of action, and in the case of PAHs it is caused by narcosis [50,51]. In our study, the measured soil PAH concentrations were compared to PNEC values for 16PAH compounds (Table 3). PNECs were adapted from the EU risk assessment report for coal-tar pitch [3,52], which were derived by applying an assessment factor to the lowest available NOEC values for the most sensitive terrestrial species, e.g., Folsomia candida, Folsomia fimetaria, Lactuca sativa or nitrifying bacteria (depending on the PAH compound). Although such deterministic approach based on only one species is rather conservative, it is a good protective measure and it is suggested for risk assessment of chemical mixtures in a first-tier of ERA process [51,52]. The TU values calculated for each individual PAH and TUm for 16PAHs can be found in Table 3.
Toxic units higher than 1 indicate that unacceptable effects on soil organisms are likely to occur. The TU for individual hydrocarbons varied significantly (CoV from 325 to 465%) and on average was 0.12 for Flu up to 20.5 for BaPyr. Generally, TU above 1 was noted mainly for higher molecular PAH compounds, an unacceptable effect on soil organisms may occur in 22-23% of soil samples contaminated with IndPyr, BPer and BbF, and in 43 and 66% of soil samples due to contamination by BaA and BaPyr, respectively. For other PAHs, the risk predicted with TU was acceptable (TU < 1) in 88-96% of soil samples ( Table 3). The toxic units calculated for the 16PAHs mixture (TUm) ranged from 1.2 to 1858 with a median value of 5.7, which indicated that the exposure to the 16PAHs mixture may reveal the risk to soil organisms in 100% of soil samples (Table 3). For the rating of the TUm values and to differentiate the magnitude of risk to soil organisms, the classification system elaborated by Persoone et al. [53] was adapted, according to which soil samples with TUm < 1 reveal no toxicity, when TUm is between 1-10, samples are considered as lowly toxic, while samples with TUm above 10 and 100 could be classified as highly toxic and very highly toxic, respectively. It was found that TUm values for the majority (77%) of soil samples ranged between 1.26 to 9.56 and could be considered as lowly toxic; the highest TUms above 100 were only in four sampling points (5.4% of the study area) located in the surroundings of the main contamination sources-coking plant and coal mining landfill [11]. The third approach used to assess potential ecological risk to the environment was based on toxic pressure (TP) calculation for both individual compounds, as well as for the 16PAHs mixture ( Figure 1, Table 4). TUm-toxic units calculated for the mixture of 16PAHs; Min-minimum value; Max-maximum value; LQ-lower quartile; UQ-upper quartile; CoV-coefficient of variation (%); PNEC-predicted no-effect concentrations [52].
Toxic units higher than 1 indicate that unacceptable effects on soil organisms are likely to occur. The TU for individual hydrocarbons varied significantly (CoV from 325 to 465%) and on average was 0.12 for Flu up to 20.5 for BaPyr. Generally, TU above 1 was noted mainly for higher molecular PAH compounds, an unacceptable effect on soil organisms may occur in 22-23% of soil samples contaminated with IndPyr, BPer and BbF, and in 43 and 66% of soil samples due to contamination by BaA and BaPyr, respectively. For other PAHs, the risk predicted with TU was acceptable (TU < 1) in 88-96% of soil samples ( Table 3). The toxic units calculated for the 16PAHs mixture (TUm) ranged from 1.2 to 1858 with a median value of 5.7, which indicated that the exposure to the 16PAHs mixture may reveal the risk to soil organisms in 100% of soil samples (Table 3). For the rating of the TUm values and to differentiate the magnitude of risk to soil organisms, the classification system elaborated by Persoone et al. [53] was adapted, according to which soil samples with TUm < 1 reveal no toxicity, when TUm is between 1-10, samples are considered as lowly toxic, while samples with TUm above 10 and 100 could be classified as highly toxic and very highly toxic, respectively. It was found that TUm values for the majority (77%) of soil samples ranged between 1.26 to 9.56 and could be considered as lowly toxic; the highest TUms above 100 were only in four sampling points (5.4% of the study area) located in the surroundings of the main contamination sources-coking plant and coal mining landfill [11].
The third approach used to assess potential ecological risk to the environment was based on toxic pressure (TP) calculation for both individual compounds, as well as for the 16 PAHs mixture ( Figure 1, Table 4).   The response addition model suggested by Jensen et al. [13] was applied for TP of mixture determination. This model assumes dissimilar modes of action for all compounds [13], although, the same mode of action is normally considered for chemicals like PAHs [3]. TP for single chemicals were calculated using the soil screening values as the effect level. Since according to Polish soil guidelines, screening values are available only for 10PAHs, to calculate toxic pressure for the 16PAHs, the maximum permissible concentrations (MPC, Table 4) from the Dutch regulations were adopted [3,54]. MPC values reflect PAH concentrations that should protect all species in the ecosystem from adverse effects. In our study, TPs obtained for individual hydrocarbons were rather low, and ranged on average from 0.03 for Flu to 0.26 for BaPyr ( Figure 1). To assess the magnitude of risk that pollutants may pose to the soil organisms, TP classification according to Jensen et al. [13] and Dagnino et al. [15] were applied. Considering the TP of single chemicals, the majority of samples (66-96%) have TP values below the lowest threshold 0.25 (Table 4), which indicates no risk in this area.
The highest TPs were noted for higher molecular weight (>4 ring) PAH, and can be ordered as follows: BaPyr > BaA > BbF > IndPyr > BPer > Pyr > DahA = Ch = BkF. Completely different results were obtained when TPs were calculated for the 16PAH mixture, TPm ranged from 0.0 to 1.0 with an average of 0.53 ( Figure 1). The lowest TPm < 0.25 (no risk) was found in only 34% of soil samples ( Table 4, Figure 2), in 11% of samples TPm ranged 0.34-0.50 (low risk), in 20% of samples between 0.51 and 0.74 (moderate risk), and as much as 35% of soil samples had TPm value above 0.75 (high risk). Extremely high TPm values (above 0.9) were observed in 16 soil samples from the study region. However, it should be remembered that TP index in our study reflects the total PAHs concentration and is based on the conservative assumption that 100% of measured contaminant concentration is bioavailable [9,11,13]. On the other hand, it is considered that such conservative approach (highly protective for soil organisms) should be used in a generic first tier risk assessment [3,51]. Table 4. Percentage share of soil samples in individual classes of toxic pressure (TP); TP classification according to Dagnino et al. [15]; MPC-maximum permissible concentration in µg kg −1 according to Verbruggen [54]. The response addition model suggested by Jensen et al. [13] was applied for TP of mixture determination. This model assumes dissimilar modes of action for all compounds [13], although, the same mode of action is normally considered for chemicals like PAHs [3]. TP for single chemicals were calculated using the soil screening values as the effect level. Since according to Polish soil guidelines, screening values are available only for 10 PAHs, to calculate toxic pressure for the 16 PAHs, the maximum permissible concentrations (MPC, Table 4) from the Dutch regulations were adopted [3,54]. MPC values reflect PAH concentrations that should protect all species in the ecosystem from adverse effects. In our study, TPs obtained for individual hydrocarbons were rather low, and ranged on average from 0.03 for Flu to 0.26 for BaPyr ( Figure 1). To assess the magnitude of risk that pollutants may pose to the soil organisms, TP classification according to Jensen et al. [13] and Dagnino et al. [15] were applied. Considering the TP of single chemicals, the majority of samples (66-96%) have TP values below the lowest threshold 0.25 (Table 4), which indicates no risk in this area. The highest TPs were noted for higher molecular weight (>4 ring) PAH, and can be ordered as follows: BaPyr > BaA > BbF > IndPyr > BPer > Pyr > DahA = Ch = BkF. Completely different results were obtained when TPs were calculated for the 16PAH mixture, TPm ranged from 0.0 to 1.0 with an average of 0.53 ( Figure 1). The lowest TPm < 0.25 (no risk) was found in only 34% of soil samples ( Table 4, Figure 2), in 11% of samples TPm ranged 0.34-0.50 (low risk), in 20% of samples between 0.51 and 0.74 (moderate risk), and as much as 35% of soil samples had TPm value above 0.75 (high risk). Extremely high TPm values (above 0.9) were observed in 16 soil samples from the study region. However, it should be remembered that TP index in our study reflects the total PAHs concentration and is based on the conservative assumption that 100% of measured contaminant concentration is bioavailable [9,11,13]. On the other hand, it is considered that such conservative approach (highly protective for soil organisms) should be used in a generic first tier risk assessment [3,51].

Evaluation of Potential Human Health Risk-Chemical Indexes
Toxic equivalency factors provided by Nisbet and LaGoy [55] were used in our study to estimate the exposure risks posed by individual compounds and mixture of PAHs to human health. The toxic potency of each PAH (TEQ) was assessed on the basis of its benzo(a)pyrene equivalent concentration [3,7,37,55,56] and for 16PAHs ranged from 21 to 52 × 10 3 µg kg −1 with a median value of 151 µg kg −1 (Table 5). Our results were comparable to toxic PAH concentrations reported for industrial and residential regions in Poland [8,44], Spain [35] and Italy [37], as well as for rural regions under high anthropogenic pressure in China [7,42]. Results highlighted a significant variation of TEQs values (CoV up to 465%) through the study region ( Figure 3). For the majority of samples TEQs ranged from 21 to 584 µg kg −1 and were below the safe value of 600 µg kg −1 established as a limit for protection of human health (regardless of the land use) in Canadian soil quality guidelines [3,57], which indicated a generally low risk in about 88% of samples from Czerwionka region. To better evaluate our results, we compared TEQ values with other guidelines values available for BaPyr [3,57,58]. All TEQ in our study exceeded the most conservative US EPA soil screening level of BaPyr for residential areas (15 µg kg −1 [3]). About 38% had the TEQ below 100 µg kg −1 , and 31% of samples < 200 µg kg −1 , which corresponded to values for residential areas in Italy and Spain [3,58], respectively. The highest toxicity value of PAHs (TEQ above Danish cut-off criteria 1000 µg kg −1 [3]) in soils was concentrated in the central and north part of the Czerwionka region (Figure 3) in the immediate vicinity of the main contamination sources (coke plant, asphalt production plant, post-mining landfill and coal recovery plant).
µg kg −1 [3]).About 38% had the TEQ below 100 µg kg −1 , and 31% of samples < 200 µg kg −1 , which corresponded to values for residential areas in Italy and Spain [3,58], respectively. The highest toxicity value of PAHs (TEQ above Danish cut-off criteria 1000 µg kg −1 [3]) in soils was concentrated in the central and north part of the Czerwionka region (Figure 3) in the immediate vicinity of the main contamination sources (coke plant, asphalt production plant, post-mining landfill and coal recovery plant). It is worth emphasizing that only in one sampling point TEQ exceeded 2000 µg kg −1 corresponding to the Finnish guideline value [58] which considers a target total lifetime cancer risk (TLCR) of 10 −5 , and only in three samples TEQ was above 10,000 µg kg −1 (Italian screening value for commercial and residential areas, [3]).

Application of VIS-NIR for Potential Ecological and Human Risk Prediction
Until now, VIS-NIR spectroscopy was used for total and bioavailable PAHs, alkanes and heavy metals determination [27], alkanes and PAHs in oil-contaminated soil [28] or the bioremediation of soils polluted with fuel oil [31]. In our study, this technique was used for diagnostic risk screening of long-term PAH contaminated soils, and to predict their toxicity based on various risk indexes.
The obtained spectra ( Figure 4) are typical for mineral soils with a high sand content (loamy sand) [59] but are relatively flat with no clear features. This is the result of sample drying and sieving but also of the relatively low Corg content [20,32] of most samples (upper quartile for Corg 17.6 g kg −1 ). It is worth emphasizing that only in one sampling point TEQ exceeded 2000 µg kg −1 corresponding to the Finnish guideline value [58] which considers a target total lifetime cancer risk (TLCR) of 10 −5 , and only in three samples TEQ was above 10,000 µg kg −1 (Italian screening value for commercial and residential areas, [3]).

Application of VIS-NIR for Potential Ecological and Human Risk Prediction
Until now, VIS-NIR spectroscopy was used for total and bioavailable PAHs, alkanes and heavy metals determination [27], alkanes and PAHs in oil-contaminated soil [28] or the bioremediation of soils polluted with fuel oil [31]. In our study, this technique was used for diagnostic risk screening of long-term PAH contaminated soils, and to predict their toxicity based on various risk indexes.
The obtained spectra ( Figure 4) are typical for mineral soils with a high sand content (loamy sand) [59] but are relatively flat with no clear features. This is the result of sample drying and sieving but also of the relatively low C org content [20,32] of most samples (upper quartile for C org 17.6 g kg −1 ). The differences between samples are mostly noticeable in baseline and peak intensities. The reflectance is usually increased with a lower SOM content [60]. In the present study, C org is probably not the only factor governing the reflectance intensities since the mean C org content from class 1 (200-600 µg kg −1 ) to class 5 (>10,000 µg kg −1 ) are 9.56; 9.81; 22.96; 17.33, and 87.94 g kg −1 , respectively. The absorption features at 1400 and 1900 nm are due to water (OH bonds) and the absorption feature at 2210 nm is due to hygroscopic water in clay minerals [61]. Overall, there was a reflectance decrease with the increase of PAHs content. This was also observed by Bray et al. [12] and Okparanma et al. [16,26], and could partly be explained by the correlation between soil organic carbon and 16PAHs.
To investigate further the structure hidden within the soil spectra, a principal component analysis (PCA) was undertaken. The score plot is presented in Figure 5. Most of the spectral variations (99%) are described by the first component (PC-1). Samples are distributed along the PC-1 axis. Two cluster are well separated (low PAH content (class 1) and high PAH content (class 5). Variations of C org content and soil texture in the other classes were too great to allow for clustering according to PAH content. Bray et al. [12] on suburban soil samples have noticed that samples were roughly distributed along PC-1 axis. That behaviour could be related to soil properties (e.g., C org content). not the only factor governing the reflectance intensities since the mean Corg content from class 1 (200-600 µg kg −1 ) to class 5 (>10,000 µg kg −1 ) are 9.56; 9.81; 22.96; 17.33, and 87.94 g kg −1 , respectively. The absorption features at 1400 and 1900 nm are due to water (OH bonds) and the absorption feature at 2210 nm is due to hygroscopic water in clay minerals [61]. Overall, there was a reflectance decrease with the increase of PAHs content. This was also observed by Bray et al. [12] and Okparanma et al. [16,26], and could partly be explained by the correlation between soil organic carbon and 16PAHs.
To investigate further the structure hidden within the soil spectra, a principal component analysis (PCA) was undertaken. The score plot is presented in Figure 5. Most of the spectral variations (99%) are described by the first component (PC-1). Samples are distributed along the PC-1 axis. Two cluster are well separated (low PAH content (class 1) and high PAH content (class 5). Variations of Corg content and soil texture in the other classes were too great to allow for clustering according to PAH content. Bray et al. [12] on suburban soil samples have noticed that samples were roughly distributed along PC-1 axis. That behaviour could be related to soil properties (e.g., Corg content).
To investigate further the structure hidden within the soil spectra, a principal component analysis (PCA) was undertaken. The score plot is presented in Figure 5. Most of the spectral variations (99%) are described by the first component (PC-1). Samples are distributed along the PC-1 axis. Two cluster are well separated (low PAH content (class 1) and high PAH content (class 5). Variations of Corg content and soil texture in the other classes were too great to allow for clustering according to PAH content. Bray et al. [12] on suburban soil samples have noticed that samples were roughly distributed along PC-1 axis. That behaviour could be related to soil properties (e.g., Corg content).
One sample (down the PC-2 axis) seems to be an outlier, but was not removed from the models since preliminary investigations have shown that it did not have any significant effect on the modelling. That sample was an organic soil (muck) while the other samples were mineral soils. The green triangles ( Figure 5) are soils from the 16PAH 1000-5000 µg kg −1 class. Their spread is probably due to the fact that they present most of the soil texture variation and texture affects soil reflectance [62], but also the most C org variation.
The texture heterogeneity leads to more spectral variation related to diffuse reflected light inside the class than the other PAH classes. No features in the 1600-1800 nm related to hydrocarbon contamination [16,26] were noticeable in the raw spectra probably due to the fact that samples were sieved and scanned dry. However, even if most features are not discernible to the naked eye, chemometric techniques (e.g., PLS) are able to extract important information from a spectral database.
The PLS regression models of individual PAHs were poorly defined with r 2 < 0.5 and high root mean square errors and are not reported here. Models with MA pre-processing were better than with other pre-processing methods and are the only ones presented below. The quantitative predictions of individual PAHs and 16PAHs were not satisfactory probably due to the low representation of extreme values and to the limited number of samples. The non-normal distribution of PAH contents is also a factor affecting the predictions. To our knowledge, only one paper predicted an individual PAH (BaPyr) concentration with success [12] on 65 urban soils from Australia. However, their BaPyr content was much higher than in our dataset (74 samples). Phenanthrene, as an individual PAH was also predicted in another study but on artificially contaminated soil samples with a very large amount of diesel oil [29]. The lack of robust prediction for individual PAH is probably due to an insufficient number of samples representing each class of contamination and to a relatively low PAHs content (except for a few samples). Despite good correlation with C org (r 2 = 0.81 for the five composite samples from Figure 5), PAH contents were not possible to predict. Nevertheless, the grouping of samples along the PC-1 component leads us to believe in the possibility to classify samples according to the different risk indexes presented in the above sections. Figure 6 presents a plot of PLS regression coefficients obtained by the PLS regression of total 16PAH. That plot allows identifying important variables (wavelengths) of the model construction.
the class than the other PAH classes. No features in the 1600-1800 nm related to hydrocarbon contamination [16,26] were noticeable in the raw spectra probably due to the fact that samples were sieved and scanned dry. However, even if most features are not discernible to the naked eye, chemometric techniques (e.g., PLS) are able to extract important information from a spectral database.
The PLS regression models of individual PAHs were poorly defined with r 2 < 0.5 and high root mean square errors and are not reported here. Models with MA pre-processing were better than with other pre-processing methods and are the only ones presented below. The quantitative predictions of individual PAHs and 16PAHs were not satisfactory probably due to the low representation of extreme values and to the limited number of samples. The non-normal distribution of PAH contents is also a factor affecting the predictions. To our knowledge, only one paper predicted an individual PAH (BaPyr) concentration with success [12] on 65 urban soils from Australia. However, their BaPyr content was much higher than in our dataset (74 samples). Phenanthrene, as an individual PAH was also predicted in another study but on artificially contaminated soil samples with a very large amount of diesel oil [29]. The lack of robust prediction for individual PAH is probably due to an insufficient number of samples representing each class of contamination and to a relatively low PAHs content (except for a few samples). Despite good correlation with Corg (r 2 = 0.81 for the five composite samples from Figure 5), PAH contents were not possible to predict. Nevertheless, the grouping of samples along the PC-1 component leads us to believe in the possibility to classify samples according to the different risk indexes presented in the above sections. Figure 6 presents a plot of PLS regression coefficients obtained by the PLS regression of total 16PAH. That plot allows identifying important variables (wavelengths) of the model construction. Assignment to specific compounds here is difficult too. No distinctive peaks were identified as important but rather regions (thick black line). Those regions are mostly related to organic matter (OM) and clay minerals. The first region (560-804 nm) is related to chromophores of OM and iron oxides [20,63]. The second region (1517-1898 nm), which is the first overtone region is mostly related to OM and contains the absorption bands by several bonds (CH, CH2, CH3, and Ar-CH aromatic ring [64]). Moreover, Okparanma et al. [16] identified a part of that region (1600-1800 nm) as related to Assignment to specific compounds here is difficult too. No distinctive peaks were identified as important but rather regions (thick black line). Those regions are mostly related to organic matter (OM) and clay minerals. The first region (560-804 nm) is related to chromophores of OM and iron oxides [20,63]. The second region (1517-1898 nm), which is the first overtone region is mostly related to OM and contains the absorption bands by several bonds (CH, CH 2 , CH 3 , and Ar-CH aromatic ring [64]). Moreover, Okparanma et al. [16] identified a part of that region (1600-1800 nm) as related to hydrocarbon contamination. That important variables region is discontinued before the 1900 nm absorption peak related to water. The next region (1983-2046 nm) starts just after that peak and is related to N-H bonds and aromatic amines [64]. The last region (2188-2236 nm) is probably related to the 2210 nm absorption band from hygroscopic water and clay mineral [61].
We, therefore, conclude that predictions of risk indexes (especially binary models) by VIS-NIR spectroscopy are possible because of the PAH relationship with OM but also with clay and perhaps iron oxides (see Figure 6). Since individual PAHs were poorly predicted, we decided to run the PLS algorithm on the index classes to use the method as a fast screening tool to predict environmental risk. Classes (e.g., 1 to 4 for TUm) were predicted instead of actual index values. Table 6 summarises the prediction of the classification results.
Prediction of Σ16PAHs classes was accurate in only 58.1% of the samples ( Table 6). The model could not correctly classify samples from the 200-600 µg kg −1 class (non-contaminated samples). We, therefore, divided the dataset into two classes (threshold-600 µg kg −1 ) to test if it was possible to predict if a sample was contaminated or not. In that case, prediction accuracy was 79.7% (59 correctly classified samples out of 74). Still, the model had difficulties in detecting samples of the non-contaminated class (11 cases were predicted as contaminated but being non-contaminated).
Besides, the prediction of the soil PAH concentrations, the VIS-NIR method was also used to predict environmental risk. The prediction of hazard index (HI) classes was better with 83.8% accuracy. The model very accurately predicted samples of low impact (1 < HI < 10), where HI index did not exceed threshold value for acceptable risk. Out of the 12 samples wrongly classified, four were classified as class 2 (HI = 10-100, moderate risk) instead of class 1 (HI = 1-10, low risk), six samples were classified as class 1 instead of class 2, and two samples classified as class 2 instead of class 3 (HI > 100, high risk). When considering for a binary prediction only HI index above which the possible risk occurs (HI > 10), the accuracy was 86.5% with eight samples classified as class 2 (HI > 10) instead of class 1 (HI < 10) and only two samples the other way around.
The classification of toxic units for 16PAH mixture (TUm) according to Persoone et al. [53] has provided satisfactory results with an accuracy of 83.8%. When predicting if a sample presented toxicity (TUm > 10) with a binary prediction, the accuracy was 87.9% with no over prediction of one class over the second. This is in agreement with Bray et al.'s [12] accuracy of 90.2% for the prediction of Σ16PAHs > 10 mg kg −1 using ordinal logistic regression.
The TP class predictions were unsatisfactory, with only 33.8% accuracy classifying the samples into the four TP classes [15]. The best predicted class was the low risk class (0.25 < TP < 0.50) with 75% accuracy. Nevertheless, there was rarely more than one class error difference (e.g., class "no risk" instead of class "moderate risk"). A binary prediction with a threshold of 0.5 was completed with an aim at detecting if there is a risk or not [15], and the obtained prediction risk accuracy was 71.6%. It needs to be highlighted that more investigations are needed to develop more accurate models of TP classification when using the 4 classes by including more samples of each class.
The prediction of TEQ classes were made separately for each threshold according to Carlon et al. [58] and Cachada et al. [3]. There were no samples of TEQ < 15 µg kg −1 , therefore seven independent models were produced (Table 6). There was an increase in prediction accuracy with increase of the TEQ threshold value. Bray et al. [12] obtained an opposite pattern, with lower accuracy while predicting higher threshold values (but for Σ16PAHs; 10, 20, and 40 mg kg −1 ). In our case, this is related to the fact that there were fewer samples with higher contamination values, and thus higher TEQs. For a TEQ threshold of 600 µg kg −1 , which is the limit Canadian value for the protection of human health [57], the binary model (if a sample has a TEQ higher or lower than 600 µg kg −1 ) accuracy is 94.6%.
There was overestimation in the predictions of 16PAHs and HI index but overall, our results are promising and the method could be of interest to use a fast screening tool to decide whether or not more investigations are needed in a specific area. This is the first time that VIS-NIR was used to predict HI, TUm, TP, and TEQ indexes in a view to allow a fast screening risk assessment of a contaminated area.

Study Area Characterization
The area considered in this study covers a territory of 115 km 2 located in the south-western part of the Silesian Province in Poland (Rybnik district, Czerwionka municipality). This region is rich in natural resources, including hard coal, rock salt, and construction sand. The research area combines typical urban, industrial and post-industrial character districts, as well as forests and agricultural areas of high natural and recreational value. The land usage structure is dominated by forest areas (42%) and arable lands (34%). The soil cover is represented by Cambisols and Luvisols with small contributions of Phaeozems, Fluvisols and Histosols. Soil contamination in this region is mainly the result of fossil fuel combustion, waste recovery, road transport, previous coal mining activity and long-term (for over 100 years) coke production. A detailed description of the research area can be found in Klimkowicz-Pawlas et al. [11].
The soil samples (n = 74) were collected from the surface layer (0-30 cm) of agricultural land (mainly arable fields-75%) in two sampling campaigns after the vegetation season and analysed for physical, chemical and spectral properties, as well as for the content of selected contaminants (PAHs). Geographical locations of sampling points were preliminary based on maps (1:25,000 scale; database of the Institute of Soil Science and Plant Cultivation in Puławy) and identified by a GPS technique. The sampling sites reflected direct exposure to local and transboundary pollution emission sources and variability of soil and hydrological conditions. Usually, the sampling density was higher in area of intensive anthropogenic activity and lower in recreational/forested areas. The soil material was air-dried at 20 • C, well mixed, sieved to pass a 2 mm sieve-mesh and stored in the dark at a temperature of 16-18 • C before chemical and spectral analyses.

Soil Analysis
Physicochemical characteristics of soils included particle size distribution, pH, total organic carbon and total nitrogen content, as well as PAH concentrations. Particle size distribution was measured by the laser diffraction method, using a Mastersizer 2000 apparatus with a Hydro UM attachment (Malvern Panalytical, Malvern, UK,) [65]. Soil reaction was determined potentiometrically in a suspension of 1 mol L −1 KCl (1:2.5 (m V −1 )). Total organic carbon (C org ) content was measured by sulfochromic oxidation followed by titration of the excess K 2 Cr 2 O 7 with FeSO 4 (NH 4 [4,10,34]. Soil samples (grain size ≤ 0.1 mm) were spiked with a mixture of five deuterated PAH compounds, extracted with dichloromethane in an Accelerated Solvent Extractor (ASE 200, Dionex Co., Sunnyvale, CA, USA) and analysed by gas chromatography with MS detection using Agilent GC-MS system (Agilent Technologies, Santa Clara, CA, USA). The quality control procedure included the analysis of certified reference material (CRM 131), laboratory control samples and solvent blanks. The recovery for individual compounds from CRM 131 was within 62-84% and the precision expressed as a relative standard deviation (RSD) was in the range of 5-12%. Limit of detection (LoD) for individual PAH compounds was within 0.5-2.3 µg kg −1 , while limit of quantification (LoQ) was 1.6-7.1 µg kg −1 .

Potential Ecological and Human Health Risk Calculation
The risk assessment was based on the total concentration of contaminants and the calculation of risk indexes for individual chemicals and for their mixture; the hazard index (HI), toxic pressure (TP), toxicity equivalency factor quotient (TEQ), and toxic units (TUm) were used. HI was expressed as the sum of hazard quotient values (HQs) calculated for individual PAH compounds and reflecting a ratio between the total measured concentration of PAH in soils (C PAH in µg kg −1 ) and the maximum permissible concentration (MPC in µg kg −1 ) for individual compound. The hazard index was calculated as follows [3,11]: The toxic unit (TU) model described previously by Gómez-Gutiérez et al. [49] and Cachada et al. [3] was used to calculate ecological risk. The soil PAH concentration (C PAH in µg kg −1 ) was compared with the predicted no-effect concentration (PNEC in µg kg −1 ). PNEC is defined in the literature as "the concentration below which unacceptable effects on organisms will most likely not occur" [3]. Since for pollutants like PAHs it is assumed that individual compounds have the same toxic mode of action, the individual TUs of the 16PAHs were summed obtaining the toxicity of the mixture (TU m ) and calculated as follows: In addition, the Toxic Pressure coefficient (TP m ) for the mixture was calculated using the approach suggested by Jensen et al. [13] and Dagnino et al. [14], based on the response addition model, which assumed a dissimilar toxic mode of action for all compounds. The following formula was used for TP m calculation: where TP 1 , TP 2 , TP 3 , TP n is the toxic pressure calculated for the individual PAH compounds [3]. The toxic equivalent factors (TEFs) are an example of relative potency factors (RPF) and were used for the toxic equivalency quotient (TEQ in µg kg −1 ) derivation according to the following formula [56]: where TEQ is obtained by summation of the products of the concentration of individual PAH (C PAH in µg kg −1 ) and corresponding TEF value. A TEF is estimated based on the relative toxicity of a chemical compared to a reference chemical, which for PAHs is benzo[a]pyrene (BaP). The TEF value of 1 is assigned to the BaP, for the other chemicals TEFs are order-of-magnitude estimates of potency.

Environmental Risk Prediction Using VIS-NIR Spectroscopy
Grounded and sieved samples (2 mm) were analysed with a PSR-3500 spectroradiometer (Spectral Evolution, Haverhill, MA, USA). The instrument acquires reflectance data in the 350-2500 nm range with spectral resolutions of 3.5 nm, 10 nm, and 7 nm at 700 nm, 1500 nm, and 2100 nm respectively. Soil reflectance was interpolated to 1 nm intervals. The instrument was calibrated with a 99% white reference panel (5 × 5 cm) every 10 samples. Samples were placed in Petri dishes and scanned with a contact reflectance probe to avoid contamination with ambient light. Each spectrum is the averaging of 30 consecutives spectra (software handled). Five replicates scans in different parts of the Petri dish were averaged to obtain a representative VIS-NIR spectrum of the sample.

Statistical Evaluations
The software package Statgraphics Centurion (version XVIII, Statpoint Technologies, The Plains, VA, USA) was used for statistical analysis. The following basic statistical parameters were analysed: median, mean, standard deviation (SD), extreme values (Min and Max), quartiles (lower-LQ and upper-UQ), skewness and kurtosis, and coefficient of variation (CoV). Inverse distance weighting method was used to construct maps of the spatial distribution of chemical indexes in soils by ArcGIS (10.0) software (ESRI, Redlands, CA, USA).
Mean average (MA) with a segment size of 5 was applied to all spectra before analysis. All predictions were made on MA spectra but also on spectra pre-processed with standard normal variate (SNV) and first derivative (Savitzky-Golay). Due to the relatively small number of samples (74) and to the preliminary nature of this study, full cross-validation (CV) was used all along the study.
A principal component analysis (PCA) of the spectra (MA) was used to reveal the hidden structure within the soil sample dataset. PCA allows visually exploring the spectra and having an insight into their relationships with soil properties. With regards to the number of samples and preliminary investigations partial least-squares (PLS) regression was used with full cross-validation to predict the sum of 16PAH concentrations, individual PAH concentrations, and all the investigated indexes. PLS regression is a popular technique for quantitative analysis of VIS-NIR spectra and was used to determine the best correlation between the PAHs data and the spectral data. This is a reduction dimension technique that seeks a set of latent variables by maximising the covariance between the spectra (X) and PAHs concentrations or indexes (Y). The goal of PLS regression was to predict Y from X. The predictions of the different indexes were done on the class number (e.g., for TUm four classes-1 to 4 corresponding to the following ranges of TUm values: <1; 1-10; 10-100; >100) and not on the actual values of that particular index. The predicted class numbers were then rounded to the nearest whole number. Spectra pre-processing, PCA and PLS regressions were completed with Unscrambler 10.3 (Camo Analytics, Oslo, Norway) software.

Conclusions
Rural areas are often located in the proximity of highly urbanised or industrial areas and may be exposed to high, continuous inputs of various pollutants, including PAHs, which may persist for a long-time in soils and pose a threat to ecosystems and human health. In this study, we demonstrated the combination of chemical indexes and VIS-NIR spectroscopy for first-tier screening assessment and identification of areas where site-specific assessment is needed. Additionally, for the rating of the degree of risk, different criteria were used. The 16PAH concentration ranged from 311 to 316.1 × 10 3 µg kg −1 and exceeded the current regulatory limit for uncontaminated soils in almost 80% of samples. The highest PAH levels were found in the central and north part of the study region in the vicinity of PAH sources (coke plant, coal mining landfill, coal recovery plant). It was shown that risk assessment results depend on the model used for risk index calculation and on the values established as a threshold for an unacceptable risk. In our study, HQ/HI indexes were based on the permissible concentration for only 10 individual PAHs (as recommended in Polish soil guidelines), while TU, TP and TEQ on toxicity data for 16PAH compounds. HQ and TU indexes calculated for individual PAHs indicated a high risk in 62-66% of the study area, whereas risk indexes obtained for PAHs mixture (TUm, TPm and TEQ) had the highest values (above acceptable limits) only in 22-55% of soil samples. Our approach based on total levels of PAHs may lead to risk overestimation, however, as a highly protective measure was useful to delineate area for detailed higher tier evaluations. VIS-NIR spectroscopy was also used for prediction of 16PAH concentrations and environmental risk. Our results shown that VIS-NIR is a promising and helpful technique which can be used for the screening assessment of PAH contaminated soils. Since it was difficult to predict individual soil PAH concentrations, the PLS model on specific index classes was used. The model accuracy ranged from 20 to 95.9%, depending on the class and type of index tested. Generally, risk indexes prediction gave better results when binary models were applied. The accuracies of binary models were 85.1%, 87.9%, 71.6% and 94.6% for HI, TUm, TPm and TEQ indexes, respectively. For better prediction of total PAH concentrations and to develop more accurate models of risk indexes classification, more investigations are needed by including more samples of each class. However, VIS-NIR technique can be successfully applied in the first-tier risk assessment to differentiate if a sample is contaminated/non-contaminated, or there is a risk/no risk.

Funding:
The study was carried out under the statutory project of IUNG-PIB no. 4.07 "Application of ecological risk assessment procedure for the evaluation of soil retention function for chemical pollutants". This work has been cofinanced by the National Science Center grant no. UMO-2011/03/B/ST10/05015.