Characteristics of groundwater quality monitoring data: absolute versus relative concentrations
Table 1 presents that the median values of most groundwater quality parameters except for redox potential (ORP) have significantly higher (p < 0.05) concentrations in the leachate (LW) than in the nearby groundwater (MW). Previous studies have shown that such livestock mortality leachate contains high concentrations of inorganic and organic compounds (e.g., ammonium, alkalinity, chloride, sulfate, BOD, and COD) as a result of carcass decomposition34,51,52. The lower ORP in the leachate (median = -66 mv) compared to the surrounding groundwater (median = 115 mv) is due to anaerobic conditions prevailing in the burial pits40,53,54. The carcass leachate leakage from burial pits thus induces the subsequent increases of ionic concentrations in groundwater, exhibiting positive correlations with EC and TDS concentrations but negative correlations with ORP.
Table 1
Statistical summary of groundwater quality data (n = 420) collected from leachate wells (LW) and groundwater monitoring wells (MW) in the livestock carcass burial pits (n = 30).
Category
|
Quantile
|
pH
|
EC
(µS/cm)
|
ORP
(mV)
|
DO
(mg/L)
|
BOD
(mg/L)
|
COD
(mg/L)
|
TN
(mg/L)
|
NH4+-N
(mg/L)
|
NO3−-N
(mg/L)
|
Cl−
(mg/L)
|
Ca2+
(mg/L)
|
Na+
(mg/L)
|
TP
(mg/L)
|
PO43−
(mg/L)
|
Total Samples (n = 480)
|
Q25
|
6.20
|
304.5
|
6.0
|
1.3
|
1.3
|
13.8
|
7.3
|
0.0
|
1.3
|
17.9
|
21.4
|
11.0
|
0.0
|
0.0
|
Q50
|
6.58
|
599.5
|
93.8
|
2.7
|
4.8
|
40.0
|
20.9
|
0.9
|
2.9
|
36.4
|
41.5
|
19.1
|
0.2
|
0.0
|
Q75
|
6.94
|
1342.5
|
160.5
|
4.2
|
31.4
|
568.3
|
86.3
|
28.5
|
7.7
|
104.9
|
101.0
|
50.5
|
1.6
|
0.0
|
Leachate Wells (n = 75)
|
Q25
|
6.00
|
2497.0
|
-197.0
|
0.7
|
451.0
|
4120.0
|
735.8
|
324.4
|
0.6
|
123.2
|
98.4
|
63.1
|
1.1
|
0.0
|
Q50
|
6.41
|
4930.0
|
-66.0
|
1.5
|
1879.2
|
11333.3
|
2890.9
|
1554.0
|
5.0
|
279.3
|
249.3
|
185.9
|
5.2
|
0.3
|
Q75
|
6.83
|
8790.0
|
62.0
|
3.0
|
6765.1
|
44000.0
|
7184.7
|
3666.0
|
13.5
|
723.4
|
582.9
|
591.0
|
14.0
|
1.0
|
Monitoring Wells (n = 345)
|
Q25
|
6.23
|
279.0
|
34.0
|
1.4
|
0.9
|
11.0
|
5.8
|
0.0
|
1.3
|
16.0
|
20.0
|
10.1
|
0.0
|
0.0
|
Q50
|
6.61
|
456.0
|
115.0
|
3.0
|
2.7
|
24.0
|
13.9
|
0.3
|
2.8
|
27.3
|
35.0
|
15.6
|
0.1
|
0.0
|
Q75
|
6.95
|
854.0
|
173.0
|
4.6
|
8.9
|
105.0
|
33.8
|
3.8
|
6.9
|
57.6
|
71.6
|
27.9
|
0.6
|
0.0
|
p-value
|
p < .05
|
p < .001
|
p < .001
|
p < .001
|
p < .001
|
p < .001
|
p < .001
|
p < .001
|
p < .01
|
p < .001
|
p < .001
|
p < .001
|
p < .001
|
p < .001
|
Given the elevated ionic concentrations within the leachate wells (LW) relative to the adjacent groundwater monitoring wells (MW), the log-scaled concentrations of Cl- and NH4+-N ions have strong positive Spearman correlations with EC values (ρ = 0.65 and 0.61), respectively in the total dataset (combined LW and MW) (Fig. 1). This indicates that the influence of leachate infiltration on proximate groundwater can be quantitatively diagnosed by measuring the the correlation coefficients among hydrochemical parameters in the groundwater quality monitoring data. Nevertheless, as mentioned above (section 2.2), the hydrochemical parameters are inherently compositional parts that carry relative information. Therefore, the correlations computed between any pair of log-transformed variables can be spurious and the log-ratio transformations such as centered log-ratio (clr) and isometric log-ratio (ilr) are necessary for hydrochemical parameters25,29.
Figure 2 shows the bivariate relationships with correlation coefficients between the clr-transformed values (i.e., relative to the geometric mean of all components) of Cl− and NH4+-N and log-transformed EC. A positive correlation is observed for NH4+-N (ρ = 0.56), consistent with the log-transformed data. Conversely, Cl− reveals a negative correlation (ρ = -0.11) despite a positive association in its log-transformed data. This is attributed to the relatively high Cl− concentration in the groundwater from monitoring wells (MW) compared to that in the leachate (LW) (right in Fig. 2). The elevated Cl− levels in MW result from the influence of agricultural practices, such as the use of livestock manures and fertilizers, which affect the background levels of the groundwater near the burial pits, unlike NH4+-N, which is primarily originated from carcass leachate. These results demonstrate that the correlation structure in the total dataset can change when considering the relative compositions of individual hydrochemical parameters.
Assessing the influence of leachate on groundwater quality using multivariate CoDa and RPCA
Given the multivariate compositional nature of hydrochemical data, it is necessary to employ a correlation matrix derived from log-ratio transformations to examine the interrelationships among various compositional parameters. Figure 3 shows the significant differences between between the correlation matrices of log-transformed and clr-transformed variables (excluding EC, ORP, total bacteria, and total coliform) in the total dataset. This comparison demonstrates that the type of data transformation significantly influences the outcome of correlation analysis, as previously shown (Figs. 2 and 3). The result of log-transformed data (lower section of Fig. 3) reveals that the nine hydrochemical parameters (Cl−, Ca2+, Na+, BOD, COD, Total N, NH4+-N, Total P, PO43−), predominantly concentrated in the leachate (LW), exhibit positive correlations with each other. In contrast, these parameters display negative correlations with pH and redox-sensitive parameters (DO and NO3−-N), which typically decrease under anaerobic conditions.
On the other hand, the correlation matrix for clr-transformed data (upper section of Fig. 3) explains the relative compositional relationships based on their source attributions. For instance, parameters primarily originating from leachate (BOD, COD, Total N, NH4+-N) show inverse relationships with those dominant in background groundwater (Cl−, Ca2+, Na+) as well as with redox-sensitive ions (DO and NO3−-N). It is noteworthy that the relative compositions of hydrochemical data are inherently influenced by the proportional contributions from various solute sources, such as carcass leachate and agricultural practices (e.g., livestock manures and fertilizers). Therefore, the application of CoDa (i.e., log-ratio transformations) can be more useful and relevant than using absolute concentrations (such as raw or log-transformed data) for a statistical and practical assessment of the impact of leachate leakage on groundwater quality.
In the context of multivariate CoDa, RPCA provides a more comprehensive explanation of the relative compositional changes in hydrochemical data. In this study, RPCA was applied to the ilr-transformed data, and then the loadings and scores of RPCA were back-transformed to the clr-coordinates. From the result, the first two principal components (PC1 and PC2), accounting for 34.0 and 29.9% of the total variance, are extracted from the ilr-transformed data (Table 2 and Fig. 4). The loadings exhibit a correlation (or covariance) structure among the twelve hydrochemical parameters. Notably, PC1 has positive correlations with NH4+, BOD, and COD, which are predominantly enriched in carcass leachate, while it shows negative correlations with redox-sensitive parameters such as DO and NO3−-N. Ions such as Cl−, Na+, and Ca2+, despite their high absolute concentrations in leachate, show only weak correlations with PC1. This is attributed to their relative abundance in the background groundwater. On the other hand, PC2 has correlations with total P and PO43−. However, both variables are redundant in the interpretation since they are immobile in groundwater due to their adsorption on soils and sediments55. Therefore, PC1 delineates the impact of leachate on groundwater quality, showing the relative increase in ionic concentrations compared to background levels and the formation of anaerobic conditions. These results are identical to the outcomes obtained from the correlation analysis on the clr-transformed data.
Table 2
Isometric log-ratio (ilr) of five selected subcompositions and their corresponding binary partitions, indicating the impact of leachate on groundwater. The table also includes Spearman correlation coefficients (R) between the ilr and the PC1 score from RPCA.
Isometric-log ratio (ilr)
|
Selected Parts (dimension)
|
Binary partition of selected balance (ilr)
|
ρ
|
Numerator (+)
|
Denominator (-)
|
Z1
|
12
|
BOD, COD, NH4+-N
|
DO, H+, NO3-N, TN, Cl−, Ca2+, Na+
|
0.75
|
Z2
|
7
|
BOD, COD, NH4+-N
|
NO3-N, Cl−, Na+, DO
|
0.83
|
Z3
|
5
|
BOD, COD, NH4+-N
|
NO3-N, Cl−
|
0.81
|
Z4
|
5
|
NH4+-N
|
NO3-N, Cl−, Na+, DO
|
0.8
|
Z5
|
3
|
NH4+-N
|
NO3-N, Cl−
|
0.79
|
Accordingly, the robust scores along the first PC1 distinctly differentiate between leachate (LW) and groundwater samples (MW) in the total dataset, while also being robust to outliers (Fig. 4B). The application of RPCA identified 110 outliers, constituting 22.9% of the total samples, which predominantly include leachate samples (LW). This suggests that the score, computed as a weighted linear combination of multivariate hydrochemical parameters, serves as an effective groundwater pollution index for assessing the impact of leachate on groundwater quality. Nevertheless, it is important to note that the eigenvectors, representing the loadings as weights of hydrochemical parameters, obtained from RPCA can be variable depending on the specific monitoring data used. Additionally, the computation of scores involves complex transformations of the observed concentrations of multiple parameters into log-ratio values. Thus, we aim to identify critical subcompositioal parts that reflect the variability in RPCA scores and introduces their ilr-coordinate as a singular groundwater pollution index (GPI). This index serves as a versatile purpose tool for assessing the impact of leachate on groundwater quality.
Development of ilr-based groundwater pollution index (GPI)
In the context of multivariate CoDa, although the RPCA provides useful scores for evaluating the influence of leachate on groundwater quality, this study has adopted ilr transformation to develop a more straightforward method for formulating a univariate GPI. As explained above (in Section 2.2.), the ilr transformation results in D-1 Cartesian coordinates, known as balances, based on an orthonormal basis established through the Sequential Binary Partition (SBP) of D selected components. Here, we construct the SBP based on the PC loadings expressed with clr (Fig. 4), which informs about the important subcompositional parts and their relationships showing the leachate pollution in groundwater quality. Figure 5 illustrates the SBP for the case of a D = 12 subcomposition partitioning the full set of hydrochemical parameters in accordance with the results of RPCA. From this partitioning, the eleven (D-1) independent isometric log-ratio (ilr) coordinates have been derived, according to Eq. (1).
Based on the SBP, we identified the second balance (labeled as ilr2 in Fig. 5 and Z1 in Table 3), which represents a binary partition excluding total P and PO43−, as a critical ilr-coordinate for evaluating the impact of leachate on groundwater. The selected ilr-coordinate (Z1) uses BOD, COD, and NH4+-N ions, which is mainly produced from carcass decomposition, as the numerator; meanwhile the denominator involves Na+, Ca2+, H+, NO3−, DO, Cl− and NO3−-N ions, which are relatively dominant in the background groundwater affected by agricultural activities and oxic conditions. This log-ratio effectively retains the relative information of the data as shown in the results from RPCA exhibiting a significant correlation (ρ = 0.56) with the first principal component score (PC1) (Table 3). Additionally, it shows a positive correlation (ρ = 0.56) with electrical conductivity (EC) and a negative correlation (ρ = 0.56) with redox potential (ORP). Consequently, this ilr-coordinate is considered a reliable GPI in terms of ratio for assessing the effects of leachate on groundwater quality.
Table 3
Performance metrics (accuracy, sensitivity, and specificity) of the selected isometric log-ratio (ilr) based groundwater pollution index (GPI) compared with environmental criteria, suggesting the optimal cutoff value for effective groundwater pollution evaluation (positive rate).
Selected GPI (ilr)
|
Cutoff
|
Accuracy
|
Sensitivity
|
Specificity
|
Positive rate
|
Z3
|
-0.8655
|
0.7762
|
0.67
|
0.88
|
0.37
|
We further examined different ilr-coordinates derived from subcompositions with a reduced number of parts (specifically, D = 7, 5, and 3 parameters), using the same procedure to develop more simplified versions of the GPI. These ilr-coordinates not only correlate well with the PC1 but also effectively account for the variations in EC and ORP (Table 3). This result suggests that the ilr-coordinates sufficiently explain the relative information relevant to the hydrochemical processes by focusing on key parameters, rather than incorporating all measured parameters. This is due to the fact that the ilr transformation ensures the principle of subcompositional coherence of compositional data56.
The ternary diagram in Fig. 6 shows the distribution of three subcompositional parts (NH4+-N, Cl− and NO3−-N) characterized by two ilr-coordinates (ilr[NH4+-N |Cl−, NO3−-N] and ilr[Cl−| NO3−-N]) in the Euclidean space. The first of these coordinates corresponds to the Z3 in Table 3 explaining 90.1% of the total variance in the distribution. This ratio reflects the increase in NH4+-N relative to Cl− and NO3−-N, and differs shows a significant difference (p < 0.05) between leachate (MW) and groundwater (GW) (Fig. 7B). Consequently, the ilr-coordinates of three specifically selected parts (NH4+-N, Cl− and NO3−-N) provide the most simplified and practical form of GPI while optimally maintaining the essential information of groundwater quality monitoring data. We propose a univariate GPI to quantify the impact of leachate on groundwater, using the following ilr equation:
$$GPI\left({Z}_{3}\right)=\sqrt{\frac{2}{3}}ln\frac{[{NH}_{4}^{+}-N]}{\sqrt{{NO}_{3}^{-}-N\left]\right[{Cl}^{-}]}}$$
5
The ilr-coordinate (Z3), proposed as a GPI, was compared with the assessment results of leachate impact on groundwater, as outlined by the government's environmental criteria (mentioned in Section 2.1.). For this, data samples were categorized into binary groups based on varying ilr values, and these classifications were then juxtaposed with those designated as contaminated or uncontaminated according to the environmental criteria, measuring sensitivity and specificity. Such a comparison not only validates the GPI's potential as a viable alternative to the environmental criteria but also suggests an appropriate GPI cutoff that aligns with the criteria.
We determined the optimal cutoff for the GPI using a receiver operating characteristic (ROC) curve with an area under curve (AUC) of 0.78, which graphically represents sensitivity versus 1-specificity (recall) across various cutoff points. The optimal cutoff, determined at the point where sensitivity is maximized and 1-specificity is minimized, was identified as -0.87 (as shown in Fig. 7A). At this point, the sensitivity was 0.67, correctly identifying 67% of samples as contaminated according to the Environmental Criteria, while the specificity was 0.88, accurately classifying 88% of uncontaminated samples (Table 3). These results validate the effectiveness of GPI in differentiating between contaminated and uncontaminated groundwater, confirming its reliability as a tool for environmental pollution assessment. Finally, the ilr-based GPI was adjusted to center around the cutoff and normalized between 0 and 1, utilizing the maximum and minimum values according to Eq. (4). Within this normalized scale, a GPI value exceeding 0.5 is established as the threshold for identifying leachate contamination, in accordance with the government's environmental criteria. Notably, this normalized GPI revealed that more than 80% of the entire monitoring dataset exceeded this 0.5 threshold, suggesting significant contamination.