Spatial Distribution of multielements including technology critical elements in sediments of the Danube River

Background: The pollution of the second-longest European river (the river Danube) has been under monitoring and focused on various contaminants including metals/metalloids (Hg, As, Ni, Zn, Cu, Cr, Pb, and Cd), personal care products, technical additives, pesticides, pharmaceuticals, etc. Recent studies show that technology critical elements (TCE) – elements with a high supply risk and economic importance – are becoming emerging pollutants due to their wide application in new technologies. According to the European Union Water Framework Directive, sediments are one of the three major sources of river pollution. This study aims to determine major and trace elements including some TCEs in the Danube River sediment. The concentrations of the targeted elements in the surface sediments were discussed in the sense of the effect of building hydropower dam Iron Gate I and increasing the quantity of sediments in the Iron Gate gorge. Results: The surface sediments were collected on the Danube River-km 1141 to 864 and three tributaries along this waterway. Two samples of deep sediments were used for comparison. Instrumental Neutron Activation Analysis was applied for quantification of 36 elements, with special attention to selected TCE belonging to lanthanides (La, Ce, Nd, Sm, Eu, Gd, Tb, Dy, Tm, and Yb). Spatial distribution is discussed (i) in the total pool of all analyzed elements and (ii) only lanthanides. For better understanding and to highlight a hidden relationship between targeted elements, multivariate statistical techniques (cluster analysis and principal component analysis) were applied to analyze the analytical data and to identify possible pollution sources. The obtained results of the targeted elements in the surface sediments were discussed in relation to the influence of hydropower dam Iron Gates I and the increasing quantity of sediments. Conclusion: Overall results show increasing concentration of almost all investigated elements in the surface sediments from the Danube River-km 1112 to the dam. Sediment od the River Pek was separated as a location with extreme anthropogenic influence due to close vicinity of the copper mining site.

7 separation treatment is involved, simple sample preparation step for solid samples, a small quantity of sample (≈ 100 μg), determination of the total element concentration independent of chemical species, real total analysis since the test portion does not have to be dissolved [29]. The concentrations of the targeted elements in the surface sediments were discussed in the sense of the effect of building hydropower dam Iron Gate I and increasing the quantity of sediments in the Iron Gate gorge. Multivariate analysis was applied to explore relationship between variables (concentrations and sample locations).

Description of the study area
The Danube River (2850 km long with the river basin of 801463 km 2 ) flows through 9 countries and geologically different areas. The Danube River in Serbia is 588 km long, from 1433 km (Hungarian border) to 845 km (border with Bulgaria), and flows through the three geomorphological areas according to Vogel and Pall 30,31]: Reach 5 (from 1433 to 1202 km, the Pannonian part), Reach 6 (from 1202 to 943 km, Iron Gate gorge) and Reach 7 (from 943 to 845 km, Lowland river). The sample places are located in the Reach 6 and 7. Reach 6 is characterized by the anthropogenic impact of the Iron Gates hydroelectric power plant and the significant inflow of untreated urban (especially form the city of Belgrade) and industrial wastewater, as well as agriculture effluents along upstream of the dam 32. Reach 7 is characterized as the Lowland river (Aeolian sediments and loess). In the Serbian part, the River Danube receives two large tributaries (Rivers Sava and the Tisza) and several smaller ones (e.g. Velika Morava and Pek). Velika Morava is the last significant right bank tributary before the Iron Gates. River Pek is important because it passing close to the large dumpsites of a copper mine Majdanpek. 8 The riverbed of the Iron Gate section dominantly consists of coarse sediments and rocks.
The river sediment transport depends on the dimension of particles: larger particles (coarse and medium send) move along the river bed (bed-load), smaller particles (fine sand, silt and clay), which are suspended in water, move with water (as suspended load). The construction of the hydropower dam Iron Gates I and reservoir system along the Danube River (135 km long, upstream of the dam) has been influenced on both the hydrological regime of the surface and ground waters, and amount of sediments [25]. The main effect of hydroelectric plant construction is the increase of sediment accumulation, more than 75% of the incoming sediment remain inside the reservoir 28. Although, the concentration of the suspended sediment is low (1 -100 g m -3 ), as a consequence of big annual water flow (110 -220 10 9 m 3 ), the quantity of sediments reaches 7 -10  10 6 tons 25. The largest quantity of sediments (up to10 m high) was accumulated three years after the construction of the dam (1972)(1973)(1974) and located between 970 and 1003 km of the River Danube. After that time, the sedimentation zone moved towards the dam [25]. These sediments are built from suspended sediments of the Danube (39%), the Tisza (26%), the Sava (21%) and the Velika Morava (14%).

Sampling and sampling locations
The samples were taken at 8 sites along the Danube River from 1141 to 864 km and in 3 main tributaries belonging to the middle Danube basin ( Fig.1  (about 1 kg each) were transferred into the plastic containers and placed in a cooler for transport to the laboratory. The samples were mixed and air-dried in a thin layer in the dark at room temperature (23 ± 1 °C). After drying, the samples were homogenized using a pestle and mortar and sieved through a 1-mm sieve to ensure sample homogeneity.   [33].
Samples of sediments ( 100 mg each) were dried to constant mass at 100 °C and wrapped in polyetheneoraluminium foils for short and long-term irradiations, respectively.
For the determination of short-lived isotopes, samples were irradiated for 60 s and measured for 15 min. For quantification of long-lived isotopes, samples were irradiated for 4 days, repacked and measured twice, after 4-5 days and after 20-23 days, with acquisitionlasting for 30 min and 1.5 h, respectively. The nuclides of interest were determined by gamma rays spectroscopy using a high-resolution germanium detector (Canberra GC5519/7500SL). Data processing and determination of concentrations of the studied elementswere performed using software developed in FLNP JINR [35].
where A sample and A stand are activities measured in samples and standards, C sample and C stand are corresponding concentrations.
The reference material for irradiation is prepared using the same procedure as the samples. CRM is used for both quality assurance and as a standard (calibrator) in INAA relative method. The chemical matrix effect is known to be significant sources of error in many instrumental chemical analysis, but it is insignificant in INAA 31.

Determination of total organic carbon (TOC)
TOC was determined by a semi-quantitative method which involves thermal destruction of all organic matter in the samples. Sample of known weights (previously heated at 100°C) was placed in a ceramic vessel and heated at 400 °C until the constant weight (2h). Then, samples were cooled in a desiccator and weighted at analytical balance. TOC is calculated as the difference between the initial and final sample weight [36].

Multivariate analysis
Multivariate analysis and chemometric methods were applied to better understand the spatial distribution of the quantified elements and to highlight hidden relationship between variables, i.e. elements and sample locations. A data matrix, with locations presented in rows and elements as descriptors in columns, was built for further multivariate analysis [37]. At the begining, the Ryan-Joiner test was applied to examine whether or not the experimental data follow a normal distribution. It is important to mention that variables (features) must be standardized (transformed) for further processing because of different units and ranges of measurements. The mean value and standard deviation are frequently used for this purpose and each value (x i ) is replaced by the corresponding z i value using the expression: where z i is the autoscaled value of x i and SD is the standard deviation of variable x.
Then, a data matrix, with locations presented in rows and autoscaled values of elements' concentration as descriptors in columns, was built for further multivariate analysis. The obtained data set was subjected to correlation analysis. Exploratory approach of Factor analysis was applied.

Factor analysis
Factor analysis is one of the often-used methods for analyzing multivariate data. It explains observed multivariate data in terms of the linear relationships between a smaller number of unobserved variables, called latent variables or factors. Factor Analysis is based on an assumption that these factors exist and that their number is exactly known, so the method is reliable only if the modeled system is well understood. Contrary to this, principal component analysis does not rely on such assumptions about covariance matrix and uses linear combinations of the observed variables. We employed both methods to cross check the produced results [37].
An initial statistics of the correlation matrix was done using Eigen analysis. Eigen values are calculated according to the expression: where R is the correlation matrix, e are eigen vectors and λ are eigen values.
The eigen values are measure of the extracted variance from the total feature variance s2 total. The sum of the eigenvalues is equal to the number of features (variables). The total number of features describes the variance of the full data set (numerical patternsloadings).
We gave an advantage to Kaiser criterion over Catell's and retained only the first components with subsequent eigenvalues less than one since they accounted for a high percentage of the determinable variance.
After extraction of the factors, the obtained matrix will show the significant loadings in each factor and present the combination of variables. That means that principal component analysis (PCA) transformed a set of possibly correlated random variables into a set of uncorrelated variables, called principal components. Using PCA, the reduction of the data set is realized by transforming the data into orthogonal components that are linear combinations of the original variables. This way, by compression, PCA reduces the complexity of high-dimensional data while retaining its patterns. Furthermore, when the obtained matrix contained too many medium factor loadings it appeared to be dificult to interprete the solution of the factors. That's why we applied a rotation of the coordinate system of factors since it does not affect the position of the objects (locations) relative to each other, but will simplify the structure of the factors. Among several available numeric transformation algorithms we selected varimax because it minimized the number of variables with high absolute values of factor loadings.

Cluster analysis
Another unsupervised learning method that belongs to Factorial methods is cluster analysis (CA) [37]. This method is very powerful in visualizing structural similarities (groups, classes) in the data and can be seen as a pattern cognition method.
For finding structures in a data set, there is a necessity for similarity (or distance). This is derived from geometry. The Euclidean distance of any two objects A and B is calculated according to expression: where m is the number of features (variables) and d(i,k) is the Euclidean distance of any two objects A and B. In the obtained distance matrix the correlation coefficients appear as a measure of similarities of each pair of features, instead of distances between objects. In selecting strategy, hierarchical techinque was applied because the typical output of a hierarchical clustering method is dendrogram, as very helpfull in interpreatation of our results. As the mode of the agglomerative hierarchical method the Ward linkage was selected. Thirty-six elements were determined in the analyzed sediments of the River Danube and three tributaries applying INAA. All measurement data including the descriptive statistic results (mean concentration and standard deviation) are given in Table S1 in the supplementary material. The concentrations of studied elements in the monitoring sediments are very variable between the sample locations with relative standard deviation from 12.5%

Major, minor and trace elements composition in the surface and deep sediments
The most abound metals in the sediments that we found are: Al (4.9 -9.9%), Ca (2.3-9.9%), Concentrations of all other measured elements are below 0.1% (minor and trace elements, Table S1(b) in the supplementary material). Spatial distribution of the elements with variability within groups higher than 30% is presented at Fig. 2. The extremely high concentration of copper, higher than remediation value prescribed by national low (Table S2 in the supplementary material) [39], was found in the sediments of river Pek (13-Pek). This

Composition of the studied lanthanides in the surface and deep sediments
Spatial distribution of lanthanides in the investigated sediments is given in Fig. S1 in the the supplementary material. The concentration range of lanthanides is brode and ranges from 0.3 to 71 mg kg -1 . Based on the descriptive statistic of the investigated lanthanides (Table S1 (c) the most uniformly distributed elements in the studied samples with RSD 10.5% and 12%, respectively. Contrary, Nd, Sm and Dy are the elements with the highest variability in the studied sediments with RSD > 20%.
Given the large difference in Ln concentrations in nature compartment, a pattern of the rare earth elements is usually described by normalizing these elements concentrations in the sample to those of the crustal abundance of the earth. Fig. 3 shows spatial distribution of the studied lanthanides depicted as relative concentration against either Upper crust concentrations taken from Taylor and McLennan [39] or concentration in sample 3-SmedRB.
The second approach is based on an assumption that sample 3-SmedRB is unpolluted because was dig on 7.2 -7.3 m deep, so the ratio of C sediment. /C 3-SmedRB was used to highlight the possibility of anthropological influence. The spatial distribution of Ln normalized against upper crust concentration ranges from 0.6 to 2.6 ( Fig. 3a). Similar distribution was found for almost all Ln in the sediments. The normalized values > 1.5 were found for Nd, Eu, Tb and Dy in surface sediments located in Iron Gate which is an indication of the positive anomaly for these elements. Mean normalized values of sample 3-SmedRB is 1.12 ± 0.35. Because of that, we choose this sample as pristine and in further work applied for normalization of other samples.

Comparison of Figs. 3 (a) and (b) it is clear that the spatial distribution of Ln differs in the
case of different applied reference values. In the second approach (Fig. 3b), high homogeneity for almost all samples was obtained. The largest differences were observed for Nd and Dy, even 3 times higher for the sample 6-VelGrad (Fig 3b). It is obvious that the second approach, which resulted in much better outcome in terms of uniformity and clarity due to similar matrix composition, confirms that potentially unpolluted sample of river bad (sampled at 7m) is a better choice for matrix match.

Chemometric evaluation of total pool of measurements for all elements
Given that environmental data are strongly characterized by inherent variability. the nature of the study objectives in this work is of multivariate nature, in following part we appled multivariate statistical analysis approach. PCA and CA have been applied to assess the level of heavy metals in sediments [40,41], soil [42], etc. PCA is used to reduce dataset to a small number of independent components for analyzing relationships among the observed variables. CA has been applied to identify different geochemical groups, clustering the samples on the basis of the similarities of their chemical properties [43].
The data set of concentration measurements was subjected to a PCA in order to decrease the number of descriptors responsible for the highest percentage of the total variance of the experimental data. The reduction of high dimensional feature space, to one which can be explained with fewer variables, is important to highlight the significant correlations, usually hidden in the original dataset. When PCA was applied to the autoscaled data matrix, with Eigen analysis as an initial (Table S4 and Fig. S2 in the supplementary material), six Principal Components (PCs) were extracted according to the Kaiser criterion which explains upto 93.80% of variance. Thus, the true dimensionality of the descriptorspace is six. As a result, we have got a space that can be described with six factors. Features with high positive or negative loadings essentially determine the factor. Since there were too many medium factor loadings in the first (unrotated) matrix for a solution of the factor, the mathematical rotation appeared as necessary to simplify the structure of the factors for a better interpretation. We applied the Varimax rotation (Table S5 in the supplementary material).
PCA (Fig. 4a) (Table S1 in the supplementary material). Location 13-Pek is already discussed concerning the extremely high concentration of Cu due to the vicinity of the copper mining site. In certain cases of chemometric evaluation, such extreme values are being discarded as outliers to enable further modeling, but in this work, we retained it based on knowledge of its source.

Chemometric evaluation of the studied lanthanides
In order to get a better insight into the lanthanides profile in analyzed samples, the dataset was subjected to the second round of PCA. Score plot of the studied lanthanides is given in   Considering the wide application of lanthanides as technological critical elements and ecological importance which is still not well estimated, further work should be a focus on periodic monitoring of these elements in surface sediments as well as corresponding river water to deeper understand their sources and fate in environment. It is necessary to emphasize the importance of a large multi-element approach in the study of lanthanides. Having reliable data on the content of heavy metals that are much better studied, it is possible to more correctly identify the sources of lanthanides. OAC was involved in the experiments and manuscript writing. RB and SR were responsible for the data analysis. AP collected samples. TTP designed the study and corrected of the manuscript.

List of abbreviations
All authors read and approved the final manuscript.