Application of a Multivariate Exploratory Analysis Technique in the Study of Dissolved Organic Matter and Metal Ions in Waters from the Eastern Quadrilátero Ferrífero , Brazil

Amostras de água foram coletadas em 10 pontos em trechos do leste do Quadrilátero Ferrífero (QF), que é uma região mineira situada no sudeste do Brasil. Os objetivos deste estudo foram encontrar possíveis relações entre carbono orgânico dissolvido (COD), metais e outros parâmetros físico-químicos medidos utilizando a rede neural de Kohonen como ferramenta para analisar esses dados geoquímicos multivariados na área estudada. As análises físico-químicas foram feitas in situ e em laboratório, onde as concentrações de COD e vários íons metálicos foram determinadas. A rede de Kohonen permitiu a visualização e interpretação mais amigáveis dos dados, além de definir relações entre eles. Assim, para os dados analisados, foi verificada relação entre COD e Fe e um possível efeito da sazonalidade na distribuição das amostras. Possíveis evidências litológicas puderam ser detectadas pela análise exploratória, especialmente se considerados os elementos Ca, Mg, Mn e Sr.


Introduction
The Quadrilátero Ferrífero (QF) is a geological structure in the southeast of Brazil that is worldwide known for its mineral deposits. 1It covers an area of approximately 7000 km 2 in the Brazilian state of Minas Gerais and constitutes a southern extension of the Espinhaço Mountain Range, in the south-eastern part of the São Francisco Craton. 1,2In this region, iron and gold are the dominant products in the mining area, along with aluminium and topaz.It is remarkable that the QF has become the most important gold producer in the late seventeenth century, with a total production that probably exceeded 1300 t in history. 3,4he basal unit and surrounding areas of the QF are composed primarily of granitic gneisses.Above this basement, there are three units of supracrustal metasedimentary rocks called Rio das Velhas Supergroup, Minas Supergroup and Itacolomi Group.The Rio das Velhas Supergroup, considered as an Archean greenstone belt, is composed of phyllites, schists and volcanic Vol. 25, No. 2, 2014   metasediments, which tends to release major elements such as Na, K, Ca, Mg, Mn and Fe, and trace elements such as Ni, Cr, Co and V.The Minas Supergroup comprises Proterozoic metasediments, which are source of Fe, Mn, Ca and Mg.The Itacolomi Group consists predominantly of quartzitic rocks. 1,2ne of the consequences of mining and associated activities is the alteration of the elements cycle in the environment, which influences the availability of metals to organisms.This is exactly what occurs along the Quadrilátero Ferrífero, where the exploitation of iron ore, for instance, acts as an important source of major elements such as Fe and Mn, and trace metals.In addition, it is important to consider that gold mining can release As and Cu, which are present in minerals such as iron sulfides (pyrite, FeS 2 ), copper sulfides (chalcopyrite, CuFeS 2 ) and arsenic sulfides (arsenopyrite, FeAsS), often in paragenesis with gold. 2,5n the aquatic environment, the dissolved organic carbon (DOC) is considered a regulator of biotic and abiotic processes.It is operationally defined as the fraction of organic material that passes through a 0.45 µm filter. 6he dissolved organic matter is a vital resource that will affect food webs either directly, by its use via organisms, or indirectly via mechanisms such as turbidity, pH and contaminant transportation. 6,7The DOC has the ability to decrease the toxicity of many metals while it reduces the availability of these elements to organisms by means of chemical bonds.This has particular importance given that the QF is a region rich in minerals and suffers from the impacts caused by mining.The variety of minerals is able to release a range of elements in the water bodies where they interact with dissolved organic matter, especially humic substances (HS), which constitute about 80% of the DOC in natural waters. 8Considering that the concentration of dissolved metal ions and organic material can influence the formation of metal-HS complexes, it is important to identify possible interactions between these chemicals and the organic material. 9ue to the large number of variables generally analysed in environmental geochemistry studies, techniques of exploratory data analysis have been shown to be effective in identifying patterns in a group of data, facilitating the interpretation of results. 10An example of a tool developed for multivariate exploratory analysis is the Kohonen neural network (Self-Organising Maps, SOM), which is an artificial intelligence technique that has the ability to project high dimensional data in a space of lower dimension, without loss of the original information.It was developed by Teuvo Kohonen (Finland) and has a close relationship with the organisation of the cerebral cortex. 11 important advantage of this tool is the ease of viewing and interpreting the data. 12To summarise, the Kohonen neural network comprises self-organising maps, which are formed by neurons arranged in a two-dimensional array.In fact, this is one of the main advantages of the Kohonen neural network, explicitly, the possibility of getting all data information (relationship between samples, variables and the influence of variables in samples) in a two-dimensional array.In addition, this characteristic is also one of the main advantages of this method compared with other exploratory approaches, such as Principal Component Analysis (PCA), where in most cases it is necessary to work with multidimensional spaces (more than twodimensional arrays) provided by the principal components (PC).][13] In the Kohonen neural network, it is assumed that all the samples placed at the same neuron are similar to each other according to the aspect analysed.Another important attribute of this technique is the formation of clusters of samples that are considered to possess the same characteristics, because of its location in nearby neurons (neighbouring neurons). 12 picture representing the typical architecture of the Kohonen neural network is shown in Figure 1, where the neurons are represented by columns, or tubes arranged inside a box.In this type of representation, if a specific input data has n samples, n input vectors x will be obtained.These vectors x may be absorbance values of a spectrum, peaks of a chromatogram or intensity values of different physicochemical parameters of the water quality.It is important to note that the dimensionality of the vector is dependent upon the number of variables in the data, which means that the amount of weights of each neuron (w) will correspond to the number of vector elements of the input data.The Kohonen network was not performed before in the study of metal ions and DOC in surface waters of Quadrilátero Ferrífero.Furthermore, this kind of study in the specific evaluated area was not found in scientific literature yet.Considering similar studies worldwide, it is possible to find successful applications of the SOM technique in multivariate data analysis. 13,14][17] The present work was an initial study that aimed to investigate the levels of dissolved organic matter in some brown-coloured water bodies and its probable relationships with metal ions and other parameters measured in the eastern part of the QF.The application of a multivariate exploratory analysis for this kind of study in the region is an innovative to visualise the results and the relationships among samples and its variables in an easy and effective way

Sampling and preliminary analyses
Ten sampling points were selected in upper Rio Doce River Basin based on the accessibility and on visual inspection of some water bodies.Areas with browncoloured waters were preferable in this process because they could indicate higher levels of dissolved organic matter.The study area comprised parts of the eastern QF and is shown on the map in Figure S1 (Supplementary Information).The physico-chemical parameters pH, temperature (T), total dissolved solids (TDS), redox potential (ORP), resistivity (Resis), conductivity (Cond) and turbidity (Turb) were evaluated in situ using a multiparameter equipment (Ultrameter II, Myron L Company) and a turbidimeter (DM-TU Digimed) previously calibrated.
It is important to note that in the studied area, Cwa and Cwb climates occur. 18,19Both climates are characterised by a dry winter.In the area where the Cwa climate occurs, the dry period is between April and September and the rainiest months are November and December.The areas where the Cwb climate occurs, the dry season is between May and August and the rainfalls are concentrated mainly between November and February.For this reason, more than one sampling at different times was performed for some points in order to evaluate possible influences of seasonality.However, there was a problem in one sampling of point 1 because the swamp was totally dry in one winter field trip.Consequently, it was not possible to collect water for analyses at that time.From the data available in the literature, 18,19 the dry season in this work was considered between April and September and the rainy season between October and March.About 1 L of water was collected in accessible areas close to the water bodies (banks or on bridges) for the determination of sulfate, chloride (Cl) and alkalinity (Alc).The samples were kept refrigerated until laboratory analyses.The methodology used was based on standard methods proposed by the American Public Health Association (APHA). 20About 40 mL of water was collected for analyses of metals and about 20 mL for analyses of DOC.These samples were filtered through membranes of 0.45 µm and kept refrigerated at 4 ºC until analysis in the laboratory.As described by Grasshoff, Kremling and Ehrhardt, 21 the storage of samples in plastic containers can cause interference in the results of carbon analyses.Therefore, it was chosen to keep the waters collected in amber glass bottles to avoid any changes by means of light in the humic material.After filtering the samples for analysis of metals, they were acidified by adding 3-4 drops of concentrated HNO 3 to keep the metals in solution.All reagents used in this work were of analytical grade.

Metal analyses
The analyses of metals were performed in the Laboratory of Environmental Geochemistry (LGqA) at Federal University of Ouro Preto (UFOP).The metals were analysed by inductively coupled plasma optical emission spectrometry (ICP-OES Spectro / Ciros model CCD) in radial mode.Output power of the generator was 1250 W, the pumping rate was 2 mL min −1 , the gas flow of the plasma was 12 L min −1 and the gas flow of the nebulizer was 0.90 L min −1 .In all cases, argon was used as gas.
The calibration was performed in all cases using standard stock solutions with analytical purity grade and was evaluated by means of international reference material NIST 1643c.The elements determined were the major metals Al, Ca, Fe, K, Mg, Mn, Na and Ti; and the trace metals As, Ba, Be, Cd, Co, Cr, Cu, Li, Mo, Ni, P, Pb, S, Sc, Sr, V, Y and Zn.

DOC analyses
The analyses of DOC were performed in Niterói City at Federal Fluminense University (UFF, Brazil) with a TOC-Analyser V-CPH (Shimadzu, Japan).The method involved the determination of the total dissolved carbon (TDC) and the dissolved inorganic carbon (DIC) of the samples whereas the DOC was obtained by calculating the difference between the two values (TDC -DIC).For the determination of the TDC, the samples were introduced into a combustion tube, which was filled with an oxidation catalyst and heated to 680 ºC.Thereby, all components of the TDC are converted into CO 2 , which is detected by a cell of non-dispersive infrared (NDIR) in the end of the procedure.
The DIC was measured by the same equipment after acidification of the samples using HCl to a pH less than 3.At this point, all the carbonates were converted to CO 2 .At the end of the process, all CO 2 was volatilised by bubbling air or nitrogen gas and detected by the NDIR cell.

Multivariate analyses
The technique used for exploratory analysis in this work was the Kohonen neural network.The aim of this method was to reduce the number of dimensions to be analysed and preserve the relevant original information in order to facilitate the observation and interpretation of the results.Some of the metal contents determined had to be excluded from Kohonen analysis because their values were below the limit of quantification (LOQ).Therefore, the data set was organised into a matrix of 16 samples (16 lines) and 19 variables (19 columns).The samples represent the sampling points and the variables represent pH, DOC, Cond, Alc, ORP, T, Turb, Resis, TDS, Cl, Ba, Ca, Fe, K, Mg, Mn, Na, S and Sr values.
Before processing the data by the SOM algorithm, the entire data set was autoscaled for all variables, which means that the variance of the variables were normalised and the averages calculated to zero.The scaling of the variables is of vital importance in the application of Kohonen network, because its algorithm uses the Euclidean metric to measure distances between vectors.If a variable has values ranging between 0 and 1000 and another variable has values ranging between 0 and 1, for instance, the first will virtually dominate the organisation of the map because of the large impact on the measurement of distances.Hence, in most cases it is recommended that the variables are equally important.The pre-processing of data ensures that all variables have the same level of importance, allowing users to assess the significance of all variables in the samples.Since the variables investigated in this work refer to different physical and chemical measurements, the scaling of the data becomes obligatory. 12he Kohonen maps were created and initialised linearly.In this process, the eigenvalues and eigenvectors of the data were calculated.Then, the weight vectors of the map have been initialised over the largest eigenvectors of the covariance matrix in agreement with the size of the map, which is generally 2. The Kohonen neural network was trained with the data using the batch training algorithm, where the entire data set is presented to the map before any adjustment of weights is done.The neighbourhood function used in training was the Gaussian, the structure was hexagonal and shape of the map was planar. 12t the end of the process, a map was obtained that shows the grouping of the samples and the influence of the variables.The lighter colours in the neurons indicate higher values for that variable.The darker colours represent the lowest values for the same variable.It is important to mention that the neurons of the map of the variables were compared with the neurons of the map of groups of samples to evaluate which parameters are influencing a given sample.During the data training, architectures with several orders were tested (from 2 × 2 to 6 × 6) for evaluation of the groups of samples and it was chosen the architecture that had the best sample distribution in groups (which was more informative).
The software used to perform the multivariate analysis of Kohonen neural network was freely available on the internet. 22or comparison, a PCA analysis was also performed using the same set of data through the computing environment GNU Octave 3.6.4,freely available on the internet at page http://www.gnu.org/software/octave/;before data processing, the entire data set was autoscaled for all variables.

Results and Discussion
The location of the points, season and types of water bodies where the samplings were performed are shown in Table 1.From some lithological data and maps of the region, another table (Table 2) was created.This table shows the stratigraphic units, rocks in the region of the sampling points and some elements likely present in the waters sampled.It is important to consider that the points 9 and 10 were collected in an area of environmental protection (Private Reserve of Natural Heritage of Caraça).The results of the parameters measured in the field and in the laboratory are shown in Table 3.
A Kohonen neural network with hexagonal grids was obtained after performing the multivariate analysis from the data set in Table 3. Architectures of several orders were evaluated (from 2 × 2 to 6 × 6) and the arrangement 5 × 5 with 25 neurons had the best sample distribution in the map.In addition, after the Kohonen analysis, it was possible to notice the formation of 4 different groups that were circled as showed in the Figure 2. Furthermore, it is important to mention that samples located at the same neuron or at neighbouring neurons form groups with similar characteristics.The map of the variables is shown in Figure 3, where the grayscale bars beside the maps indicate the intensity of each parameter evaluated.The lighter colours in these bars mean higher values and a higher importance in the formation of the groups for each variable.
It can be noted from Figure 2 and Figure 3 that K, temperature and Cl were the parameters responsible for making the samples S1B, S3A and S5 get closer and, therefore, form group I.These parameters were important because they had higher values for the samples of group I considering the data obtained in this exploratory study.
Although Cl was not measured for the sample S3A, the SOM algorithm estimates missing values during its training process.In this way, it is possible to infer about the behaviour of missing values. 25otassium is a lithophile element that participates in the formation of silicates, feldspars and micas (biotite and muscovite), which are mineral constituents of rocks as gneisses and schist's.Muscovite and biotite (that have K in their structure) are still part of the composition of quartzite rocks.Consequently, evaluating the lithotype of the region studied (Table 2) it can be noticed that the presence of K in the analyses is an indication of lithological contribution of this element for the waters in the studied area.The higher values detected for this element in group I may be explained by the fact that sampling of the samples S1B, S3A and S5 were performed in the rainy season (Table 1), where K is more leached by high precipitation.
In the environment, Cl can originate both from the weathering of rocks and by man's influence via sewage discharges. 26In all samples, the concentrations of Cl were very low, ranging from 0.50 mg L −1 to 4.66 mg L −1 .As the rocks of the region do not have Cl in their composition, probably the source of Cl can be atmospheric and/or from plants and animals (biogenic origin).It is important to note that the samples S1B and S8 may have some anthropogenic influence due to the proximity of a village (Santa Rita Durão).The breeding could influence the sample S2B because this type of activity is common in the region and  it could be seen during the field trips.These three points were the ones with the highest concentrations of Cl, with values above 4 mg L −1 .
The summer in the southern hemisphere (December to March) corresponds to the period where higher volumes of precipitation are recorded considering the studied area. 19onsequently, Cl and K can be more leached by water from rainfalls and showed higher concentrations.This explains higher values in group I along with temperature.The second sampling of point 3 (S3B) plots in group II because the sampling was done in the dry season (June 2011, Table 1).From Figure 3, it can be noticed that the variables DOC and Fe indicated higher concentrations at neurons in the same location.These higher values influenced the positions of the samples in group II and suggest that Fe has positive relationship with DOC considering the samples analysed in this paper.This observation could indicate a complexation between these variables, since Fe can effectively bind organic matter, especially humic substances. 27,28Although a mining region, some elements that were expected to be present in larger quantities were found below the LOQ (e.g., As and Cu, with LOQ values of 57.7 µg L −1 and 4 µg L −1 , respectively).An explanation for this fact could be that the waters were collected in areas of environmental protection (as in the case of points 9 and 10) without any evident anthropogenic influence.Consequently, it was not possible to indicate if there are relationships among these elements and dissolved organic matter in the samples using the Kohonen neural network.
The values for DOC in the samples ranged from 0.72 mg L −1 to 3.88 mg L −1 .The highest values for DOC were detected in a swamp (sample S3B) and in a stream (sample S9A), located in Itatiaia Mountain Range and Caraça Mountain Range, respectively.The lowest value was detected in Brumado Stream (sample S8).
During the second sampling in Caraça (sample S9B) the concentration of DOC obtained was only 2.2 mg L −1 , much lower compared to the first sampling, where it was 3.83 mg L −1 .A possible explanation for this observation could be the sampling at the end of the dry season after more than 4 months with low average volumes of precipitation, around 36 and 42 mm month −1 . 19As a result, the water level in aquifers and the amount of water in soils probably were low at that time (base-flow situation).Therefore, a smaller amount of humic material was transported by the direct reaction of rainfalls into the water bodies of the region (surface and interflow).Therefore, the DOC concentration was lower, especially if considering that the main source of the DOC at that time was probably the groundwater, which has lower concentrations of dissolved organic matter compared to the top-most soils. 29On the other hand, the first sampling was performed at the end of the rainiest months with volumes of precipitation exceeding 210 mm month −1 , mainly between November and February. 19Consequently, the amount of water available was higher and a greater concentration of humic material was leached into streams due to a higher volume of water at that time.It is important to consider that in base-flow situations, the concentration of DOC decreases and other factors may also have influenced the results. 29ooking at point 1 (samples S1A and S1B) it can be seen (Table 3) that the levels of dissolved organic matter were higher (sample S1A) in the beginning of the rainy season compared to the values at the end of the rainy season (sample S1B).A hypothesis to explain this behaviour could be the effect of dilution after the water body dries out in the dry season, as seen in the third field trip to this point.In this way, a large amount of DOC probably is carried into the swamp during first rainfalls (in the beginning of the rainy season).This organic matter could be originated from the degradation of living organisms in the dry winter period.After the rainiest months between November and February, the DOC concentration decreases by dilution effect.In addition, the organic matter from the dry season was already almost completely decomposed at that time.This would explain the decrease of DOC at point 1 in the end of the rainy season.At this point it is important to consider that Steinberg 29 describes that in the onset of the rainy season the concentration of DOC rises rapidly with the discharge. 25roup II was formed by having higher values of DOC, Fe, Resis and ORP.At point 1 (especially for sample S1A), the presence of banded iron formations (BIF), which are covered frequently by canga (the Brazilian name for a ferruginous breccia surface formation, consisting of fragments of hematite, cemented by goethite), phyllites and schist's explain the higher Fe contents.However, it is important to note that the content of Fe was lower in the sample S1B, what could be explained by the dilution effect as previously explained.The sample S2C was in this cluster due to a high redox value, which was the dominant factor for the composition of group II.
All samples from group III had waters more alkaline than other groups.Therefore, it was possible to affirm that the pH was the predominant variable for its formation (Sr was also important for the formation of this cluster).In addition, for being a large group there is some heterogeneity among some of its samples.The sample S6, for instance, was highlighted by a water more alkaline and containing higher concentrations of S. The highest value for this element is probably due to the presence of sulfide rocks upstream of this point.The highest alkalinity may be explained by increased concentrations of carbonate, which is evidenced by the presence of Ca and Mg.These ions may originate from dolomite rocks, which are part of the lithology units of Piracicaba and Itabira.
Considering only the samples S6 and S7, it can be observed that they had higher concentrations of Ca, Mg and Mn (lighter colours in the bottom right neurons of these three variables in Figure 3).These values may indicate a common lithological origin, especially because Ca, Mg, and Mn are lithophile elements that participate in the formation of dolomites and schists.These kind of rocks are present upstream of these two sampling sites (S6 and S7).It is remarkable that these samples had higher values of Na, which along with Ca, Mg and Mn were responsible for increased levels of TDS.All of these five variables were responsible for the fact that samples S8 and S2A were far away from samples S6 and S7 in a same group.
The higher levels of Sr were observed in the cluster formed by the samples S4A, S4B, S6 and S7.Sr has similar chemical properties like Ca and Mg and can replace both elements within their minerals.Consequently, the presence of dolomite in the lithological groups of Itabira and Piracicaba and of phyllites and schists in the Nova Lima and Itacolomi lithological groups could explain the presence of Sr in the collected samples.For this reason, the relationship observed among Ca, Mg and Sr may be an indicative of their lithological origin.
Finally, group IV was formed due to higher concentrations of Ba found at point S2B.The highest values of this element can be explained due to the increased leaching at the rainy season, which was the period of sampling (March).The concentrations of Ba decreased considerably in the samplings performed in the dry season or during the beginning of the rainy season, as explained for other parameters before.The temperature also showed a higher value at this point, which is explained by the sampling done in the summer time.
Considering the seasons of the year, it may be noted that all samples of group I were collected in the rainy season (summer), which presented higher temperatures and higher levels of Cl and K (probably as a result of leaching), as shown by the lighter colours in Figure 3 and discussed before.Most samples in group II were collected in the dry period (winter).Hence, lower temperatures (darker colours in Figure 3) were observed in this group if it is compared to others groups.Group III is heterogeneous and samples were collected during the rainy and dry seasons.Group IV refers to the rainy period, where the sample S2B had higher temperatures and higher concentrations of Ba, as mentioned before.
No relationship was observed between the formation of clusters and the type of water body (swamp, lake, stream or river).This can be shown by groups II and III, where samples were taken from rivers, streams and lakes and plotted within the same group.
To evaluate a possible influence of sulfate concentration on the DOC content in this exploratory study, sulfate was analysed from seven selected samples, since it is known that S is a compositional part of the humic substances (Figure 4). 30However, this study was unable to demonstrate an influence due to the low number of analyses.As sulfate was not measured for most samples, it was not included in the SOM analysis.
Part of the sulfate in the waters of the QF comes from the oxidation of sulfides such as pyrite (FeS 2 ), which are abundant in rocks such as schist's and amphibolites in the region.Among the anthropogenic sources of sulfate in surface waters, the discharge of domestic and industrial effluents and the use of coagulants in treated waters are known. 26or comparison and to validate and show the important applicability of Kohonen neural network in this study, a PCA analysis of the same data set (Table 3) was performed.The results of the PCA analysis is shown in Figure 5 and Figure 6, which represent the scores and loadings plots of PC1×PC2.
It was necessary to use 11 PC to explain 99.31% of the data set variability with the PCA approach and the first and second components explained only just 56.5% of variance (Figure 5 and Figure 6).To analyse all information in the data, it would necessary to observe the principal components in all possible combinations, that could be in a two or three-dimensional way.This certainly would be a hard task, and would make data evaluation and, consequently, data interpretation very difficult.In this way, the Kohonen neural network exhibited the great advantage of projecting all data in a two-dimensional space without loss of information.
Figure 5 shows the samples separated in seven groups.It can be seen that the samples of the groups IV, V and VI were located in the same quadrant (left bottom).The same samples formed the group II in the Kohonen analysis (Figure 2) showing a similarity between the results of the two methods.The samples S9A, S9B and S10 are quite near in both methods indicating similarities among them.The group I in the PCA scores plot (Figure 5) is composed by the same sample (S2B) of the group IV in the Kohonen map (Figure 2).These similarities between both methods show the ability of Kohonen network to explore the data with robustness and reliability.Figures 2 and 5 are not able to present exactly the same relationships because only little more than half of the data variability is presented with the PCA approach (Figure 5).
In the loadings plot (Figure 6), it can be observed that Fe and DOC are near indicating some similarity between them, although they are presented in different quadrants.The argument for the scores plot is valid here, since all variance in the data set is not represented.The   relationships among DOC, Fe, ORP and Resis in Kohonen map (Figure 3) are also demonstrated in the loadings plot (Figure 6).
In general, the main tendencies (greater similarities or differences among samples or variables, and the influences of variables in samples) of the data set were shown by both the Kohonen neural network and the PCA methods, although the PCA was not able to present all data variance while Kohonen neural network expressed it very efficiently in a two-dimensional space.

Conclusions
The multivariate exploratory data analysis by the application of Kohonen neural network was effective in this study, especially considering an easy and friendly data interpretation, with complex nonlinear relationships.Furthermore, this technique allowed a separation of all samples into groups with similar characteristics while reducing a high dimensional space to a two-dimensional space.It was one of the main advantages of this technique when compared with PCA, which was applied to the same data set.In the latter method, it was necessary to work in a multidimensional space (11 PC) making the analysis of the results difficult.Nevertheless, both methods showed similar relationships of the data set.
A positive relationship between DOC and Fe was noted in the Kohonen neural network.This observation possibly indicates a complexation between both variables as described in the literature.A certain influence of seasonality on the distribution of samples could be noticed considering that some in the Kohonen map were formed as a result of their sampling date (rainy or dry period).In addition, some samples that were collected at the same point in different seasons stayed in different groups due to the effects of leaching or dilution.Some relationships among some elements in the Kohonen neural network indicated a contribution from the lithology of the studied area as it can be found by the elements Ca, Mg, Mn and Sr in the maps of the distribution of variables.Chloride may be partly also from biogenic origin since the rocks in the region studied do not have Cl in their structure.
Further studies will be necessary to measure and characterise the dissolved organic matter in the area and to fully understand its role on the cycle of elements in the QF, especially considering the impacts of mining.However, in this study it was possible to perform a screening of the evaluated area, particularly considering the concentrations of DOC and of some metal ions.In addition, the use of the Kohonen neural network for the first time with chemical data of this studied area showed that it certainly is a promising technique that may help to analyse a variety of environmental results with complex interdependencies in an easy way.

Figure 1 .
Figure 1.Representation of the typical architecture of the Kohonen neural network.

Figure 2 .
Figure 2. Map of groups of samples (natural waters) obtained by Kohonen neural network.

Figure 3 .
Figure 3. Maps of the distribution of individual variables obtained by Kohonen neural network.The colour bars indicate the intensity of the measured variable: the lighter the colour, the more intense the variable value.

Figure 4 .
Figure 4. Concentration of sulfate in selected water samples from the Quadrilátero Ferrífero.

Figure 5 .
Figure 5. Scores plot on PC1 and PC2 in the study of dissolved organic matter and metal ions in waters from the eastern Quadrilátero Ferrífero, Brazil.

Figure 6 .
Figure 6.Loadings plot on PC1 and PC2 in the study of dissolved organic matter and metal ions in waters from the eastern Quadrilátero Ferrífero, Brazil.

Table 1 .
Location, type of water body and period of sampling in the studied area

Table 2 .
23,24igraphic units and lithology of the studied area; geological data from Dorr, Alkmim and Marshak23,24

Table 3 .
Results of the physico-chemical parameters and metals analysed in natural waters of the upper Rio Doce River basin(Quadrilátero Ferrífero) a Standard deviation calculated by replicate analyses was less than 10%; b Given in mg CaCO 3 L −1 ; c ND: Not determined; d LOQ: Limit of quantification; e DOC: dissolved organic carbon; f Alc: alkalinity; g Cond: conductivity; h ORP: redox potential; i Turb: turbidity; j Resis: resistivity; k TDS: total dissolved solids.