Combining indicators for better decisions – Algorithms vs experts on lakes ecological status assessment

The results of ecological condition assessments of ecosystems are related to key decisions taken for the purpose of remedial measures or maintaining their current state. In the assessment process, experts come across extensive datasets, the quality, and completeness of which do not always allow for a reliable evaluation, especially if a single empirical approach is used. In this paper, results of machine learning algorithms are presented, with a focus on Self-Organizing Maps. In this context, measurements of component parameters for the assessment of the ecological state of lake ecosystems were subjected to the process of unsupervised machine learning with the aim to create an alternative assessment approach based on the capabilities of neural networks. Results are mapped and compared with expert evaluations, allowing to extend knowledge about sub-clusters present in the data. The primary target of this paper is the ecological assessment expert. At an early stage, information was obtained about the presence of ecological outliers that may be subject to separate monitoring or verification of environmental activities and objectives. In the back-mapping process, the presented technique of map construction and clustering with various versions of the division was referenced to a set of expert classification findings, revealing the underlying structure of the results when addressed with an unsupervised data comprehension. The approach introduced here does not intend to interfere with the format of an original assessment methodology. Rather it aims at obtaining useful additional information which may help in making better decisions.


Introduction
Maintaining and enhancing the ecological status of water ecosystems is a particularly difficult issue faced by policymakers today (Derakhshannia et al., 2020;Fattahi Nafchi et al., 2021;Geist and Hawkins, 2016;Ostad-Ali-Askari and Shayannejad, 2021;Saarikoski et al., 2018). One of the main concerns of applied ecology is of an empirical nature. At the same time, sufficiently large datasets are becoming increasingly available in many cases (Peters et al., 2014;Zhang and Chen, 2017). Researchers and government agencies need access to a variety of tools that are best suited for analyzing these critical datasets. Many of the difficulties related to lake water quality may be solved using complicated, time-consuming and costly physical models of the environment (Cosgrove and Loucks, 2015;Ostad-Ali-Askari et al., 2017;Salehi-Hafshejani et al., 2019). As data scientists become more involved in the study of environmental concerns, machine learning models may increasingly have a role to play not just in ecological state modelling, but also in the development of density and point projections. This is very important when engaging in a process of knowledge based decisionmaking (Blair et al., 2019;Hadjimichael et al., 2016;Kanevski et al., 2004). Whilst successful ecological modelling does not require the use of structural models, it necessitates the completion of data pre-processing that underpins reduced-form, robust methods (Alamelu Mangai and Gulyani, 2020;Chrobak et al., 2021).
The European Union's Water Framework Directive (WFD) establishes operational definitions for assessing ecological conditions for defining management goals, and for harmonizing the ecological assessment systems of EU Member States (Hendry, 2017). The WFD aims at all European Union's lakes being in 'good' ecological conditions shortly. In this context, evaluation of particular aquatic assemblages referred to as biological quality components, is used to determine 'good' ecological status (Nõges et al., 2009). Phytoplankton, aquatic flora (including macrophytes, macroalgae, and phytobenthos), benthic invertebrates, and fish are all parts of these components (Kolada et al., 2016). Moreover, especially in the case of lacustrine ecosystems, the measured parameters include levels of phosphorous, nitrogen, chlorophyll a or water conductivity (Bhateria and Jain, 2016;Javadinejad et al., 2019;Ostad-Ali-Askar et al., 2018;Pirnazar et al., 2018). The WFD deems biological quality elements (BQE) to be of good ecological quality if they deviate from near-natural reference conditions only minimally (Europe Environment Agency, 2018). Member States are responsible for periodically analyzing certain BQEs in order to categorize the quality of their water bodies into one of the five WFD classes: high, good, moderate, poor, or bad. In many cases, bioindication technologies are used to minimize the costs of water quality monitoring (Golian et al., 2020;Holt and Miller, 2011;Kohlmann et al., 2018;Navabpour et al., 2018;Vanani et al., 2017). Assessments are used to highlight not only the current level of water contamination but also to categorize water quality as a foundation for environmental and economic considerations (Hu et al., 2018;Javadinejad et al., 2021;Kishimoto and Ichise, 2013;Liu et al., 2016;Talebmorad et al., 2020). Thus, they are useful not only in practical terms e.g. water management and establishing environmental solutions, but also in terms of revealing how various factors impact ecosystem characteristics such as population composition, biological productivity, and overall ecological fitness. Environmental flows are increasingly being included into freshwater ecosystem conservation management methods. The impacts of various hydrologically based flows on fish, macroinvertebrates, and macrophytes were investigated by adding artificially modified flow and discharge variations (Kuriqi et al., 2021(Kuriqi et al., , 2019Pander et al., 2019). Various models for estimating the ecosystem status of lakes coexist in different parts of Europe, using different methodologies (Frassl et al., 2019;Menshutkin et al., 2014;Mooij et al., 2010;Vinçon-Leite and Casenave, 2019). Some of them are directly reliant on the WFD's criteria being implemented. In the current monitoring of aquatic environments, two types of data are available: (1) chemical, physical, morphometric, and climatically relevant variables labelled generally as environmental data, and biotic data such as the number, abundance, and biomass of algal, invertebrate, and aquatic plant species; as well as (2) bioindication groups represented by the number of species in each, and indices calculated for aquatic communities and their environments such as PMPL, ESMI, IOJ, and many others (Alizamir et al., 2021;Legendre, 2018;Lepš and Š milauer, 2006;Zhang et al., 2021). Due to the comprehensive approach to assessment, the popularity of methods based on environmental mapping is also growing, especially in the basin approach. However, this belongs to the solutions that are time-consuming and hard to interpret. The multitude of approaches and available indicators that often correlate through the use of similar (or the same) variables generates a number of problems, e.g. during data assimilation procedures (Beven, 2018;Mclaughlin et al., 2006;Wu et al., 2014).
According to existing research, there are over 300 aquatic ecological evaluation methodologies in use across Europe (Poikane et al., 2014). One of the main tasks today is to provide a methodology for intercalibrating existing models (Birk et al., 2013;Lyche Solheim et al., 2019;Mooij et al., 2010). Despite progress in standardizing methodologies, decisions in different countries are made on the basis of locally developed solutions (Arhonditsis et al., 2019;Liu et al., 2008;Moss, 2008;Paruch et al., 2017;Sojka et al., 2019). Some of them, based on the framework defined in ecological status classes, do not sufficiently cope with the presence of outliers. Here, activities should be conducted that take into account their specific conditions (Cristóbal et al., 2014;Díaz Muñiz et al., 2012;Jackson and Chen, 2004;Lin et al., 2020). Differentiation within classes, low sensitivity to outliers, as well as deficiencies in the sets of measurements negatively impact decisionmaking for remedial action. This is also a problem when monitoring the achievement of environmental goals of lake ecosystems (Liu et al., 2019;Park and Hwang, 2016). Shortcomings like these make it difficult to reach a compromise between science and practice. As will be discussed further below, the rapidly developing areas of machine learning and artificial intelligence can address some of these difficulties.
Machine learning techniques are used in applied water ecology, related to the assessment of the quality of ecosystems. This applies primarily to case studies, where machine learning methods have been demonstrated to outperform standard statistical approaches for a range of ecological indicators (Bui et al., 2020;Ebron et al., 2020;Li et al., 2021;Liu et al., 2019;Ostad-Ali-Askari et al., 2019;Sahaya Vasant and Adish Kum, 2019;Singh et al., 2021;Vinçon-Leite and Casenave, 2019). Such techniques are robust especially in situations when measured ecological data are known to be non-linear and highly dimensional, with strong interaction effects. Whilst methods that assume linearity are unable to cope with interaction effects they are still being used. Frequently some modifications are introduced for aiding their performance. Among the methods used to date, approaches focused on forecasting the ecological state of lakes, using decision trees, random forests, support vector machines, k-nearest neighbors, and artificial neural networks, deserve particular attention (Chen et al., 2020;Hadjisolomou et al., 2021;Hrnjica and Bonacci, 2019;Mellios et al., 2020). Only a little research has been devoted to retrospective analysis, which aims at being an alternative to testing existing methods and producing algorithms for assessing the current ecological state of lakes (Gophen, 2021;Klinard et al., 2019;Moges et al., 2017). This is particularly important in the context of measures that aim at saving endangered ecosystems, where decisions are made on the basis of an already completed evaluation, or that are based on the assessment of long-term activities carried out over several ecological classification processes. In these cases, decision support based on existing data and using machine learning algorithms act as additional tools, aimed at supporting experts making decisions in situations of high uncertainty and complexity of the assessment process, is particularly useful (Uddin et al., 2021).
The main aim of this paper is to provide support for current methods of assessing the ecological status of lakes, so as to avoid changing the approach developed among experts. The proposed solution may serve as a tool supporting the work of the expert team in the selection of priority water bodies in terms of the implementation schedule of remedial measures in the next period of the water management plan. The unsupervised classifier is introduced as a parallel tool to the expert assessment process in this paper, which is a novel approach in ecological state assessment of lakes. This allows for obtaining information supporting the prioritization of objects for remedial actions in a situation when the number of ecological status classes included in the source methodology does not allow for a case study for each water body. An additional advantage of the solution is the ability to adjust the class factor to the internal (relative) range of the analyzed data, which places the prioritization result in the context of the dataset, while maintaining the higher-order division resulting from the WFD provisions. In this context, the information capacity of previous measurement campaigns will be kept. Our approach was constructed using available measurement data from four measuring campaigns. Based on the outcomes of experts' work following WFD standards, an ecological status classifier for lake ecosystems was built in the study underlying this paper. The Self Organizing Map model used in this approach functions as an unsupervised data reinterpreter, creating subgroups of ecologically similar objects without disturbing the original class structure.
Subsequently, a dataset and pre-processing description is provided, this is followed by the introduction of methods, along with an explanation of the general analytical scope and a description of the specific machine learning-based method. The supporting classifier that has been created is introduced next, along with a description of the different model options. Results are summarized in the discussion section, including the difficulties encountered as well as the relevance of the study findings. Finally, the usefulness of the findings is explained and suggestions are made for future study.

Data and pre-processing
The Chief Inspectorate of Environmental Protection in Poland provided the input data for the study underlying this paper. The same repository is used as the one which is the basis for reporting to the European Commission concerning environmental monitoring (GIOŚ, 2015). Due to the fulfillment of the acquis communautaire obligations in the field of surface water monitoring and evaluation, measurements are carried out in accordance with the requirements set out in national as well as European Union legislation (Mantzouki et al., 2018). The data collected cover the ecological state assessment results supplied with data on 496 lakes in Poland from 2010 to 2015. Ecological State Macrophyte Index (ESMI), chlorophyll a, conductivity, Diatom Index for Lakes (IOJ), nitrogen, phosphorus, phytoplankton, Phytoplankton Method for Polish Lakes (PMPL), and visibility were measures used to determine the ecological state of lakes during an assessment conducted with expert analyses.
The dataset was pre-processed during correlation analysis and missing data imputation (Chrobak et al., 2021). These procedures were performed to construct a model reproducing the course of expert assessment, so that in the future it could be an auxiliary tool, especially in terms of re-evaluation of ecosystems when specific remedial actions are to be implemented. Research published contains formulas and a formal record of the applied calculations, as well as the definitions employed (Chrobak et al., 2021). The pre-processed dataset, used as input to this paper is available in raw format as Appendix A. Moreover, Appendix B contains an R language script that converts all of the analysis procedures in this paper into an executable, repeatable code. The research was conducted with the use of software, providing: data visualization (Tableau 2019.1.19, https://www.tableau.com/), data modelling (R 3.6.1 via RStudio 1.2.5033 "Orange Blossom", https:// www.r-project.org/, https://rstudio.com/), and diagram development (draw.io 14.1.8, https://www.diagrams.net/). The list of all acronyms used in this work is presented below in an alphabetical order: ANN -Artificial Neural Network BMU -Best Matching Unit BQE -Biological Quality Elements ESMI -Ecological State Macrophyte Index EU -European Union IOJ -Diatom Index for Lakes SOM -Self Organizing Map WFD -Water Framework Directive

Self-Organizing maps (SOM)
Self-Organizing Maps are a type of artificial neural network (ANN) algorithm that, in this study, was trained using an unsupervised learning process to produce a two-dimensional, discretized representation of the input space of training samples obtained from measured ecological parameters of lake ecosystems (Kohonen, 1998). Thus, here, the SOM acts as a dimensionality reduction method in the first place (Thrun and Ultsch, 2020). Moreover, by grouping related features together, SOM also represented the clustering notion (Vesanto and Alhoniemi, 2000). As a result, an algorithm was used to employ dimensionality reduction to cluster a nine-dimensional dataset. This is depicted by using feature maps. By competing for representation, each object in the dataset was recognized in the context of inter-set similarity and placed adequately on the map (Kohonen, 2013). The initialization of the weight vectors for neurons was the first stage in the mapping process. Then, at random, a sample vector was chosen and the map of weight vectors was searched for the weight which is able to best describe that sample. Each vector had neighborhood weights in its immediate vicinity. The chosen weight was rewarded by the ability to become increasingly similar to the randomly picked sample vector. The neighbors of that weight were rewarded as well, as they were able to become more like the sample vector chosen. This allowed the map to expand and take on new forms during the training process (Kohonen, 1990). The algorithm used in this study consisted of the following steps, which are detailed further in the following subsections (following Ali Hameed et al., 2019): Step 1. The node weight vectors of the map were distributed randomly.
Step 2. An input target vector D(t) was selected at random.
Step 3. Each node in the map was iterated: Step 3.1. The Euclidean distance formula was used to find the similarity between the input vector and the map's node's weight vector.
Step 3.2. The node that generated the shortest distance was found and tagged as the best matching unit (BMU).
Step 4. By drawing the weight vectors of the nodes in the BMU's neighborhood (with BMU included) closer to the input vector, they were updated as: where: W is the current weight vector of the node, t is the current time step, L is the learning rate in the form of exponential decay function: for t being the current time step and λ acting as the constant depending on the selected number of iterations, λ is the influence rate which decides the size (radius) of the neighborhoodas the number of iterations grows, the size gradually shrinks until reaching 0 at the end of the training: ) with E dist being Euclidean distance between each pair of the neurons and σ representing radius size.
To measure the quality of the developed map, the node counts metric was used, which provides the count of objects (lakes) mapped to each node (Fig. 1). In order to facilitate the next steps, and to avoid the occurrence of 'empty nodes', the sample distribution should be relatively uniform. Moreover, the grid was checked in search of the presence of high values (counts) in the map area. This implies that a larger map might be advantageous (Tatoian and Hamel, 2018). The grid was created with 10x10 hexagonal nodes, allowed to achieve the goal of obtaining an average of close to 5 observations in the node. The adopted mesh did not contain 'empty nodes'. However, in the case of nodes: 1, 16, and 29, they contained 1 element each. The maximum number of objects in the node (12) was assigned to node 26. At the stage of constructing the map, the neighbor distance analysis was performed, the illustration of which is the so-called 'U-Matrix', showing the distances between each node and its neighbors (Fig. 2) (Stefanovič and Kurasova, 2011). The detected low neighbor distance (<5) pointed to groups of similar nodes. The observed distance with a value of above 5 indicates dissimilarities between object groups.

Variables distribution in SOM
In the initial phase of the analysis of the results of the parameters measured for the lakes, their distributions across the grid were traced on a previously defined map (Fig. 3). It was feasible to trace distribution patterns and hence determine the nature of the relationship between the indicators as a result of this. In some circumstances (e.g., IOJ -visibility, phosphorus -PMPL), the observed results provide a basis for the data set's dimensionality reduction. Similarly, some variables show a noticeable inverse relationship, which in the case of PMPL -ESMI or phosphorus -IOJ pairs may affect the formation of the resulting map. It is possible to illustrate locations where lakes with a particularly low (upper left corner) or high (lower right corner) ecological status will be concentrated at this level of the investigation. Because the default algorithm (which depicts the normalized version of the dataset) was reaggregated to be able to represent the variables from the original measurement set, this is intuitive at this early stage (Qian et al., 2019). As a result, the output is scaled to the true values of the eco-state variables. Both, the size of the clusters and the class distribution for lakes that could fall into any of the categories in between the extreme classes are unknown at this stage. As a result, it was decided to develop a weight vector map prior to the model training in order to make pattern identification easier (Fig. 4) (Chaudhary et al., 2014). The findings of the map's weighted juxtaposition of variables were afterwards used to support the explanation skill of a specific object (or group of objects) that acted as an outlier in the final SOM (Stefanovic and Kurasovay, 2018).
The heatmaps' patterns can be seen again, with relatively large proportions of chlorophyll a, nitrogen, phosphorus, and PMPL readings influencing the list's eventual left end. On the opposite side of the map, the IOJ, ESMI, and visibility indicators dominate (higher values indicate lower water turbidity in lakes). ESMI also has a scale that spans from 0 to 1, with 1 signifying a high level of ecological status. The interaction of conductivity-IOJ-ESMI (upper right-hand corner) and conductivity-PMPL-IOJ (lower left-hand corner) has also shaped some distinguishable locations. Across the whole cross-section of measurements, there were also lakes with relatively low values, with single markers critical for a shift in one of the major directions. The shape of data segmentations was determined in the following phases so that the model's results could be compared to the categorization presented in the original dataset (Wehrens and Kruisselbrink, 2018).

Model training results
The training dataset consisted of a collection of normalized measurements that were one of the outcomes of a prior study's approach (see 2.1 'Data and pre-processing'). The data has been preserved in the form of a matrix with the result of the original classification hidden from the algorithm. Thanks to this, an unsupervised version of SOM could be implemented. The aim of subsequent iterations of the training stage was to reduce the distance from each node's weights to the samples represented by that node (Stefanovič and Kurasova, 2011). Ideally, this distance should reach a minimum plateau (Nuhoǧlu and Yildirim, 2018). The mean value developed around consecutive noticeable drops. The series reached a turning point at iteration 8 364, when the mean distance steadied at around 0.013 on average, with the lowest value being 0.0012. The training process was terminated after an iteration of 10 000 due to detection of local stabilization of the variable waveform when increasing repetitions (See Figs. 5-8).

Clustering and segmentation
The clustering was performed with the use of the k-means algorithm (Yang et al., 2012). The total within mean of squares was considered in the context of 1 to 15 possible cluster counts. In order to detect the optimal number of clusters for the considered set of observations, the classic elbow method was used (Ghayekhloo et al., 2015). The number of three clusters was found to be the most optimal. In addition, the result of an alternative way of determining the number of clusters -the average silhouette, which revealed two clusters -was examined (Susilowati et al., 2020). In this research, the results of the techniques for determining the appropriate number of clusters played a supportive role. They were utilized as anchoring points for the segmentation method, ranging from two to five clusters (as recommended by the elbow technique), allowing the set's unsupervised segmentation to be traced back to the number of classes corresponding to the initial categorization of lake ecological state (five classes). Thus, it is possible to assign a value in the selected range (e.g. 1 to 3 or 1 to 5) to each of the objects as a measure accompanying the expert result.
In the case of segmentation with the use of two clusters, nodes clearly stand out from lakes with a particularly bad ecological condition (original class: very bad) (Fig. 7a) However, the cluster that separates these objects from the rest is not homogeneous. The objects were Fig. 1. The count of objects mapped to a given node. The average node has ca. 5 objects. The main goal was to prevent the occurrence of empty nodes and also to avoid creating nodes with large values comparing to those existing in the entire grid. Fig. 2. The unified distance matrix shows the Euclidean distance between each node and its neighbours. The dissimilarities are visible in the upper left part of the matrix, pointing to the potential boundaries within the studied set of lakes.
assigned to a common group on the basis of the results of the measurements of nitrogen, PMPL and phosphorus. In this way, 16 objects were separated from the original 'very poor' class, creating a separate group of lakes. The three-class option proposed by the elbow algorithm produced a cluster separating a subset of lakes initially included in the 'good' and 'very good' classes -a total of 188 items (Fig. 7b). They are primarily distinguished by high visibility, IOJ, and ESMI measurements. The four-cluster version is the first to come out of the framework imposed by segmentation shaping methods (Fig. 7c). In the early stages of modeling (when k = 2), the class of lakes with poor ecological condition was split. The three lakes therefore formed their own category, although they retained the original categorization, which was based mostly on the 'very bad' category (except for one case). The fourth splitsequence, which resulted in a total of five clusters, was used to refer to the initial set of scores (Fig. 7d). Hence, the results of unsupervised SOM on the dataset served as notes indicating the position of the object in one of the meta classification options. This stage allowed for the identification of lakes with higher shares of results for PMPL, conductivity and IOJ, thus separating a valuable subset for the classification results inheriting features from the original 'very poor' class.
An interesting case is the early separation (when k = 2) of an object originally belonging to the 'poor class' and an assignment to a cluster grouping lakes with a particularly poor ecological condition, visible in the resulting diagram of the process of back-mapping the model results to the original set of classes. Case study analysis appears to be an acceptable strategy in situations like these to detect the source of the original expert choice at an early stage of future analyses, where the given assessment could be supported by conditions outside the scope of the measured indicators. Also, in such cases, human error or shortcomings in the evaluation methodology cannot be excluded. Nevertheless, even if the in-depth analysis of the case study does not show significant arguments in favor of a correction in the primary classification, the authors propose, in accordance with the approach used in this work, to leave the object in its original class, and assign to it the attribute of high risk of transition to a lower class in the absence of remedial actions taken until the next evaluation.
Obtaining solid information regarding the need to prioritize remedial measures in lakes that were identified at an early stage as a subgroup with an exceptionally low ecological state is one of the outcomes of the SOM development process. The issue of selecting lakes in terms of recommended pro-ecological practices also applies to cases with the potential for promotion from the moderate to a good group, where, based on the classification of similarity in the cluster, there are 8 such sites. Thus, the direction of corrective actions is to be indicated by: a) analysis of the results contained in heatmaps -reduction of uncertainty related to the lack of measurement results by recognizing the patterns of shaping the results in the entire dataset,  b) detection of cumulative impacts in the dimensionally reduced dataset (vector maps) along with the definition of groups of factors influencing the ecological state of the lake, c) identification of ecologically similar lakes within clusters, which enables the selection of specific remedial methods, if they were used (or function in the current water management program), and also helps in assessing their potential effectiveness, d) prioritization supported by classification methods in the field of machine learning to obtain additional information resulting from the analysis of variables in n-dimensional terms.

Discussion
Based on measurements supplied for 497 lakes in Poland, the decision support potential of a machine learning-based unsupervised ecological state classifier for lake ecosystems is investigated. We proposed the use of the Self Organizing Maps algorithm as a tool complementing the results of the methodology used in expert research. Classification support is important in cases where the used capacity of the original class, resulting from the provisions of the WFD, does not allow for selecting subgroups of ecologically similar lakes. As shown by the example from this study, it was possible to separate from the area of "very poor" and "poor" classes sixteen objects forming a subgroup of lakes with a particularly bad ecological condition, with similar measurement results identified in the space reduced to two leading dimensions. In the subsequent stages of class segmentation, substructures of the set of observations were revealed. This gave a premise for the use of the solution as decision support in the process of prioritizing the use of remedial action groups in ecosystems. This means that before the lake reaching its target ecological status the logical goal is a qualitative transfer to the subgroup that has the best chance of doing so in the future evaluation. A program of evaluation of actions taken on lakes that, retrospectively, were ecologically similar, may be helpful in effectively proposing specific actions aimed at the identified problems, and then, in   6. The evaluation of distance between objects assigned to each cluster plotted against the number of clusters. An inflection points to three clusters as the optimal solution (k = 3). Taking into account that the selection of the number of clusters depends on the number of nodes maps, the versions with the number of clusters 2,4 and 5 were also taken into account. the next campaign, change the subgroup to "more favorable".
The detection of ecological outliers at an early stage (k = 1) clearly indicates a possible approach to prioritizing ecosystems in the context of remedial actions within water management programs. The use of the knumber of cluster parameters at the selection stages makes it possible to separate groups of outliers in the initial phase, and then repeat the function on the set reduced by these objects. Therefore, the given number of clusters is constant in the process and results from the adopted segmentation methodology. Lakes are classified until the sample is exhausted. Individual groups indicated in subsequent iterations can then be assigned a prioritization level corresponding to the n th iteration in which the classifier indicated the outlier subset. The grouping conditions at each stage can be checked and interpreted on the basis of heatmaps or weight vector maps.
One of the main issues connected with the presented method concerns the quality of the input data (Flexer, 1997). In order to build a map, there is a need to value each dimension of each sample member. This is a limiting aspect of the usage of SOMs, sometimes referred to as missing data problems, because it is not always possible and frequently particularly difficult to obtain all of the required data. Another issue is that each SOM is unique and discovers distinct patterns in the sample vectors (Kohonen, 2013). SOMs organize sample data so that comparable samples are generally surrounded by similar samples in the end result; nevertheless, similar samples are not necessarily close to each other. For instance, if there are hues of a given color in the map, occasionally, the clusters may divide, resulting in two (or more) groupings with the same color (Halgamuge and Wang, 2005). The final issue with SOMs is that they are computationally costly, which is a substantial disadvantage since as the dimensions of the data become larger, dimension reduction visualization techniques become more essential, but the time required to calculate them grows as well (Liu et al., 2006).
Despite the indicated limitations, the solution has the advantage of high information capability, which makes the results easy to understand and thus, interpret (Qu et al., 2021;Wehrens and Kruisselbrink, 2018). Moreover, algorithms efficiently categorize data and then assess their own quality, allowing for the estimation of how good a map is and how strong the similarities between items are (Oprea et al., 2020;Yotova et al., 2021). Thanks to this, the solution meets the requirements of process transparency and, with conscious use, the results can be consulted between experts and policymakers on the basis of intuitive visualizations presented in this paper (Chung et al., 2018;Khamassi et al., 2006;Paini et al., 2010). The presented method highlights the need for more effective and transparent visual communication between experts, society and decision makers. This facilitates the process of public consultation for the results of updated water management plans (Tokarczyk-Dorociak et al., 2019). The positive impact of conscious Fig. 7. a) result map with two clusters (k = 2) showing a group of lakes with a particularly low ecological status (upper left corner), also separating similar objects (3 lakes), creating a cluster that is not contiguous on the map surface; b) the map version with three clusters (k = 3), selected as optimal by the "elbow method" does not affect the structure of the cluster created in the previous example, but the main area of the map has been divided, creating two new subclasses, the continuity of which is disturbed by three objects in the upper right-hand corner of the map; c) in the case of a solution with four clusters (k = 4), the cluster created in the first approach (k = 2) was separated by dividing a subset of lakes with a low ecological status into two separate subgroups. On the other hand, the map fragment created in version 1 remains unchanged, when k = 3; d) the five-cluster version (k = 5) was created to reflect the original ecological status class division. The newly created division included a group of lakes intermediate between the ecologically worst and the average subgroup, including in this cluster three objects disturbing the continuity of the division in versions k = 3 and k = 4.
increasing of communication skills on the course of the environmental assessment process has been widely discussed in the literature (Few, 2006;Murchie and Diomede, 2020;Ståhl and Kaihovirta, 2019;Xiong et al., 2020). In addition, from the analytical point of view, the results obtained from the SOM analysis can be used to predict the ecological status of lakes by combining non-linear map representation with linear statistical forecasting methods for each homogenous sub-group to improve prediction accuracy.

Conclusions
The solution suggested in the paper aims at supporting existing expert systems for assessing the ecological condition of lakes, using an unsupervised machine learning algorithm taking a format of a Self-Organizing Map. Presenting the set of measurements, using heatmaps, makes it possible to intuitively trace the shaping of variables of their mutual relations. Moreover, the use of a weight vector map contributes to an increase in the interpretative ability of measurements in a map reduced to two dimensions.
In this paper, the introduced procedure of map design and clustering with different versions of the division was referred to the set of expert classification results in the back-mapping process, showing the underlying structure of the results when treated with unsupervised data comprehension approach. The requirement to optimize the selection of groups of pro-ecological actions based on a mix of empirical facts and advantages supplied by latent knowledge led to the extension of the classic method with an additional supportive priority inside decision making. Despite taking a step toward increasing the informative skill of lake ecological condition evaluation, we recognize the necessity to give answers to the problem of data efficiency, such as in instances of data shortage, which results in significant, and frequently uncontrollable, uncertainty of generated findings. With this in mind, the next step is to research the assimilation of measurement data of lake ecosystems in order to minimize the impact of identified quality deficiencies in the data on the application of further stages of the evaluation procedure.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 8.
Results of back-mapping the segmentation obtained with Self Organizing Maps to original ecological classes. Already in the case of the map with k = 2, there is a visible separation within the "very bad" class, which persists in subsequent runs of the model (only deepening the division). Thanks to the division into successive segments, it is possible to assign lakes (groups of lakes) to individual subtypes within one primary class.