Yes, We Can! Large-Scale Integrative Assessment of European Regional Seas, Using Open Access Databases

Substantial progress has been made in assessing marine health in an integrative way. However, managers are still reluctant in undertaking such assessments, because: (i) lack of indicators; (ii) absence of targets; (iii) difficulty of aggregating indicators from different ecosystem components, habitats, and areas; (iv) absence of criteria on the number of indicators to be used; (v) discussion on the use of ‘one-out, all-out’ (OOAO) principle in aggregating; and (vi) lack of traceability when integrating data. Our objective was, using open access databases with indicators across all the European seas with agreed targets, to demonstrate if the Nested Environmental status Assessment Tool (NEAT), can be used at the European scale, serving to managers and policy-makers as a tool to assess the environmental status under the Marine Strategy Framework Directive (MSFD). We have used MSFD Descriptor D3 (commercial fish) from 341 stocks, 119 species and two indicators from each of them (years 2013-2015); D5 (eutrophication) with 90th percentile of Chlorophyll-a (years 2009-2014); and D8 (contaminants), with Anthracene, Fluoranthene, Naphthalene, Cadmium, Nickel and Lead as indicators (years 2009-2013). We have calculated the environmental status for each European subdivision, subregion and regional sea, nested at different levels. The analyses include weighting and no-weighting by each assessment area; for ecosystem component (water column, phytoplankton, fish, crustaceans and molluscs); descriptor (three), and habitat (pelagic, demersal/benthic), with the confidence value of the status. A sensitivity analysis was undertaken to determine a minimum number of indicators to include for a robust assessment. We demonstrated that using NEAT in assessing the status of large marine areas, by aggregating indicators, ecosystem components and descriptors, at different spatial scales, can remove at least four out of the six barriers that managers and policy-makers confront when undertaking such assessments. This can be done by using open-access databases and already established targets. Aggregating indicators of different origin is possible. Around 40 indicators seem to be enough to obtain robust assessments. It is better to integrate the assessment items using an ecosystem-based approach, rather than using the OOAO principle. Using NEAT, this approach supports identifying the problematic environmental issues needing management attention and measures.


INTRODUCTION
Due to increasing human activities and pressures in the oceans (Korpinen and Andersen, 2016;Lotze et al., 2018), some initiatives worldwide are intending to assess the status of marine waters in an integrative way, incorporating multiple metrics, indicators and ecosystem components (i.e., from bacteria to mammals), under the ecosystem approach (Inniss et al., 2016). In some cases, the assessments are based upon national, regional, or international legislation (Borja et al., 2008). However, nowadays, there are few methodologies able to assess the status of marine waters under an ecosystem approach . One of the difficulties when applying these integrative assessment methods is to find adequate indicators and targets (Gibson et al., 2000;Borja et al., 2012;Rossberg et al., 2017). In fact, there are different initiatives looking for essential biodiversity (Pereira et al., 2013) and ocean (Miloslavich et al., 2018) variables to be used as indicators in such kind of marine assessments worldwide.
To obtain these indicators or essential variables, global databases and acquisition methods, publicly available, are being used (Muller-Karger et al., 2018). Hence, the assessment methods which use these databases tend to be attractive for managers and policy makers, due to its ease of use [e.g., the Ocean Health Index (OHI), Halpern et al. (2012)]. However, some of these methods have been criticized because they are more an evaluation of the benefits to humans, provided by oceans, rather than assessing the actual health or environmental status (Duarte et al., 2018). Other methods, such as the Nested Environmental status Assessment Tool (NEAT, Borja et al., 2016), have initially been developed to assess the status under European legislation [i.e., the Marine Strategy Framework Directive (MSFD), (European Commission, 2008)]. In this case, managers usually complain because of the lack of suitable indicators able to be used at large scale (i.e., at regional seas scale, such as the Baltic, Atlantic, Mediterranean and Black Seas), and each Member State can use different indicators, despite the guidance provided by the European Commission (2017). Hence, hundreds of indicators have been listed to be used in the MSFD (Hummel et al., 2015;Teixeira et al., 2016), making any standardized use difficult. In addition, the different indicator aggregation methods, used at different spatial and temporal scales Gan et al., 2017), can result in different assessment results, even differing status classes (Langhans et al., 2014;Probst and Lynam, 2016). These differences can difficult the use of these integrative assessment methods by managers and policy-makers, despite of some recent studies showing the usefulness of these methods in responding to human pressures, both spatially and temporally (Pavlidou et al., 2019).
Hence, although substantial progress has been made in the last few years in assessing marine health in an integrative way, at global scale Inniss et al., 2016), managers are still reluctant in undertaking such assessments, for different reasons: (i) the supposed lack of indicators able to be used at large scale (Hummel et al., 2015;Teixeira et al., 2016), including its rigorous testing and validation (Moriarty et al., 2018); (ii) the absence of suitable reference conditions or targets for those indicators ; (iii) the difficulty of aggregating indicators from different ecosystem components, habitats, areas, etc. Langhans et al., 2014;Probst and Lynam, 2016); (iv) the absence of criteria on the number of indicators to be used for an adequate assessment; (v) the discussion of whether integration should be done using the principle "one-out, all-out" (OOAO) (Borja and Rodríguez, 2010), in which the worst status at any level (indicator, ecosystem component, assessment unit, etc.) determines the global status, or if the integration can be undertaken following other principles; and (vi) the lack of traceability of the problems coming from different human pressures, if data are integrated to obtain a single value.
Taking this into account, our objective is, using open access databases which include indicators across all the European regional seas with agreed targets, to demonstrate if an aggregation method, such as NEAT, can be used at the European scale, serving to managers and policy-makers as a tool to assess the environmental status under the MSFD. Hence, our primary aim is not to determine an actual environmental status for Europe or each of the regional seas, but contribute to remove the six barriers which we have identified, that managers and policy-makers see when being confronted with such an assessment task.

NEAT Description
NEAT ) is a free software 1 , which has been applied in different geographical areas, inside and outside Europe Nemati et al., 2017;Pavlidou et al., 2019). Its principles are: (i) indicators, which constitute the basis of the assessment, and need a range of values and a target (i.e., the boundary between good status and non-good status); (ii) weighting and hierarchies: its central principle is a hierarchical, nested structure of Spatial Assessment Units (SAUs) and habitats, avoiding the dominance of certain indicators, habitats or SAUs by using a proper weighting procedure, which considers what information is available for different real spatial scales [see ]; (iii) aggregation: indicators and boundaries are normalized into a scale of 0-1, independently of their original scale, and aggregation is done across all indicators belonging to a SAU; (iv) the aggregation is visualized into a number (NEAT value) and a color, which corresponds to the status (i.e., high, good, moderate, poor, and bad). The NEAT value is obtained for the whole assessed area, but can be visualized at different spatial scales, for different ecosystem component (e.g., fish, water column, etc.), or habitats; and (v) each NEAT value is accompanied by its quantitative estimate of the confidence of the result .

Indicators Selected, Reference Conditions, and Thresholds
From the many indicators proposed for the MSFD (Teixeira et al., 2016), only few are available at large scale and open access to undertake an exercise such as this proposed here, to demonstrate the possibility of using NEAT at European scale within the MSFD. This directive is structured into 11 qualitative Descriptors (D) (European Commission, 2008), which includes biodiversity (D1), alien species (D2), commercial fish (D3), food-webs (D4), eutrophication (D5), seafloor integrity (D6), hydrography (D7), contaminants in waters and sediments (D8), contaminants in seafood (D9), noise (D10) and litter (D11).
For commercially exploited fish and shellfish (D3), some of the indicators used to determine the environmental status are fish mortality and spawning stock biomass (European Commission, 2017). The most recent data (years 2013-2015) on 397 European stocks are available in Froese et al. (2018), for each of the regional seas, subregional seas and subdivisions. From these fishing pressure (F/Fmsy; F, fishing pressure; msy, maximum sustainable yield) and stock biomass (B/Bmsy; B, stock biomass) data of 341 stocks, corresponding to 119 species, have been considered in our investigation (obtained from the Supplementary Material in (Froese et al., 2018), and corresponding to FAO fishing areas). The reduction in the initial number of stocks was because: (i) some stocks with data from 2 years were averaged, and (ii) Celtic Seas and Rockall stocks were merged, since they appear in the MSFD as a single sub-region. For each fish stock, and the two associated indicators for each species (this means 238 indicators), mean and standard error values were calculated for each SAU (see Figure 1 for SAUs), using the above time period.
For eutrophication (D5), one of the indicators used across all Member States (e.g., within the Water Framework Directive), is 90th percentile of chlorophyll a, calculated using a period of 5 years (European Commission, 2018). In order to calculate such indicator, we used a web tool 3 to extract chlorophylla data from 87 different locations distributed through the different subregions and subdivisions considered (Figure 1). These chlorophyll-a data are extracted from different Copernicus marine products depending on the regional sea considered ( Table 1). In the case of Barents Sea, the web tool does not provide data for calculation, and in the Norwegian Sea only for the southern part. Then, the 90th percentile of chlorophyll a was calculated, using the period between 2009 and 2014, for each location. Finally, the mean and standard error values of the 90th percentile of chlorophyll-a were calculated for each SAU.
Member States are required to report on contaminants (D8) in coastal and territorial waters (Reker et al., 2015). These data are included in the databases of the European Environment Agency (EEA), on the status and quality of Europe's transitional, coastal and marine waters 4 . This dataset contains data on many hazardous substances in seawater, and we have selected for the analysis Anthracene, Fluoranthene, Naphthalene, Cadmium, Nickel, and Lead, since they were the only present in the datasets of all Member States. We have used data from the coastal waters (<1 nautical mile-nm), territorial waters (1-12 nm) and offshore (>12 nm), when they were available. A total of 2,841 stations were selected: 231 from Baltic Sea, 62 from Black Sea, 406 from Mediterranean Sea and 2,142 from North East Atlantic Ocean, for years between 2009 and 2013 (Figure 1). Again, mean and standard error values were calculated using the stations and years within each SAU, for each of the six indicators.
The raw data used in the NEAT calculations, for the three descriptors, can be consulted in the Table S1.
To assess the status, all of these indicators need to set reference conditions (i.e., values for the best and the worst environmental status), targets (boundary between good and non-good status), and thresholds for the remainder quality classes . The origin of data and those values can be consulted in Table 2. Hence, reference conditions and class boundaries for D3 (Commercially exploited fish and shellfish) were obtained from Froese et al. (2018). In the case of D5 (Eutrophication), the boundaries for Chlorophyll a were obtained from the intercalibration exercise undertaken for the Water Framework Directive and published in European Commission (2018). These boundaries depend on the types within the regional and subregional seas. In the Baltic Sea, with several types and countries, we calculated a mean value for the boundaries between good/moderate and good/high ( Table 2), since the chlorophyll values used in the calculation are for the whole regional sea. In the North East Atlantic Ocean, with many types and boundaries calculated for very coastal areas, we have used the lowest available boundaries, which correspond to Sweden ( Table 2). This can be considered a precautionary principle, since the data used in the calculations are in open ocean and are expected to be much lower than in coastal areas. For the Mediterranean and Black Seas we used the boundaries set for each subregional sea or subdivision ( Table 2). NEAT calculates by interpolation the remainder boundaries, between Good/Moderate and the worst value . Finally, for contaminants (D8), we used the environmental quality standards proposed by the European Commission (2013) ( Table 2). In this case, as the worst values are high, to avoid an automatic linear interpolation between the target value (Good/Moderate) and the worst value, we decided to include boundaries of Moderate/Poor and Poor/Bad as three times the target value in the first case and nine times in the second.
FIGURE 1 | Map of the European seas, showing the Spatial Assessment Units (SAUs) defined in this study, at different levels (subunits, subregions, regional seas). The sampling stations for contaminants (pink points) and chlorophyll a (green points) are shown. BoB, Bay of Biscay; EU, European Union.
TABLE 1 | Copernicus marine products from which satellite chlorophyll-a data were extracted.

Analysis of Data
NEAT has been calculated for each subunit, subregion, regional seas and the whole Europe, both weighting, by the surface area of each of them, and non-weighting, giving the same weight to each of them irrespective of their area. Also, we have calculated the NEAT value for each ecosystem component studied (water column, phytoplankton, fish, crustaceans, and molluscs), each descriptor (D3, D5, and D8), as well as each habitat (pelagic, demersal/benthic). In all cases, the confidence of the NEAT result has been calculated by 1,000 Monte-Carlo iterations, using the indicator value along with its associated standard error to calculate multiple times the assessment . The error then propagates through the assessment scheme and leads to different NEAT values, expressed as the percentage of values falling into the five quality classes and showing the probability that the environmental status corresponds to that obtained .
In addition, we have determined the environmental status using the OOAO principle at different levels: (i) the worst of the values of all indicators; (ii) the worst of the five ecosystem components abovementioned; and (iii) the worst of the three descriptors studied, at the subunits or subregion level and then integrating the worst of them at regional and European levels. Then, we have compared these results with the environmental status obtained using NEAT, calculated weighting and nonweighting by SAU. The source in which they are based is included. F, Fishing pressure; B, Biomass; msy, maximum sustainable yield. The class boundary values are those included in NEAT, after the sources from which they were obtained, the software calculates the empty cells by interpolation between the other boundaries .
Finally, we have investigated the sensitivity of the aggregation, by undertaking a one-way sensitivity analysis in NEAT, varying only one indicator in the model at a time, examining the impact of that change on the quality status result (Nemati et al., 2018). After running an assessment, the NEAT software contains a functionality allowing to undertake the sensitivity analysis automatically. The assessment was run with all indicators. We used randomly selected indicators, using 1,000 Monte Carlo iterations each time an indicator is removed. The analysis continued until there was only one indicator left in the assessment . This allows to identify when an assessment changes from the initial status set to a different one, or the system becomes unstable with different status, and the number of indicators needed to have a robust assessment.

RESULTS
In the Supplementary Material, several tables present all the analyses undertaken, showing the data as exported from NEAT: Table S2, includes results of ecosystem components by weighting for SAU area; Table S3, the same but for habitat; Table S4, ecosystem components without weighting; and Table S5, the same by habitat. The remainder tables have results weighted by area, referring to each of the MSFD descriptors analyzed: Table S6, shows descriptor 3, of commercial fish; Table S7, the same, but for habitat; Table S8, descriptor 5, of eutrophication; Table S9, descriptor 8, of contaminants in seawater; and Table S10, the sensitivity analysis results.
Tables S2-S9 have been summarized in Table S11. Descriptor 3 (commercial fish) shows the worst status, with all SAUs in poor or moderate status, excepting the Barents Sea, which is in good status. When looking at each of the ecosystem components of this descriptor, crustaceans are the most affected (12 out of the 18 SAUs are in poor or bad status, mainly due to the observed values for B/Bmsy indicator of Nephrops norvegicus, Palinurus elephas, Homarus gammarus, Parapeneaus longirostris, Aristeomorpha foliacea, and Pandalus borealis stocks, and for F/Fmsy indicator of N. norvegicus and Penaeus kerathurus stocks) (Table S1), followed by molluscs and fish (in this case, the only two seas in good status are no MSFD locations: Barents and Iceland Seas) ( Table S11). In the case of fish, the worst observed values for B/Bmsy indicator were registered in Ionian Sea, in Aegean Sea and in Sardinia (stocks of Epinephelus marginatus, Solea solea, Scomber scombrus, Dicentrarchus labrax, Atherina boyeri, Mullus barbatus, Dentex dentex, and Belone belone) and for F/Fmsy indicator in North East Atlantic and in Black Sea (stocks of Raja clavata, Trachurus mediterraneus and Scophthalmus maximus) ( Table S1). In the case of molluscs, the worst observed values for B/Bmsy indicator were registered in Sardinia, in Adriatic Sea and in Cyprus (stocks of Chamelea gallina, Pecten jacobeus, Loligo vulgaris, and Sepia officinalis) ( Table S1).
Regarding Descriptor 5 (Eutrophication, based on Chlorophyll a, as a proxy of phytoplankton), most SAUs are in good or high status, excepting Black and Baltic Seas, classified as moderate status (Table S11).
In the case of contaminants (Descriptor 8), most SAUs are in good or high status, except the coastal areas (<1 nm) of Balearic Sea and Black Sea, which are in moderate status. In Balearic Sea there are some high concentrations of cadmium and lead, whilst in the Black Sea, in addition to these two metals, there are also high concentrations of anthracene and fluoranthene. When there are data close to the coast (<1 nm), territorial (1-12 nm), and offshore (>12 nm), the quality increases with the distance to the coast, showing higher NEAT values in the open sea than in the coast ( Table S11). The exception is the Baltic Sea, in which the lowest NEAT values were found in intermediate and offshore areas (due to the high concentrations of cadmium and lead) ( Table S11).
When undertaking the calculations by habitat, instead of by ecosystem components, it can be seen (Table S11) that demersal/benthic habitats are in much worst status (25 out of 26 SAUs are in poor or moderate status) than pelagic habitats (14 out of 50 SAUs are in moderate status, being the remainder in good or high status). This is mainly due to the demersal and benthic species in D3, which have worst status than pelagic ones.
When comparing the integrated NEAT values, weighting or not by the SAU area, it can be seen (Table S11) that weighting generally seems to lead to lower NEAT values in our dataset, with 18 out of the 50 SAUs in moderate status, with only 11 out of 50 when no weight was applied.
Some of these results have been represented in Figure 2 and detailed in Table 3 for the whole Europe, each regional sea and some subregions and subdivisions. For the indicators and descriptors selected, the whole Europe has a moderate status (NEAT value: 0.586), when weighting by SAU area, being in good status (NEAT value: 0.612) when no weighting is applied. The confidence in the result is 99.9 and 100%, respectively. Regarding regional seas, the Baltic, Mediterranean and Black Seas are also in moderate status (confidence values ranging between 99.3 and 100%), whilst the Atlantic Ocean is in good status (confidence 85.1%), when weighting (Table 3). When no weight is applied, the status of all regional seas but Black Sea is good (confidence 100%). In the Atlantic Ocean, the subregions not included in the MSFD show lower NEAT values than those within the MSFD, but in both cases the status is good when weighting (confidence 100 and 68.4%, respectively), decreasing to moderate in no-MSFD areas when no weighting is applied (confidence 80.6%). This is a good example of the influence of weighting in the result, when using NEAT: 3 out of the 5 SAUs (Faroes, Greenland and Iceland Sea) within the no-MSFD seas are in moderate status, but they have the smallest surfaces, representing only 31% of the total surface of no-MSFD seas (Table 3). Hence, when weighting, these SAUs do not "compensate" the 69% of the area (Barents and Norwegian Seas) in good status, and the no-MSFD sea results in a good global status. However, when no weighting is applied (the SAU weight is 0.017 for each no-MSFD SAU), the mean value of NEAT from those three SAUs in moderate status and those in good status, results in a moderate status.
Within the MSFD Atlantic Ocean subregions, only the Bay of Biscay and Iberian coast is in moderate status, the remaining subregions being in good or high status ( Table 3). In the Mediterranean Sea, when weighting, all subregions and subdivisions are in moderate status, except the Aegean Sea, which is in good status (all of them with 100% of confidence). If no weighting is applied, the Western Mediterranean remains in moderate status, but the other subregions are in good or high status, resulting in a good status for the whole Mediterranean Sea (Table 3).
At the global European level, the moderate classification comes mainly from Descriptor 3, with the three ecosystem components fished (crustaceans, molluscs and fishes) in the same status, whilst Descriptors 5 and 8 are in high status (Table 3). This low quality is reflected also in the habitats at the whole European level, since demersal/benthic habitats (in moderate status), most affected by fishing, are in worse status than the pelagic habitats (in good status). When looking at the regional seas level, the pattern is similar in the Atlantic Ocean and the Mediterranean (D3 in moderate status, D5 and D8 in high status), whilst in the Baltic and Black Seas Descriptor 5 (Eutrophication) is in moderate status and Descriptor 8 (Contaminants) is in good status, with D3 (Fishing) in moderate and poor status, respectively, showing a more degraded situation ( Table 3). In addition, the demersal and pelagic habitats of the Mediterranean and Black Seas are in poor status, indicating more pressures in them than in the remaining regional seas.
When comparing the results obtained using NEAT (noweighting and weighting) with the "One-out, all-out" principle (OOAO principle) at the level of descriptors, ecosystem components and indicators, a clear pattern is observed, with a gradient of improvement from the more conservative method (OOAO at the level of indicators) to NEAT method without weighting by area (Table 4). Hence, when more indicators are included in the OOAO approach, most of the subunits and subregions, as well as all regional seas and the whole Europe, the assessment shows bad status. The only exceptions are those areas in which either fish stock data or contaminants do not exist (e.g., rest of Macaronesia) or with little number of fish indicators (e.g., Norwegian Sea) ( Table 4). When applying the OOAO at the level of the five ecosystem components, still the whole Europe is in bad status, as well as the Atlantic, but the Mediterranean and Black Seas are in poor status and the Baltic Sea in moderate status (Table 4). Again, those subunits and subregions with less components have higher possibilities of showing better status. Applying the OOAO at the three descriptors used here, the status is the same in the Baltic, the Mediterranean and the Black Seas, but there is an improvement in the Atlantic and the whole Europe (Table 4). Using NEAT, the status is moderate in most of the cases when weighting by the surface area, and good if no-weighting is applied (Table 4).
Regarding the sensitivity analysis, the raw results can be consulted in Table S10. Using many indicators, the status as shown in Table 3 is maintained during the process of removing indicators randomly, until the result becomes unstable (i.e., permanent change in the status, changes to different statuses when removing additional indicators). Hence, from the 24 subunits, subregions and regional seas studied: (i) four (Black Sea, Mediterranean Sea, Adriatic Sea, and Barents Sea) needed at least 10-15 indicators of the initial bulk of indicators to achieve the same final environmental status obtained when using all of them; (ii) four (North East Atlantic Ocean, Macaronesia, Aegean Sea and Cyprus) needed between 65 and 235 indicators; and (iii) the remainder sixteen needed between 20 and 40 indicators ( Table 5).

DISCUSSION
Despite the substantial progress made in assessing marine health in an integrative way Inniss et al., 2016), we have identified at least six barriers that managers and policymakers confront when undertaking such assessments. In the discussion below, we show how these barriers can be removed, after the results obtained in our analyses.

Lack of Tested and Validated Indicators
There is a contradiction in the fact that whilst managers claim there is a lack of suitable indicators to assess the environmental status of marine systems, to apply the criteria from the European Commission (2017), we have plenty of indicators available to be used (HELCOM, 2010;Pereira et al., 2013;UNEP, 2014;Hummel et al., 2015;Teixeira et al., 2016;Miloslavich et al., 2018). However, it is true that in some cases there is a lack of rigorous testing and validation (Moriarty et al., 2018), which have been mitigated in recent times in Europe with the intercalibration of some of them within the WFD (European Commission, 2018). At the same time, there are plenty of indicators (e.g., contaminants, those related with fish stocks management) for which a long history of application and development allows us to use them with more confidence (Froese et al., 2018). However, it is true that still suitable indicators are needed for descriptors such as D1 (biodiversity) and D4 (food-webs) (Rombouts et al., 2013;Azzellino et al., 2014), despite the efforts done to integrate them in NEAT (Haraldsson et al., 2017).
In this research we have used well-known indicators, available in different open access sources, which can be increased relatively easy, e.g., we have used only contaminants in waters, but contaminants in sediments and biota also exist in European databases 5 . Also, information from other MSFD descriptors could be available in coming months, such as litter, biodiversity (from Habitats Directive and MSFD reporting), nutrients from WFD reporting, etc., in databases such EMODNET 6 . This would allow to select a number of new indicators, corresponding to several MSFD descriptors, common across different Member States. For the moment, with the indicators used in our research, we have demonstrated that it is possible to use current open access datasets to assess the environmental status of European marine waters in an integrative way. These databases can be completed in coming months with new indicators suitable for MSFD assessment.

Absence of Suitable Reference Conditions and Targets
There are some discussions at European level on the need to use reference conditions and targets or not, when assessing the environmental status in the context of the MSFD; however, Borja et al. (2012) highlighted the importance of having such values when a quantitative assessment is going to be undertaken. In 5 Available at: https://www.eea.europa.eu/data-and-maps/data/waterbasetransitional-coastal-and-marine-waters-11 6 Available at: http://www.emodnet.eu TABLE 3 | Summary of the assessment values, for the whole Europe, for the four regional seas (in bold), for the subregional seas (centered), and several subunits (aligned to the right), weighting by Spatial Assessment Unit (SAU) area and non-weighting.   4 | Summary of the environmental status, for the whole Europe, for the four regional seas (in bold), for the subregional seas (centered), and several subunits (aligned to the right), assessed following different methods: (i) NEAT calculated weighting and non-weighting by Spatial Assessment Unit (SAU); and (ii) using the "One-out, all-out" principle at different levels: the worst of the values of the three descriptors, the worst of the ecosystem components and the worst of the indicators studied, at the subunits or subregion level and then integrating the worst of them at regional and European levels. absence of suitable reference conditions (best and worst values, as defined in NEAT) and, at least, target, or thresholds values, i.e., the boundary between good and not good status, it is not possible to undertake quantitative environmental assessments (in fact, some so-called global integrated marine assessments, have a clear lack of targets and they are more a qualitative description of the actual marine environmental situation ( Reker et al., 2015;Inniss et al., 2016), rather than an assessment of the status. Setting these target values is crucial, since the failure to achieve a good status will require some management measures to improve the situation and achieve the required good status (Elliott et al., 2017). Although most of the hundreds of available indicators have not targets and reference conditions (Teixeira et al., 2016), for some of them targets have been set, after intercalibration at European level (Birk et al., 2012;European Commission, 2018). In other cases, methodologies to set quantitative targets have been proposed and they are available to be used immediately (Rossberg et al., 2017). Hence, to undertake global and regional environmental status assessments, we have already at least eight options to set targets, which we have summarized below (in decreasing order of preference, slightly modified from Borja et al., 2012 Commission, 2018). This is the option used here for D5 and D8.

Spatial
(ii) Using agreed boundaries, accepted by the scientific community or managers, e.g., for different fish and shellfish stocks there are targets proposed by the International Council for the Exploration of the Sea (ICES; Froese et al., 2018), which is the option used here for D3; also there are boundaries for different eutrophication status (eutrophic, mesotrophic, oligotrophic), which can be used in other descriptors and indicators (Ferreira et al., 2011). (iii) Using information from pristine areas, which can be considered as reference, although this is difficult to find in areas subjected to high human pressures, such as Europe. This method is considered the most adequate by some environmental agencies, after those legally binding (Gibson et al., 2000). (iv) Using information from gradients of pressure, which can be used to set the targets. (v) Using targets from existing literature, for indicators used in similar habitats, e.g., Sediment Quality Guidelines at regional level (Bakke et al., 2010;Menchaca et al., 2012Menchaca et al., , 2014. (vi) Model a target, such the proposal in Rossberg et al. (2017), although there are numerous issues in developing and applying such models (see a discussion on the topic in Borja et al., 2012). (vii) Using information from the past (e.g., before any human pressure), although under the current global change this could be very problematic, due to shifting baselines which can prevent their use (Elliott et al., 2015). (viii) The last option could be the use of expert judgment, achieving a consensus in a target value, if possible, working together with managers and stakeholders. We have used this option here for D8, to set boundaries of Moderate/Poor and Poor/Bad status, since in NEAT it is important to avoid large ranges between those classes, since this can tend to classify the SAUs in moderate status. This option is ranked much better by some authors . However, managers are afraid that those targets could be not politically expedient and even legally permissible (sensu Elliott, 2013).
With the current scientific knowledge in Europe we consider that it could be feasible setting targets, using any of these methods, for the indicators to be used in the MSFD assessment. In the meantime, we have demonstrated that with the current targets [options "(i)" and "(ii)"], it is possible to assess the status for several MSFD descriptors.

Difficulty of Aggregating Indicators From Different Sources
The difficulty of aggregating indicators coming from different monitoring networks, covering different spatial and temporal frameworks, and including different ecosystem components (biotic and abiotic) as well as different habitats, has been largely debated Langhans et al., 2014;Link and Browman, 2014;Probst and Lynam, 2016;Gan et al., 2017). Although here we have used same temporal frameworks for the descriptors in the assessment, recently Pavlidou et al.
(2019), explored the use of NEAT in different time periods, demonstrating its applicability and the response to changes in human pressures. Despite the reluctancy of managers to aggregate information coming from different sources, it is more and more accepted that any ecosystem-based approach to marine management requires a certain aggregation of indicators (Levin et al., 2009;Tett et al., 2013). It has been demonstrated that, depending on the aggregation method, the results can be totally different (Langhans et al., 2014;Probst and Lynam, 2016), as we have seen in our research (Table 4). However, the aggregation/integration requires a flexible and adaptive approach (Dickey-Collas, 2014), since each case and study area could require different approaches, depending on different factors, such as the differences in the number of indicators, the coverage of them, both spatial and temporal, etc. In addition, the decision of weighting or not spatially should be discussed at regional seas level. In this way, NEAT has demonstrated such flexibility and the possibility to customize the assessment , not only in our study, but also as shown in previous applications Nemati et al., 2017;Pavlidou et al., 2019). These demonstrations can be useful for managers, when undertaking global assessments.

Criteria on the Number of Indicators to Be Used
Little work has been undertaken in setting the minimum number of indicators to be used when assessing the environmental status of a regional sea or at European scale. When the scale is very small very few indicators can be enough to assess the status, with the same accuracy as using many different, as demonstrated by Nemati et al. (2018) using NEAT. However, at large scale in most cases a number of indicators between 20 and 40 could be enough in NEAT, as shown in Uusitalo et al. (2016). Of course, this will depend on the surveys dedicated to obtain values to calculate those indicators, which would be time consuming and costly, probably with insufficient spatial-temporal coverage (Baudrier et al., 2018). Currently each Member State designs their own surveys and decided on the indicators to be used, although some of them are suggested across all countries as a sort of primary or common indicators (European Commission, 2017), which can make the MSFD assessment more or less comparable across regional seas and countries. In our investigation, 16 out of the 24 SAUs studied needed between 20 and 40 indicators to achieve the final environmental status, with few needing less or more than that amount, with results similar to those obtained by Uusitalo et al. (2016). This probably means that, in most cases, using 40 indicators could be robust enough to assess the environmental status at regional and subregional scales. This is on line with the current recommendation of the European Commission (2017), which includes 27 primary criteria (assimilable to indicators) and 15 secondary criteria when assessing the MSFD status. While primary criteria should be used to ensure consistency across the European Union, flexibility is granted regarding secondary criteria, which is decided by Member States when to use them to complement a primary criterion (European Commission, 2017).

The OOAO Principle
The use of the OOAO principle has been repeatedly criticized (Moss et al., 2003;Moss, 2008;Caroni et al., 2013;Langhans et al., 2014), because it tends to downgrade the quality assessed locations unjustifiably, depending on the number of indicators included in the assessment, as demonstrated in our study and elsewhere (Borja and Rodríguez, 2010). Although this principle is consistent with the precautionary principle, at the same time, tends to inflate Type I errors (concluding that the assessed area is below good status, even if the real status is good). In fact, it has been demonstrated that integrative assessments show better the improvement of estuarine and coastal areas after taking management measures, whilst using the OOAO there is no trend in the improvement because of the probabilities of having individual indicators below the good status (Borja and Rodríguez, 2010). This means that there is a risk of implementing additional management measures to revert the situation where they are not strictly needed (Borja and Rodríguez, 2010). Hence, the OOAO principle increases the likelihood of scoring a lower status class by sheer randomness, whereas the risk of misclassifying to a higher status (than the actual status) becomes less likely (Hering et al., 2010). In our study we have seen that, increasing the number of indicators, ecosystem components or descriptors, the possibility of downgrading the quality status in the assessment increases exponentially. However, this is not a real situation (are the whole European regional seas in bad status, without nuances?), and using NEAT weighting by the surface area of the subunit, subregion or regional sea allows to identify different status in different regional seas. Although most of them need still management measures to achieve good environmental status, those measures can be driven more adequately than looking at OOAO assessment.

Lack of Traceability When Integrating
This barrier links directly with the previous one, since the management measures to be developed need an identification of the problems to be addressed, causing the impairment of the system. Recently, it has been demonstrated that the use of NEAT, spatially and temporally, allows linking the assessment with the human pressures and the measures taken to revert a degraded situation (Pavlidou et al., 2019). Our study shows that the ecosystem components, descriptors or habitats, affected at each SAU level, can easily be tracked in the tables with NEAT results, to determine which of them is causing an impairment in the environmental status. Thus, it is readily possible to identify specific needs for management measures leading to achieve the good environmental status. When necessary, it is possible even going to each individual indicator, just in case a management measure should be taken at that level. This is especially important in the case of specific fish stocks, which are in less than good status in a SAU or regional sea, as can be seen in our results but also in Froese et al. (2018), in which many stocks are in bad or poor status.

CONCLUSIONS
We have demonstrated that using NEAT in assessing the environmental status of large marine areas, under the ecosystem-based approach, by aggregating indicators, ecosystem components and descriptors, at different spatial scales, can contribute to remove at least four out of the six barriers that managers and policy-makers confront when undertaking such assessments, since: (i) we have already indicators, from open-access databases, which are ready to use, although more effort is needed in some descriptors; (ii) we have already suitable reference conditions and targets, for them (and tools to develop them if they are still not available), but agreed targets across regional seas are still needed; (iii) we can aggregate indicators, even if they are of very different origin (biotic and abiotic); (iv) different results show that the use of around 40 indicators could be enough to obtain robust assessments; (v) it is better to integrate under the ecosystembased approach, rather than using the OOAO principle, which mask the actual status; and (vi) using NEAT, managers can track where are the problems needing management measures.

DATA AVAILABILITY STATEMENT
All raw data and results are available in the Supplementary Material.

AUTHOR CONTRIBUTIONS
AB developed the idea of the study. YS obtained and calculated data related to Chlorophyll a. JG, AU, and AB obtained and calculated data related to commercial fish. IM obtained and calculated data related to contaminants. All authors defined the SAUs and included data from indicators in NEAT. AB wrote the first draft of the article and all authors contributed equally to the interpretation of the results and in writing the final manuscript.

FUNDING
This study has been partially funded by a convention between AZTI and the Basque Water Agency (URA) and the project MEDCIS: Support Mediterranean MSs toward Coherent and Coordinated Implementation of the second phase of the MSFD, funded by the European Commission, DG Environment, Grant agreement 11.0661/2016/748067/SUB/E NV.C2.

ACKNOWLEDGMENTS
We thank the comments from three reviewers, which have contributed to improve the first version of the manuscript. This is contribution number 898 from AZTI (Marine Research Division).