Defining ecological reference conditions in naturally stressed environments - How difficult is it?

The present study performed in Horten Inner Harbor (southern Norway) shows that foraminifera link the present-day Ecological Quality Status (EcoQS) to EcoQS of former times and, this way, bridge an important knowledge gap concerning determination of reference conditions, even in naturally stressed environments such as transitional waters and oxygen depleted habitats. In Horten Inner Harbor, geochemical data in the oldest deposits showed stable background concentrations for about 200 years (from about 1600 to 1800) before human activity became noteworthy, reflecting 'good' to 'high' status. Hence, it is reasonable that organisms, which lived in the area during the same nearly un-impacted time interval, represent the biologically defined reference conditions, irrespectively of whether the biotic indices are classified as 'good' or 'bad'. The present paper illustrates, with a conceptual model, how the retrospective foraminiferal biomonitoring method can be used to detect environmental perturbations in estuaries and meet the difficulties of the Estuarine Quality Paradox.


Introduction
Fjords are a type of estuary (e.g., Syvitski et al., 1987). They are relatively sheltered coastal marine areas, often with a naturally stressed environment due to the presence of one or more sills, estuarine circulation with high fresh-water input, stratified water columns, and irregular deep-water exchange that may lead to natural oxygen depletion. Furthermore, they are commonly preferential urbanization areas and thus, exposed to domestic and industrial effluents. Because of their high sediment accumulation rates (commonly >1 mm yr À 1 ), they serve as pollution traps and sediment sinks for e.g., organic material (e.g., Alve, 2000;Howe et al., 2010;Skei, 1996).
Increasing population growth along the coasts and intensified human-induced activities (e.g., agricultural land-use, industrial activity) may result in severe pollution (including enhanced supply of nutrients, metals, and organic contaminants) in the marine system. In addition, cultural eutrophication is often identified as one of the main causes of oxygen deficiency in bottom waters and sediments (e.g., Aure et al., 1996;Dale et al., 1999;Johannessen and Dahl, 1996). These human-induced environmental conditions affect the benthic faunal community and may cause e.g., an increase in opportunistic and tolerant species and a decline in faunal diversity and biomass (e.g., Diaz and Rosenberg, 1995;Gray et al., 2002;Levin et al., 2009).
All countries implementing EU's Water Framework Directive (WFD) are committed to obtain 'good' or 'high' Ecological Quality Status (EcoQS) in their coastal waterbodies by 2020 (European Parliament, 2008). For areas with less than 'good' status, improvement is needed to obtain conditions similar to the ecological reference conditions, i.e., conditions "with no, or very minor disturbance from human activities" (European Commission, 2003, p. 36). Currently, due to lack of long biological time series, reference conditions are defined either by comparison with reference sites of comparable ecoregions and water types or expert judgement (European Commission, 2003).
Evaluating the EcoQS in transitional waters (TW) and naturally oxygen depleted fjord environments is a challenge. Faunal communities living in such naturally variable environments are adapted to cope with temporally changing environmental parameters (such as salinity, temperature, organic matter input) and might show faunal characteristics (e.g., low diversity, high abundance of stress-tolerant species) similar to assemblages exposed to anthropogenic stress. This difficulty of separating natural and anthropogenic stress in estuaries is known and described as the 'Estuarine Quality Paradox' (Dauvin, 2007;Dauvin and Ruellet, 2009;Elliot and Quintino, 2007). This paradox, together with the lack of long time-series, makes the ability to define reference conditions the main uncertainty when classifying ecological status based on soft bottom macrofauna (the traditionally used biological quality element). In recent years, quantitative analyses of fossil benthic foraminifera in dated sediment cores have been used to determine in situ reference conditions in fully marine, subtidal habitats (e.g., Alve et al., 2009;Dolven et al., 2013;Polovodova Asteman et al., 2015). Norwegian national guidelines now recommend the retrospective foraminiferal method for defining reference conditions in coastal regions, where unaffected reference sites are difficult to find (Veileder 02: 2018). This foraminiferal method allows detecting anthropogenic stress against a background of natural conditions by the use of a number of biotic and abiotic parameters.
Within the mentioned context, the overall aim of the present study is to investigate, if reference conditions, sensu the WFD, can be defined in naturally stressed fjord environments (e.g., brackish) where species diversity can be naturally low and the 'Estuarine Quality Paradox' applies. To achieve this, temporal changes (past 3-400 yrs) in benthic foraminiferal assemblages and associated sediment-and geochemical parameters in the present-day polluted Horten Inner Harbor (southern Norway) have been analyzed.

Investigation area and its pollution history
The Oslofjord, south-eastern Norway, is separated from the northern Skagerrak by a ~120 m deep sill. Horten Inner Harbor is an enclosed, small (3.8 km 2 ) and shallow (max. 27 m water depth) basin at the western side of the outer Oslofjord ( Fig. 1). In addition to a few shallow sounds (<6 m water depth), the inner harbor basin is separated from the Oslofjord by a sill at 9 m water depth. The astronomical tidal range is small, ca. 20 cm, and the basin has no major fluvial inputs. Still, there is a stratification of the water masses with a pronounced halocline between 8 and 10 m water depth. Above the halocline, the salinity and temperature ranges are 15-26 and 4-6 � C (in March), respectively, and reach values of 33 and 8 � C in the basin (NGI, 2014). Since the environmental monitoring studies in the late 1980s (Baalsrud, 1990), the dissolved oxygen concentrations at > 10 m water depth have been continuously so low (<0.6 ml/l) that hardly any living macrofauna has been observed (Lund, 2013;Saunes and Konieczny, 2013;Walday et al., 2012).
For centuries, Horten Inner Harbor has been an easily defendable, enclosed basin. This was the main reason why it was established as Norway's main naval base and shipyard in 1818. Additionally, the area was established as a municipality in 1837, and got the official status as a trade place in 1857. The activities led to increased urban settlement and the city of Horten was established in 1907. From 1849 to 1953, the area was used as a base and shipyard for the Navy and subsequently as a shipyard for civilian ships until 1987 and other industry until 2000. In 1945, the base and shipyard were completely destroyed in a bomb attack but soon rebuilt (Krokaas, 2012). The shipyards typically discharged disposals like oils, sandblasting components, and antifouling paints, containing PAHs, metals, PCBs and TBT. Over the years parts of the shoreline became landfills created by the municipality, armed forces, and local industries. The landfills were in action during most of the 1900s and were closed down between 1979 and 1993. Their most active periods were during the 1960s and 1970s, and typical pollutants included various kinds of hydrocarbons, slag, lead and other metals (Krokaas, 2012).
A number of recent environmental studies have documented that the sediments in parts of Horten Inner Harbor are contaminated with heavy metals and organic pollutants (e.g., Øxnevad et al., 2011a, b). Therefore, remediation work was initiated by the Norwegian Defense Estate Agency (Forsvarsbygg), municipality of Horten (Horten kommune), Horten havn, Horten Industripark AS, and the County Governor of Vestfold in winter 2019.

Field work
Sediment cores from three sites in Horten Inner Harbor ( Fig. 1) were collected in March 2014 using the RV Trygve Braarud. Three core sites, HIH-3, HIH-1, and HIH-4 at 8, 12 and 20 m water depth, respectively, were chosen in order to obtain an optimal coverage of the dissolved oxygen gradient of the water masses in the basin (Table 1). At each site, at least three sediment cores were collected using a Gemini (80 mm inner diameter) or an Abdullah (56 mm inner diameter) gravity corer. The latter can provide longer sediment cores (>70 cm length) than the former. Only sediment cores with an undisturbed sediment-water interface were selected for down-core analyses.
Immediately after retrieval, the salinity, temperature and dissolved oxygen concentration in the water just above the sediment-water interface were measured using an YSI Pro ODO optical oxygen-meter. In addition, the sediment pore water oxygen concentration in one sediment core from the oxygenated site HIH-3, located at 8 m water depth, was measured with a Unisense Oxygen Micro Sensor (Table 1). Subsequently, two of the longest Gemini sediment cores from each site were sectioned into 1 cm slices down to 20 cm core depth and into 2 cm slices below 20 cm. The remaining cores were pushed out on deck, vertically split in two and used for sediment profile description. At site HIH-4 (20 m water depth), the 52-80 cm core interval of the Abdullah sediment core was subsampled and used as an extension of the shorter Gemini cores. The position of the overlap zone between the two cores was later based on the sediment water content. All sediment core samples were immediately frozen after sectioning. For living foraminiferal assemblage studies, surface samples (0-1 cm) from three replicate Gemini cores per site (8, 12 and 20 m water depth) were preserved in a 70%-ethanol/rose Bengal (2 g/l) solution to differentiate living (stained) from dead tests (Murray and Bowser, 2000;Walton, 1952).

Laboratory analyses
All sediment core samples were freeze-dried and the water content was calculated to detect possible unconformities in the sediments. Subsamples (>5 g dry weight) of one replicate core from each site were analyzed for radiometric activity of 210 Pb, 226 Ra and 137 Cs at the Gamma Dating Center, University of Copenhagen (Denmark). The measurements were carried out on a Canberra ultralow-background Ge-detector. Chronologies and sediment accumulation rates for the cores were calculated using a modified version of Appleby's (2001) constant rate of supply (CRS) model. Ages for core levels below the oldest dated level were calculated based on linear extrapolation assuming constant sedimentation rates. To indicate the uncertainty connected to these estimations the extrapolated ages are written in Italics in the figures. From now on, the dated cores are termed "the 8m-core, the 12m-core, and the 20m-core" reflecting the water depth at the respective sampling sites.
Samples for total organic carbon (TOC) and nitrogen (TN) analyses were acid treated (1M HCl), rinsed with distilled water, dried at 50 � C, and analyzed using a Flash EA 1112 NC Analyzer. The TOC values were corrected for the sand content and are expressed as TOC in the fraction <63 μm (TOC 63 ) according to Norwegian guidelines (Veileder 02: 2018). Stable carbon isotope analyses of dry bulk sediment from the 12m-core were performed by the Iso-Analytical laboratory in Crewe (UK). Stable carbon isotopes and C/N ratios were used to achieve information of the origin of the organic material in the sediments. Marine particulate organic carbon (POC), mainly reflecting marine phytoplankton, typically has stable carbon isotope values ranging from À 21‰ to À 18‰ and, as it tends to be nitrogen rich, it has relatively low C/N ratios, typically < 8 (Lamb et al., 2006 and references therein).
Sub-samples for heavy metal and organic contaminant analyses were performed at ALcontrol Laboratories in Link€ oping (Sweden). Sample treatment and analyses were performed according to Norwegian standard (NS4770 1994), which is the accepted methodology for classification of environmental quality in Norwegian coastal areas.
For fossil (dead) foraminiferal analyses c. 5 g dry sediment was precisely weighed and washed over a 63 μm sieve. After drying at 40 � C, the fraction >63 μm was weighed and sand content was calculated as percentage of the weight of the washed sample following the Norwegian guidelines (Veileder 02: 2018). Subsequently, from each sample about 250 foraminiferal tests were picked, counted and determined to species level, when possible. Absolute abundances were standardized to foraminiferal tests per g dry sediment (tests g À 1 dry sed.). Benthic foraminiferal accumulation rates, BFAR (tests cm À 2 y À 1 ) were calculated according to Herguera (1992). Living (stained) foraminiferal assemblage (0-1 cm) analyses mainly followed the FOBIMO protocol (Sch€ onfeld et al., 2012). Due to the generally low absolute abundance and the dominance of small (<125 μm) species the size fraction >63 μm was analyzed in this study. The diversity indices Shannon-Wiener (H 0 log 2 ) (Shannon and Weaver, 1963), Hurlbert's index (ES100) (Hurlbert, 1971), and exp(H' bc ) (Chao and Shen, 2003;Hill, 1973) were calculated for all samples except those which contained < 100 foraminiferal tests. The two first mentioned indices are recommended in the Norwegian guidelines for biological classification of environmental status (Veileder 02: 2018), whereas a classification system based on exp(H' bc ) was suggested by Bouchet et al. (2012). For EcoQS class boundaries, see Table 2 ( Alve et al., 2019;Bouchet et al., 2012). In order to obtain one unified expression for each sample based on all calculated biotic indices, a normalised Ecological Quality Ratio (nEQR) was calculated following the Norwegian guidelines (Veileder 02:2018: nEQR per index per sample ¼ (index valuelower class value of the index) / (upper class value of the indexlower class value of the index) * 0.2 þ lower class value of nEQR). Finally, the average of the indices' nEQR values was calculated to give the average nEQR value per sample or time. The class boundary values for nEQR are shown in Table 2.

Sediment chronology and accumulation rates
The 8m-core showed an unsupported 210 Pb profile with an almost exponential decline in the upper 12 cm. A broad 137 Cs peak occurred around 8 cm core depth, corresponding to the Chernobyl 1986 accident. However, the broad depth range of this peak indicated some bioturbation at this site. Concentrations of 210 Pb suggested a sediment accumulation rate of c. 0.07 g cm À 2 y À 1 with an average sedimentation rate of approx. 2.5 mm y À 1 . A core chronology was calculated for the upper 11 cm (post-1970; Fig. 2, Table 3). Down-core extrapolation is somewhat uncertain due to the high sand content at the base of the core that possibly indicates re-sedimentation.
The 12m-core showed an unsupported 210 Pb profile with a clear exponential decline with depth in the upper 15 cm. The 137 Cs concentrations showed a distinct peak around 8 cm core depth, which can be considered a reliable chronostratigraphic marker, corresponding to the atmospheric fallout from the Chernobyl 1986 accident. Concentrations of 210 Pb suggested a relatively stable sediment accumulation rate of c. 0.05 g cm À 2 y À 1 with an average sedimentation rate of approx. 2 mm y À 1 . The 210 Pb dating of the 12m-core indicates that the recovered sediment record extends beyond 1800 (Fig. 2).
The sediments of the 20m-core showed an irregular decline of unsupported 210 Pb in the upper 6 cm, which could indicate some periods with sediment slumping or higher sediment accumulation. The 137 Cs concentrations showed a distinct peak around 8 cm depth, which is most likely related to the Chernobyl 1986 accident. The chronology has therefore been calculated by using this peak as a reference date for 1986. The sediment accumulation rate was not constant throughout the core but fluctuated around 0.04 g cm À 2 y À 1 with an average sedimentation rate of approx. 1.7 mm y À 1 . A core chronology was calculated for the upper 12 cm (post-1950s; Fig. 2) and down-core extrapolation indicates that the recovered sediment record extends beyond 1860 (Table 3).

Hydrography and sediment characteristics at the sampling sites
At the shallow-water site HIH-3 (8 m w.d.), the bottom temperature was 6 � C, the salinity 26, and the bottom water oxygen concentration was 4.9 ml/l at the time of collection (Table 1).
The sediment cores from this site consisted of grey, sandy mud. The water content ranged between 35 and 85% and was highest in the core top. The sediments became gradually browner, softer and finer upwards the sediment column. The sediment surface was light brown, indicating well-oxygenated conditions, and some polychaete tubes were observed. The oxygen concentration in the sediment column decreased rapidly and reached the zero concentration at only 3 mm depth (Table 1). The sandsized fraction (>63 μm) ranged between 1.3 and 48.1%, reaching maximum values in the lowermost 10 cm of the core (Fig. 3). Shell fragments and plant debris occurred regularly in the sediments.
At the 12m-and 20m-core sites, the bottom temperature was 8 � C and salinity 31 and 33, respectively, at the time of sampling. Oxygen concentrations in the bottom water were low (0.6 ml/l and 0.4 ml/l, respectively) (Table 1).
Both the 12m-and 20m-core were located in the deeper harbor basin (Fig. 1). These sediment cores consisted of homogenous, medium grey mud containing small shell fragments of molluscs up to approx. 25 cm core depth. The sediment got gradually darker and softer upwards. The Table 2 Classification system for environmental and ecological quality status used in this study (based on Veileder 02:2018, * Alve et al., 2019, and**Bouchet et al., 2012). water content ranged between 54 and 89% with highest values in the top layers. The upper 20 cm of the sediment column were black, laminated and H 2 S smell was detected during sampling. No sign of life was observed on the sediment surfaces. The sand-sized fraction (>63 μm) at both sites was generally low, around 0.5% and reached a maximum of 5.5% in the deepest parts of the cores (Fig. 3). Plant debris occurred regularly in the uppermost c. 40 cm of the cores. Coal and slag particles were found in the sediment intervals between 12 and 25 cm at both sites.

Sediment geochemistry
Total organic carbon (TOC 63 ) concentrations were lowest in the deeper core intervals in all analyzed sediment cores and showed an increasing trend upwards. Highest values occurred in the surface sediments (Fig. 3). The organic carbon accumulation rates were comparable at all sites, ranging from 11 to 42 g C m À 2 y À 1 (8m-core), 10-32 g C m À 2 y À 1 (12m-core) and 7-63 g C m À 2 y À 1 (20m-core) (Fig. 4). In general, the organic carbon accumulation rate was stable until the early to middle 1800s and increased continuously thereafter.
The C/N ratio showed the same trend in all cores with stable values around 10 upwards until the last half of the 1800s, and increasing values to a maximum around 13 in the middle of the 1900s, followed by a decrease to around 8 in the surface sediment (Figs. 3 and 4). The stable organic carbon isotope data (δ 13 C org ) of bulk sediment in the 12m-core showed stable values around À 21.5‰ upwards until the last half of the 1800s. The values then decreased to minima around À 23‰ in the middle of the 1900s, followed by a final increase to about À 22‰ in the surface sediment layers (Fig. 4).
Heavy metal (Cu, Pb, Hg) concentrations showed similar distribution patterns in all analyzed cores and were stable at low levels in the deeper core intervals, indicating 'good' to 'high' status (Fig. 3). Metal accumulation rates were stable until the early 1800s and increased continuously until they culminated with maximum values between the 1920s and 1970s. All metal values subsequently decreased after the mid-1980s and indicated 'good' status in the surface sediments (Figs. 3, 5).
Concentrations of polycyclic aromatic hydrocarbons (PAHs) and polychlorinated biphenyls (PCBs) showed low background values ('good' status) until the early 1900s and increased to maximum values between 1920 and 1980 (indicating 'moderate' to 'bad' conditions). Concentrations declined during the past 2 decades at all three sites (Supplementary Tables 1-3).
Tributyltin (TBT) concentrations reflect the development from the 1950s to the day of collection (2014). Highest concentrations occurred in the 12m-and 20m-cores in the late 1990s. TBT concentrations were lower in the top-most sediments but still reflected 'bad' status (Supplementary Tables 1-3).
The foraminiferal records of the 12m-and 20m-cores were similar. Benthic foraminiferal accumulation rates (BFAR) were very low, reaching highest values in the lowermost sediment horizons with 16 and 2 tests cm À 2 y À 1 , respectively (Fig. 3). Accumulation rates decreased up core and values < 1 tests cm À 2 y À 1 were recorded above approx. 50 cm core depth. Benthic foraminifera were nearly absent from the beginning of the 1600s. The most abundant species in these two cores were S. fusiformis, Bulimina marginata, Elphidium albiumbilicatum and E. excavatum (Supplementary Table 4). The average nEQRs indicated 'moderate' to 'poor' status in the 12m-and 'poor' to 'bad' status in 20mcores (Fig. 3). No living (stained) foraminifera were found in surface sediments (0-1 cm) at these two sites (Supplementary Table 4).

Pollution history
The investigated cores from three sites in Horten Inner Harbor (Norway) reflected shallow-water (8-20 m water depth), brackish to marine sequences deposited in a semi-enclosed basin during the past 400 years (i.e., since the early 1600s). Most investigated metals and some organic compounds (e.g., PAH) occur in the sediments for natural reasons. At all sites, such naturally occurring, potential pollutants were present at low, stable background concentrations in the oldest, lower parts of the sediment cores representing 'good'or 'high' status (Veileder 2:2018) for at least 200 years (Fig. 3, Supplementary Tables 1-3).
The concentration of all analyzed potential pollutants started increasing during the 1800s. The pollution history of the area indicates that the increase was due to the growing human activity (see section 2). Concentrations of potential pollutants in the sediments started increasing from the background levels and reached maxima during the middle to late part of the 1900s. PAHs increased in connection with shipyard activities and extensive wood and charcoal usage late in the 1800s. This is supported by increased TOC and C/N values, the presence of wood and coal fragments, and the low stable carbon isotope values (e. g., Meyers, 1994;Lamb et al., 2006, and references therein) between the late1800s and late 1900s (Fig. 4). The distribution of PCBs (Supplementary Tables 1-3) reflected their general historical usage in Norway. They were introduced early in the 1940s (e.g., in paints and coolants),   the concentrations peaked during the following decades and declined following their ban in Norway in 1980 (Arp et al., 2011). From the 1960s onwards, TBT was used as an agent in antifouling paints (Arp et al., 2014;Hugget et al., 1992) until it was banned in Norway in 2003. Accordingly, the sediments showed increasing TBT concentrations from the mid-1900s; it peaked during the 1990s, and decreased in the sediments deposited during the past few decades, although the surface sediments are still classified as 'bad' status (Supplementary Tables 1-3). Lead (Pb) and mercury (Hg) concentrations reached maximum values between the 1960s and 1970s, while copper (Cu) concentrations peaked in the 1980s-1990s (Fig. 5, Supplementary Tables 1-3). All were common components of antifouling paints and extensively used in connection with shipyard activities and boat traffic. The analyzed metals showed decreasing concentrations towards the most recently deposited sediments where they approached levels representing 'good' status (Fig. 3). In the same way as for inner Oslofjord the subsurface metalmaxima, which represent the mid to late 1900s, are probably not significantly influenced by diagenetic remobilization because the upcore metal-distribution profiles occur in cores with oxic as well as in cores with anoxic surface sediments (Dolven et al., 2013;Lepland et al., 2010).
Organic matter accumulation over several hundred years and a gradual isostatic rebound (see section 5.2), probably had a negative influence on the oxygen conditions in the basin at the 12m-and 20m-sites. Consequently, a long period of anoxic bottom conditions, in addition to a continuous, high supply of organic matter (either directly or, during the 1900s, indirectly through eutrophication; decreased C:N, Fig. 3), hampered a natural re-establishment of improved oxygen conditions in the deeper basin.
In concert, the above reflects 1) low, stable background concentrations of naturally occurring, potential pollutants in the oldest, lower parts of the sediment cores representing 'good' or 'high' status, 2) temporally increasing concentrations of pollutants in the sediments due to local industrial development, and 3) recent (since 1990s) improvement of geochemical status which is now approaching background conditions. This indicates that the environmental conditions in Horten Inner Harbor are improving and there is a chance of natural recovery as long as sediment resuspension can be avoided. The temporal variation in concentration of geochemical parameters from reference-to present-day conditions is illustrated by the left hand diagram in Fig. 6.

Conceptual model to define reference conditions
Various approaches to define ecological reference conditions in marine and transitional waters (TW) have been suggested (e.g., Borja et al., 2012 and references therein). In this study, we define ecological reference conditions as value(s) of biotic indices that existed during a time interval of decades to centuries with stable geochemical conditions Fig. 6. Conceptual model illustrating the temporal development of in situ ecological status from pre-impacted reference conditions (1), through deteriorating periods (2) to potential recovered conditions (3), based on the retrospective foraminiferal biomonitoring method. The upper, dashed lines reflect different potential scenarios. A ¼ normal marine coastal waters, B ¼ naturally oxygen depleted waters, and C ¼ transitional waters (TW). Alterations between status classes in the lower column parts reflect natural variability.
reflecting "no or very minor disturbance from human activity" (European Commission, 2003, p. 36). This implies that biotic indices, reflecting the ecological reference conditions will, for natural reasons, be low in stressful environments (i.e., naturally oxygen depleted waters or TW) and higher under well oxygenated, normal marine environmental conditions (Fig. 6). Reference conditions may reflect natural variability as indicated in the lower parts of the columns in Figure 6 (cf., European Commission, 2003, p. 37).

Coastal waters
In coastal, marine soft bottom habitats, benthic invertebrate macrofauna is the only accepted faunal biological quality element (BQE) to classify ecological quality status (EcoQS), including reference conditions (European Commission, 2003). However, due to lack of long time series, reference conditions cannot be directly determined using macrofauna and this is a recurring problem (European Commission, 2003, p. 41). An alternative method is to base the classification on fossilisable species of benthic foraminifera in dated sediment cores, back to pre-impact times, as suggested by Alve et al. (2009). How can the use of foraminifera be relevant for ecological information required to be based on macrofauna? Recently, quantitative comparisons have shown that living (stained) foraminiferal and macrofaunal community compositions in samples collected at the same time from the same coastal, marine stations correlated significantly (cross-taxon congruence), implying that foraminiferal distribution patterns parallel those of the associated macrofauna (Bouchet et al., 2018). Additionally, applying the same diversity and sensitivity indices on the two groups of organisms has shown that they reflect the present-day environmental conditions similarly (e.g., Alve et al., 2019;Wlodarska-Kowalczuk et al., 2013). Finally, an intercalibration study between biotic indices based on the two groups showed that it was possible to define ecological classes and class boundaries for foraminifera-based EcoQS in present-day coastal waters using the Norwegian macrofauna-based classification system (Alve et al., 2019). Since the intercalibration was based on fossilisable, living (stained) foraminiferal species, the classification system also allows assessment of historical (biostratigraphical) changes in EcoQS beyond time intervals for which macroinvertebrate time series exist. Hence, in coastal, marine environments, foraminifera serve the function of bridging the gap between information based on present-day EcoQS and the in situ Paleo-EcoQS of former times, back to reference conditions (e. g., Alve et al., 2009;Dolven et al., 2013;Duffield et al., 2017;Polovodova Asteman et al., 2015). This temporal development of the ecological status from reference conditions in pre-impacted deposits through younger, impacted, deposits to present-day status is illustrated in Fig. 6A.

Naturally oxygen depleted waters
As opposed to the geochemical parameters that showed stable, environmental background concentrations until the late 1700s in Horten Inner Harbor, the associated foraminiferal assemblages at the two deeper sites (12 and 20 m), reflected declining ecological status during the same time interval (Figs. 3 and 5). The abundance of foraminifera decreased and the species diversity, which reflected 'bad' to 'poor' nEQR already during the 1600s and 1700s, reached minimum values before the assemblages dominated by S. fusiformis and B. marginata disappeared at the end of the 1700s. While most benthic foraminifera require oxic bottom water conditions (Murray, 2006, p. 56), some species like S. fusiformis and B. marginata are well-known opportunistic, low-oxygen tolerant species (e.g., Alve and Bernhard, 1995;Alve, 2003;Hess et al., 2014) that profit from slow food degradation rate in the sediment (Bernhard and Sen Gupta, 1999). Stainforthia fusiformis even thrives under hypoxic to severe hypoxic conditions (Bouchet et al., 2018) and may be able to survive anoxia for weeks by performing anaerobic metabolism (e.g., Risgaard-Petersen et al., 2006). However, no foraminifera survive in basins with permanently to nearly permanently anoxic bottom waters (Alve, 1995).
At assemblage level, the diversity of living benthic foraminifera in marine fjord environments has shown to be significantly positively correlated with bottom water dissolved oxygen concentration, i.e., decreased foraminiferal diversity along a decreasing oxygen gradient reflect deteriorating conditions (Bouchet et al., 2012). Consequently, based on the dominance of S. fusiformis and B. marginata and low species diversity, the most likely explanation for the deteriorating ecological status during the 1600s and 1700s is that the reference conditions in the deeper areas developed from strongly oxygen depleted to nearly permanently anoxic conditions for natural reasons. The reference conditions in Horten Inner Harbor have higher TOC-values (about 2%, Fig. 3) than many other southern Norwegian fjord deposits (<1%) of similar age (e.g., Alve, 1991Alve, , 2000. A possible reason for the deteriorating oxygen conditions already during the 1600s and 1700s may therefore have been a combination of the naturally occurring organically enriched sediments and increased residence time of the bottom water below about 10 m water depth. The latter may have developed as a result of the gradual isostatic rebound that has been about 2-3 mm per year during the last 1000 years (Sørensen et al., 2014), representing about 1 m elevation of the land in 400 years. With the present-day halocline in the harbor at about 10 m water depth and the sill connecting the harbor water with the Oslofjord (Fig. 1) at 8-9 m, the residence time of the bottom water has been sensitive to elevation of the sill. This natural development continued until the time of sampling (2014). It implies that the basin water in Horten Inner Harbor reflected by the deeper parts of the 12m-and 20m-cores first were subject to natural stress (oxygen depletion), then additionally, to human induced stress. Together, this caused development of successively more oxygen depleted, and finally anoxic, bottom water conditions devoid of benthic foraminifera during the past few centuries (Fig. 3). Continued elevation of the land will probably not allow natural re-oxygenation in the deeper parts. A model of the temporal, faunal development from a combination of unpolluted ('good'-'high' geochemical status) and naturally stressed, oxygen depleted reference conditions ('poor'-'bad' ecological status) to increasingly polluted and strongly oxygen depleted-anoxic bottom water conditions with no foraminifera is shown in Fig. 6B.

Transitional waters
For natural reasons, transitional waters (TW) have lower species diversity than most coastal waters. This is because TW are naturally stressed and have a high abundance of stress tolerant species. This makes it difficult to distinguish between naturally and human-induced stress factors and is known as the Estuarine Quality Paradox (Elliott and Quintino, 2007;Dauvin, 2007;Dauvin and Ruellet, 2009). Still, according to the WFD's classification system, un-impacted TW represent 'moderate' or worse ecological status, implying that action is needed to improve the conditions. How then, is it possible to define acceptable reference conditions in such environments? Is it possible to use a similar approach as shown above for coastal and naturally oxygen depleted waters? Although not characterized using governmental classification systems, numerous biostratigraphic studies from transitional waters worldwide indicate that this is the case (e.g., Cearreta et al., 2002;Hayward et al., 2004;Tsujimoto et al., 2006;Francescangeli et al., 2016).
The geochemical parameters in the older sediments in Horten Inner Harbor from the 8m-site showed stable background concentrations reflecting reference conditions of at least 'good' status until the late 1800s (Figs. 3 and 5). On the other hand, the associated foraminiferal assemblages had low species diversity reflecting only 'moderate' reference conditions during the same time interval. It is reasonable to infer that the organisms which lived at the site during times when the sediments were not polluted represent the in situ biologically defined reference conditions, irrespectively of whether the biotic indices are classified as 'good' or 'bad'. In other words, as long as the geochemical status reflects nearly un-impacted reference conditions, the associated biota represents the ecological reference conditions irrespectively of how their biotic indices (according to the current WFD-system) are classified. In concert, this implies that the natural background environment (reference conditions), rather than pollution, was the reason for the 'moderate' ecological status.
At this shallow, 8m-site there are no indications of oxygen depletion in the older deposits that are characterised by e.g. H. germanica and C. williamsoni. Both species are common "intertidal to shallow-subtidal (6 m) temperate brackish" species in the Skagerrak/Kattegat area (Murray, 2006, p. 79). Pigment analyses have indicated that they live in symbiotic association with diatoms or their chloroplasts (Knight and Mantoura, 1985). For H. germanica, this has been further supported by its ability to crack open and feed on/sequester living diatoms on mudflats (Austin et al., 2005). Consequently, the low species diversity and the assemblage composition in the older sediments reflect pre-impacted, estuarine, brackish water depositional conditions. This inference also fits with the position of the halocline between 8 and 10 m water depth at the time of sampling and a salinity of about 26 at the 8 m site (Table 1).
During the first half of the 1900s, the geochemical conditions deteriorated while the nEQR changed from 'moderate' to 'poor' (Fig. 3). Possibly, due to further organic enrichment and nutrients from the city of Horten in the late 1900s, the ecological status deteriorated to 'bad' status. The development, although less severe, followed the same pattern as that of the two deeper sites; the foraminifera disappeared during the 1980s, they were gone for a few decades but (as opposed to the deeper sites) re-appeared with a very low-diverse, living (stained) assemblage around the time of collection in 2014 (Fig. 3). This probably reflects a positive trend in recent times as also indicated by the geochemical data.
Overall, the temporal faunal development at the 8m-site reflects variable and stressful conditions during the 200 yrs time span represented by the analyzed sediments. Initially, this was due to characteristics of naturally stressed transitional waters with reference conditions represented by 'moderate' to 'poor' ecological status while the geochemical conditions reflected 'good' to 'high' status. The conditions finally deteriorated during increasingly impacted times and the ecological status turned to 'bad' as schematically illustrated in Fig. 6C. These in situ reference conditions represent the "type-specific" conditions sensu the European Commission (2003). The biotic indices used should be part of a classification system that fill the criteria set by the WFD.

Conclusion
Within the Water Framework Directive, naturally stressed environments will, due to naturally low species diversity, be classified as having 'moderate' or worse EcoQS. Strictly speaking, this implies that management action is needed to improve the conditions, even though the conditions are natural. Consequently, it is crucial to be able to determine the ecological quality of the in situ reference conditions i.e., to evaluate if the conditions are stressful for natural reasons or if they are stressful due to anthropogenic forcing.
Horten Inner Harbor is an example where the local geochemical reference conditions (the natural background), reflected 'good' to 'high' status at all three investigated sites. On the other hand, during the same time interval, two different kinds of ecological reference conditions were defined: 1) naturally oxygen depleted conditions in the two deepest basins (the 12 and 20 m sites) reflecting 'poor' to 'bad' status and 2) transitional, estuarine waters with naturally variable but low species diversity reflecting 'moderate' status at the shallower 8-m site. Both show temporally stable geochemical reference conditions with 'good' to 'high' status, whereas the associated biotic indices are classified as worse than 'good' due to the naturally stressful conditions.
The present study underlines that retrospective foraminiferal analyses of dated sediment cores can be used in combination with geochemical analyses to determine ecological reference conditions in naturally stressed environments, in which species diversity is naturally low. It illustrates, with a conceptual model, how the retrospective foraminiferal biomonitoring method allows assessing information about ecological status developments from reference to present day conditions. Our results represent another example showing the potential of benthic foraminifera in biomonitoring. Through their fossil record they provide additional ecological information about in situ reference conditions which is not achievable by the use of traditional tools such as macrofauna. Therefore, we suggest that benthic foraminifera should be accepted as a Biological Quality Element within the WFD.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.