Predicting New Zealand riverine fish reference assemblages

Biomonitoring is a common method to monitor environmental change in river ecosystems, a key advantage of biomonitoring over snap-shot physicochemical monitoring is that it provides a more stable, long-term insight into change that is also effects-based. In New Zealand, the main biomonitoring method is a macroinvertebrate sensitivity scoring index, with little established methods available for biomonitoring of fish. This study models the contemporary distribution of common freshwater fish and then uses those models to predict freshwater fish assemblages for each river reach under reference conditions. Comparison of current fish assemblages with those predicted in reference conditions (as observed/expected (O/E) ratios) may provide a suitable option for freshwater fish biomonitoring. Most of the fish communities throughout the central North Island and lower reaches show substantial deviation from the modelled reference community. Most of this deviation is explained by nutrient enrichment, followed by downstream barriers (i.e. dams) and loss of riparian vegetation. The presence of modelled introduced species had relatively little impact on the presence of the modelled native fish. The maps of O/E fish assemblage may provide a rapid way to identify potential restoration sites.


INTRODUCTION
Biomonitoring is the use of biota to detect and track change in an environment and underpins much of the environmental management in developed countries (Friberg et al., 2011;Li, Zheng & Liu, 2010). In aquatic systems, physicochemical monitoring, such as the measurement of nutrient and sediment concentrations, is popular the world over. However, physicochemical monitoring often only provides periodic snapshots of water quality that likely mislead environmental managers on the health of the system (Hazelton, 1998). One common example is where rivers are spot sampled for dissolved oxygen concentration during the day, despite dissolved oxygen having large diurnal fluctuations. Whilst dissolved oxygen may be at sufficient levels during the day, at night (when photosynthesis is not occurring) levels may plummet to stressful or even lethal levels, yet these minima are overlooked by seemingly healthy day-time concentrations (Hazelton, 1998). Whereas biological communities are continually exposed to the ranges of Clapcott et al. (2017) show that one way to potentially circumvent the lack of suitable reference sites is to model the communities across all conditions (allowing the use of many sites), allowing for the encapsulation of responses across a gradient of anthropogenic impact. The model can then be used to predict communities at a defined reference state.
Using a similar approach to that by Clapcott et al. (2017), this study aims to develop an O/E indicator for New Zealand riverine fish and decapods. This first involves modelling the current distribution of common fish throughout New Zealand, then predicting fish distribution under defined reference conditions and calculating the O/E ratio for each river reach. A secondary aim is to then explore the potential anthropogenic impacts driving fish O/E scores.

Fish data
Fish and decapod presence absence data were sourced from the New Zealand Freshwater Fish Database (NZFFD) (Richardson, 1989). Only sites that were sampled since 2000 and during the Summer period (December through to March inclusive) using electric fishing over a minimum reach of 150 m (as suggested by Joy, David & Lake (2013)) were included in the study. Restricting our analysis to Summer months is to minimise the influence of migration on predicted distribution. As a result, the predicted probability of fish occurrences only applies to electric fishing in wadable rivers. Furthermore, the influence of temporal changes in fish communities since 2000 were not considered, this represents a trade-off between recent data whilst retaining sufficiently enough surveys of each fish species. Furthermore, Crow et al. (2016) assessed temporal changes in the NZFFD records and found that most species (except Brown trout, Canterbury galaxias and Shortfin Eel) had indeterminate trends between 1977 and 2015. Despite this, Joy (2009Joy ( , 2014 shows national decline in the Fish IBI (Joy & Death, 2004) at the decadal scale. To ensure a sufficient site selection for the models to learn habitat, only fish species that were present in at least 150 sites were included. Where sites had multiple survey records, only one survey record was included (randomly selected). Overall, 24 native fishes and eight exotic fishes (Table 1) across 19,892 sites were modelled (only native fish were included in the O/E indicator, exotic fish were modelled to explore their impacts).

Environmental data
At each river reach, most environmental variables were extracted from the Freshwater Environments New Zealand (FENZ) geodatabase (Leathwick et al., 2010), except for the nitrate-nitrogen (N) and dissolved reactive phosphorus (DRP) predicted concentrations which were sourced from Unwin & Larned (2013), and the hydrological characteristics which were sourced from Booker & Woods (2014) ( Table 2).
The FENZ geodatabase also classifies river reaches into groups with similar environmental conditions (Leathwick et al., 2010(Leathwick et al., , 2011. The classifications were made using Generalised Dissimilarity Modelling that used the FENZ geodatabase to explain the biological dissimilarity in fish and macroinvertebrate distributions (Leathwick et al., 2011). To reduce the risk of over-extrapolation, only river classes that contained at least 1,000 fish survey sites were included in the exercise, thus all predictions are restricted only to those river classes (A, C, G and H) where sufficient records exist.

Fish distribution models
Using all records from FENZ river classes with at least 1,000 survey sites, each of the 24 native fish and decapod species and eight exotic species included in the study (Table 1)

Gambusia affinis Gambusia
Note: All natives were included in the observed/expected indicator, whilst exotics were included in the impact assessment. were modelled from all environmental variables (Table 2) using boosted regression tree (BRT) models. BRT models consist of many simple tree models that when combined can fit complex relationships (Elith, Leathwick & Hastie, 2008). BRT models are capable of fitting interactions and non-linear predictors, can handle non-normal error terms and missing values, and can identify the most informative predictors whilst ignoring irrelevant ones (Elith, Leathwick & Hastie, 2008;Friedman & Meulman, 2003). Tree complexity was set at five, whilst the learning rate was set to ensure that at least 1,000 trees were assembled, as recommended by Elith, Leathwick & Hastie (2008). Models were crossvalidated with a bag fraction of 0.15 and 10 K-folds. Model performance was assessed using the cross-validated area under the receiver operator curve (AUC). Linear regressions were used to screen co-linearity between predictor variables (Table S1). The BRT models also give the relative importance of environmental variables for each species. Using the relative influence of the five most important variables for each species, non-metric multidimensional scaling (NMDS) was used to portray the dissimilarity (Euclidean method) in the important environmental variables predicting each species. The NMDS was produced using the PAST3 software package (Hammer, Harper & Ryan, 2001).
To estimate current fish distribution, each of the BRT models were extrapolated across all New Zealand river reaches within the FENZ classes (A, C, G and H). Following a similar approach to Clapcott et al. (2017), fish distribution at each reach in reference conditions was estimated by setting: the proportion of upstream and riparian native cover were set to 100%; the proportion of upstream pasture set to 0%; the proportion of riparian shade was set to that estimated assuming complete vegetation in pre-human conditions as published in FENZ; the predicted N and DRP concentrations were reduced (where exceeded) to 0.11 and 0.006 mg/L respectively, as these reflect the upper range of nutrient enrichment for river reaches with high ecological health as defined by Death et al. (2018).
The BRT models predict the probability of species capture for each reach. Many models assume that species are present when the probability of capture is greater than 0.5; however, unless the data used to create the model is balanced (i.e. species occurrence at ∼50% of sites) then a threshold of 0.5 would not provide the best representation. To circumvent this, Schroder (2004) was used to compare actual species presence-absence with predicted probability of capture and calculate the Cohen's Kappa for all potential thresholds between zero and one in 0.005 steps. The threshold that maximised Cohen's Kappa for a given species was selected as the threshold with the best prediction. Fish were considered present at a site when their probability of capture was equal to or exceeded the threshold selected for that species.
The O/E ratio was predicted for each river reach by: 1. Counting the number of native fish species (from the 24 modelled) that are predicted (i.e. expected) to occur in reference conditions (Fig. 1A).
2. Counting the number of native fish species (from the 24 modelled) that are predicted to be present in present-day conditions and were predicted to occur in human-absent conditions (i.e. observed) (Fig. 1B).
3. Dividing the number of observed species by the number of species expected gives a ratio between zero and one (Fig. 2). High ratios indicate that fish presence assemblages are similar to those expect in reference conditions, whilst low ratios suggest fish presence assemblages are substantially different from human-absent conditions. Crow et al. (2014) used Regularized Random Forest models to predict the present-day New Zealand-wide distribution of freshwater fishes, also using data from the NZFFD. As a comparison, the percent agreement (of fish presence-absence) between the predicted Human impacts on the predicted fish observed/expected ratio The fish O/E was predicted for 208,449 FENZ river reaches. The predicted O/E for each river was modelled using the same BRT procedure described above, but predicted from the following human-influenced factors: The presence/absence of a downstream dam (Leathwick et al., 2010); predicted nitrate-nitrogen and DRP concentrations (Unwin & Larned, 2013); predicted E. Coli concentrations (Unwin & Larned, 2013); the O/E riparian vegetation cover (Leathwick et al., 2010); the O/E fine sediment cover (Clapcott et al., 2011); and the predicted presence of exotic Oncorhynchus mykiss (Rainbow Trout), Salmo trutta (Brown Trout), Perca fluviatilis (Perch), Scardinius erythrophthalmus (Rudd), Carassius auratus (Goldfish), Gambusia affinis (Gambusia), Oncorhynchus tshawytscha (Chinook salmon) and Ameiurus nebulosus (Catfish).

RESULTS
All fish modelled had good or excellent performances as measured by the AUC (>0.8 and >0.9 respectively; Table 3). The best thresholds for species presence range from 0.025 to 0.48. Slope, air temperature, nutrients and flood frequency were among the most common influential factors determining fish distribution (Table 3; Fig. 3). When the predicted current fish distributions were compared with those derived by Crow et al. (2014), on average there was 93% agreement in the predicted presence-absence of a species at a reach. The percentage agreement for individual species ranged from 71% to 100% and are shown in Table 3.
The O/E was, perhaps not surprisingly, greatest in naturally forested areas followed by areas dominated by low intensity agriculture. The lowest scores were in primarily in the central North Island (Waikato and Manawatu regions) where intensive agriculture and (in the case of Waikato) many dams (primarily for hydropower generation and irrigation) exist. BRT modelling of the predicted O/E values had a cross-validated correlation coefficient of 0.56 and suggests that DRP concentration (26.4%), NO3-N concentration (21.6%), downstream dams (20.7%), O/E riparian cover (16.1%) and O/E sediment cover (8.0%) are among the most influential factors in predicting the O/E, with exotic fish, such as Rainbow Trout (1%), Goldfish (0.5%), Perch (0.03%) and Brown Trout (0.005%), having relatively little influence (as indicated by percentages).

DISCUSSION
The BRT modelling successfully predicted the distribution of all 32 species (native and exotic) with high accuracy. Geographic and climatic variables were the primary drivers of New Zealand freshwater fish distribution, this is consistent with the findings of Joy & Death (2002) and Leathwick et al. (2005). Furthermore, the BRT models were optimised using cross-validation, which has been shown to have substantially less bias than resubstitution approaches or simple training/test splits (Olden & Jackson, 2002;Olden, Jackson & Peres-Neto, 2002); the predictions also had high inter-rater agreement with the predictions by Crow et al. (2014); thus, instilling confidence in the predictions. Whilst the models performed well in predicting each species, each species was modelled independently of each other. This means that there is spatial autocorrelation between individual species and model parameters but not with other native species that typically co-occur. Other approaches that model whole communities have found that they may provide greater predictive accuracy, particularly for rare species, by learning when species co-exist (Nieto-Lugilde et al., 2018;Olden, Joym & Death, 2006), this may be an interesting comparison for future analysis.
The vast majority of New Zealand lowland streams have some degree of human impact. This may have led the BRT models to associate natural characteristics of lowland streams, such as distance to coast and slope, with fish distributions that are actually the result of human impacts. However, given that lowland streams differ substantially in their degree of human impact, it is also plausible that the gradient of impact on fish distribution is well encapsulated, resulting in reasonable predictions under more natural conditions. The predicted reference conditions do not represent natural conditions but provide a benchmark that is close to natural conditions. Another key issue is that the defined reference condition could not account for changes in geomorphology (physical habitat), primarily due to a lack of data on geomorphological reference condition. This may be somewhat alleviated by physical habitat likely correlating with land use, which was defined in the reference condition. Nevertheless, not all changes in physical habitat are the result of land use, with flood management engineering being another key driver of physical habitat change. The river environment classification (REC) Geographic information system (GIS) layers used are also representative of current river extent, the definition of reference condition did not account for the straightening of rivers, as such the extent of river reaches in reference condition is defined by the REC, rather than historical/pre-human extent. Despite these pitfalls, Clapcott et al. (2017) showed that an almost identical approach to that recruited here was accurate in predicting the MCI in close to natural conditions. Still, this risk is one of the unfortunate disadvantages of having very little lowland sites in natural condition. Future of comparison of this approach with traditional O/E approaches (using representative sites in reference condition) would also be worthwhile to observe the influence of reference definition on O/E outcomes.
The BRT models are also prone to data uncertainty from several sources. First, the success of electric fishing can be heavily impacted by user skill, water conductivity and machine settings. The freshwater fish database does not include data on these aspects to allow for their control. However, these impacts were minimised by only selecting reaches that had at least 150 m surveyed, as recommended by Joy, David & Lake (2013), and only assessing fish presence/absence rather than relative abundance or density. Second, many New Zealand freshwater fish are migratory and may be absent from streams at various times of year (McDowall, 1990); also, many fish become less active at colder temperatures making them harder to catch (David & Closs, 2003;Jellyman, 1991;Jellyman, Glova & Todd, 1996;Ryan, 1984). Temporal variability can also present from sampling that occurred soon after a flood (David & Closs, 2002). Whilst it was not possible for surveys following recent floods to be identified and removed, seasonal impacts were reduced by excluding data collected from May to October (inclusive) to maximise species presence (Joy, David & Lake, 2013).
The predicted O/E suggests the vast majority of absences are throughout the central North Island (predominantly the Waikato and Manawatu regions), followed by lowland flat areas. The presence of downstream dams and heightened nutrient enrichment were the most influential anthropogenic factors impacting the O/E, followed by riparian loss and sedimentation. Interestingly, introduced salmonids and perch had relatively very little influence on the O/E of native fish. Due to data limitations, this study could not assess the loss of physical habitat on fish O/E; the loss of physical habitat is likely to be a large driver of fish exclusion and warrants further exploration.
Many New Zealand fish are diadromous (Joy & Death, 2001;McDowall, 1998aMcDowall, , 1998b and it is probable that the inhibition of migration is the key factor underlying the high influence of downstream dams. Baker (2003) found that juvenile migratory galaxiids and common bullies were restricted by weir fall heights as little as 10 cm, whilst adult galaxiids were restricted by falls greater than 20 cm. Furthermore, Joy & Death (2001) examined 85 sites across 38 streams with dams or weirs in the Taranaki region and found that fish species richness was consistently higher downstream than upstream of dams. The predictions of expected (reference condition) fish distribution could be used to prioritise fish passage improvement at dams where there is a large area of upstream catchment potentially habitable to multiple species.
New Zealand freshwater ecosystems are primarily threatened by agricultural intensification, which has driven large increases in excess nitrogen, phosphorus and sediment. Excess nitrogen and phosphorus can impact fish community health either directly at physiologically toxic levels (e.g. nitrite toxicity) or, more commonly, indirectly by permitting excessive algal growth. High algal growth often alters macroinvertebrate communities from one dominated by mayflies, caddisflies and stoneflies to one dominated by chironomids midges, snails and worms-the latter community being less energetically rewarding for fish. The extra algal growth and decomposition can also increase diurnal oxygen fluctuations resulting in stressful hyperoxic and hypoxic conditions. Though, excessive growth can be mediated by riparian shading.
Riparian vegetation can benefit fish communities by reducing algal growth, sustaining allochthonous inputs and supporting diverse habitat. Riparian vegetation can intercept and absorb nutrients flowing into the river as well as shade the river bed-both can limit the growth and consequences of excessive algal growth. High leaf litter and terrestrial invertebrate inputs from vegetated riparian can also sustain shredding invertebrates (which fish can then consume) and support the diets of some fish species. For example, Bonnett & Lambert (2002) found that terrestrial invertebrates occurred in 83% of Giant Kokopu (Galaxias argenteus) stomachs, and comprised 25% of the gut content. Riparian vegetation can also create and maintain habitat diversity by stabilising banks, shading streams, regulating temperature and providing root structures and woody debris (Pusey & Arthington, 2003).
Interestingly, the presence of the introduced species, including the very widespread Brown Trout, had almost negligible influence on the O/E of native fish. Previous studies have suggested that salmonids have replaced non-migratory galaxiids in some streams (but not others) and reduce the relative abundance of large and drifting invertebrates (McIntosh & Townsend, 1996;McIntosh, Townsend & Crowl, 1992;Townsend, 2003). However, this study did not include all non-migratory galaxiids as they have restricted distributions with too few surveys to be modelled. Furthermore, this study only assessed the presence or absence of species, rather than changes in abundance. Whilst the presence of exotic fish may not have marked impacts on the presence of species assessed, brown trout have been associated with reduced galaxiid abundance and size (McIntosh, Crowl & Townsend, 1994;McIntosh et al., 2010;Olsson et al., 2006). Therefore, an O/E of species abundance rather than presence-absence may be more sensitive to the impacts of introduced species.

CONCLUSIONS
In conclusion, this study presents a presence-absence fish O/E indicator that is applicable for the majority of New Zealand river reaches. It may be useful for the rapid assessment of native fish communities and may be useful for identifying potential restoration sites. It is shown that barriers, such as dams, and nutrient enrichment have considerable influence over the distribution of native fish. Further work is needed to assess the impact of physical habitat change on the exclusion of native fish.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
The author received no funding for this work.

Competing Interests
Adam Canning is an employee of the Wellington Fish and Game Council.

Author Contributions
Adam D. Canning conceived and designed the experiments, performed the experiments, analysed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft.