Estimating intertidal seaweed biomass at larger scales from quadrat surveys

The amount of macroalgal biomass is an important ecosystem variable. Estimates can be made for a sampled area or values can be extrapolated to represent biomass over a larger region. Typically biomass is scaled-up using the area multiplied by the mean: a non-spatial method. Where algal biomass is patchy or shows gradients, non- spatial estimates for an area may be improved by spatial interpolation. A separate issue with scaling-up biomass estimates is that conventional confidence intervals based on the standard error (SE) of the sample may not be appropriate. The issues around interpolation and confidence intervals were examined for three fucoid species using data from 40 � 0.25 m -2 quadrats thrown in a 0.717 ha sampling plot on the shore of Galway Bay. Despite evidence of spatial autocorrelation, interpolation did not appear to improve estimates of the total plot biomass of Fucus serratus and F. vesiculosus . In contrast, interpolated estimates for Ascophyllum nodosum had less error than those based on the non-spatial method. Bootstrapped confidence intervals had several benefits over those based on the SE. These benefits include the avoidance of negative confidence limits at low sample sizes and no assumptions of normality in the data. If there is reason to expect strong patchiness or a gradient of biomass in the area of interest, interpolation is likely to produce more accurate estimates of biomass than non-spatial methods. Development of methodologies for biomass would benefit from more definition of local and regional gradients in biomass and their associated covariates.


Introduction
Large scale estimates of seaweed biomass are needed for evaluations of resource availability, carbon capture, food web structure and ecosystem function (Burrows et al., 2014;Krause-Jensen et al., 2018, Quartino andBoraso de Zaixso, 2008;Trevathan-Tackett et al., 2015). The amount of macroalgal biomass is often calculated using a relationship multiplying the suitable habitat area by a quadrat-scale estimate of biomass (e.g., Sharp et al., 2008;Werner and Kraan, 2004). Quadrat-scale biomass of seaweeds can be very variable. For example, the average coefficient of variation for the dry weight m 2 of Ascophyllum nodosum surveyed on five shores in Brittany was 67% (Gollety et al., 2011). Such variability inevitably causes uncertainty in scaled-up estimates of biomass. Confidence intervals for the total biomass of Ascopyllum nodosum and Fucus vesiculosus in Irish counties were typically 50% of the estimate (Cullinane, 1984).
Including additional information on sources of variability can potentially reduce the uncertainty of seaweed biomass estimates. For example, biomass can be related to environmental covariates like wave exposure (Burrows et al., 2010;Gorman et al., 2013). Some of the variables influencing seaweed biomass may not be well-defined or the data may not be available an appropriate scale. Models that include spatial information may capture some of the variability associated with differences between locations. With a suitable geostatistical model, differences between locations can be interpolated to make estimates of biomass (e.g., Addis et al., 2009;Rufino et al., 2006). This approach is relatively common in fisheries, but has been rarely applied to studies of algal biomass (but see Givernaud et al., 2005).
The distribution of biomass among quadrats can cause issues in describing the uncertainty of estimates, particularly with an asymmetric spread of values. Parametric confidence estimates based on the standard error of raw data may not be appropriate with skewed data. The errors are likely to be greater where the number of quadrats is relatively small, such that the hypothetical distribution of sample means does not approach normality under the central limit theorem. Data transformation does not offer a simple way of dealing with skewed data. The issues of data transformation can be reflected in a number of ways. One example can be illustrated by considering a situation where the entire population has been sampled. The biomass is the total amount measured; equal to the arithmetic mean multiplied by the number of sampled units. A back-transformed mean will not be equal to the arithmetic mean, so would not be an appropriate basis for calculating E-mail address: mark.johnson@nuigalway.ie. the total biomass measured. Where several transformations are plausible, different back-transformed confidence intervals are possible and there is no clear rationale for deciding which would be the most appropriate. Where the true distribution of the data is not well known, bootstrapping provides a method to estimate confidence intervals. Bootstrapping is based on resampling the data, as this can be considered the best source of information about the measured variable (Manly, 1997).
This paper makes estimates for the biomass of three intertidal fucoids in a sampling plot. In doing this, the estimates generated by a cross-site interpolation using geostatistics are compared to the non-spatial estimate based on the mean. Confidence intervals for extrapolating are estimated by bootstrapping as an alternative to parametrically defined intervals. Comparison of different species is used to examine the extent to which sampling guidelines can be generalized. The quadrat density in the sampled area is relatively high compared to typical field studies. This allows an examination of the variation in biomass estimates when sampling with different levels of intensity. The analyses developed in the current study contrast with previous studies of quadrat sampling in macrophytes: these have mostly focused on the efficiency of different replicate sizes (e.g., Downing and Anderson, 1985;Pringle, 1984).

Methods
Samples were taken from the shore at Furbo in Galway Bay. This site has an extensive intertidal area, with a mixture of areas dominated by bedrock, boulders, cobble or sediment. A sampling plot was defined with an upper boundary at the point where Fucus spiralis L. became the dominant cover. "Sampling plot" is used in this manuscript to refer to the area of shore within the polygon shown in Fig. 1. The lower boundary was where kelps became more frequent than Fucus serratus L. The sampling plot contained Ascophyllum nodosum (L.) Le Jolis, Fucus vesiculosus L. and Fucus serratus as the dominant macroalgae. The outline of the sampling plot was recorded using a WAAS/EGNOS enabled Garmin eTrex 10 GPS. There is good satellite coverage at Furbo (frequently > 15 satellites visible), the plot outline suggests good positional accuracy with respect to identifiable features, and previous observations with similar technology have indicated a median horizontal displacement error of 0.37 m (Witte and Wilson, 2005).
Quadrat (0.25 m 2 , n ¼ 40) measurements of fresh weight biomass were made for the three dominant fucoids in the sampling plot. Quadrats were haphazardly thrown, with all macroalgae removed and placed in a plastic bag. The central location of each quadrat was recorded using the GPS unit. Species were identified and divided into groups for weighing in the lab. Location data for quadrats was transformed from WGS1984 to Irish grid (EPSG:29902), so that rasters of algal biomass could be defined in metric units for the sampling plot. Raster interpolation used ordinary kriging based on a raster cell size of 0.25 m 2 . Kriging was based on experimental variograms of the raw algal data. Variograms show any change in the average difference between measurements as a function of geographical distance between the points of measurement. The optimal kriging model was chosen from a comparison of exponential, spherical, gaussian, nugget, Matern and exponential class models. All of the models, except the nugget, describe the tendency for data to be spatially autocorrelated. The nugget model describes a situation with no spatial dependence between points. The model with the lowest error was used for subsequent interpolation. Rare, high biomass quadrats of Ascophyllum resulted in the nugget model being chosen. Following Rufino et al. (2005), outliers were omitted to estimate the spatial structure in the absence of extreme values. In this approach, the largest value is omitted and fitted models are judged for goodness of fit. This process can be repeated, removing the next largest value and evaluating the results.
The value of spatial information to estimates of biomass can be examined by comparing predictions for unobserved data using both spatial interpolation and the mean. The presence of unobserved data was simulated by 5-fold cross validation: dividing the data into 5 blocks with each block used once as test data, while the other blocks were used as training data. Outliers were not omitted when using cross-validation. This reduces the complexity of the comparison and avoids a 'tuning' factor that may inadvertently increase the performance of interpolation models with respect to the non-spatial alternative.
95% confidence intervals are frequently used as a guide to the uncertainty of estimates. The conventional parametric confidence intervals are calculated from the mean � t (0.95) .SE, where SE is the sample standard error and t is the value of the two tailed t statistic at α ¼ 0.05.
Where the distribution of means is not normally distributed (e.g., with skewed data), bootstrapping may provide a more appropriate estimator for confidence intervals. Bootstrapping resamples the measurements and calculates statistics based on the resampled data. The percentile method Fig. 1. Outline of sampling plot used for algal biomass estimation on the shore at Furbo. ranks all the estimated statistics (means in this case) and then discards the top and bottom 2.5% to define the confidence interval. The Accelerated Bias Corrected percentile limits (bca) method estimates two parameters for the asymmetry and influence of outliers on the estimated statistic (Manly, 1997). These are then used to modify the conventional symmetric confidence intervals.
To illustrate the effect of sample size on confidence intervals, the data were resampled using 5000 replicates at each point between 1 and 40 quadrats. Approximate 95% confidence intervals were defined using the 2.5 and 97.5 percentiles. All bootstrapping and spatial data handling was carried out in R (R Core Team, 2019). Geospatial data was processed using the sp, raster, GIStools and rgeos packages within R (Bivand et al., 2013;Bivand and Rundel, 2019;Brunsdon and Chen, 2014;Hijmans, 2019;Pebesma and Bivand, 2005). Geostatistical modelling and kriging was carried out using gstat in R (Gr€ aler et al., 2016;Pebesma, 2004). Colour palettes from the viridis package (Garnier, 2018) were applied to rasters generated in R. Bootstrap estimates were generated using the boot package (Canty and Ripley, 2019;Davison and Hinkley, 1997).
Observed data are likely to lie somewhere on a spectrum between a spatially random pattern and a strong spatial gradient of values. Estimates of biomass based on the mean are essentially assuming the first case, spatial randomness. The effect of a strong gradient can be simulated. For example, a strong gradient in biomass can be created by sorting the data for F. vesiculosus so that the highest values occur at lower latitudes in the plot (representing the seaward edge in this study). The effects of such a strong gradient on the confidence that can be placed in interpolation were examined by applying the same kriging and crossvalidation techniques that were used with the observed spatial distribution of algal biomass.

Results
The three target fucoid species were all found in the sampling plot (Fig. 2). The uneven outlines of the plot reflect boundaries between midshore fucoid habitat and other substrates, such as a gully filled with kelps along the southern border of the plot. The overall area of the sampling plot was 0.717 ha. The three target species occurred throughout the sampling plot, with the exception of a gap in A. nodosum towards the southeast corner, with F. serratus less common towards the north of the plot. These two species were not distributed randomly with respect to each other, with co-occurrences of A. nodosum and F. serratus being less frequent than expected by chance (8 co-occurrences, p < 0.05, Fisher's exact test). In contrast, there was no segregation of F. vesiculosus with the other two species, with 17 and 23 co-occurrences in quadrats with A. nodosum and F. serratus respectively.
There was patchiness in the distributions of fucoid fresh weights, with evidence for autocorrelation in the experimental variograms for each species (Fig. 3). The spherical variogram model was chosen as the one with the lowest fitting error. The range of autocorrelation was similar in the two Fucus species (16.1 m for F. vesiculosus and 15.7 m for F. serratus). A. nodosum had a larger range, 29.3 m, indicating detectable spatial dependence between quadrat biomass values over greater distances than the other species. Interpolated maps of seaweed biomass emphasize how patchy the distribution of macroalgae can be (Fig. 4). The largest mean quadrat biomass was recorded for A. nodosum. Despite the broader coverage of F. vesiculosus, the total biomass in the sampling plot was largest for A. nodosum (Table 1). The maximum difference between estimation techniques was 6% of the estimate based on the mean: 1519 kg. One method of evaluating the likely value of estimates is to evaluate them against data not used in making the estimate. This was carried out using 5-fold cross validation, splitting the data into training and test data. For each of five folds, this creates test data consisting of eight observations that were not used in calculating the mean or an interpolated surface. The root mean square error (RMSE) of predictions from interpolated data was higher for F. vesiculosus and F. serratus (Table 2), indicating that estimates based on the mean are likely to be more accurate. In contrast, estimates based on an interpolated surface had lower RMSE for A. nodsosum. Biomass estimates based on an interpolated surface would be more accurate for this species.
The artificial gradient in F. vesiculosus (Fig. 5) does not affect the non-spatial summary statistics, but the simulated gradient has a revised biomass estimate of 18354 kg F vesiculosus in the plot and a k-fold RMSE prediction error of 0.378 kg. The inherent predictability of a strong spatial gradient therefore results in an improved confidence in the interpolated surface.
A. nodosum had the largest quadrat mean and standard deviation of the three fucoids (0.85 kg, SD 2.281). None of the species had a normal distribution of biomass among quadrats (Shapiro-Wilk tests, 0.429 � W � 0.855, all p < 0.05). The measurements for A. nodosum also had the largest skew of the three species. Larger variance and skew are reflected in the wide empirical bootstrap confidence intervals for A. nodosum (Fig. 6). These limits are also asymmetrical around the mean, reflecting the positive skew of measurements. The empirical bootstrap confidence intervals narrow with increasing quadrat number, this pattern would be expected, with increased numbers of replicates improving the precision of biomass estimates. To have an upper confidence limit of approximately double the mean, bootstrapping suggests that 5 quadrats will be needed for F. vesiculosus, 17 for F. serratus and 36 for A. nodosum. The percentile and bca estimates of confidence limits are at larger values than those from conventional parametric statistics. For example, the SE based confidence intervals for 40 quadrats of A. nodosum are 0.121-1.580 kg quadrat 1 , compared to the range of 0.383-2.086 kg from the bca method. Confidence limits (bca method) for F. vesiculosus were 0.456-0.885 kg quadrat 1 , and 0.113-0.360 kg quadrat 1 for F. serratus. The lower limit becomes problematic if the conventional (SE based) confidence intervals are estimated for smaller sample sizes. With fewer than 30 quadrats, the lower limit for A. nodosum is negative, which is not possible to interpret. Confidence intervals from the empirical bootstrap do not have this issue as the lowest value observable is 0.

Discussion
There are good arguments for avoiding the conventional mean � t (0.95) .SE confidence limits to express the uncertainty in algal biomass estimates. Quadrat measurements were not normally distributed and the skew in data makes symmetrical confidence limits unreliable. The conventional confidence limits were biased, in that both upper and lower limits were lower than bootstrapped confidence intervals. As the aim is to estimate total biomass, the use of transformation of data followed by back transformation of confidence limits cannot be recommended as no specific transformation can be defined or justified. Furthermore, the lower limit of conventional confidence intervals can be negative at small sample sizes. It is not clear how to interpret negative biomass values of this type. Bootstrapped confidence intervals should be used for expressing uncertainty in algal biomass measurements from quadrats.
If extrapolating biomass measurements to a region that has not been directly surveyed, there is no alternative to multiplying a mean biomass by area. In the case of summarizing biomass across an area that includes the measured quadrats, the evidence is mixed. Spatially explicit information from F. vesiculosus and F. serratus did not improve the estimates at quadrats not used in making estimates. The mean of the training set was a marginally better predictor for these species and differences in the total biomass predicted by spatial and non-spatial methods were relatively small. In contrast, interpolated surfaces had some additional predictive value in A. nodosum. Using an estimate based on the mean quadrat biomass may have underestimated the A. nodosum biomass in the sampling plot by 1519 kg (6%).
The value of spatial information for A. nodosum reflects the wider extent of spatial autocorrelation (spatial dependence) between quadrats in this species compared to the other fucoids. The influence of spatial dependence is further emphasized in the example of F. vesiculosus with an artificially constructed gradient. Reduced prediction error in comparison to those generated with observed data reflects the increased spatial dependence in the simulated gradient, as the training set measurements contain more information about the test measurements.
Gradients in biomass can clearly be important for estimates of overall totals. Ignoring gradients is not necessarily an issue for the accuracy of non-spatial summaries of biomass, although if sample locations were not stratified with respect to the gradient an unintentional bias could occur. The recommendations of Miller and Ambrose (2000) include stratified sampling and/or sampling on a transect perpendicular to the elevational contours, approaches that address the relative weakness of random quadrat placements in the face of gradients in the intertidal. There are probably few generalities about the strength of vertical and horizontal gradients in biomass on shores, as variation in environmental conditions and ecological processes is potentially complex. In the sampling plot investigated in the current study, the spatial variation was patchy, without clear gradients. This may reflect the uneven shore, which has variations in height and mixtures of boulders, cobbles and bedrock. Algal biomass will theoretically be lower with increased elevation on the shore (Johnson et al., 1998), as long as interactions with other processes like grazing do not override the pattern. Eriksson and Bergstr€ om (2005) give an example of how biomass of species in the Baltic (including F. vesiculosus) is structured by a number of environmental variables including depth. Given the possibility of covariates that can predict algal biomass, it would be useful for large scale estimates of algal biomass to define predictors that could be extracted from digital elevation models or other remotely-sensed sources of data.
It is not clear why different species would have different spatial dependencies and maximum biomass levels. A. nodosum had a higher biomass in quadrats than the two Fucus species, in addition to a longer range of spatial dependence. The greater accumulation of biomass may reflect A. nodosum's longer life span (Åberg, 1992) and a growth form that can lead to longer fronds (Johnson et al., 1998). Fucoid zygotes sink rapidly and may be released when conditions are calm (Serrao et al., 1996), factors contributing to a relatively restricted recruitment at distances from the adult fronds (e.g., ) that could potentially cause patchiness.  caution, however, that zygote dispersal is variable and that post settlement processes will also have a role in the spatial pattern of algal densities.
The precision of estimates of algal biomass could potentially be improved by combining quadrat biomass data with remotely sensed data. Satellite images, aerial photography and drones offer potential means to identify the cover and extent of algal beds (Davies et al., 2007;Brodie et al., 2018;Konar and Iken, 2018;Murfitt et al., 2017;Setyawidati et al., 2017). If species can be identified, this allows a mean biomass estimate to be multiplied by the area occupied to provide a shore-wide or regional biomass total (Guillaumont et al., 1993). Unfortunately, three issues complicate the combination of remote sensing and field survey: 1) canopies can be mixed (this study). The appropriate mean species biomass is not yet obvious in these cases; 2) species, particularly in the same order or genus, can be difficult to separate, even with multispectral information (e.g., Mcilwaine et al., 2019 were unable to separate Fucus species); 3) the biomass reflects the thickness of the canopy, a property that is difficult to estimate from remote data. Fucoids are optically dark and the upper layers obscure information about the fronds below, making remotely sensed data incomplete for areas of higher biomass (e.g., Guichard et al., 2000). Difficulties in relating remotely sensed data to biomass are not limited to fucoids (Mitchard et al., 2014). The optical density of fucoids contrasts with the more transparent fronds of Ulva, where Hu et al. (2017) were able to calibrate a remotely sensed floating algae index using a sensor above tanks filled with different amounts of seaweed.
While bootstrapped estimates for confidence intervals have advantages over other approaches, they are likely to be overoptimistic. This can be related to the extent that the sample data is an estimate for the unobserved variability in the system. If the bootstrapped dataset is 'small', the confidence intervals may be underestimates (Schenker, 1985). Of course, without information on unobserved areas, it is difficult to judge what sort of sample is 'small'. The current study sampled up to 0.14% of the area to which the biomass extrapolation was made. This    level of extrapolation involves at least an order of magnitude greater coverage than other studies (e.g., Sharp et al., 2008). It is clear that most extrapolations for algal biomass will involve wide confidence intervals, reducing the precision for estimates of trend (or lack of), food web models, and carbon budgets. Ultimately, robust estimates of macroalgal biomass will require integration across a range of scales, incorporating any meaningful covariates, with shared protocols and data so that estimates of variability can be placed in an appropriate context (Duffy et al., 2019). Development of agreed methodologies is particularly urgent if reliable estimates of ecosystem change, resource availability and carbon storage are to be made. Data available at: https://data.mendeley.com/datasets/txt7ks2zbv/ draft?a¼c0ebba30-249a-40c3-89d2-c948c8e3e0e1

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.