Characterization of measurement errors using structure‐from‐motion and photogrammetry to measure marine habitat structural complexity

Abstract Habitat structural complexity is one of the most important factors in determining the makeup of biological communities. Recent advances in structure‐from‐motion and photogrammetry have resulted in a proliferation of 3D digital representations of habitats from which structural complexity can be measured. Little attention has been paid to quantifying the measurement errors associated with these techniques, including the variability of results under different surveying and environmental conditions. Such errors have the potential to confound studies that compare habitat complexity over space and time. This study evaluated the accuracy, precision, and bias in measurements of marine habitat structural complexity derived from structure‐from‐motion and photogrammetric measurements using repeated surveys of artificial reefs (with known structure) as well as natural coral reefs. We quantified measurement errors as a function of survey image coverage, actual surface rugosity, and the morphological community composition of the habitat‐forming organisms (reef corals). Our results indicated that measurements could be biased by up to 7.5% of the total observed ranges of structural complexity based on the environmental conditions present during any particular survey. Positive relationships were found between measurement errors and actual complexity, and the strength of these relationships was increased when coral morphology and abundance were also used as predictors. The numerous advantages of structure‐from‐motion and photogrammetry techniques for quantifying and investigating marine habitats will mean that they are likely to replace traditional measurement techniques (e.g., chain‐and‐tape). To this end, our results have important implications for data collection and the interpretation of measurements when examining changes in habitat complexity using structure‐from‐motion and photogrammetry.

tats from which structural complexity can be measured. Little attention has been paid to quantifying the measurement errors associated with these techniques, including the variability of results under different surveying and environmental conditions. Such errors have the potential to confound studies that compare habitat complexity over space and time. This study evaluated the accuracy, precision, and bias in measurements of marine habitat structural complexity derived from structure-from-motion and photogrammetric measurements using repeated surveys of artificial reefs (with known structure) as well as natural coral reefs. We quantified measurement errors as a function of survey image coverage, actual surface rugosity, and the morphological community composition of the habitat-forming organisms (reef corals). Our results indicated that measurements could be biased by up to 7.5% of the total observed ranges of structural complexity based on the environmental conditions present during any particular survey. Positive relationships were found between measurement errors and actual complexity, and the strength of these relationships was increased when coral morphology and abundance were also used as predictors. The numerous advantages of structure-from-motion and photogrammetry techniques for quantifying and investigating marine habitats will mean that they are likely to replace traditional measurement techniques (e.g., chain-and-tape). To this end, our results have important implications for data collection and the interpretation of measurements when examining changes in habitat complexity using structure-from-motion and photogrammetry.

K E Y W O R D S
3D habitat mapping, coral ecology, photogrammetry, structural complexity, structurefrom-motion

| INTRODUCTION
Habitat structural complexity is one of the most important factors in structuring biological communities. In marine environments, there exists a wealth of studies linking increases in habitat complexity to increases in abundance and diversity of both benthic and mobile organisms (Graham & Nash, 2013;Gratwicke & Speight, 2005;Harborne, Mumby, Kennedy, & Ferrari, 2011;Kovalenko, Thomaz, & Warfe, 2012;Luckhurst & Luckhurst, 1978;Meager, Schlacher, & Green, 2011;Rees, Jordan, Price, Coleman, & Davis, 2014). Structurally complex habitats may promote biodiversity and abundance of biota through increasing habitat niches and/or increased habitat availability (Johnson, Frost, Mosley, Roberts, & Hawkins, 2003;Willis, Winemiller, & Lopez-Fernandez, 2005). In coral reef ecosystems, structural complexity has been identified as one of the most important attributes in determining reef resilience (Graham, Jennings, MacNeil, Mouillot, & Wilson, 2015) and whether a reef community returns to a coral-dominated or shifts to an algal-dominated state following disturbance.
In marine benthic habitats, structural complexity is measured in a variety of ways including both qualitative (e.g., visual assessment Lara & Gonzalez, 1998;Wilson, Graham, & Polunin, 2007;Graham et al., 2015) and quantitative (e.g., chain-and-tape Luckhurst &Luckhurst, 1978 andprofile/level gauge McCormick, 1994) techniques. Recently there has been a proliferation in studies using photogrammetry and structure-from-motion techniques for measuring structural complexity in underwater habitats (e.g., Burns, Delparte, Gates, & Takabayashi, 2015;Burns et al., 2016;Ferrari, Bryson, et al., 2016;Figueira et al., 2015;Friedman, Pizarro, Williams, & Johnson-Roberson, 2012;Leon, Roelfsema, Saunders, & Phinn, 2015). These techniques use a series of overlapping images, taken from multiple perspectives to reconstruct the three-dimensional (3D) structure of the seafloor and habitat-forming organisms at high resolution and accuracy, from which structural complexity measurements can then be derived. Structural complexity is commonly represented by a rugosity index that is the ratio of habitat surface area to the total areal extent of the habitat (Friedman et al., 2012). This measure of complexity is an area-based equivalent to the linear transect measurements available using a traditional chain-and-tape or profile techniques and provides a more specific representation of habitat complexity present (Friedman et al., 2012). Unlike chain-and-tape methods, the generation of 3D models allows for the spatial resolution of complexity measurements (equivalent to the link size in chain-and-tape methods) to be easily controlled and varied (Ferrari, Bryson, et al., 2016), a feature important for comparing complexity measurements (Knudby & LeDrew, 2007) but frequently ignored or unreported (Graham & Nash, 2013). Furthermore, photogrammetric methods allow for habitats to be measured over larger areas than traditional techniques (Leon et al., 2015) and for measurements to be made in a noninvasive fashion (Bridge et al., 2014).
The underwater environment provides many challenges to photogrammetry not encountered in above-water settings; large variations in ambient light and water clarity affect image quality and control over the precise position and orientation from which photographs are acquired is difficult. Although the use of photogrammetric techniques in underwater environments is not new (see, e.g., Done, 1981), its application has been vastly simplified in recent years with the availability of software (such as Agisoft PhotoScan [http://www.agisoft.com] and Visual SFM [http://ccwu.me/vsfm]) that can process data without the need for ordered images, detailed ground control or prior camera calibration that previous photogrammetry methods have relied on. Although these techniques can be employed with great success, they are inherently complex, relying on a plethora of factors including image quality, resolution, image textural properties, camera lens distortions and artifacts, surface brightness, shape, and roughness. What is lacking in recent studies using these techniques is a clear understanding of how these factors influence and induce errors in measurements of structural complexity. In the context of measuring changes in habitat complexity, these factors have the potential to confound studies where factors affecting measurement error vary across space and time. Characterization of these errors is crucial for proper inference of changes in structural complexity over time with respect to changes in other covariates such as metrics of biological assemblage structure that change across disturbance events.
The purpose of this study is to quantify the accuracy, precision and potential bias of ecologically relevant mesoscale (tens to hundreds of meters) structural complexity measurements of marine benthic habitats determined from structure-from-motion photogrammetry. On coral reefs, existing ecologically focused evaluations into the accuracy of underwater photogrammetry have focused primarily at the scale of individual coral colonies (Bythell, Pan, & Lee, 2001;Courtney, Fisher, Raimondo, Oliver, & Davis, 2007;Lavy et al., 2015). These studies use survey techniques that image a single, small region of space around the colony, where occluding objects are not present in the scene, and from multiple perspectives and ranges to the target. Lavy et al., 2015 found photogrammetric techniques resulted in accuracies of single-colony surface area ranging from 2% to 18% of the total surface area, depending on colony shape. Courtney et al., 2007 found errors less than 12% in computed coral volume. The resulting quality of 3D models is not indicative of the performance of these methods when applied to more extensive regions of the benthos, where habitat patchiness results in a complex, interwoven network of colonies, and survey techniques cannot be tailored to each individual colony. In recent studies (Ferrari, Bryson, et al., 2016;Ferrari, McKinnon, et al., 2016;Figueira et al., 2015) , measurement precision was considered for both colony scale and patch scale (19 × 6 m regions of temperate and tropical reefs), but these works did not study the driving factors behind the precision of the approach.
Our study addresses the following questions: c) The differing community composition and abundance of habitat-forming organisms (i.e., coral) with differing morphologies?
Using coral reef habitats as a case study, we use repeated measurements over an artificial reef scene (with known 3D structure) and shallow water coral reefs on the Great Barrier Reef, Australia, to quantify both accuracy (difference from the "true" SR) and precision (variability in measured structural complexity) of photogrammetric measurement techniques. Measurement errors are examined in the context of different survey techniques, changing ambient lighting conditions, and differences in water clarity. We develop models of complexity measurement errors as a function of varying image coverage pattern, actual SR, and changes in dominant coral morphology. In the context of using structural complexity as a measure of reef condition, these models have the potential to inform decisions about how data should be collected and processed.
Where repeated measurements are made, such characterization is crucial for understanding what level of structural complexity change might be detectable, for example, given prior knowledge of the complexity and habitat-forming communities present.

| Study sites
Initial experiments using a 1.1-by-1.1 m artificial reef (see Figure 1) were performed in an ocean swimming pool in Sydney, Australia

| Equipment
A diver-operated stereo camera rig (Figure 2, referred herein as the "diver-rig") was used to collect imagery over survey sites on the seafloor to build image-derived 3D topographic surface models. The diver-rig carried a downwards-looking stereo camera pair utilizing one color and one monochrome Prosilica GC1380 12 bit camera, each with a resolution of 1,360 × 1,024 pixels and an imaging field of view of 42° × 34° in water, providing a ground sampling distance of ~1 mm per pixel over a 1.5 × 1.2 m footprint at 2 m from the substrate. The diver-rig also carried a pressure-depth sensor, tilt sensors, a magnetic F I G U R E 1 (a) Artificial reef used to evaluate reconstruction accuracy and repeatability, (b) underwater photomosaic of artificial reef, (c) corresponding digital surface model colored by relief height. (d) Comparison of rugosity measurements derived from 3D models reconstructed using diver-rig underwater stereo imagery and rugosity measurements from an in-air reference model, reconstructed using in-air high-resolution images

| Data collection and survey techniques
Imaging surveys using the diver-rig were conducted at each site and repeated over 4 days at each location. During each survey, overlapping images of the site were captured at 2 m from the substrate with a ground spacing distance between images of approximately 50 cm.
Two survey techniques were used to guide the trajectory of the diver-rig over the terrain, depending on the reef environment type. When

| Structure-from-motion postprocessing
For each survey of each site, collected images and other diver-rig sensor data were used to reconstruct photomosaics and 3D surface models of the seafloor ( Figure 3). Estimation of the position and orientations (poses) of the stereo pairs was performed using a featurebased stereo bundle adjustment algorithm (Johnson-Roberson et al., 2016). A similar procedure is used by commercially available structurefrom-motion/photogrammetry software such as Agisoft PhotoScan

| Calculation of surface rugosity
Topographic surface models were used to derive multiscale measures of structural complexity using SR, the ratio of total surface area of the model divided by the projected surface area of the sampled region onto a plane. For each survey site, a plane of best fit to the overall site surface was calculated using a least squares fit and used as the axis of projection for SR measurements, to decouple measurements of SR from the overall slope of the environment. Each topographic surface model was divided into nonoverlapping virtual quadrats (2 × 2 m), and one value of rugosity was extracted from each virtual quadrat, for each survey using Equation 1: where a i is the actual area of a surface face element i, i = 1 to N (all faces in a given virtual quadrat), and a proj,i is the orthographically projected area of face i, corresponding to the coordinate system based on the plane of best fit. SR measurements were firstly computed using the full resolution topographic surface models (2.5 cm resolution) and subsequently computed on several down-sampled versions of each mesh at increments of 5 cm up to 25 cm resolution to generate SR measurements across at a range of link-size resolutions.

| Surface rugosity measurement repeatability and error estimation
To assess the accuracy and precision of structural complexity measurements derived from the resulting topographic surface models, the surveys were repeated multiple times at each site over varying temporal scales. Once surface models from each survey of a given Orthographic imagery mosaic and (b) topographic surface model at Horseshoe Reef survey site, Lizard Island using "Reef Record" survey method. (c) Orthographic imagery mosaic and (d) topographic surface model at Blue Pools survey site, Heron Island using "Mow-the-Lawn" survey method site had been produced, they were spatially registered into a common reference frame using a rigid six-degree-of-freedom transformation (Bryson et al., 2013) with a residual accuracy of ±5 cm. The precisely registered surface models allowed for measurements to be compared within the 2 × 2 m virtual quadrats across multiple surveys.
Given the logistical challenges of imaging a natural reef in air, we used an artificial reef to assess accuracy of the SR measurements (see Figure 1).
At the Sydney ocean swimming pool, the artificial reef was placed into the water and surveyed eight times on the same day using a Mow-the-lawn survey pattern, from which eight separate reconstructions of the same area were built. The accuracy of the SR measurements was ascertained by examining the difference in SR between topographic surface model- surveys were also performed four times back-to-back during a 1-hr period in a single day (single-day surveys not performed at Horseshoe site owing to logistical restrictions at the study site). Data from each survey were used to reconstruct separate topographic surface models, one for each survey. An average SR measurement across all available surveys was computed for each 2 × 2 m virtual quadrat. SR measurements errors for each quadrat were then estimated by taking the standard deviation of differences from this average: where σ SR,j is the SR measurement standard error of quadrat j, N survey is the number of surveys over quadrat j, SR j,i is the SR measurement for quadrat j from survey i, i = 1 to N survey , and μ SR,j is the average SR measurement for quadrat j: Estimates of the per-survey bias (b i ) for each site were generated (one value for each survey of each site) by averaging these differences across all quadrats at a given site: Measurements for each quadrat were also grouped according to surveys taken back-to-back on the same day (N = 4) and surveys taken over multiple days (N = 4) to generate two separate estimates of measurement error and bias (single day and multiday), corresponding to similar and different environmental conditions (Table 1).

| Annotation of coral morphotypes
Imagery mosaics were digitally annotated by proportion of the benthic classes: "Sand," "Abiotic/Mixed Hard Bottom," and coral morphotypes "Massives," "Plating," "Fine-Branching," and "Coarse Branching" using GIS. The abundance of each class in each virtual quadrat was calculated and quadrats were assigned into categories according to the dominant coverage type (highest area proportional to other labeled classes in the quadrat), when the dominant coverage class abundance was >25%. When the greatest abundance in a quadrat was <25%, the quadrat was assigned as "Mixed."

| Image coverage metrics
Image coverage metrics for each virtual quadrat were computed by creating an additional spatial layer on top of the topographic surface models that measured the number of images that observed each spatial point in the quadrat, reproduced at a resolution of 0.5 cm per pixel. Two metrics were calculated: the first which measured the average coverage of a quadrat (average number of images observing each spatial point in the quadrat) and the second which measured the coverage variation (standard deviation of the number of images observing each spatial point in the quadrat).

| Statistical analyses
Accuracy of the in-water topographic surface models was tested using two-tailed t-tests comparing the average in-water SR measurements to the SR measurements derived from the in-air reference surface model (Question 1). To examine the relationship between errors in measured SR and the way in which images were collected from the target site (Question 3a), correlation coefficients and ordinary least squares (OLS) fits for SR measurement standard error versus the two image coverage metrics were computed.
To examine the effect of the underlying surface on the SR measurement error and to investigate whether higher complexity surfaces yielded larger measurement errors (Question 3b), OLS was used to estimate a linear model of SR measurement error as a function of average rugosity, using quadrat-scale data from all of the four surveys sites. In order to additionally examine the effect of the dominant coral morphotype on rugosity errors (Question 3c), this OLS model was extended to include dominant coral morphotype as a categorical variable based on the classes "Massives," "Plating," "Fine-Branching," and "Coarse Branching" using a simple contrast coding scheme with the class "Mixed" as the base reference category. For all OLS modeling, residual plots were used to verify that model assumptions were not violated. All statistical tests (OLS and ANOVA) were performed using the StatsModels python package (http://statsmodels.sourceforge.net/ devel/index.html).

| Artificial reef
Measurements of SR from in-water topographic models consistently underestimated the reference model SR computed using in-air photographs ( Figure 1). T-test results at each link size indicated that differences were statistically significant (Table 2). SR measurements were between 8% and 15% lower for in-water reconstructions when compared to the reference reconstruction, but highly repeatable with a standard deviation of 0.065 (4.9% of the true rugosity) for the 2.5 cm link-size resolution. Underestimation was likely due to a combination of factors including loss of contrast when imaging through the water and that the survey techniques were not able to resolve the complexity contributed by overhanging surfaces, which were not imaged during the survey techniques employed in this study, which observed surfaces from above.

| Coral reef topographic reconstructions
Data collection time for field surveys varied from 12 to 20 min, and each reconstruction was produced using approximately 1,400-2,400 stereo images pairs (Table 1). Variations in the ambient lighting (sunny vs. cloudy conditions and time of day) and water clarity over the multiple surveys resulted in variations in the shadow direction, color, and contrast observable in reconstructed photomosaics (Figures 3 and 4).
The corresponding topographic surface models exhibited subtle variations in how well fine-scale features were resolved ( Figure 5).

| Coral reef structural complexity measurements
Habitat complexity was observed to be highly heterogeneous across the spatial range of each of our sites (12-30 m), owing to the heterogeneous benthic composition of different coral morphotypes, highlighting the importance of mesoscale sampling ( Figure 6). The T A B L E 2 Results of two-tailed Student's t-tests comparing rugosity derived using underwater imagery against rugosity from an in-air reference 3D model of an artificial reef

| Relationship between measurement errors and image coverage metrics
Correlations between rugosity measurement errors and image coverage metrics (average number of covering images and variation in covering images) were weak and found to be nonsignificant using OLS when using quadrat data from all sites (Figure 10a, Table 4a).
F I G U R E 6 Distribution of average rugosity for 2 × 2 m virtual quadrats (μ SR ) at each of the four reefs surveyed in this study (resolution 2.5 cm) F I G U R E 7 2 × 2 m quadrat rugosity measurement errors at each site: (a) 2 × 2 m quadrat standard deviation from quadrat average σ SR , averaged for all quadrats at each site. (b) Root mean square (RMS) of average 2 × 2 m quadrat rugosity difference from quadrat average per survey (b), a measure of the per-survey bias induced by conditions for a particular survey F I G U R E 8 Boxplot distributions of 2 × 2 m quadrat rugosity measurement differences δSR j,i per survey (i = 1 to 4) from quadrat average across all surveys, for surveys performed on the same day. The distribution of differences from average for a given survey may be greater or less than zero, indicating that quadrats are being measured consistently higher or lower for the conditions in which the survey was performed Analyses were reperformed using data for each site one at a time (four separate models, one for each site); only the Blue pools site model exhibited a statistically significant correlation between rugosity measurement error and coverage parameters (Figure 10b, Table 4b). Errors had a slight negative correlation to the average coverage (number of images) and slight positive correlation to the coverage variation (standard deviation in number of images touching any part of the quadrat), although both relationships were weak (low coefficient values).

| Relationship between measurement errors, surface rugosity, and dominant morphotype
Rugosity measurement errors exhibited statistically significant positive correlations to actual rugosity ( Figure 11, Table 5a, adjusted R 2 = .309). When dominant coral morphotype within each quadrat was added to the model as a categorical variable, model fit improved from R 2 = .309 to R 2 = .473 and three of the four coral morphotypes had statistically significant model coefficients (Table 5b), when compared to the "Mixed" quadrat type base category. Quadrats dominated by "Massives" and "Plating" morphotypes had lower mean rugosity errors per quadrat rugosity, whereas "Coarse Branching" morphotypes had higher mean rugosity errors per quadrat rugosity when compared to "Mixed" quadrats ( Figure 12

| DISCUSSION
Our study has provided for the first time a detailed quantification of errors in structural complexity measurements made using photogrammetry/structure-from-motion and factors inducing errors. In-water SR measurements consistently underestimated SR when compared to in-air SR values, but were highly repeatable with a 4.5% precision SR F I G U R E 9 Boxplot distributions of 2 × 2 m quadrat rugosity measurement differences δSR j,i per survey (i = 1 to 4) from quadrat average across all surveys, for surveys across multiple days. The distribution of differences from average for a given survey may be greater or less than zero, indicating that quadrats are being measured consistently higher or lower for the conditions in which the survey was performed F I G U R E 1 0 2 × 2 m quadrat rugosity errors ( The advantages of using photogrammetric techniques for measuring structural attributes of coral reefs mean that these techniques are likely to be employed more widely within ecologically focused research and monitoring programs in the future (Burns et al., 2016;Ferrari, McKinnon, et al., 2016;Leon et al., 2015). In the context of monitoring, the quantitative models generated by this study provide a basis for predicting SR measurement error based on the level of complexity measured and knowledge of the coral morphotype imaged. Knowledge of measurement error and precision is an important factor to determine whether changes in an environment following a disturbance event are significant or not, where the statistical power to detect changes depends on the number of samples and the sample variance (Cohen, 1988;Green & Smith, 1997). Our results provide quantitative estimates of sample variance (depending on dominant coral morphologies) that could be used to determine the appropriate number of samples (quadrats) necessary for a given level of statistical power of experimental designs intended to detect changes in complexity. The optimization of sampling effort with appropriate consideration to statistical power is particularly important in the context of monitoring programs with limited resources (Hill & Wilkinson, 2004).
The survey techniques presented here are designed to observe habitats from above, limiting the ability to resolve space underneath large overhanging structures, such as under large plating coral morphotypes. Goatley and Bellwood (2011) discuss the issues of horizontal planar views and sampling of coral reefs as missing information on the 3D structure of coral habitats, where the structural complexity and cover of organisms below the dominating upper "canopy" is missed.
The photogrammetric techniques presented here can be adapted to measuring 3D structure on the underside of corals by adding photographic views of these points (e.g., Bythell et al., 2001;Courtney et al., 2007) but with increased cost of image acquisition and sampling effort per sampling unit. The results presented here are more likely to represent the realities of sampling using imagery in real monitoring programs in which the use of planar views maximizes the effective sampling area observable for a given constraint in sampling effort (Hill & Wilkinson, 2004). F I G U R E 1 2 Relationships between 2 × 2 m quadrat rugosity errors and quadrat rugosity categorized by dominant coral morphotype coverage class modeled using OLS (adj. R 2 = .473). Model predicted rugosity error versus rugosity is shown for each class (solid line) and compared to the model predicted relationship for "mixed" type quadrats (dashed line)