Enhancing automated analysis of marine soundscapes using ecoacoustic indices and machine learning

indices to improve predictive power of ecosystem health. Healthy and degraded reef sites were identified through live coral cover surveys, with 90 – 95% and 0 – 20% cover respectively. A library of one-minute recordings were extracted from each. Twelve ecoacoustic indices were calculated for each recording, in up to three different frequency bandwidths (low: 0.05 – 0.8 kHz, medium: 2 – 7 kHz and broad: 0.05 – 20 kHz). Twelve of these 33 index-frequency combinations differed significantly between healthy and degraded habitats. However, the best performing single index could only correctly classify 47% of recordings, requiring extensive sampling from each site to be useful. We therefore trained a regularised discriminant analysis machine-learning algorithm to discriminate between healthy and degraded sites using an optimised combination of ecoacoustic indices. This multi-index approach discriminated between these two habitat classes with improved accuracy compared to any single index in isolation. The pooled classification rate of 1000 cross-validated iterations of the model had a 91.7% 0.8, mean SE) success rate at correctly classifying individual recordings. The model was subsequently used to classify recordings from two actively restored sites, established > 24 months prior to recordings, with coral cover values of 79.1% ( ± 3.9) and 66.5% ( ± 3.8). Of these recordings, 37/38 and 33/39 received a classification as healthy respectively. The model was also used to classify recordings from a newly restored site established < 12 months prior with a coral cover of 25.6% ( ± 2.6), from which 27/33 recordings were classified as degraded. This investigation highlights the value of combining PAM recordings with machine-learning analysis for ecological monitoring and demonstrates the potential of PAM to monitor reef recovery over time, reducing the reliance on labour-intensive in-water surveys by experts. As access to PAM recorders continues to rapidly advance, effective automated analysis will be needed to keep pace with these expanding acoustic datasets.

Historically, ecological monitoring of marine habitats has primarily relied on labour-intensive, non-automated survey methods.The field of passive acoustic monitoring (PAM) has demonstrated the potential of this practice to automate surveying in marine habitats.This has primarily been through the use of 'ecoacoustic indices' to quantify attributes from natural soundscapes.However, investigations using individual indices have had mixed success.Using PAM recordings collected at one of the world's largest coral reef restoration programmes, we instead apply a machine-learning approach across a suite of ecoacoustic indices to improve predictive power of ecosystem health.Healthy and degraded reef sites were identified through live coral cover surveys, with 90-95% and 0-20% cover respectively.A library of one-minute recordings were extracted from each.Twelve ecoacoustic indices were calculated for each recording, in up to three different frequency bandwidths (low: 0.05-0.8kHz, medium: 2-7 kHz and broad: 0.05-20 kHz).Twelve of these 33 index-frequency combinations differed significantly between healthy and degraded habitats.However, the best performing single index could only correctly classify 47% of recordings, requiring extensive sampling from each site to be useful.We therefore trained a regularised discriminant analysis machine-learning algorithm to discriminate between healthy and degraded sites using an optimised combination of ecoacoustic indices.This multi-index approach discriminated between these two habitat classes with improved accuracy compared to any single index in isolation.The pooled classification rate of 1000 cross-validated iterations of the model had a 91.7% 0.8, mean SE) success rate at correctly classifying individual recordings.The model was subsequently used to classify recordings from two actively restored sites, established >24 months prior to recordings, with coral cover values of 79.1% (±3.9) and 66.5% (±3.8).Of these recordings, 37/38 and 33/39 received a classification as healthy respectively.The model was also used to classify recordings from a newly restored site established <12 months prior with a coral cover of 25.6% (±2.6), from which 27/33 recordings were classified as degraded.This investigation highlights the value of combining PAM recordings with machine-learning analysis for ecological monitoring and demonstrates the potential of PAM to monitor reef recovery over time, reducing the reliance on labour-intensive in-water surveys by experts.As access to PAM recorders continues to rapidly advance, effective automated analysis will be needed to keep pace with these expanding acoustic datasets.

Introduction
Ecological monitoring of marine habitats is key to understanding these ecosystems and successfully measuring the outcomes of conservation and restoration programmes happening in our oceans.This kind of ecological monitoring often relies on visual census surveys.However, these come with limitations which include the requirement of expert data collectors, logistical complexities, are often expensive, are typically poor at monitoring cryptic organisms within the ecological community, and, only collect a snapshot in time of the target site, rather than continous long-term data (Mooney et al., 2020;Munger et al., 2022).Moreover, conservation and restoration programmes are typically limited by time and resources, making it a challenge to sufficiently report on their progress using these methods (Boström-Einarsson et al., 2020;Rilov et al., 2020).
Automated passive acoustic monitoring (PAM) of whole soundscapes has the potential to address many of these limitations (Lindseth and Lobel, 2018;Mooney et al., 2020, Lamont et al., 2021).Low-cost acoustic recording technology capable of recording continuously for several days, or longer with duty cycling, is becoming available (Chapuis et al., 2021;Lamont et al., 2022).These devices can be deployed rapidly by non-expert data collectors and left to record autonomously.These can collect data on cryptic organisms that disproportionately rely on acoustic communication compared to non-cryptic species (Lamont et al., 2022).A growing number of studies have found relationships between the soundscapes of marine habitats and traditional ecological metrics such as benthic cover, fish communities and overall habitat quality using automated approaches (Nedelec et al., 2015;Butler et al., 2016;Freeman and Freeman, 2016;Harris et al., 2016;Gordon et al., 2018;Elise et al., 2019).As well as being useful for the tracking of habitat characteristics, soundscapes also constitute important components of ecosystem functioning, especially for orientation and recruitment of a variety of organisms (Simpson et al., 2005;Lecchini et al., 2018;Gordon et al., 2019).Surveying reefs using acoustics can therefore provide new understandings that may explain both ecological and behavioural processes.
Given recent innovations in autonomous hydrophone technology, our ability to capture large databases of long-term soundscape recordings is expanding (Sousa-Lima et al., 2013;Lamont et al., 2022).However, analytical approaches must keep pace with this if the potential of these data is to be maximised.Early analysis was based on aural assessments or visual inspection of spectrograms to score characteristics such as the frequency of occurrence and diversity of acoustic events like fish vocalisations (Putland et al., 2017;Archer et al., 2018;McWilliam et al., 2017McWilliam et al., , 2018;;Bertucci et al., 2020;Lamont et al., 2021).However, these approaches can be slow and labour intensive, introducing a severe limit on the speed at which PAM data can be analysed.
Computationally generated ecoacoustic indices are becoming a popular approach to overcome this limitation (Gibb et al., 2019).Ecoacoustic indices have primarily been developed for terrestrial habitats (Sueur et al., 2014) where they are used to quantify soundscape attributes including variability across time and/or frequency bands (Stowell and Sueur, 2020).These indices can be automatically derived from long-term acoustic recordings, enabling extended recordings to be analysed.Several indices have been tested recently in the marine environment, revealing relationships between these and attributes of the ecological community, habitat quality and ecological functioning of marine habitats (Harris et al., 2016;Lindseth and Lobel, 2018;Gordon et al., 2018;Elise et al., 2019b;Mooney et al., 2020).
These studies have primarily used individual ecoacoustic indices to test for differences between sites (e.g., low-and high-quality habitats, reefs pre-and post-bleaching on reefs), or relationships with other ecological metrics (e.g., biodiversity, abundance etc) so far.However, these indices do not perform consistently across all marine investigations (Kaplan et al., 2015;Bertucci et al., 2016;Dimoff et al., 2021).Results from any individual index can also be biased by individual components of the soundscape, such as a high density of snapping shrimps or repetitive fish chorusing, limiting their utility to characterise the wider community (Staaterman et al., 2013;Bolgan et al., 2018;Dimoff et al., 2021).Some terrestrial soundscape ecology investigations have attempted to overcome similar performance issues through combining several ecoacoustic indices to generate multivariate representations of acoustic recordings known as 'compound indices' to generate a more holistic representation of the soundscape (Eldridge et al., 2018).These compound indices can then be input into machine learning algorithms which are able to identify relationships between this 'feature set' of indices and the task asked of the algorithm through finding patterns and interactions between the indices.(Eldridge et al., 2018;Bradfer-Lawrence et al., 2019;Gibb et al., 2019;Sethi et al., 2020).
Such tasks include supervised approaches such as classification problems which group soundscape recordings into categories specified by the researchers (e.g logged or unlogged forest) or regression tasks which can place recordings along a gradient (e.g avian diversity) (Sethi et al,. 2020(Sethi et al,. , 2021)).Alternatively, unsupervised approaches can be used such as clustering which can be used to identify key groups from recordings, or, anomaly detection can be used to identify recordings which significantly deviate from the majority (e.g containing noise pollution or a rare animal chorus) (Sethi et al., 2020).
In this study, we use whole soundscape recordings to test whether a machine learning algorithm could be trained using a compound index to accurately classify recordings from two different ecostates of a marine habitat.We used coral reef soundscapes due to the diversity of sounds present on these ecosystems and known relationships between their soundscape and ecological attributes (Kaplan et al., 2015;McWilliam et al., 2017;Nedelec et al., 2015).We then trialled this model in a real world application using recordings from neighbouring reefs which had been actively restored by one of the world's largest reef restoration programmes, located in Sulawesi, Indonesia.Coral restoration programmes typically fail to collect adequate monitoring data (Bayraktarov et al., 2019;Boström-Einarsson et al., 2020).Through demonstrating the utility of this approach, we intend to highlight a more efficient and costeffective means of field data collection for the monitoring of restoration and other marine conservation projects alike.

Acoustic recordings
The recordings were made across the seven sites during August-September 2018 as part of the MARRS monitoring programme.We used a regime which sampled one hour blocks from sites five days either side of the full moon (26th August 2018) and three days either side of the following new moon (10th September 2018) during daylight (09:00-15:00), twilight (half an hour either side of sunrise and sunset) and night time (half an hour either side of midnight) periods.Three SoundTraps (SoundTrap 300STD, 48 kHz sampling rate, Ocean Instruments, Auckland, NZ) were used, with one suspended 0.5 m above the seabed for each deployment at a site.In each new round of deployments, SoundTraps were assigned randomly to recording sites within a counterbalanced blocking design, in order to control for potential instrument error.
We sub-sampled five non-overlapping one-minute segments from each of the hour-long periods at random.Only samples recorded during calm conditions (wind speed < 20 km h − 1 ) were used.These samples were also screened for motorboat noise and any recordings with this disturbance were removed, resulting in 262 recordings in the final sample set.This sample set was previously used in Lamont et al. (2021) to compare fish sound diversity between sites.

Processing recordings
Each recording was filtered using a short-term Fourier transform band-pass filter into three frequency bands: low-frequency (0.05-0.8 kHz), medium-frequency (2-7 kHz) and a broadband (0.05-20 kHz).The low-frequency band covered all the known fish vocalisations within the dataset (Lamont et al., 2021), while the medium-frequency band comprised invertebrate (primarily snapping shrimp) sound (Elise et al., 2019a).The broad-frequency band was used to encompass the full spectrum of potentially relevant frequencies, as previously used in coral reef soundscape investigations (Kaplan et al., 2015;Lyon, 2018).Frequencies below 0.05 kHz were excluded from the low-and broadfrequency band recordings to remove geophonic noise and self-noise from the recording system (Curtis et al., 1999).All processing was performed in R (v3.4.2.R Development Core Team, 2020): audio files were read and written using tuneR (v.1.3.3)(Ligges et al., 2018) and the filters were implemented using Seewave (v2.1.6)(Sueur et al., 2008).

Ecoacoustic indices
Twelve ecoacoustic indices were chosen from a range of previous soundscape studies (Table 1).Each index was calculated for all three frequency bands, with two exceptions: Snap rate was only calculated for the middle-and broad-frequency bands, because snapping shrimp cavitation bubbles are not audible at lower frequencies (Bohnenstiehl et al., 2016), and the normalised difference soundscape index (NDSI) was only calculated for the broad-band recordings.NDSI is typically used to quantify discrepancies in amplitude between an anthropogenic noise band up to 1 kHz and a biophonic noise band at selected higher frequencies (Kasten et al., 2012).We instead used this index to quantify differences in the 1 kHz band where fish noise dominates, and, a higher 2-7 kHz frequency band where snapping shrimp sound dominates (Au and Banks, 1998).Thus, we established a feature set of 33 index values across 12 indices and three frequency bands for each of the 262 oneminute recordings.All indices were calculated using the R package Seewave (Sueur et al., 2008) where possible, all remaining indices were calculated in Soundecology R package (v.1.3.3)(Villanueva-Rivera et al., 2018) other than SPL which was calculated in paPAM (Nedelec et al., 2016) and snap rate which calculated using a custom MATLAB script (Gordon et al., 2018).

Selection of indices to differentiate healthy and degraded habitats
All 33 indices were examined for separation between recordings from healthy (n = 81) and degraded (n = 71) habitats using Mann-Whitney U tests.Since the purpose was to explore the potential of each candidate index for model development, we did not control for spatial and temporal pseudoreplication within the library of recordings, nor control for cross-correlation between indices with similar trends.Violin plots were used to visualise the degree of overlap between the distributions for indices with indicative differences.Where minimal overlap was observed between the two habitats, the index could be considered likely to provide a promising measure with which to differentiate between healthy and degraded habitats.

Machine-learning approaches to develop a compound index
Following analysis of individual indices, we developed a supervised machine-learning model to assign recordings to either healthy or degraded habitat classes.A regularised discriminant analysis (RDA) algorithm was selected to account for the high level of collinearity reported between indices (Supp.1).An optimised set of indices was selected in a 'feature selection' stage, using recursive feature elimination (RFE) and a multivariate adaptive regression spline (MAR) (Kuhn and Johnson, 2019) (Supp.1).The RFE highlighted increases in model accuracy with the multi-index approach as additional indices were added sequentially (Supp.1, Fig. S1).Predictive accuracy was greatest with eight indices, followed by a gradual decline as the addition of further indices introduced noise and/or caused model overtraining.The list of suggested features from RFE included the following index/frequency band combinations: broadband ACI, H, NDSI and H t ; and mediumfrequency band ACI, BI, H and H t .This was highly congruent with rankings obtained from the relative importance scores using the MAR (Fig. 3).
Following RFE, further manual feature selection was conducted by systematic removal and addition of indices whilst executing the full model, to select a final feature set with the lowest misclassification rate.This led to discarding H t in both the broad-range and middle-frequency bands, and the introduction of low-frequency band ACI and middlefrequency band AR.Thus, the final set was: low-frequency band ACI, medium-frequency band ACI, AR and BI, broadband ACI, H and NDSI.Feature selection was performed using the R packages mlbench (v2.1.1)(Leisch and Dimitriadou, 2010) and Caret (v.6.0-86)(Kuhn, 2020).

Constructing the final model
An RDA model was constructed using the healthy and degraded datasets, using the R packages MASS (v.7.3-53) (Venables and Ripley,

Table 1
Twelve ecoacoustic indices calculated from recordings with summary description of the mechanistic principle, software used and respective settings employed.et al., 2005).Model accuracy was assessed using k-fold cross validation (10 folds), whereby the dataset was partitioned into 'training' and 'test' sets to prevent overestimation of model accuracy when presented with new data (Supp.1).To account for variation in the RDA, 1000 repeats of the cross-validated model construction were performed to assess accuracy (Rao et al. 2008).
A principal component analysis (PCA) and a pairs plot comparing each combination of the eight selected indices were generated for all recordings.These were used to test whether soundscape properties from restored reefs diverged from the healthy and degraded classes, which would lead to potentially inappropriate classifications using the RDA trained on the healthy and degraded recordings.Both tests were conducted in R using inbuilt functions (Supp.1, Fig. S4).

Comparing indices between healthy and degraded sites
Exploratory Mann-Whitney U tests revealed significant differences between healthy and degraded habitat index scores for 12 of the 33 indices (Fig. 4).Bonferroni corrections were also used to reduce the likelihood of false positives in the search for a significant difference; the original alpha value of 0.05 was therefore divided by 33 to provide a new value of 0.00152.Using this more conservative approach no longer reveals AI in the broad and medium-frequency bands, as well as AEI in the medium-frequency band, to be significantly different.
Violin plots of the three most significantly different index results B. Williams et al. between the healthy and degraded sites revealed large zones of overlap between values for these indices between habitat classes (Fig. 5).The strongest significant difference was reported for H in the 2-7 kHz band, here 71 of the 152 (47 %) of recordings reported results that did fall in range of both habitat types.

Comparing indices to phonic richness
We searched for a correlation between each ecoacoustic index and the diversity of fish sounds present in each recording using the 'phonic richness' method which counts the number of unique sound types audible in each recording (Lamont et al., 2021) (Supp.1).This revealed no strong relationships (Pearson correlations) between phonic richness and any of the 33 indices trialled (Supp.1, Fig. S3).The strongest relationship was a weak negative correlation with the acoustic entropy index (H) for the broad-frequency band (Pearson correlation; rho = -0.43;p < 0.001), with all other indices reporting weaker correlations than this.

Regularised discriminant analysis
From the 1000 repeated constructions of the cross-validated model using the 152 recordings taken across healthy and degraded sites, the pooled mean misclassification rate was 8.27% (0.84, mean SE).Across these model constructions, of the 81 recording samples taken from the two healthy sites, 73.0 (0.1) of these were correctly classified as healthy, with 8.0 (0.1) misclassified as degraded.Of the 71 recordings taken from the two degraded sites, 67.2 (0.1) of these were classified as degraded, with 3.7 (0.1) misclassified as healthy (see Fig. 6 for individual results for each recording sample).
Cluster analysis using the principal component analysis (PCA; Fig. 7) and a pairs plot (Supp.1, Fig. S4) were used to examine whether the 110 samples taken from recordings of the three restored sites overlapped with recordings from the control sites.If they deviated then it would be likely that the model would provide inappropriate classifications to these sites.For the mature restored and newly restored sites, 70/81 and 70/71 samples respectively fell within one or both of the predictive ellipses for the two existing classes.This indicates that the soundscapes of the restored sites did not diverge from the soundscape present at the other two habitat types when using the properties investigated here.This supports the inputting of restored samples into the model as the data were not divergent from the original training data.Additionally, the PCA showed that 61/81 samples from the mature restored sites fell within the ellipse that could be used to predict healthy sites, whereas 24/27 samples of recordings from the newly restored site fell within the ellipse that could be used to predict degraded sites.However, it is important to note that there was a large region of overlap between the healthy and degraded class when using only the two dimensions shown by the PCA, with most of the ellipse of the degraded classes encompassed by that of the healthy class.
Analysis of the restored site samples revealed that the majority of samples from mature restored sites were classified as healthy, but  samples from the newly restored site were mainly classified as degraded (Fig. 8).The Bontosua mature restored site was classified more clearly than the Badi mature restored site, with 37/38 and 33/39 samples classified as healthy respectively.The six samples classified as degraded from the Badi mature restored site occurred consecutively on the new moon at night.At the newly restored site, 27/33 samples were classified as degraded, and all of these were during the full moon (though only four new moon samples were available) and five of these were at night.
The model trained on the 2018 recordings was also tested on a smaller number of recordings taken at the same sites 10 months later (June/July 2019).Here, the model provided similar predictions for six of the seven sites; while one site (Healthy Bontosua) exhibited a change in prediction between 2018 and 2019, changing from primarily being classified as healthy to degraded (Table 2, full results in Fig. S5).

Discussion
This study compared the value of individual ecoacoustic indices and a machine-learning model trained on a compound index to discriminate between coral reef ecostates.While no single ecoacoustic index could reliably discriminate between healthy and degraded reefs, a supervised machine-learning approach more accurately predicted habitat class from randomly drawn acoustic samples.This highlights the potential of combining PAM with machine learning for monitoring the health of marine ecosystems.
Twelve individual ecoacoustic indices were calculated in up to three frequency bandwidths, totalling 33 values, of which 12 indices were significantly different between healthy and degraded reefs (Fig. 4).There were no strong correlations between any of these indices and Fig. 6.Machine learning classification of acoustic samples from the healthy and degraded sites.Each cell indicates a single one-minute recording from the 152 taken in healthy and degraded habitats.The model was executed 1000 times on the dataset, generating a new habitat class prediction each time for every recording.Values within cells represent the proportion of these 1000 iterations in which the recording was predicted as originating from a healthy site, with the remaining being predicted as degraded (green shading: >0.5; pink shading: < 0.5).Recordings on the left of the partition were made during the day and recordings to the right were made during crepuscular or nighttime periods.Although frequent gaps were present in the sampling regime, the order with which cells are presented within their respective blocks conserves the overall order with which they were sampled across time.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)phonic richness (Supp.1), indicating that fish sound diversity alone is not the dominant driver of these results; rather alternative aspects of the soundscape are responsible.A metric that combines abundance alongside diversity of fish vocalisations may reveal more about the role these play in driving index values.Other contributors may originate from alternative biotic or abiotic sources.Invertebrates are well documented contributors to reef soundscapes including snapping shrimp (Bohnenstiehl et al., 2016), urchin feeding activities (Radford et al., 2008) and the movement of hard shelled organisms (Freeman et al., 2014).The photosynthetic process of macro algae has also been reported as a sound producer (Freeman et al., 2018).Though understudied, abiotic attributes may also influence the soundscape as ecostates change (Duarte et al. 2021).Though recordings were taken in calm conditions, low level geophonic noise produced by waves and wind may propagate differently through a rubble field compared to a more structurally complex reef, or the regular movements of unconsolidated coral rubble due to hydrodynamic forcing might also contribute to the soundscape (Kenyon et al. 2020).
The distribution of values within indices showing significant differences between healthy and degraded reefs exhibited substantial overlap between the two habitat classes.The best performing individual index was H in the 2-7 kHz band.For this index, 71 of the 152 (47 %) of recordings had non-overlapping values, meaning they could not be correctly classified as healthy or degraded, all other recordings were ambiguous as the values fell within the range reported for both habitat types.This means that the ability to distinguish between habitat classes from a single recording using an individual index is low, as any given value from one class has a high chance of being reported from a recording in the other class.Violin plots of the three most significant results demonstrate this large overlap between the values of each class (Fig. 5).In isolation, these indices could discriminate between habitats if extensive sampling is achievable for all sites of interest to build up a dataset that can be tested for statistical significance, as demonstrated here.However, their potential to deliver reliable results from short 'snapshot' recordings is lower and their ability to deliver insights into more complex tasks, beyond a coarse healthy-degraded comparison, may be limited.
In contrast to bivariate analyses, combining multiple indices with regularised discriminant analysis (RDA) gave a strong predictive ability to classify habitats based on single recordings.Recursive feature elimination (RFE) highlights the increase in accuracy attainable through constructing an optimised set of multiple indices (Supp.1, Fig. S1) compared to using individual indices (Fig. 5).The misclassification rate of the final RDA model was 8.27% (0.84, mean SE) when applied to recordings from the same season.The model made accurate predictions despite being kept blind to diel and lunar period, which are both known to influence marine soundscapes (Staaterman et al., 2014); this highlights its robustness to temporal changes in soundscapes.The model also reliably delivered the same classification for recordings from six of the seven sites taken ten months later.The feature-selection stage of this approach is specific to the data and questions considered in this study.However, indices within the final feature set may offer a useful starting place for similar investigations elsewhere.To produce optimised models, investigations at new locations addressing new questions should carry out an independent feature selection process on their own data.
Following the successful classification of healthy and degraded habitats, our compound index based model was used to examine soundscape recordings taken from nearby coral reef habitats that had been restored (Williams et al., 2019).This tested the ability of this approach to perform a rapid assessment of these restored sites using oneminute soundscape recordings.The model was able to detect differences between the two mature restored sites and the newly restored site.Of the recording samples from the two mature restored sites, 33/39 and 37/38 were primarily classified as healthy, whereas 27/33 samples from the newly restored sites were classified as degraded (Fig. 8).The mature restored sites were more than twice as old as the newly restored site (restoration started >24 months prior to recordings on mature restored sites, compared to <12 months for the newly restored site), and had approximately three times more live coral cover (79.1% ± 3.9 and 66.5% ± 3.8 for the mature restored sites, 25.6% ± 2.6 for the newly  restored site; values all % live coral cover mean ± SE; full data in Supp.1).Restoration progress was clearly detected in the soundscape, with better classification made possible by using a machine-learning-driven approach, suggesting PAM can be a useful tool for monitoring restoration against reference sites.More generally, this study highlights the potential for using machine-learning approaches to explore PAM data to provide greater analytical power in coral reef monitoring programmes.Further improvements include considering sources of observed error in the model, which could be due to several factors working in isolation or in combination.The RDA approach operates best when input features have Gaussian distributions (Wu et al., 1996), but some features used in this study exhibited sub-Gaussian distributions.This is likely due to inclusion of samples from various times of day and from multiple sites.Diel trends are frequently detected in reef soundscapes across a range of ecoacoustic indices (Kaplan et al., 2015;Bertucci et al., 2020b;Carriço et al., 2020).Additionally, reef soundscapes can differ over small spatial scales (Putland et al., 2017).In this study, samples were taken from spatially separated sites to reduce pseudoreplication, thus differences within habitat classes are likely.Both these factors may have skewed the distributions of the feature sets.Furthermore, the dataset used to train the model will also contain natural outliers through ecological randomness that cannot be resolved at the sampling resolution employed.Longer periods of recording can be used to minimise impacts of this natural variation (Bradfer-Lawrence et al., 2019), though we show here that short periods of recording can still be used to identify accurate classifications between significantly different habitat states.
Six of the seven sites studied in 2018 retained similar classifications when resampled 10 months later in 2019.The outlier was Bontosua Healthy, for which 10/12 recordings were incorrectly classified as degraded.Recordings at this site were only collected during the day in 2019, and 9/12 of these were taken during the new moon period.The soundscape may therefore have been inadequately sampled.Alternatively, this could be an early indicator of a changing state of reef health at this site, not yet seen in the coral cover data, which was similar in both years (Supp.1), or, this could demonstrate the specificity of this model to the time it was taken.
Future investigations could build on the present study by considering a more nuanced approach to classifying ecostate.For example, this study employed a binary classification of reef health but, in reality, marine habitats occur across gradients of ecostates (Downs et al., 2005;Smith et al., 2008).Sampling across these gradients, and using regressionbased algorithms such as logistic regression, random forests or neural networks could support models that can predict on a continuous scale.Additionally, within the context of coral reefs, although live coral cover may be a strong indicator of overall reef health (Smith et al., 2016;Dietzel et al., 2020), other attributes of interest could be incorporated to better determine the ecostate of a site.For example, soundscape-based machine-learning models could be trained to predict metrics which are effort and training-intensive, such as fish or invertebrate abundance or diversity, and other habitat attributes could be explored such as structural complexity or ecosystem stability.Similar approaches could also be applied to other kinds of marine habitats where soundscape research has so far been limited (Pieretti and Danovaro, 2020).By drawing comparisons with a wider range of traditional metrics used in marine monitoring, the potential for machine-learning-based analyses of ecoacoustic recordings can be further developed.

Conclusion
Given the increasing availability of hydrophone technology (Chapuis et al., 2021;Lamont et al., 2022), acoustic datasets from the marine environment are set to rise.
Automated analyse is needed to efficiently process and analyse large acoustic datasets so that insights from the information held within can be maximised.However, this is so far underdeveloped.Our investigation presents an automated approach, through the use of a compound index and machine-learning, that improves upon existing approaches used in the marine environment.We first demonstrate the use of this to classify coral reef habitats into healthy or degraded ecostates based on shortterm recordings.We then demonstrate this in an applied setting, highlighting the utility of this approach when assessing areas of restored reef, revealing that restoration progress is detectable in the soundscape.This investigation provides the first evidence that compound indices and machine learning are able to outperform the use of single ecoacoustic indices on a tropical reef and that this approach should be considered for use in other marine and terrestrial habitats applications.

Fig. 1 .
Fig. 1.Location and habitat class of the seven reef sites, present within the broader Spermonde Archipelago, Indonesia (A) where soundscape recordings were collected.Fringing reefs from two nearby islands: Bontosua (B) and Badi (C) were used.Modified from Lamont et al. (2021).

Fig. 2 .
Fig. 2. Representative habitat and coral cover images from the four habitat classes at which soundscape recordings were made.(A) Degraded, (B) healthy, (C) newly restored and (D) mature restored.

Fig. 3 .
Fig. 3. Relative importance rankings of indices obtained from the multivariate adaptive regression (MAR) analysis used for feature selection.The eight recommendations obtained from the recursive feature elimination (RFE) analysis are indicated by the black lines.The top eight indices of the MAR analysis were congruent with the eight recommendations from RFE, though the order was not conserved.Black dots to the right of bars indicate features which were selected for the final model after further manual feature selection.

Fig. 4 .
Fig. 4. Heat map results from Mann-Whitney U between the ecoacoustic index scores calculated from recordings of healthy (n = 81) and degraded (n = 71) sites in low-, medium-and broad-frequency bands.The habitat class with the higher mean is indicated by the letter in bottom right of each cell (H = Healthy; D = Degraded).Blank cells indicate indices for which values from the corresponding frequency band were not calculated (see Methods).

Fig. 7 .
Fig. 7. Plot from the principal component analysis of PC1 and PC2 scores for the Healthy and Degraded site recording samples.Samples from recordings of Restored sites are overlaid on this to help determine whether these conform with either of the two existing classes or whether the properties of their soundscape are distinct.Ellipses indicate the zone within which a new sample can be assigned to a class using the two principal components presented in this figure.Overlapping areas indicate ambiguous recordings which cannot be differentiated by PCA.

Fig. 8 .
Fig. 8. Machine-learning classification of acoustic samples from the restored sites.Each cell indicates a single one-minute recording from the 110 taken from restored sites.The model was executed 1000 times on the dataset, generating a new habitat class prediction each time for every recording.Values within cells represent the proportion of these 1000 iterations in which the recording was predicted as originating from a healthy site, with the remaining being predicated as degraded (green shading: >0.5; pink shading: <0.5).Recordings on the left of the partition were made during the day and recordings to the right were made during crepuscular or nighttime periods.Despite gaps in the sampling regime, the order within blocks conserves the overall order with which they were sampled across time.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Table 2
Results from the application of the 2018 model when tested on recordings taken at the same sites in 2019.
B.Williams et al.