Pseudoreplication of sound treatments in underwater exposure studies

Sound exposure studies require replicated sound treatments for the results to be representative for sound classes in general. Additionally, reused treatments in replicated designs need to be accounted for statistically. The lack hereof is referred to as simple and sacri ﬁ cial pseudoreplication, respectively, and results should be interpreted accordingly. We quanti ﬁ ed the occurrence of these issues and subsequent interpretation of results in 104 underwater sound exposure studies (2019 e 2023). The majority of the studies (85%) did not replicate sound treatments. From the ones that did, most did not statistically acknowledge the hierarchical structure of the data. Unreplicated treatment designs limit the generalizability of the ﬁ ndings. Nevertheless, only small differences were found in how the results of unre-plicated and replicated treatment designs were interpreted. This commentary aims to provide guidance in the design, analysis and interpretation of sound exposure studies, which are equally valid for aquatic and terrestrial research. © 2024 The Author(s). Published by Elsevier Ltd on behalf of The Association for the Study of Animal Behaviour. This is an open access article under the CC BY license (http://creativecommons.org/licenses/ by/4.0/).

The effects of sound on humans and nonhuman animals are increasingly gaining societal and scientific interest (Duarte et al., 2021;Kunc & Schmidt, 2019;WHO, 2011).It is estimated that 20% of the European Union population lives in areas where traffic noise levels are harmful to health (EEA, 2020).Increasing evidence shows that nonhuman animals are also impacted by noise pollution.Over the last 20 years, there is growing interest for research that focuses on the effects of underwater sound on aquatic animals (Duarte et al., 2021).This interest stems from the important role sound plays in communication and orientation of animals within the underwater environment, while this is also a noisy environment due to human activities.Numerous studies aim to gain insight into the effects of specific sound sources on aquatic animals and compare the effects of different sources or classes of sound.This provides input for impact assessments and mitigation measures and can yield insight into animal cognition.To draw valid conclusions from exposure experiments, sound treatments should be designed with care, and their effects should be analysed and interpreted accordingly.
It is widely accepted that animal experiments require sufficient sample sizes, in this case animal subjects, to draw more robust conclusions, control for confounding factors and enhance the generalizability of research findings.Animal subjects exhibit individual variability in terms of genetics, behaviour and physiology (MacKinlay & Shaw, 2023).Hence, individuals can exhibit unique characteristics or responses that may not be representative of the overall population.By replicating trials with a sufficiently sized sample of animals, results are more likely to provide meaningful insights into the broader population (Tipton et al., 2017).
Interestingly, in sound exposure studies, it appears less common to replicate sound treatments as well (Hurlbert, 1984;Kroodsma et al., 2001).However, sounds of one sound class can also vary due to individual, and potentially unique, traits of the sound source.Additionally, a sound recording can be confounded by additional unwanted and unnoticed sound sources (Slabbekoorn, 2013).Hence, exposing all animal subjects to a single sound treatment as representative of a class of sounds is referred to as simple pseudoreplication (Kroodsma et al., 2001).By using sufficient sound treatments in independent trials, the obtained results on the responses of animals are more likely to be representative of the response to this sound class in general (Slabbekoorn & Bouton, 2008).A suitable, truly replicated sound treatment design would be either to expose each subject to a unique sound treatment (so, used once) of a class or to use multiple sound treatments to represent one class, and test multiple animals per sound treatment.
When multiple animals are individually tested using one of the sound treatments of a sound class, it is important to recognize that the animals that were exposed to the same sound treatment, even when individually tested, are not true independent replicates (Hurlbert, 1984;Kroodsma et al., 2001).This means that the results cannot simply be pooled in the analysis (Machlis et al., 1985).Such unjustified pooling is referred to as sacrificial pseudoreplication (Hurlbert, 1984).To compare two sound classes that each comprises multiple treatments, one should ideally use models that can handle data with hierarchical or nested structures, such as a nested ANOVA or mixed-effect model (Kroodsma et al., 2001;Millar & Anderson, 2004;Wiley, 2003).Attributing part of the observed variation to the individual treatments may actually increase the statistical power (Millar & Anderson, 2004).
The exposure design of an experiment affects not only the statistical methods that should be used, but also the conclusions that can be drawn from the study.For instance, speakers may not be able to mimic the sound field generated by the original source completely due to low-frequency limitations or the speaker being a point source.Consequently, studies that use speakers should be careful in generalizing their results to the sound of the original sound source (e.g.boat noise).Additionally, one should be aware that the actual source (e.g. a boat) may affect animals not only through sound, but also through stimulation in other modalities.Lastly, studies that have only used a single sound treatment to represent a sound class cannot draw conclusions about the sound class in general, because of the previously mentioned potential confounders in that sound treatment (Slabbekoorn, 2013).
For the current paper, we examined previously published underwater sound exposure studies and identified the sound treatment designs, analyses and interpretations.Using these results, we discuss to what extent pseudoreplication still occurs in this field and what the consequences are.The occurrence of pseudoreplication in terrestrial exposure studies has been quantified before and this has decreased over time (Kroodsma, 1991;Kroodsma et al., 2001).Now that underwater exposure studies are proliferating, it may be useful to map pseudoreplication here too.This commentary should provide guidance for all future (underwater) sound exposure experiments and can hopefully contribute to reducing pseudoreplication.Despite the focus of the literature survey on underwater sound exposure studies, all principles apply to in-air studies on terrestrial organisms too.

METHODS
We used Web of Science (see search query in the Appendix) to find papers on underwater sound exposure studies, published from 2019 to 2023, in English.We screened the resulting records (N ¼ 495) on title and abstract.Records were eliminated if they did not contain sound exposure experiments, if they focused on humans or plants, and if they used stimuli that were not recorded underwater (e.g.cars, aeroplanes) or artificial stimuli only.The remaining records (N ¼ 95) were assessed for eligibility based on the full text: one record was excluded during this round because it lacked information regarding composition of playback sound stimuli and sound sources.Additional records (N ¼ 10) were added at this stage, gathered by looking up referenced articles from the earlier identified records.A flow diagram of the systematic literature search is provided in Appendix Fig. A1.
All included papers (N ¼ 104) were scored in the subsequent step.All studies compared two or more sound classes: for example, ambient sound (as control) and boat sound, or one or more sound class(es) and silence.We scored whether only one sound treatment was used for each sound class (yes/no).In other words, were all subjects, per sound class, exposed to the same sound treatment?Sound classes that were just silence playback were ignored.We also scored how many unique stimuli (recordings or sources) were used to make the sound treatment(s) of each sound class.Multiple recordings of, or exposures to, one individual source were scored as one stimulus or treatment.Multiple treatments that used the same recordings (but e.g. in a different order) were regarded as one.When the number of stimuli was unknown (but clearly more than one per sound class) or uneven between sound classes, we scored the number as '>1'.Sound classes with artificial stimuli (e.g.white noise, pure tones) were ignored for the scoring.The two scores allowed us to differentiate between three different stimulus designs: (1) all subjects were exposed to the same sound treatment per sound class, and the sound treatment consisted of one stimulus only; (2) all subjects were exposed to the same sound treatment per sound class, and the sound treatment consisted of a combination of multiple stimuli; or (3) the subjects were exposed to different sound treatments per sound class, consisting of different stimuli.A number of papers consisted of multiple experiments, but in those cases, either only one experiment met the inclusion criteria or the different experiments fell into the same stimulus design category and was therefore treated as one study.For papers with the third treatment design, we scored whether replicated (reused?) sound treatments were taken into account in the statistical analyses.Additionally, we scored whether the subjects were exposed to sound using a speaker or by the original sound source.To get an impression of the generalizations made by the authors based on their studies, and whether these fitted their experimental designs, we checked the title and abstract of each paper and noted whether these contained any definitive conclusions about (1) the actual source of the sound, when the study had used playbacks, (2) only the sound of a source, when the study had used the source with potentially also stimuli in other modalities, and (3) the sound class as a whole.We present the results below, give examples and directly discuss the implications, informed through previous perspective papers.

Sound Treatment Design
The systematic literature analysis of underwater sound exposure studies revealed that the majority of the studies were performed using a single sound treatment per sound class, each class consisting of one stimulus only (Fig. 1a).This means that the results of these studies are only valid for the sound treatments used and cannot be extrapolated beyond them.Depending on the goal of the study, and given that appropriate statistics are used, this may not be problematic.For example, if only one boat or boat type is allowed in a specific area, and the goal is to study the effects of this specific boat on the local ecosystem, then generalization beyond the tested treatments is not important.When such a study is performed using speaker playback of recordings, it is still recommended to use different recordings from the boat or preferably multiple boats of the specific type to dilute and control for potentially unknown biases in a recording.It is under debate whether a study examining a single stimulus should even use inferential statistics or rather descriptive statistics only (Cottenie & De Meester, 2003;Hurlbert, 1984;Oksanen, 2001).Most importantly, for both authors and readers of such papers, it should be clear that the results may only be valid for the stimuli used.
A study that highlighted the need for sufficient replicated sound treatments exposed foraging shore crabs, Carcinus maenas, to one of six ambient and six boat playbacks (Hubert et al., 2021).The time the crabs used to reach the food was scored.This foraging time was similar during all six ambient playbacks.For five of six boat playbacks, the foraging time was faster than during the ambient playbacks, while one boat playback resulted in slower foraging times.If only the deviating boat playback had been used, the outcome of the study would have been completely reversed.The variability in responses to the boat playbacks used may also reflect the variability in response levels in situ with actual sound sources.
Almost a fifth of the papers used multiple stimuli per sound class but combined them in single sound treatments that were used for all subjects (Fig. 1b).It may appear as if stimuli were replicated; however, the response of the animals may be a result of just one of the stimuli.This means that the sound class is not truly replicated and that the results still cannot be extrapolated beyond the specific sound treatment.Again, it depends on the goal of the study, the statistical analysis and interpretation of the study whether this is really problematic, but the potential for generalization is probably limited.Both designs (Fig. 1a and b) can be labelled as simple pseudoreplication (Kroodsma et al., 2001).It should be mentioned that some (field) studies are logistically very challenging, timeconsuming and expensive (e.g.McQueen et al., 2022), especially in long-term studies.Despite the absence of replication for the sound treatment, these studies can still contribute significant value to the field due to the scarcity of such data, if one is aware of the limitations.
Only 16 of the 104 papers used multiple sound treatments per sound class, consisting of different stimuli (Fig. 1c).This is a design with true replication and more readily allows for generalization and extrapolation of the results, especially when there is little variation between the effects of individual sound treatments of a sound class.In that case, it is likely that untested sound treatments that fall acoustically within the variation of the tested sound treatments will elicit a similar effect.When different sound treatments of one sound class seem to cause opposing effects or the effects vary markedly, it is more complex to generalize the results.

Subjects
Sound A sound treatment consisting of either the repetition of a single stimulus, or a sequence of different stimuli.This is equivalent to a single long stimulus for the entire trial, and any other combination of different stimuli, respectively.
Circle diagram of all papers that were scored; the dark sector indicates the proportion of papers to which this sound treatment design applies. (1) (2) (3) Figure 1.Overview of three sound treatment designs to compare effects of two sound classes, the summarized implication, and the occurrence of such a design in the literature (N ¼ 104 included papers).The majority of the papers used a single sound treatment (per sound class), each consisting of a single stimulus (1).These results cannot be generalized beyond the sound treatment used because it may not be representative of the sound class in general, and a single recording may be confounded by other unwanted sources.Less than one-fifth of the papers used a single sound treatment (per sound class), each consisting of multiple stimuli (2).These results may be explained by just one of the stimuli in the treatment, so the same risks as for design 1 applies here.Only the smallest category of papers used multiple sound treatments (per sound class), consisting of different stimuli (3).The latter generally leads to results with the highest external validity, because the treatments more likely represent the sound class as a whole and the impact of confounded treatments is diluted.
Examination of the acoustic characteristics may inspire new hypotheses and experiments about sound impact.Additionally, it would be valuable to have information on the frequency of occurrence of each stimulus in situ.
A point for consideration in the latter category is that three of the 16 studies only used two sound treatments per sound class, and five only used three.Although two sound treatments are much better than just one, it may still be a poor representation of a sound class.Besides the limited replication within studies, several studies also reused treatments from earlier studies, leading to an overall limited pool of sound treatments.

Analyses
From the studies that used multiple sound treatments per sound class (N ¼ 16), the majority (68.8%;N ¼ 11; Fig. 2a) pooled the results of all treatments from a class and did not statistically attribute potential variation in results to the individual treatments.This is referred to as sacrificial pseudoreplication and inappropriate pooling (Kroodsma et al., 2001), violates the assumptions of independence of many statistical tests, and can actually reduce the statistical power of a test (Millar & Anderson, 2004).One study did not perform inferential statistics, one did not pool the results at all and three prevented sacrificial pseudoreplication by using each treatment once: Varola et al. (2021), Kok et al. (2021), andSal Moyano et al. (2023) all used sufficient sound treatments to use each of them for one animal subject only.Within our sample, we found no studies that used different sound treatments multiple times and used statistical models that handle data with hierarchical or nested structures, which is another way to prevent sacrificial pseudoreplication.

Speakers
The majority of the studies examined used speakers (74.0%;Fig. 2b) to expose subjects to sound, rather than the original sound source (24.0%;Fig. 2b), and two studies used both (1.9%; Fig. 2b).Speakers allow for an easy and controllable exposure and for manipulation of sound treatments.On the other hand, it can be difficult to mimic original sound sources using a speaker.Underwater speakers are typically unable to produce low-frequency sound and are point sources whereas sources such as boats and pile driving are not.In laboratory settings, it is often not possible to use the original sound sources.In situ, it can be logistically challenging and expensive to replicate trials with the actual sound sources.Either way, one should be aware of the opportunities and limitations of speakers and actual sources which should match the goal of the experiment and interpretation of the results.
As an aside, two of the studies that used a speaker played back natural sounds and had the explicit goal to test the effect of the played back sound rather than the original sound in order to evaluate the efficacy of playbacks in reef restoration projects (McAfee et al., 2023;Williams et al., 2023).So, a speaker can also be the desired source.

Interpretation
From the 77 papers that only used speakers for the exposure, 45 (58.4%;Fig. 2c) contained generalizations of the results to effects of the actual sound source in the title and/or abstract.This was, for instance, the case in papers that described experiments with playbacks of prerecorded boat noise, and derived conclusions about the effect of boat noise or anthropogenic noise in general from these.On the other hand, from the 25 papers that only used original sources, such as actual boats and pile driving events, five (20.0%) contained definitive conclusions about just the sounds produced by these sources in the title and/or abstract, even though the effect of other aspects of these sources could not be ruled out.
The title and/or abstract of 54 (53.5%) of the 104 papers contained definitive generalizations of results of the sound treatment(s) used to the sound class as a whole, for instance about boat noise in general.As mentioned before, studies using only a single sound treatment to represent a sound class should be cautious in drawing conclusions about the sound class in general, because of the treatment probably not being representative of the entire sound class (Slabbekoorn, 2013).However, of the 88 papers using only a single sound treatment per sound class, 45 (51.1%;Fig. 2d) contained definitive generalizations to the entire sound class in the title and/or abstract.Interestingly, an only slightly higher percentage (56.3%,nine of 16; Fig. 2d) of the papers that used multiple sound treatments per sound class contained definitive generalizations to the sound class as a whole in the title and/or abstract, even though, as mentioned earlier, this design more readily allows for extrapolation of the results to the entire sound class.

Conclusion
Exposure studies with appropriate design and testing of sound stimuli can provide insight into sound impact, mitigation and animal cognition.To be able to extrapolate results from sound exposure studies beyond the specific sound treatments used in the study, multiple sound treatments representing a sound class must be used in separate sound exposure trials.The current systematic literature analysis shows that this is not yet common practice in underwater exposure studies, which limits the external validity of results.Avoiding pseudoreplication may be more challenging for long-term studies in the field and with original sound sources rather than speakers, all of which increase the ecological validity of the study; nevertheless, one should be aware of the risks and interpret the results accordingly.To enable researchers to replicate sound treatments, we call for generous sharing of (unused) recordings.

Figure 2 .
Figure 2. (a) Type of pooling that was applied for papers with design 3 (see Fig. 1 for the three treatment designs).(b) Sound source used by all papers.The boat symbol is just an example of an original source.(c) Generalization to the original sound source by papers that used a speaker only.(d) Generalization of the sound treatment(s) used to the sound class.Note that most y-axes differ.