How background complexity impairs target detection

Camou ﬂ age is frequently used by animals for concealment and thereby improves survival. Typically, it is the animal's own colour and patterning that are expected to affect its detectability; however, the complexity of the background can also have an in ﬂ uence. Although there is a growing literature examining this, the underlying exact mechanism is unknown. In this study we addressed this issue by using humans as proxy ‘ predators ’ in a computer-based search task and monitoring their detection times for targets on varying backgrounds. By using arti ﬁ cial greyscale targets and backgrounds, we were able to isolate and manipulate the normally covarying factors that comprise ‘ complexity ’ in natural habitats. We show that reduced detection was explained not by greater information content (entropy) or higher variance in the background's features per se, but instead by reduced signal-to-noise ratio in the visual features that potentially distinguish target from background. This raises questions about when the term complexity should be used, and how observers learn the characteristics of a background. © 2024 The Author(s). Published by Elsevier Ltd on behalf of The Association for the Study of Animal Behaviour. This is an open access article under the CC BY license (http://creativecommons.org/licenses/ by/4.0/).

One of the most common forms of protective coloration is camouflage (Cuthill, 2019;Stevens & Merilaita, 2011), and the study of how an animal's colours impede detection or recognition has provided the classic textbook examples of natural selection in the wild (e.g.Cain & Sheppard, 1952;Endler, 1978;Kettlewell, 1965).Many characteristics of the camouflaged individual have been proposed to aid concealment, from matching the background (Merilaita & Stevens, 2011), through disruptive coloration (Cuthill et al., 2005;Merilaita, 1998;Stevens & Merilaita, 2009), to distracting from vulnerable areas with salient features (Dimitrova et al., 2009).However, using an artificial neural network model, Merilaita (2003) demonstrated that an extrinsic factor may also be involved in concealment: the visual complexity of the background.As natural scenes vary considerably in their complexity, it is important to understand the effects this might have on the animals that live within them.In this paper, we investigate whether an object can be concealed within a complex background even when it matches no part of that background.We show that it can, but only when it lies within the bounds of that overall background distribution.Some of the first empirical evidence to show that visual complexity has a deleterious effect on visual search was collected from humans attempting to discriminate targets from already detected objects that look similar, 'distractors' (Duncan & Humphreys, 1989), and to detect targets through 'visual clutter' on displays (Rosenholtz et al., 2005(Rosenholtz et al., , 2007)).The same effect has been found in both birds (Dimitrova & Merilaita, 2010, 2012) and fish (Kjernsmo & Merilaita, 2012) when searching for artificial targets on artificial backgrounds, each with abstract patterns of varying diversity in their component features.Similar adverse effects of complexity on search performance have also been shown in a natural setting by monitoring wild avian predation on artificial moth-like prey on tree bark (Rowe et al., 2021;Xiao & Cuthill, 2016).Further experiments with artificial backgrounds and human participants indicate that background complexity can mitigate poorer background matching to some degree (Murali et al., 2021).However, it appears that the protection that complexity brings may have limits (Murali et al., 2021;Rowe et al., 2021).
But what exactly is visual complexity?Complexity can be variously quantified in terms of objective mathematical constructs related to entropy, or perceptual metrics of 'visual clutter'.Entropy relates to the degree of randomness in the pattern and will be inversely related to how 'structured' it is.With a less varied and/or more repetitive background, the amount of information needed to encode it decreases and thus so does its entropy.Merilaita (2003) suggested that it is the amount of information required to encode the scene that generates difficulty in detecting targets against a complex background.In computational terms, the more colours there are, and the less ordered their distribution, the more bits of information are required to encode that background.
Quite conceptually distinct from this information theoretic approach, perceptual measures of complexity attempt to quantify the amount of 'visual clutter': salient features of the scene that might interfere with the detectability or identifiability of a target.For example, the 'feature congestion' metric of Rosenholtz et al. (2005Rosenholtz et al. ( , 2007) ) combines luminance contrast, colour and variation in 'edge' orientation, at different spatial scales, as a single measure of complexity (Rosenholtz et al., 2005).Surprisingly, although this measure has been successful at predicting detection success in experiments on humans and birds (Rosenholtz et al., 2007;Rowe et al., 2021;Xiao & Cuthill, 2016), the exact process by which complexity, however it is defined, impedes detection is, as yet, unclear.Potentially, if a scene has more information to encode, the processing itself induces a delay (Merilaita, 2003), or the defining characteristics of the background take longer to learn.However, it may be that it is not information content per se that impedes detection.First, the more variable the background, the greater the probability of perceptual similarity between the target's features and at least some aspects of that background.Second, greater background 'complexity' might covary with higher contrast between features in the background which, of itself, could impede visual search due to visual masking.The latter is a well-studied phenomenon whereby the visibility of a stimulus is reduced, or even eliminated, by the presence of another stimulus (e.g.Stromeyer & Julesz, 1972).One example of this is the way in which predators and prey adapt their behaviour in dynamic environments (e.g.Bian et al., 2016;Fleishman, 1985;Pohl et al., 2022), effectively employing motion masquerade to avoid detection (Cuthill et al., 2019).This approach works because masking is more effective when both stimuli involved are moving, as opposed to when, for example, a prey item moves over a static background (Matchette et al., 2018(Matchette et al., , 2019(Matchette et al., , 2020)).
Masking suggests a third possible approach to characterizing complexity.In line with the idea that an animal must learn the distribution of background and target features to discriminate between them, when a target's colours are 'bracketed' by that of the background's (in our experiments, when the background contains greys that are both darker and lighter than the greys of the target), then this discrimination should be harder.This stimulus configuration results in higher levels of masking.This is because bracketing eliminates the simple rule of finding a target by looking for something that is lighter or darker than the background.Rather, two rules are now needed: the target is both lighter than one component of the background and darker than another.Unlike the effect of within-background contrast, a deleterious effect on target detection should be particularly strong when the grey levels are perceptually closer to each other and to the target, so the masking effect is stronger.
Perceptual measures of colour difference are now well embedded in behavioural and evolutionary ecology, with the 'receptor noise limited model' (Vorobyev & Osorio, 1998) being the most widely used, and conveniently implemented in various software packages (e.g.Maia et al., 2013;van den Berg et al., 2020).In this model, the variance in each colour being compared (the 'noise') is generated at the first level of perceptual processing, photoreception.This is appropriate for comparing two homogeneous colour patches (Vorobyev & Osorio, 1998) but, in natural scenes, the background (and, frequently, the target) consists of both multiple perceptually discrete colour patches and continuous variation across surfaces.Thus, the detectability of a camouflaged animal is rarely a matter of the discriminability of 'the' animal's colour from 'the' background colour; rather, it is a matter of discriminating two distributions (Barnett et al., 2021;Endler, 1984Endler, , 2012;;Endler & Mielke, 2005).An animal needs to learn the distribution of background and target colours (and, more generally, feature properties) to make the discrimination.A knowledge of only the background distribution may allow detection of unfamiliar targets, as their features could fall outside the expected range (Chen & Hegd e, 2010).
It is hard to isolate the mechanisms by which complexity affects detectability, and the related issue of how best to define complexity, using natural backgrounds.A background with more colour patches has more features to encode, but also has more colour patches potentially confusable with the target and may have higher between-patch contrast.Therefore, to separate the different factors that covary in nature, we used an experimental protocol in which the problem was simplified to a colourless 'greyscale world' in which all target and background greys were easily discriminable, but the number of background greys and their relationship to the target greys was systematically varied.Human participants searched for targets in two computer-driven experiments.In both experiments, targets and backgrounds were random mosaics of greyscale tiles.In all treatments, no component greys of the targets matched any background greys (pairwise discrimination was trivially easy).In this way, the effect of different aspects of background complexity could be isolated independent of background matching as typically conceived (target greys matching the background greys).
In both experiments we predicted that, all things being equal, targets should be hardest to find when there are more grey levels in the background.If this is due to the information content of the background, then detection should become more difficult when the background has more grey levels, regardless of their relationship to the target greys.The feature congestion metric of visual complexity makes a different prediction: it should become more difficult to find the target when the background grey levels are of higher contrast; the number of grey levels have an effect only in so far as they affect within-background contrast.

Stimuli
The target and backgrounds were produced in R 4.0.2(R Core Team, 2020), with the target being a 4 Â 4 square of grey blocks randomly placed on a 48 Â 48 background of blocks (see Fig. 1; further examples can be found in the Appendix).Blocks were 16 Â 16 pixels.The 'palette' used for the backgrounds comprised nine grey levels, ranging from black (RGB pixel values 0, 0, 0) to white (255,255,255), equally spaced in a standard red, green, blue sRGB colour space.At 32 RGB values apart, each grey level was well above a just noticeable difference from any other grey level: the grey levels used were all easily discriminated from one another.With a good quality monitor that has been perceptually linearized, such as ours, one would expect over 700 grey levels to be discriminable (Kimpe & Tuytschaever, 2007), so even 1 grey level on the 256-point (8-bit) scale used would represent ca.three just noticeable differences (jnds; Teghtsoonian, 1971).This stimulus configuration simplified the specification of target and background attributes to one dimension: lightness.Therefore, our data can only strictly support strong conclusions about differences in lightness, but we would expect the points we make here to generalize to other stimulus attributes.The 4x4 of the target always aligned with the blocks of the background (blocks could not be partially occluded, see Murali et al., 2021) so any difference between the greys in the background and either or both target greys was a unique and perfect predictor of target location.
The target was always made up of two grey levels (sRGB ¼ 191 and 64) while the background contained either one, two, three or, in the case of experiment 1, four levels (Fig. 1).The patterning of background grey levels was fully and independently randomized (the grey shade of each block was assigned using R's 'sample' function, with replacement, the probabilities of each grey depending upon treatment: e.g.1/2 for a treatment with two greys, 1/3 for a treatment with three greys, and so on).Randomization of blocks meant that differences in entropy or feature congestion between treatments were driven by the number of, or contrast between, grey levels, and not their spatial distribution.With so many (48 2 ¼ 2304) blocks in every background, although there was random local variation in the density of different grey blocks, the frequency of each grey was close to that expected from the probability.Conversely, every target had exactly eight light grey and eight dark grey blocks, but their positions were independently randomized (again using R's sample function, but without replacement).Randomization prevented any search images for the target, based on learning of pattern configuration (Troscianko et al., 2018), from occurring.Sampling without replacement (from eight light and eight dark greys) prevented variation in the average colour of targets (as targets comprised only 16 blocks, full randomization could have resulted in a predominantly light or predominantly dark square, which could affect detectability).
Two experiments were performed, varying the number of grey levels in the background (and hence complexity in terms of information content or entropy) and the difference in grey level between them (and hence feature congestion, as this metric rises with the local contrast between blocks).In experiment 1, the similarity in grey levels between background greys and target greys could be asymmetrical with respect to the two component target greys (e.g. a background grey could be closer perceptually to one of the two target greys than the other, for example background 5; Fig. 1a) or symmetrical (e.g. if there was a background grey similar to the lighter of the target greys, there was another background grey equally close to the darker of the two target greys, for example background 6; Fig. 1a).In experiment 2, lightness differences were all symmetrical.
The treatments for both experiments are shown in Fig. 1; note that the target greys never coincided with any of those in the background, removing the possibility that better concealment might have been due to part of the target matching part of the background.In addition, the symmetry of the grey levels in experiment 2 means that the average grey of the background is always the same, and the same as the target.This is relevant because it means that the target cannot be easily located by a difference in average local luminance, for example in the peripheral visual field where spatial resolution is lower (see discussion in Murali et al., 2021 in relation to their results).

Procedure
All the visual search tasks were run using PsychoPy2 (Peirce et al., 2019), using the stimuli, saved as PNG files, described above.All participants were naïve to the purpose of the experiments and had normal or corrected-to-normal vision.
Each experiment was run twice, on different sets of participants: once online, with each participant using their own computer/ monitor, and, once COVID restrictions had eased and in-lab testing was possible, in a controlled test environment.The replication was necessary because commercial displays vary in their relationship between pixel (RGB) value and screen brightness, meaning that different online participants potentially saw slightly different images (in terms of contrast range and target e background difference), and those differences were unknown.In addition, the test conditions would have been variable (viewing distance, mouse or touchpad, attention to the task, etc.).Thus, the online data were ratified by repeating the experiment with a gamma-corrected (calibrated and linearized) 21.5 00 iiyama ProLite B2280HS monitor (iiyama; Hoofddorp, Netherlands) with a refresh rate of 60 Hz, a resolution of 1200 Â 1080 pixels, a screen size of 27 by 48 cm and a mean luminance of 64 cd/m 2 ) at a viewing distance of ca.50 cm.The monitor was driven by a Macbook Pro (Apple Inc., Cupertino, CA, U.S.A.).The main room lights were turned off, and the participant was left alone, after briefing, to complete the experiment.The participants were given written instructions that they would have to find and click on a target made up of two grey levels on a background which varied in the number of grey levels.They were told that every screen had one, and only one, target in a random location, and they should search until they found it.However, they were also told that, if they really could not locate the target, they could click to the side of the screen to advance to the next image.The experiment was preceded by a random ordering of the same 10 practice images, varying in the number of background greys, where the target was always in the middle of the screen.Following this, treatments were shown to the participants in five blocks of nine screens for experiment 1, and five blocks of seven screens in experiment 2. For experimental trials, the target was in a random location.Each block contained one replicate of each treatment, with each participant's five blocks comprising a random subset, without replacement, of 24 possible blocks.Participants saw random subsets of a pool of replicate images, rather than unique target e background combinations, because images were constructed in advance rather than generated in real time during the experiment.Each block was followed with an instruction that a break could be taken before the next block; in practice, most participants only paused for a few seconds.The response time was measured as the latency between the stimulus appearing and the participant clicking on the screen.

Ethical Note
All participants were briefed in line with the Declaration of Helsinki.Ethical approval was granted by the joint research ethics committee of the University of Bristol Animal Welfare and Ethical Review Body.

Participants
The two online experiments involved 110 participants from the Life Sciences undergraduate student cohort from the University of Bristol.Participants were randomly assigned (using the Python random number generator within PsychoPy) to one of the two experiments.The first experiment had 69 participants (46 female, 21 male, two undisclosed; aged 19e27, median 20) and the second had 41 (27 female, 14 male, aged 19e26, median 20).The femalebiased sex ratio reflects the female bias in the student population for these degrees.For the laboratory replication, 24 participants (50% females, aged 20e36, median 22.5) completed both experiments in turn, with the order balanced across participants.

Analyses
All data were analysed in R 4.0.2(R Core Team, 2020).Using linear mixed models (function lmer in the package lme4; Bates et al., 2015) we tested the effect of treatment (background type) on log-transformed response time (RT); transformation fulfilled the assumption of Gaussian residuals.Only trials in which the target was correctly located were used in the analysis of response time.Although particular pairwise comparisons were of a priori interest (i.e.those that differentiate between the hypotheses for the effect of complexity on search difficulty), we present all pairwise comparisons, using the Tukey method to maintain the experiment-wise type I error rate at 0.05.These were obtained using the function glht in the multcomp package (Hothorn et al., 2008).
To provide a global comparison of the success of different hypotheses (entropy, feature congestion and bracketing), we fitted the data to the predictions of those models and then used Akaike's information criterion (AIC) to compare them (Burnham & Anderson, 2002).For example, entropy would predict that response time is an increasing function of (only) the number of grey levels.The degree of feature congestion was calculated using the MATLAB functions (The MathWorks, Natick, MA, U.S.A.) of Rosenholtz et al. (2007;  37593) but can be inferred intuitively by the amount of contrast between the component greyscale patches.Bracketing predicts interference with search efficiency only when background greys fall either side of the target greys, with greater effects when the bracketing is 'tighter' (closer to the target greys).(To help the reader follow what are a fairly complex set of treatments, a fuller account of the reasons why the predictions diverge for particular treatment comparisons is provided in the Appendix.)Once the data were fitted to a particular model, AIC was calculated using the R function AIC.AIC is based on likelihood, penalized by the number of fitted parameters, where likelihood is the probability of obtaining the observed data given a particular statistical model (Edwards, 1992).So, likelihood-based comparisons provide a means of assessing the goodness-of-fit of the competing models.For non-nested models (as here) we cannot use likelihood ratio tests for quantitative testing, but AIC does not assume nesting and so we used this measure for model comparison (Burnham & Anderson, 2002).We also included the null model (no treatment effect) and saturated model (all treatments differ) for completeness.
The effect of sex on detection was also analysed to assess, post hoc, whether there were any unexpected sex differences.In these secondary analyses, only data for participants who reported their  A1 and A2.binary sex were analysed, with the instances of undisclosed sex too rare to be included as an additional level.First, a model with the interaction between sex and treatment, and their main effects, was fitted; then, where the interaction was not significant, a model without the interaction was fitted to test the main effect of sex.We had also intended to analyse the proportion of 'hits' (correctly clicking on the target), as an additional measure of detection success, with generalized linear mixed models using the function glmer in the package lme4, with binomial error.However, none of the models converged because misses were relatively rare.Nevertheless, we report the percentage of misses to address the possibility of any speed -accuracy trade-offs.

Experiment 1
Treatment affected response time in both the online (c 2 8 ¼ 1587.0,P < 0.001) and laboratory-based (c 2 8 ¼ 1030.3,P < 0.001) versions of the experiment.Although some comparisons that were significant in the online version of the experiment were not significant in the laboratory version, the pattern of the effects was very consistent (Fig. 2; Appendix Table A1.In experiment 1, entropy predicts an increasing order of difficulty (i.e.increased response time) in line with the number of grey levels: (1, 4, 8) < (2, 5, 6, 9) < 3 < 7.For feature congestion the prediction is (1, 4, 8) < 2 < 5 < 3 < 9 < 6 < 7.For bracketing it is (1, 2, 3, 4, 5, 8, 9) < 6 < 7. Comparing the overall pattern of pairwise differences to the predictions of the three models of background complexity, bracketing appears to be the closest (Fig. 3a) and, based on either a ranking of log likelihoods or AIC values, is a better fit than entropy or feature congestion (Table 1).Bracketing does not explain all the treatment variation in the online version of the experiment (the saturated model has the lowest AIC), but it comes much closer than the other models (Table 1).
Misses were too rare to be analysed but, consistent with the response time data, were most common in treatment 7 (7.6% of trials in the online experiment; 0.8% in the laboratory version) followed by treatment 6 in the online experiment (2.6% of trials); there were no errors in any treatments apart from treatment 7 in the laboratory version of the experiment.There was therefore no evidence of any speed e accuracy trade-off.The effect of sex on response time was not significant, either in interaction with treatment (online: c 2 8 ¼ 8.4, P ¼ 0.392; laboratory: c 2 8 ¼ 4.5, P ¼ 0.808) or as a main effect (online: c 2 1 ¼ 0.5, P ¼ 0.484; laboratory: c 2 1 ¼ 0.4, P ¼ 0.548).
As with experiment 1, the results of online and laboratory versions of the experiment are very consistent.For experiment 2 the entropy prediction is 1 < (2, 3, 4) < (5, 6, 7); the feature congestion prediction is 1 < (4, 7) < 6 < 3 < 5 < 2; the bracketing prediction is (1, 4, 7) < 2 < 3 < 5 < 6. Comparing the overall pattern of pairwise differences to the predictions of the three models of background complexity, once again bracketing appears to be the closest (Fig. 3b).As with experiment 1, bracketing does not explain all the treatment variation (the saturated model has the lowest AIC) but, based on both a ranking of log likelihoods and AIC values, is a far better fit than entropy or feature congestion (Table 2).
As with experiment 1, there was as large a variation between treatments with the same entropy as between treatments with different entropies.For example, treatment 7, with three greys, produced significantly shorter response times than some of the treatments with two greys (Fig. 4).Indeed, the difficulty of target detection was similar to that of treatment 1, with only one grey level, on which the target was immediately detectable.
As with experiment 1, it is informative to consider how the data compare to the predictions for the three hypotheses (Table 2).Like experiment 1, the results of online and laboratory versions of the experiment were very consistent and, again like experiment 1, the results strongly favour bracketing over the other two hypotheses for the aspect of 'complexity' that impedes target detection.Misses were too rare in some (but not all) treatments to be analysed but, consistent with the response time data, were most common in treatment 6 (37.1% of trials in the online experiment; 20.8% in the laboratory version) followed by treatment 5 (22.9% of trials in the online experiment, 15.8% in the laboratory version); there were no errors in any other treatments in the laboratory version of the experiment, and they were 6% of trials or fewer in the online experiment.
The effect of sex on response time was not significant, either in interaction with treatment (online: c 2 6 ¼ 9.7, P ¼ 0.138; laboratory: c 2 6 ¼ 2.7, P ¼ 0.848) or as a main effect (online: c 2 1 ¼ 0.5,

DISCUSSION
While, broadly speaking, response times increased with the number of grey levels in the background, this alone could not explain the, sometimes very large, differences in response times within treatments with the same number of grey levels in the background.Therefore, the information content of the background, in purely mathematical terms, is not the sole determinant of the difficulty created by increased complexity.As demonstrated in other contexts, an information theoretic definition of complexity, as quantified by entropy, is not the best predictor of visual search performance (Rosenholtz et al., 2005(Rosenholtz et al., , 2007)).If visual 'complexity' was captured by the number of bits of information, we would expect to see that detection would become slower with an increasing number of grey levels, regardless of the contrast between those levels.It does not.Therefore increased 'complexity' of a background, in terms of mathematical information, is not the driver of increased difficulty in visual search for our stimuli.That entropy is a poorer predictor of visual search than feature congestion has been advanced before in studies of natural and manufactured backgrounds (Rosenholtz et al., 2007;Xiao & Cuthill, 2016); by designing backgrounds where the predictions of the different metrics are opposed, our study pinpoints that failing.We suggest that when biologists discuss the camouflaging effect of backgrounds that are variable and/or have lots of 'visual clutter', they should be specific when using the term complexity.Complexity has a precise mathematical definition, and it is not the number of bits of information in a scene that is the greatest impediment to detection of a target.The results of experiment 1 provide more support for the explanatory power of feature congestion, as defined by Rosenholtz et al. (2005Rosenholtz et al. ( , 2007)), than for entropy (Table 1).For example, in experiment 1, comparing two-level backgrounds only, the longest response times are seen in treatment 6, with the highest feature congestion (driven by the highest variance in grey level), followed by treatment 9, with the next highest, then treatments 5 and 2 having the lowest response time and variance (Fig. 2).However, looking more closely, the differences between treatment 9 and treatments 5 or 2 are not that marked (significant only in the online version of the experiment, where screen calibration was absent).The nature of the deviation from the predicted pattern supports our hypothesis that a target can hide 'within' a multimodal background distribution, as a result of bracketing, even when the target is readily discriminable from any colour in the background.Remember that all background and target greys were at least 32 grey levels apart on the 256-bit scale, and so were readily discriminated, as reflected in the very low response times for treatments 1, 4 and 8 where the background was monochrome.First, in treatment 6 which, of the two-grey treatments, produced the greatest increase in response times, both target greys fall within the range of the background greys (i.e.lie between the dark and light levels of grey in the background), whereas in treatments 2, 5 and 9, both target greys fall outside the limits bounded by the background greys.This difficulty created when a target falls within the range of background grey levels is increased when there is a closer match between target and background greys (but only when there is more than one grey level).Consider treatments 2, 5, 6 and 9, all of which have two grey levels.In 2 and 5 one of the two grey levels is more different in luminance from either of the target greys than are the greys in treatments 6 and 9; and the targets in treatments 2 and 5 are detected somewhat more rapidly (Fig. 2).The results of experiment 2 confirm this interpretation of the predicted causes of detection difficulty in experiment 1.The treatments with the longest detection times (2, 3, 5 and 6) are those where the target greys fall within the limits bounded by the background greys.In addition, all else being equal, having greys closer in luminance to those of the target increases detection times (Fig. 3: e.g. 3 > 2 and 6 > 5).This is, we contend, what makes detection against background 6 so difficult: the target greys fall within a range spanned by greys in the background, and those greys are closer to each other in perceptual space.Significantly, the addition of a third grey to the background, intermediate between the lightest and darkest levels, even though it decreases feature congestion by reducing the average contrast in luminance between adjacent squares, greatly increases search difficulty (compare treatments 6 and 3 in Fig. 3).The target greys are 'doubly bracketed', with background greys interspersed between, and close to, the target greys.This conclusion is consistent with previous work on prey detection and natural complexity in the field, which showed that greater feature congestion can impede wild avian predation, but only when the target colours fall within the distribution of those in the background (Rowe et al., 2021).
When comparing treatment 3 with treatment 9 in experiment 1, information content and feature congestion each point to one of the treatments concealing the target better, but the data suggest that adding the intermediate grey makes no difference (Fig. 2, Table 1).This is because in both treatments the target can be located by the simple rule 'find the lightest grey' or 'find the darkest grey'.The explanation also holds in experiment 2 where there is no detectable difference between treatments 4 and 7.Such a 'unidirectional' rule cannot be used where the target greys lie within the limits of the background distribution; instead, a two-component ('greater than x but less than y') rule must be used, or even a search for a unique grey level.Consistent with this, as discussed earlier,  concealment is greatly improved by the addition of an intermediate grey, but only when the lightest and darkest background greys lie outside the two target greys (Fig. 3: treatment 6 is a much more difficult task than 3, and 5 than 2, but 7 and 4 are almost identical).When a simple unidirectional 'lighter than all' or 'darker than all' rule cannot be used, addition of grey levels makes locating a unique target grey even harder.Greater background complexity offers protection but not, as suggested previously (Dimitrova & Merilaita, 2010, 2012), independently of the confusability of target and background features.We argue that there is always some level of interaction between background and target features.Camouflage works through reducing the signal-to-noise ratio (Merilaita et al., 2017).As the latter can be via increased noise as well as reduced signal, this is how 'complexity' aids concealment: by increasing the variance in background features potentially confusable with those of a target.It is interesting that a background with just three grey levels (e.g.treatment 6 of experiment 2 in Fig. 3) makes for a very difficult search task, even though any of the grey levels used in our experiments are very easily discriminated from those in the target (differences in grey levels of 1 are discriminable, and our greys were spaced 32 units apart).A key result from our experiments is that confusability is not a simple matter of background matching as commonly understood or quantified in the measures such as Ds, the discriminability any two colours in the target and background (not that this was ever claimed in Vorobyev & Osorio, 1998).How confusing a background is might be linked to the viewer's estimate of the distribution of features in the background and our results show that this can be a difficult task even with only two or three grey levels in the background.How viewers learn the nature of the background, and thus detect mismatches, is an important area for future study.
Appendix.Rationale for, and Examples of, the Stimuli used in Experiments 1 and 2 Targets are always 4 Â 4 blocks of grey and are always the same two shades of grey, but the spatial location of the darker and lighter blocks varies in every trial.Backgrounds are also composed of shades of grey, always differing from the greys in the targets.Backgrounds can have 1, 2, 3 or 4 shades of grey, representing an increase in entropy.However, within any one level of entropy, the contrast between the grey levels differs and thus the feature congestion and bracketing predictions differ.(See Fig. A1 for an explanation of treatment differences in experiment 1 and Fig. A2 for examples of each treatment; and see Fig. A3 for an explanation of treatment differences in experiment 2 and Fig. A4 for examples of each treatment.)In experiment 1, entropy predicts an increasing order of difficulty (i.e.increased response time) in line with the number of grey levels: (1, 4, 8) < (2, 5, 6, 9) < 3 < 7.For feature congestion the prediction is (1, 4, 8) < 2 < 5 < 3 < 9 < 6 < 7.For bracketing it is (1, 2, 3, 4, 5, 8, 9) < 6 < 7.For experiment 2 the entropy prediction is 1 < (2, 3, 4) < (5, 6, 7); the feature congestion prediction is 1 < (4, 7) < 6 < 3 < 5 < 2; the bracketing prediction is (1, 4, 7) < 2 < 3 < 5 < 6.(a) One background treatment has one shade of grey

Figure 1 .
Figure 1.The stimuli in experiments (a) 1 and (b) 2. Left to right: in the tables, there are nine possible grey levels (column 'Grey') from black (pixel value 0) to white (255), with the target always being a square of eight dark grey (pixel value 64) and eight light grey (191) square blocks, randomly arranged (column 'Target').An example of a target is shown in the middle panels of (a) and (b) above.The grey levels in each of the nine treatments of experiment 1 and seven in experiment 2 (a and b, respectively) are shown in the tables to the left, with the pixel values in each cell.Examples of backgrounds for treatment 9 of experiment 1, and treatment 6 of experiment 2, are shown in the right-hand panels of (a) and (b), respectively.The target location is indicated by a red circle (not present in the experiment).

Figure 2 .
Figure 2. Mean response time (s; ±95% confidence intervals) for the nine background treatments of the (a) uncalibrated online and (b) calibrated laboratory versions of experiment 1.Values are based on the model estimates, back transformed from the log scale.The bottom key shows which grey levels were present within the background for each treatment; the two grey levels of the target are provided for comparison (dotted lines).

Figure 3 .
Figure3.Predictions of the three models of background complexity in terms of whether a positive (green), negative (purple) or no difference (white) is predicted in pairwise comparisons between treatments.The observed pairwise Tukey comparisons from (a) experiment 1 and (b) experiment 2 are displayed as heatmaps, based on the average coefficients across online and laboratory versions of each experiment.The full set of comparisons can be found in Appendix TablesA1 and A2.

Figure 4 .
Figure 4. Mean response time (s; ±95% confidence intervals) for the seven background treatments of the (a) online uncalibrated and (b) calibrated laboratory versions of experiment 2. Values are based on the model estimates, back transformed from the log scale.The bottom key shows which levels were present within the background for each treatment; the two levels of the target are provided for comparison (dotted lines).

Figure A1 .Figure
Figure A1.(a)e(i) A step-by-step explanation of the treatment differences in experiment 1.

Figure A3 .Figure
FigureA3.(a)e(c) A step-by-step explanation of the treatment differences in experiment 2. Note that the variance in grey levels differs within each set of two-, and three-, grey treatments.

Table 1
Log likelihoods and AIC values for the data from experiment 1 logLik: log likelihood; DAIC: the difference between each model and the 'best' model (lowest AIC).This table displays models of entropy, feature congestion (FC) and bracketing (plus the null model and the saturated model where all treatments differ).Online and calibrated-screen versions of the experiment are analysed separately.

Table 2
Log likelihoods and AIC values for the data from experiment 2 DAIC: the difference between each model and the 'best' model (lowest AIC).This table displays models of entropy, feature congestion (FC) and bracketing (plus the null model and the saturated model where all treatments differ).Online and calibrated-screen versions of the experiment are analysed separately.

Table A1
Pairwise comparisons between all treatments in experiment 1, using Tukey's method Three background treatments have one shade of grey One closer to the darker grey of the target One closer to the lighter g.of the target