Does crowding predict conjunction search? An individual differences approach

Searching for objects in the visual environment is an integral part of human behavior. Most of the information used during such visual search comes from the periphery of our vision, and understanding the basic mechanisms of search therefore requires taking into account the inherent limitations of peripheral vision. Our previous work using an individual differences approach has shown that one of the major factors limiting peripheral vision (crowding) is predictive of single feature search, as reflected in response time and eye movement measures. Here we extended this work, by testing the relationship between crowding and visual search in a conjunction-search paradigm. Given that conjunction search involves more fine-grained discrimination and more serial behavior, we predicted it would be strongly affected by crowding. We tested sixty participants with regard to their sensitivity to both orientation and color-based crowding (as measured by critical spacing) and their efficiency in searching for a color/orientation conjunction (as indicated by manual response times and eye movements). While the correlations between the different crowding tasks were high, the correlations between the different crowding measures and search performance were relatively modest, and no higher than those previously observed for single-feature search. Instead, observers showed very strong color selectivity during search. The results suggest that conjunction search behavior relies more on top-down guidance (here by color) and is therefore relatively less determined by individual differences in sensory limitations as caused by crowding.


Introduction
I'm looking for my keys.Have you seen them?Most of us have been in this (frustrating) situation.We are in a rush, trying to leave the house, and suddenly can't locate a crucial object, that is somewhere amongst the other household clutter.Why is it that, in some occasions, we are quick to find the item we need, while in others we look all over our surroundings and simply cannot find it?What factors limit the success or efficiency of our search?Identifying these factors is fundamental to grasping the mechanisms underlying visual search, and requires understanding mechanisms of perception, attention, memory, and oculomotor control (Chan and Hayward, 2013;Eimer, 2015;Nakayama and Martini, 2011;Wolfe, 2015;Zelinsky, 2008).
At the input level, visual search is constrained by the inhomogeneous resolution observed across the visual field, as retinal photoreceptor density rapidly decreases with distance from the fixation point, resulting in reduced fidelity for peripheral vision (Curcio et al., 1990).Likewise, compared to peripheral vision, more of the brain's cortical surface is assigned to process foveal information, as is expressed by the cortical magnification factor (Cowey and Rolls, 1974;Duncan and Boynton, 2003).The loss of visual resolution with eccentricity results in worse detection of visual search targets in the periphery (unless stimuli are rescaled according to this factor, see Carrasco et al., 1995Carrasco et al., , 1998; also Carrasco and Frieder, 1997;Motter and Simoni, 2007).There are also attentional differences that favor the fovea -allocation of attention is biased towards the center of vision (Heusden et al., 2023), and the spatial resolution of peripheral attention is severely limited over and above visual acuity limitations (He et al., 1996;Intriligator and Cavanagh, 2001;Wolfe et al., 1989;however, see Rosenholtz et al., 2019 for a different perspective).Finally, peripheral vision is highly susceptible to visual crowding, a phenomenon in which observers are unable to discriminate stimuli when embedded in local clutter, due to the competitive interactions between the target and surrounding stimuli (the flankers, Bouma, 1970;Toet and Levi, 1992;Wolford and Chambers, 1983).Crowding disrupts our perception of target features, and the strength of this effect is modulated by several factors including stimulus properties (Pelli et al., 2004;Rosen et al., 2014;Scolari et al., 2007) and attention (Dakin et al., 2009;Kewan-Khalayly and Yashar, 2022;Yeshurun and Rashal, 2010;Herzog et al., 2015; for reviews see Pelli, 2008;Whitney and Levi, 2011).
Both foveal and peripheral visual processing contribute to visual search, yet by nature of the process of looking around for something, the vast majority of visual information comes from the periphery, with all its inherit limitations.This has led to the development of a set of visual search theories that not only recognize the sensory and attentional constraints of peripheral vision, but in fact state that those same constraints are the major determinants of search efficiency (Akbas and Eckstein, 2017;Engel, 1977;Geisler and Chou, 1995;Hulleman and Olivers, 2017;Motter and Simoni, 2008;Rosenholtz, 2016;Rosenholtz et al., 2012;Zelinsky, 2008).These theories can be collectively described as relying on some form of functional viewing field (FVF), which is the area of the visual field, centered on fixation, within which a certain target stimulus can still be discerned.Target discrimination is then determined by a number of factors that affect local contrast, including acuity, attention, and clutter.The premise of this framework is that all items encompassed within the FVF are processed in parallel, and this collective information at each eye fixation is what guides search.When the FVF is smaller than the to-be searched area, and thus targets cannot be sufficiently discriminated from peripheral vision, eye movements become necessary as a serial solution that brings objects back within the FVF.Importantly, visual search efficiency is thus determined by the efficacy of peripheral vision, rather than by a central covert attentional bottleneck as posited by classical accounts (Treisman and Gelade, 1980;Wolfe, 1994;Wolfe et al., 1989;Wolfe and Gray, 2007; for the most recent views on this, see Hulleman and Olivers, 2017;Lleras et al., 2022).
Our previous work adopted an individual differences approach to test the prediction from FVF theories that search performance is determined by the efficacy of peripheral vision (Verıśsimo et al., 2021).In particular, we focused on the contribution of crowding, which we assumed to be a major factor limiting peripheral vision (Rosenholtz, 2016).In a visual search task, participants were asked to find a left or right tilted target (and determine the tilt direction) among multiple vertical distractors, with varying set size.Such a search task is typically referred to as a single feature search, as the target differs from the distractors on a single feature dimension (here orientation).Importantly, observers were allowed to freely move their eyes in looking for the target, and we took manual response times and eye movements as the main dependent measures of search performance.The same stimuli were used in a crowding task, where participants were asked to keep central fixation and discriminate the orientation of a target that was placed at various eccentricities left or right from fixation, and that was surrounded by two tangentially positioned distractors (often referred to as flankers).We then assessed individual susceptibility to crowding by determining the critical spacing (CS) at different eccentricities, which is the minimum distance within which target-flanker proximity impacts discrimination accuracy (following Bouma, 1970).Our findings showed that higher susceptibility to crowding, as reflected in larger CS values, correlated with longer reaction times (RT), more eye movements, and longer fixation durations in a feature-search task.This provided a direct empirical link between classical measures of crowding and single feature search, consistent with earlier findings (Gheri et al., 2007;Vlaskamp and Hooge, 2006;Wertheim et al., 2006).
The abovementioned findings provided evidence for a core prediction of FVF theories, despite the fact that visual search was quite efficient to begin with -the shallow RT over set size slopes suggested that a considerable part of the single feature search may have been parallel in nature (cf.Treisman and Gelade, 1980) -, and would thus by definition be less limited by the FVF (as targets that are easy to discriminate in the periphery will require fewer eye movements to find it).We therefore reasoned that the FVF may impact search even more in searches where the target is relatively difficult to discriminate.In the present study, we sought to extend this work by testing the relationship between crowding and visual search in a conjunction-search paradigm.In conjunction search, stimuli vary within two or more feature dimensions, and the target is defined by a unique combination of feature values instead.
Conjunction search unfolds more slowly, with more eye movements, and is more error prone than feature search, an indication of serial processing (Treisman and Gelade, 1980).According to FVF theories, this occurs due to the heterogeneous nature of the display where either target-defining feature is shared with distractors, and thus target-distractor contrast is rather weak, and in principle no stronger than distractor-distractor contrast (Duncan and Humphreys, 1989) -something that will only be aggravated further into the periphery (Chang and Rosenholtz, 2016;Rosenholtz, 2016;Rosenholtz et al., 2012).As a more fine-grained discrimination is necessary, more eye movements are required, resulting in a serial component to behavior, as for example reflected in substantial set size effects on manual responses and number of eye fixations.Therefore, we expected crowding to have a stronger impact in conjunction search performance than previously observed for single feature search (Verıśsimo et al., 2021) -as suggested by earlier findings that coupled search efficiency with peripheral discriminability, across stimulus types (Rosenholtz et al., 2012;Sayim et al., 2011;Zhang et al., 2015).
As in Verıśsimo et al. (2021) we chose to adopt an individual differences approach.The study consisted of two parts, both illustrated in Figure 1.In the crowding task, participants reported the identity (color and orientation) of a target Gabor pattern located at a fixed eccentricity along the horizontal meridian, left or right from fixation.On most trials, the target was accompanied by four flankers.These flankers could differ from the target in terms of orientation, color or in both feature dimensions.A staircase procedure varying the target-flanker distance allowed us to estimate the CS for each individual and flanker type.The same participants also completed a visual search task, using the same type of stimuli, but now randomly arranged at different eccentricities and with larger set sizes.Previous work has suggested that item features (such as color, size, orientation) can be differentially prioritized, due to task-relevance and/or saliency, in order to guide conjunction search behavior (Egeth et al., 1984;Kaptein et al., 1995;Wolfe, 1994).Given this, we hypothesized that conjunction search may be differentially related to color and orientation CS values.
Last but not least, by assessing crowding susceptibility for different feature dimensions in the same set of individuals, we are able to analyze the relationship between them.Observers vary considerably in the efficacy of their peripheral vision (e.g.Frömer et al., 2015;He et al., 2019;Petrov and Meleshkevich, 2011;Verıśsimo et al., 2021), yet to our knowledge it has not been reported if this variation is a general property or feature-specific.The current study allows us to address whether someone suffering more from orientation crowding is also someone likely to be strongly impacted by color crowding.Moreover, several studies have characterized how crowding disrupts the recognition of individual features (Kennedy and Whitaker, 2010;Van den Berg et al., 2007;Wilkinson et al., 1997), and in some cases also the conjunction of features (Poder and Wagemans, 2007), however little is known on how crowding by conjunction-defined flankers relates to crowding by the individual constituent features (though see Yashar et al., 2019;and Greenwood and Parsons, 2020).Our design allowed us to determine whether crowding for conjunction-defined targets is determined by the strongest-crowding individual feature, or whether the whole is worse than the underlying components.Additional details regarding the methods employed and obtained findings will be presented in the following sections.

Participants
In total 68 participants completed the study, recruited from the student population of the Vrije Universiteit Amsterdam.The inclusion criterion was normal or corrected-to-normal visual acuity.Data is shown for 60 participants (40 female, mean age 23.4,age range 17-45).Eight participants were excluded from the analyses due to their performance not meeting either one or multiple of the following criteria: Search task accuracy (across set sizes) was at least 85 %, and at least 75 % for any given set size.For the crowding task, there were at least 75 % valid trials (i.e.where a response was given within the time limit), and accuracy on the unflankered trials was at least 30 %. Finally, to prevent floor or ceiling effects, the staircased estimates of CS should not converge on the minimum or maximum values of the tested range.Thus, we only included participants with a median CS value (across flanker types) above 0.2 or below 0.65 in the analyses.Participants gave written informed consent and were financially compensated for their time.The protocol was approved by the ethics committee of the Faculty of Behavioral and Movement Sciences.

Experimental setup
The experimental protocol was similar to the one used in Verıśsimo et al. (2021).Each session consisted of two tasks -a crowding task and a visual search task, each preceded by a short practice session.Task order was randomized across participants.The entire session lasted 60-90 min, and participants were allowed to take short breaks between blocks and tasks.Visual stimuli were created with Psychopy functions (Peirce et al., 2019) and customized code developed with Python 3.6.The experiment was displayed on a ASUS ROG Strix XG248Q monitor (screen dimension 526x296 mm, native resolution of 1920x1080, and refresh rate of 240 Hz) at a viewing distance of 73 cm.Throughout the experiment the background was set to grey (mean luminance of 99 cd/ m2).Participants' head position was kept stable with a chinrest.Eye movements were recorded throughout the experimental session using an EyeLink 1000 SR Research, Inc. remote eye-tracker system with a refresh rate of 1000 Hz.A standard calibration-validation procedure was performed at the start of each task and also between blocks in the visual search task, when deemed necessary.

Crowding task
Figure 1A graphically depicts the experimental design of the crowding task.Example trials can be found at the Open Science Framework.During the experiment, participants were asked to keep fixation on the white cross (0.5 dva diameter) located at the center of the screen.In each trial, a target Gabor patch was displayed on either the left or right hemifield, at an eccentricity of 12 dva from central fixation.
Participants were asked to indicate the target identity, by making a conjoint judgement of its color (blue/pink) or orientation (5 • from vertical, either clockwise -CW -or counterclockwise -CCW).In 1/6th of the trials the target appeared on its own, and in the remaining trials it was accompanied by four radially positioned flanker patches, offset from the cardinal directions by 45 • .The Gabor patches had a diameter of 2.2 dva and a spatial frequency of 6 cycles/dva.
In each trial, the stimuli were presented for 75 ms and participants had an additional 4000 ms to respond by pressing the keyboard.Between trials, only the fixation mark was displayed (for 500 ms).The crowding task consisted in a total of 576 trials, divided into four blocks, where visual hemifield (left/right) and trial type (flankered vs. unflankered) were balanced and randomly mixed.The experiment was preceded by an additional 112 practice trials.
Flankers (when present) could be of three types: orientation, color, and conjunction.In orientation flanker trials, all stimuli would share the same color but differ in orientation (if the target was CW, half the flankers would be CCW).For color flanker trials, target and flankers shared a same orientation, but half the flankers differed in color.Finally, for conjunction flanker trials, all flankers differed from each other in their conjunction of features (with only one flanker sharing the same features as the target).Regardless of trial type, the task remained the same -to indicate the identity of the target Gabor patch -but by manipulating the flanker feature(s), we increased the uncertainty within the specific feature dimension(s) involved.This also means that the difficulty level between features varied across conditions (e.g., stimuli with uniform color vs. heterogeneous orientations in orientation flanker trials).To avoid systematic reporting differences between features, crowding conditions were presented in a randomized order and participants were asked to disregard the surrounding flankers.Additionally, the spatial distribution of the flanker features varied on each trial, to prevent grouping by collinearity.It is worth mentioning that, conceivably, participants could try to identify the duplicate stimuli in order to perform the task (e.g., in conjunction trials).The spatial variance in feature location and the brief display time, make the conditional behavior an unlikely strategy.
Flanker-present trials were staircased according to participants' performance, in a 1-up-2-down scheme.When participants successfully indicated the identity of the target in 2 consecutive trials, the radial distance between target and flankers decreased by 0.05 dva (fixed step size).Target-flanker distance could range between 0.70 and 0.15 × target eccentricity.Separate staircases were used per flanker type.

Visual Search task
The visual search procedure is illustrated in Figure 1B.Search displays were structured in an oval grid, arranged in 8 concentric rings.Gabor patches could be presented in any of the 94 possible item locations, which ranged from 2 to 16 dva from the center fixation cross.Minimum center-to-center distance between adjacent items was of 2 dva.At the beginning of each block, participants were shown the target to search for, which could be one of four possible Gabor patchesidentical to the stimuli used in the crowding task -in a specific combination of color (pink/blue) and orientation (CW/CCW).The search display appeared once participants directed their gaze towards the center of the screen.They were instructed to scan the colored-tilted Gabor patches in order to find the target as quickly as possible.Their task was to indicate which side of the target contained a small white dot, by pressing the right or left arrow keys.The task dot size was 5 % of the size of the Gabor, and was displayed with an alpha level of 30 %, to ensure it would not disrupt the appearance of the stimuli.In contrast to the crowding task, participants could now freely move their eyes.After pressing the response key, the search display disappeared and participants had to return their gaze to fixation in order to start a new trial.
The search task was divided into 4(8) equal-length blocks of 180(90) trials each, resulting in a total of 720 trials (note that blocks were shortened and increased in number during the data collection, to allow for more breaks).The two main factors of the experimental design were set size − 7, 16 and 31 items -, and target eccentricity − 4, 8 or 12 dva from central fixation.Both levels were balanced, with 80 trials per set size × eccentricity combination, and randomly mixed within blocks.

Data processing and analyses
Behavioral and eye movement data were analyzed with customwritten code, following methodology similar to our previous work (Verıśsimo et al., 2021).
Crowding data.Average response accuracy and reaction times, with and without flankers, were compared using Wilcoxon signed-rank tests (due to the non-gaussian distribution of the data).CS values -the main measure of interest -were calculated per participant and flanker type, as the median target-flanker distance of the last 96 trials (the last 1/6th of the total number of trials).We also compared the distribution of CS values across flanker types by using Wilcoxon signed-rank tests, and tested for collinearity between them by calculating the Spearman correlation between pairs.
Visual search, manual response data.As a first step in the RT analysis, we excluded incorrect trials, trials where the response was given less than 250 ms after stimulus onset, and trials with an RT higher than 3 standard deviations beyond the mean RT for a given target eccentricity and set size.Participant mean RTs were then calculated for each combination of set size and eccentricity, and used in a two-way repeated measures analysis of variance (ANOVA) with set size and eccentricity as factors (α = 0.05).A similar approach was used for the average response accuracy.Subsequently, we fitted a linear regression to the individual RT data and calculated the slopes of the search functions across set size, for each target eccentricity.To evaluate the stability of the linear search slopes, we performed a split-half reliability analysis.For this, the RT data of each participant was randomly split into twofolds (balancing for target eccentricity and set size), and each fitted with a linear model.The slope values of both halves were then correlated across participants, to obtain a reliability score.
Visual search, eye tracking data.Fixation and saccade detection was done based on the Eyelink standard criteria.We excluded target fixations and early fixations (<150 ms after display onsets), as these are not representative of the search process.Per participant, target eccentricity and set size, we calculated the average number of fixations and analyzed these in the same ANOVA as above.We also calculated linear slopes across set size.To quantify the saccadic selectivity for color and orientation, we used a kde-tree algorithm to find the closest neighboring stimuli to each fixation center-point (as defined by the Euclidean distance in 2D space here, Maneewongvatana and Mount, 1999) and calculated the ratio of feature-selective fixations across trials.
Correlation analyses.To evaluate if crowding performance was predictive of conjunction search performance, we correlated the CS values with the above-mentioned visual search measures: the mean search RTs for each set size and eccentricity separately, the linear search RT slopes, the mean number of fixations per set size and eccentricity, and the fixations x set size slopes.The nonparametric Spearman correlation was chosen over the Pearson correlation, to avoid prior assumptions on the nature of the relation (whether linear or not) and normality of the distributions of the underlying measures.Permutation tests were performed to assign the statistical significance of the obtained correlation coefficients.

Crowding
As expected, participants were faster and more accurate in flankerabsent trials than in flanker-present trials -irrespective of the flanker type (see supplementary material, Table S1).Note that, due to the staircase, the absolute values here have limited meaning (1-up-2-down staircase targets the 70.7 % level on the psychometric function).Nonetheless, the result indicates that the flankers caused crowding, with performance on the flanker-absent trials well above chance, suggesting overall little limitation in terms of visual acuity.To further assess if there were relative differences in performance across crowding conditions, we calculated the percentage decrease in color/orientation accuracy of flanker-present trials relative to flanker absent trials: ΔAccuracy = ( Accuracy Fpresent /Accuracy Fabsent − 1 ) × 100 .The resulting values are portrayed in Figure 2A.In color flanker trials (where target and flankers shared a same orientation, yet differed in color) participants correctly identified the orientation of the target more often than its color (ΔAccuracy orientation = -8.4% and ΔAccuracy color = -9.1 %).The opposite occured in orientation flanker trials, where the heterogeneity of target and flanker orientation led to a higher accuracy in identifying the target color (and the difference between feature dimensions was also more pronounced, ΔAccuracy orientation = -11.9% and ΔAccuracy color = -6.9%).
Finally, the accuracy difference in conjunction flanker trials -where target and flanker differ both in color and orientation -shows a similar pattern to what is observed in orientation flanker trials, for both feature dimensions (ΔAccuracy orientation = -12.9% and ΔAccuracy color = -6.4%).
The results show that differences in flanker features led to specific feature-related errors in performance, and suggest that the misreporting errors seen for color and orientation flanker-trials do not linearly combine for conjunction flanker-trials.We will return to this in the Discussion.
To characterize crowding susceptibility across our sample, we calculated each participants' critical spacing (CS).Figure 2B shows the distribution of CS values for each flanker type, as well as the group mean.In the figure, we can observe that CS was lower for the color flanker trials when compared to orientation or conjunction trials, which was confirmed with post-hoc Bonferroni corrected Wilcoxon tests of the pairwise comparisons (orientation vs. color: p = 3.09 x 10-5 < 0.001; conjunction vs. color: p = 9.41 x 10-5 < 0.001; with no significant difference between orientation and conjunction, p = 0.04).This confirms once more that color was somewhat easier to discriminate than orientation.
Another aspect of interest is to what extent crowding by different flanker features is related.To this end, we calculated the Spearman correlation of CS values, between the different flanker types, across individuals (Figure 3).All three pairwise comparisons showed strong positive correlations (ρ = 0.72 to 0.87; p < 0.001), with the strongest relationship observed between orientation and conjunction CS values.
Additionally, a partial Spearman correlation was run to determine the relationship between orientation and conjunction CS values, whilst controlling for the effect of color crowding sensitivity.Once again, there was a strong positive correlation between the measurements: ρ = 0.71, p < 0.001.Even though susceptibility to crowding is common across feature dimensions (clutter impacts some participants' overall target discrimination more than others), these results indicate that conjunction and orientation are more similar to each other than color CS, at an intrasubject level.

Visual search
Figure 4A shows the mean search RTs as a function of set size, for each target eccentricity.A two-way repeated-measures ANOVA revealed significant effects of set size (F(2,118) = 586.24,p < 0.001), eccentricity (F(2,118) = 360.80,p < 0.001), as well as their interaction (F(4,236) = 89.73,p < 0.001).Analogous results were obtained when performing the same analysis on the mean number of fixations, see Figure 4B.Here the ANOVA also revealed a significant main effect of set size (F(2,118) = 743.92,p < 0.001), eccentricity (F(2,118) = 419.50,p < 0.001), and an interaction between them (F(4,236) = 103.97,p < 0.001).Search times and number of fixations increased with increasing target eccentricity and set size, and these effects amplified each other.Average accuracy values are displayed in the supplementary material (Table S2).
Finally, we calculated search RT slopes across set size (as computed using simple linear regression), in order to obtain a measure of search efficiency.Different authors have argued for a logarithmic relationship between set size and RT (Buetti et al., 2019(Buetti et al., , 2016)).We applied both models to the conjunction RT values and evaluated model performance by calculating the Bayesian information criterion (BIC).For 48/60 participants, the linear model provided the best fit to the data (lowest BIC score).Split-half reliability analysis of the linear slopes indicated that the measurement was highly reliable (ρ = 0.87 ± 0.025, averaged across 1000 iterations).Therefore, we opted to use the linear slopes for the subsequent analysis.

Crowding -Search correlation analyses
The main research question was whether individual differences in crowding susceptibility predicts conjunction search performance.To answer this question we assessed the Spearman correlation coefficient between CS values and different search metrics.Figure 5A  shows the correlation between CS values and search efficiency.Slope values showed low to moderate correlations with the different CS types (ρ ranging from 0.28 to 0.30).To assess the probability that this positive correlation could appear due to chance alone, we performed permutations tests by shuffling the data and correlating the two variables 10,000 times.The distribution of the permutation correlation coefficients is displayed in the bottom row of Figure 5A, with the dashed line indicating where the observed coefficient lies within this distribution.For all three comparisons, ρ is significantly different from the mean of the sampling distribution (p < 0.05; we note that standard computing of Spearman p-values generated very comparable if not identical values).Correlations between each CS type and absolute search RTs (rather than slopes) showed less to no significant correlations (see supplementary material, Figure S1-S3).These outcomes cannot be explained by simple differences in participant motivation to perform the tasks.Instead the correlations indicate a specific link between peripheral discriminability and search efficiency.
Analogous results were found when correlating each CS with the number of fixations per item (Figure 5B).From the results, we observe that the overall pattern follows a similar trend as for the RT data, with the strongest positive correlations for orientation and conjunction CS (ρ = 0.29 and 0.30, permutation p < 0.05).For color flanker trials, the correlation value did not reach significance (ρ = 0.22, permutation p = 0.089).Correlations between each CS type and absolute number of fixations during search can be found in the supplementary material (Figure S4-S6).
FVF theories predict a strong association between number of eye movements and RTs.Given our previous results (Verıśsimo et al., 2021), we expected that participants who were most limited in their peripheral vision (high CS values) would also be the ones that most strongly relied on eye movements to bring potential targets within a discriminable range.To test this, we first calculated the Spearman correlation coefficient between RT and number of fixations across search trials, for each individual separately.This gives us a measure for how strongly search   times were driven by eye movements, per participant.Subsequently, we correlated these Spearman coefficients with the CS values from the crowding task.As portrayed in Figure 6, search RTs are quite strongly driven by eye movements, with an average RT-fixation ρ = 0.93 ± 0.03 (range 0.78-0.96across participants).Moreover, this eye-RT relationship itself was indeed (modestly) predicted by the CS values (significant ρ = 0.3 and 0.29, p < 0.05, for color and conjunction CS respectively; for orientation ρ = 0.16, p = 0.223).
Finally, we examined the influence of display design on the observed correlation values.Search trials were randomly intermixed, varying in target eccentricity, set size and spatial distribution of the stimuli.Randomization was chosen to avoid the influence of cognitive strategies (for example, prioritizing locations that -based on knowledge or context -are more likely to contain the target), as they do not reflect the sensorybased search processes we were interested in.This also means that we did not actively control the level of crowding of any particular target.To ensure that targets were actually crowded, we calculated the proportion of trials in which the target was accompanied by at least one distractor within the critical spacing for any one fixation during the trial.Figure S7 in the supplementary material shows that, across participants, this ranges on average from 19 % to 97 %, depending on set size and eccentricity (as would be expected).The figure also shows this metric, but for the very first fixation only.Further control analyses indicated that there was no feature bias of target-flanking stimuli across trials (supplementary material, Table S3).We deem that, without setting predefined clutter levels per display, the target was often crowded during search.Note further that even if distractors fall outside the measured CS, this does not necessarily mean that there is no crowdingour CS metric is defined relative to a ~ 70 % performance threshold, and crowding interference could occur beyond that.
To assess if distractor density around the target determined performance, we calculated the average proximity of distractors around the target for each search trial.We then employed a linear mixed-effects model (LMM) to examine the relationship between search times and this density estimate, while taking into account subject-specific variability.Interestingly, analyses of participants display density indicated that increased proximity of distractors around the target is linked to decreases in overall search duration (more details in the supplementary material).This unexpected outcome might be the result of a strategic scanning behavior, where participants focus their search in highly dense parts of the display.

Exploratory analyses -Saccadic selectivity
We conducted a set of exploratory analyses to further characterize the conjunction search behavior, with a focus on saccadic selectivity.Note that items in the visual search task varied along two feature dimensions (color and orientation), and either or both could be used to guide the search process.To quantify the saccadic selectivity for each feature dimension, we identified the closest neighboring stimuli to each fixation center-point (this is, with the smallest Euclidean distance).Target fixations were excluded from this analysis.If the nearest neighboring item shared the same color as the target, we labeled the distractor fixation as selective to color.Likewise, if the nearest neighboring item shared the same orientation as the target, we labeled the fixation as selective to orientation.Then, we calculated the ratio of fixations for the selected feature by dividing the sum of those fixations by the sum of all distractors fixations.Figure 7 shows the average ratio of color selective  fixations, as a function of set size and eccentricity.Results show that the vast majority of distractor fixations were selective to color (on average 85 %), indicating that this was the main feature guiding the search process.
Next, we wanted to assess if there was a relationship between saccadic selectivity and search efficiency.In Figure 8 (top row) we can observe the Spearman correlation coefficient between slope values and the ratio of color-selective fixations, separately for the different target eccentricities.When the target was closest to fixation (4 dva), there was no relation between both metrics (ρ near 0), indicating no behavioral gain (or loss) when adopting the strategic behaviour.However, at further eccentricities (8 and 12 dva) we did observe low to moderate correlation values (ρ = 0.34 and 0.22, respectively).Permutation tests were performed to assess where the observed coefficient lies within the statistical distribution (Figure 8, bottom row).This outcome suggests that, when the target was located in the periphery, the participants that were less efficient searchers were also the ones that resorted more to color guidance.
In light of these findings, we then explored whether varying degrees of crowding lead to different eye movement strategies.We assumed that participants with higher CS values were more likely to resort to such feature selectivity, as a strategy to circumvent their peripheral sensory limitations.To assess this, we correlated each CS type with the ratio of distractor fixations that were selective to color, combined across eccentricity and set size.However, this did not reveal any significant relationship between saccadic selectivity in the search task and CS values in the crowding task (orientation: ρ = 0.07, p = 0.603; color: ρ = 0.16, p = 0.233; and conjunction: ρ = 0.07, p = 0.593).

Discussion
FVF theories of visual search have provided a substantial step towards integrating the characteristics of extrafoveal processing into working theories of attention and visual search (see Hulleman and Olivers, 2017 for an overview).Our previous work (Verıśsimo et al., 2021), leveraged individual differences to investigate if a major sensory limitation on peripheral vision, crowding, is indeed predictive of visual search performance.We found that participants with larger susceptibility to crowding took longer to complete an orientation search.They also made more eye movements, with longer fixation durations.In the current work, we tested the same hypothesis of a relationship between crowding and search but now taking into account the interaction of different feature dimensions in these processes.For this purpose, we assessed conjunction search instead of feature search and measured crowding susceptibility for different features in conjunction and isolation.
Crowding impairs the discriminability of an object's features when surrounded by clutter, and it can be quantified by determining the CSthe center-to-center distance between the target and the flanking objects at which recognition attains a criterion level of performance (Bouma, 1970;Toet and Levi, 1992).Bouma (1970) reported that this distance increases linearly with target eccentricity, with a factor of 0.5.Later studies have shown that this factor may vary substantially between stimulus types and tasks (Pelli et al., 2004;Rosen et al., 2014;Scolari et al., 2007;Whitney and Levi, 2011), depending on target-flanker similarity (Greenwood and Parsons, 2020;Kooi et al., 1994;Wilkinson et al., 1997), number of distractors (Poder, 2008;Strasburger et al., 1991), attention (Dakin et al., 2009;Kewan-Khalayly and Yashar, 2022;Yeshurun and Rashal, 2010), and individuals (Frömer et al., 2015;He et al., 2019;Petrov and Meleshkevich, 2011;Verıśsimo et al., 2021).The current study confirms that any such presumed constant really reflects only an average -the critical spacing varied (although was still quite similar) for orientation and color-based crowding and ranged from 0.2 to 0.7 across participants.Note that due to the fixed threshold criteria, it is possible that the absolute CS values were also influenced by intersubject discrepancies in acuity or, in case of the different hues, color contrast sensitivity.Nonetheless, performance would then still reflect individual differences in the efficacy of peripheral vision, albeit then a combination of crowding with other factors that impair target discriminability.In addition, we found that color flanker trials yielded lower CS values than orientation or conjunction flanker trials, suggesting that participants are overall less susceptible to color crowding.It is important to highlight that overall susceptibility to crowding is common across feature dimensions, as shown by the strong correlations between the different CS types.These observations are not mutually exclusive: The fidelity of feature representations may vary overall from individual to individual, while for each individual, information for different features is encoded with varying levels of precision.
Several reports have shown a larger crowding effect in the estimation of orientation than that of color, both when reporting features separately (Van den Berg et al., 2007) or in conjunction (Greenwood and Parsons, 2020;Kewan-Khalayly and Yashar, 2022;Yashar et al., 2019), and these studies have suggested that crowding disrupts certain combinations of visual features in a feature-specific manner.For example, Yashar et al. (2019) showed that in a crowded display, color and orientation misreporting errors remain unbound, even when both dimensions are jointly reported.Greenwood and Parsons (2020) asked observers to identify the color and motion direction of a "cowhide" stimulus surrounded by flankers, while modulating the strength of each feature dimension independently, and showed that when crowding was weak for color and strong for motion direction, errors were reduced for color but remained for motion, and vice versa with weak motion and strong color crowding.This suggests that the ability to recognize one aspect of a cluttered scene, such as color, offers no guarantees for the correct recognition of other aspects, like motion direction.We observed a similar pattern: When analyzing response accuracy for each of the target features separately, we could observe the influence of flanker features on performance for each condition, as well as a dissociation between features in the conjunction flanker trials.These results contradict traditional models that view crowding as a singular object-selective mechanism, where participants mistake one of the flankers for the target in their entirety (Ester et al., 2015;Strasburger et al., 1991) or average color and orientation signals from all items equally (Parkes et al., 2001).Instead, our findings may be best accounted for by the summary statistics approach of pooling models (Freeman et al., 2012;Keshvari and Rosenholtz, 2016).According to this framework, the visual system tiles the visual field with overlapping patches (pooling regions), that encompass multiple items at a time.Each pooling region retrieves local image statistics that can contain information about both target and flanker features and serve to guide eye movements towards the target location.Increasing the density of the display (crowding) will make it more difficult to identify which patch might contain the target features, independent of which object they belong to.Therefore, if an observer is more sensitive to orientation than to color crowding, in the conjunction flanker-trials that feature will also have a stronger effect -which is then reflected by the similarity of CS values and feature-specific accuracy between conjunction and orientation flanker conditions (as was the case here).Although the current observations are drawn from a discrete set of stimuli (and consequently feature values), pooling models have successfully explained several crowding effects (Keshvari and Rosenholtz, 2016), proving the robustness of the approach.
Drawing upon a pooling model framework, we hypothesized that conjunction search displays would inherently increase the variance of the local image statistics (when compared to single-feature search), and consequently lead to a more pronounced effect of crowding in search performance.From the search task, we found reliable effects of both eccentricity and set size on search RTs and number of eye movements (Carrasco et al., 1995;Scialfa and Joffe, 1998), and used these measures to correlate search efficiency with participants' CS values, across flanker types.The observed results showed positive correlations between these measures -participants with higher sensitivity to crowding were less efficient searchers -, in line with other empirical studies that have linked peripheral discriminability with search efficacy (albeit across stimulus types rather than individuals, Gheri et al., 2007;Rosenholtz et al., 2012;Sayim et al., 2011;Vlaskamp and Hooge, 2006;Wertheim et al., 2006;Zhang et al., 2015).These studies have generally shown that targets that are hard to discriminate at far eccentricities are also hard to find in search.For example, Rosenholtz et al. (2012) showed that performance on a set of classical feature and conjunction search paradigms (e.g.: tilted among vertical and orientation-contrast conjunction) and the peripheral discriminability of target-present vs. target-absent image patches, can both be predicted by the discriminability of statistically transformed stimuli (referred to as "mongrels") in a free-viewing task.Zhang et al. (2015) used a similar approach for items with more complex feature configurations (specifically shaded 3D cubes vs. matching 2D patterns).They found that differences in search efficiency (previously attributed to preattentive processing of 3D shape and lighting, Enns and Rensink, 1990) strongly correlate with peripheral discriminability of crowded target present vs. absent patches, and once again showed that the information available in peripheral vision provides a critical limit on visual search.Our results reinforce this converging evidence, by providing a direct empirical link between classical measures of crowding and conjunction search across individuals.Another relevant finding in this study, is the strong correlation between number of eye movements and manual RTs (Zelinsky andSheinberg, 1997, 1995) -a further sign that when search is difficult, eye movements are a required serial solution that brings potential target candidates within a discernible distance (Hulleman and Olivers, 2017).These RT-fixation correlation values themselves correlated with the individual CS values, indicating that the participants who were most limited in their peripheral vision, also relied more strongly on eye movements.Yet, contrary to expectations, the relationship between CS values and conjunction search outcomes was ultimately only of relatively modest strength -no stronger than those obtained for single-feature search (Verıśsimo et al., 2021).
We believe that a crucial difference may reside in the extent to which search is predominantly stimulus-driven, versus driven in a top-down manner on the basis of spatial and/or feature-based scanning strategies.Note that during conjunction search, eye movements were strongly biased towards items that shared the target color.This strategic scanning behavior is in line with previous studies showing that eye movements can be guided by information regarding the target (Eckstein et al., 2007;Williams, 1966;Wolfe, 1994;Wolfe et al., 1989;Wolfe and Gray, 2007;Zelinsky, 2008), and that color information plays a dominant role in selecting the next candidate item for inspection over and above other features such as orientation or size (Alexander et al., 2019;Egeth et al., 1984;Hannus et al., 2006;Kaptein et al., 1995;Motter and Belky, 1998;Rutishauser and Koch, 2007).For example, Kaptein et al. (1995) independently varied set sizes for distractors that shared the target color and for distractors that shared the target orientation.They found that search RTs were almost solely driven by the former and not the latter, suggesting that search was confined to the color-matching set.Color selectivity also played a major role in our results.Participants confined search to distractors that shared the same color as the target, and within this functional set size (cf.Neider and Zelinsky, 2008) then probably discriminated the orientation largely foveally, thus by its very nature eliminating any eccentricity effects on orientation discrimination.Moreover, in our exploratory analysis we found no correlation between the ratio of color-selective fixations and crowding sensitivity.This suggests that color selectivity itself is not determined by peripheral constraints, so observers scan from one nearby item to the next.
It could be argued that, in our experiment, color was the easier feature to discriminate (as we did not a priori match orientation contrast and color contrast for difficulty) and that is what propelled the strong color selectivity during search.Interestingly, color selectivity did not lead to behavioral gains, as the most color-selective participants were also the least efficient searchers.Previous studies have demonstrated that observers do not consistently make eye movements in a manner that maximizes information gain (Nowakowska et al., 2017), and that suboptimal strategies may be linked to individuals that are "effort minimizers" (this is, that minimize cognitive load by, for example, minimizing target switching, Irons and Leber, 2018).Interestingly, recent work has shown that participants' search strategies are stable over time, but context-specific (Clarke et al., 2022).Our results are in line with these findings -individuals that were reliant on color guidance performed worse in the conjunction search task, while presumably the participants that were more flexible in their search strategy (alternating between color and orientation selection) obtained better outcomes.Thus, even if color was easier to discriminate, that cannot in and of itself account for our results.Moreover, in studies where color and orientation differences were matched for discrimination performance, observers still showed a strong bias for color (Hannus et al., 2006).Another example comes from Alexander et al. (2019), who used displays consisting of real-world objects and asked observers to look for a target that looked like a previewed example, but that could differ in shape or orientation.Their results showed that search was less efficient when targets were dissimilar from the example, but only when objects were shown in greyscale (search did not suffer when the objects were shown in color).This suggests that shape and orientation only guide visual search when color is not sufficiently informative.What differentiates color from other feature dimensions?It could be due to its multidimensional properties, which allows for a more robust perception of color in the real word (for a review of color constancy see Witzel and Gegenfurtner, 2018).Color has an exceptional role in mediating the relationship between an organism and its environment (e.g., to determine if a fruit is ripe or when an animal might attack, Cuthill et al., 2017), which might have subsequently led to the development of specialized mechanisms that facilitate color perception in the periphery (Johnson, 1986;Nuthmann and Malcolm, 2016).Further work might tell us more about the special status of color.
It is important to emphasize that our findings do not mean that the role of crowding in conjunction search is minimal.A number of other studies have shown strong and robust correlations between peripheral discriminability and search performance, including when the target is defined by a conjunction of features (Rosenholtz et al., 2012;Zhang et al., 2015).However, these studies demonstrated this relationship across stimulus types, while our experiment focused on differences across individuals.We believe that the substantial differences in stimuli that have been used in past research will always be a much stronger predictor of performance than any subtle differences between the perceptual systems of healthy young individuals.At the same time, those observers can still differ in the choice of and ability to implement the chosen behavioural strategies -feature prioritization, spatial scan paths (for example starting at the top left of the display) -which may then overshadow such subtle relationships to crowding.Paradoxically, the impact of such top-down feature-based and spatial strategies would mean that search for a conjunction of features (although harder to discriminate as such than a single feature) is relatively less impacted by crowding than feature search, where selection is likely to be more directly driven by the strength of the feature contrast.
Returning to our initial scenario -I'm looking for my keys.Have you seen them?-, we realize that some people will find themselves in this situation more often than others.The ability to predict who will be a good searcher depends on the information that is available from sensory input, as well as individual strategy.We conclude that, at an individual differences level, top-down feature-based modulations help to further shape the functional viewing field (Wolfe, 2021) in addition to sensory effects such as crowding.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 .
Fig. 1.Schematic illustration of the experimental procedure (stimuli not drawn to scale).(A) Crowding task.A target Gabor patch and four flankers were shown for 75 ms, after which participants had up to 4000 ms to indicate the conjunction of features that identified the target (color and orientation).Flanker-present trials could be of three different types: color, orientation or conjunction crowding.Target-flanker distance was staircased per crowding type.(B) Visual Search task.Displays consisted of a target (Gabor patch composed by a conjunction of features, cued at the start of each block) surrounded by distractors, making a total of 7, 16, or 31 items on screen.The target could be located at one of three possible eccentricities (4, 8, and 12 dva).Participants had to indicate which side of the target presented a small white dot.

Fig. 2 .
Fig. 2. Crowding results.(A) Percentage decrease in feature accuracy, for target orientation and color, across crowding conditions.Values represent the mean performance difference for flanker-present trials relative to flanker absent trials.Error bars in grey indicate the standard error of the mean across participants.(B) Critical spacing values for the different flanker types.Individual participant values are connected by grey lines, overlaid by the mean value across participants (error bar shows 95 % confidence interval).Asterisks indicate the significance levels for the pairwise comparisons (p < 0.001).

Fig. 3 .
Fig. 3. Pairwise correlation of CS values from the different flanker types.Left panel -Color vs. orientation.Middle panel -Color vs. conjunction.Right panel -Conjunction vs. Orientation.Regression line in grey (shading indicating 95% confidence interval), with Spearman correlation and corresponding p-value annotated in text.Dotted line illustrates the equality line.Histograms show percentage of CS values in each bin (bar heights sum to 100).

Fig. 4 .
Fig. 4. Visual Search results.Boxplots and kernel density estimate plots show the distribution of the observations in the dataset (individual points show participant values).(A) Mean search RTs, as a function of set size and eccentricity.(B) Mean number of fixations, as a function of set size and eccentricity.

Fig. 5 .
Fig. 5. (A) Top row -Scatter plots showing the relationship between crowding CS and search average RT/set size slopes, for each crowding type.Regression line in grey (shading indicating 95% confidence interval), with Spearman correlation coefficient annotated in text.Bottom row -Histogram showing the distribution of the permutated correlation coefficients.Black dashed line indicates where the observed coefficient lies within this distribution.Permutation p-value annotated in text.(B) Top row -Scatter plots showing the relationship between crowding CS and search average number of fixation/set size slopes, for each crowding type.Regression line in grey (shading indicating 95% confidence interval), with Spearman correlation coefficient annotated in text.Bottom row -Histogram showing the distribution of the permutated correlation coefficients.Black dashed line indicates where the observed coefficient lies within this distribution.Permutation p-value annotated in text.

Fig. 6 .
Fig. 6.Top row -Scatter plots showing the relationship between each crowding CS type and the search RT-number of fixations correlation coefficients, across participants.Regression line in gray (shading indicating 95% confidence interval), with Spearman correlation annotated in text.Bottom row -Histogram showing the distribution of the permutated correlation coefficients.Black dashed line indicates where the observed coefficient lies within this distribution.Permutation p-value annotated in text.

Fig. 7 .
Fig. 7. Ratio of color selective fixations in the visual search task, per eccentricity and set size.Boxplots and kernel density estimate plots show the distribution of the observations in the dataset (individual points show participant values).Grey dotted line indicates the 50% level.

Fig. 8 .
Fig. 8. Top row -Scatter plots showing the relationship between the ratio of color selective fixations in the visual search task and average RT/set size slopes, per target eccentricity.Regression line in grey (shading indicating 95% confidence interval), with Spearman correlation coefficient annotated in text.Bottom row -Histogram showing the distribution of the permutated correlation coefficients.Black dashed line indicates where the observed coefficient lies within this distribution.Permutation p-value annotated in text.