In the Thatcher illusion, faces appear grotesque as a consequence of inverting the eyes and mouth in an otherwise upright face (Bartlett & Searcy, 1993; Cornes, Donnelly, Godwin, & Wenger, 2011; Thompson, 1980). However, the grotesque appearance disappears when Thatcherised faces are inverted. Given that the Thatcher illusion is affected by inversion, the Thatcherisation of features is thought to affect relational (or configural) information within the faces (e.g., Boutsen, Humphreys, Praamstra, & Warbrick, 2006; Donnelly & Hadwin, 2003; Leder & Bruce, 2000; Lewis, 2001; Maurer, Grand, & Mondloch, 2002; Stürzel & Spillmann, 2000). Typically, the explanation of the Thatcher illusion is couched in terms of the dual-mode hypothesis (Bartlett & Searcy, 1993), which states that faces are processed in terms of features and of the relationships between features (configurations), but with the processing of configurations restricted to upright faces.

The aim of the present research was to test whether the grotesqueness of Thatcherised faces is subject to a form of configural or face superiority effect (Homa, Haver, & Schwartz, 1976; Pomerantz, Sager, & Stoever, 1977). In relation to the Thatcher illusion, the issue of configural superiority can be explored by measuring the processing capacity for detecting the oddity signalled by the inversion of eyes and mouths. Processing capacity reflects the resources available to complete a task at a given time. It is calculated from the relationship between the amounts of work that can be achieved in the presence of different amounts of information (e.g., both eyes and mouth inverted vs. either eyes or mouth inverted). Importantly, measures of processing capacity provide an assessment of performance (work achieved) that goes beyond that which can be inferred from simple differences in average response time (RT) data across conditions. Most importantly, processing capacity can be determined at all time points, rather than being based on a comparison of measures of central tendency.

Three levels of processing capacity relate to the effect on performance when variations in the amount of information (i.e., workload) occur (see Townsend & Nozawa, 1995). Limited capacity is signified by a decrement in performance as workload increases. Unlimited capacity occurs when there is no change in performance as workload increases. Finally, in supercapacity processing, positive interactions between the channels cause performance to improve under increased workload (Wenger & Townsend, 2001).

Given that supercapacity is required for configural processing (see Blaha, 2004, and Blaha & Townsend, 2006, for changes in capacity with configural learning; see Wenger & Townsend, 2001, for the relationship between capacity and configural, or Gestalt, processing), we hypothesised that supercapacity processing should be observed in detecting oddity in Thatcherised faces. With respect to this hypothesis, it is important to be clear how configural processing is causal in the experiencing of the Thatcher illusion. There are two ways in which the grotesqueness in the Thatcher illusion may be related to configurality. First, inversion of the eyes and mouth might create highly atypical configurations, and such atypicality causes the experience of grotesqueness. Second, the feature inversion of eyes and mouths may impede the ability to encode the eyes and mouths within facial configurations that are otherwise fluently processed. In this second hypothesis, Thatcherisation is associated with local difficulties in configural processing. Nevertheless, and according to either version, the configurations computed for upright faces act as carriers for signals of oddity (grotesqueness). Evidence of supercapacity in oddity detection would reveal a form of configural superiority for the detection of oddity from emergent features in Thatcher faces.

Here we manipulated the number of target (inverted) features, to create partial Thatcher faces, in which either eyes or mouths were inverted, as well as full-Thatcher faces, in which both eyes and mouths were inverted. Changes in the number of inverted features are known to correspond to perceived grotesqueness, with grotesqueness ratings increasing as the number of inverted features increases from one to two (Cornes et al., 2011). By manipulating the number of features inverted, we used the redundant-targets paradigm (e.g., Miller, 1982; Townsend & Wenger, 2004) to determine the processing capacity for Thatcherised faces.

To the extent that the Thatcher illusion is processed configurally, then interaction or dependencies between the eyes and the mouth should allow more work to be achieved in the full than in the partial Thatcher conditions. We therefore hypothesised that we would find supercapacity processing for detecting oddity when both eyes and mouths are inverted relative to when only eyes or only mouths are inverted. Failure to find supercapacity would bring into question the presence of configural processing in generating the illusion. Moreover, given the status of the Thatcher illusion as the prime exemplar of the effect of configurality on face processing, any evidence that it does not result from configural processing would require us to ask fundamental questions about configural face processing in general.

In the redundant-targets paradigm, performance is compared in three conditions: A (here defined as inverted eyes), B (here defined as inverted mouths), and A plus B (here defined as inverted eyes and mouths). First, RTs are compared in order to determine whether there is a gain for the redundant-target condition as compared with the single-target conditions. If such a redundancy gain is found, the critical issue is whether this gain arises from supercapacity processing. Supercapacity arises when the work done in the A-plus-B condition is greater than the sum of the work done in the A and B conditions. The work done is calculated from the RT distributions, as explained later. In examining this issue, we used a number of measures: the proportional-hazards model (Allison, 2010; Cox, 1972), the capacity coefficient (Townsend & Nozawa, 1995), the Miller inequality (Miller, 1982), and the Grice inequality (Grice, Canham, & Boroughs, 1984; see Townsend & Wenger, 2004, for a discussion of the relations among these measures). All four measures were used to provide converging evidence for or against the hypothesis that supercapacity processing of full Thatcherisation (eyes and mouth inverted) will occur relative to partial Thatcherisation (either the eyes or the mouth inverted). Average RTs alone cannot be used to calculate capacity, and therefore cannot address the question of configurality in the detection of Thatcherisation.

Processing efficiency across time: methods of analysis

The formulae for calculating the capacity coefficient, the Miller inequality, and the Grice inequality are presented in the following sections, together with discussion of possible outcomes.

Hazard ratios

The hazard function provides a measure of the probability of making a response at the next time point, given that the task is not yet completed (Wenger & Townsend, 2000). Formally, it is the probability that a response is made at a given time point (probability density function of task completion over time) divided by the probability that a response has not yet been made up to and including a given time point (the survivor functionFootnote 1). The hazard function can be used as a measure of the amount of work that can be achieved in a given unit of time, and as such lends itself to measuring the processing capacity of a system—that is, the relationship between amounts of work achieved under different conditions (Townsend & Ashby, 1978; Wenger & Gibson, 2004).

The proportional-hazards model (see Allison, 2010; Cox, 1972; Wenger & Gibson, 2004) can be used to determine the ordering of hazard functions, and as such allows a test of whether work is achieved earlier in the full-Thatcher condition than in the partial-Thatcher conditions by calculating the ratio of the hazard function for the full-Thatcher condition to that for the partial-Thatcher conditions. Because this measure relies on the relative ordering of the hazard functions, rather than on precise values, it is more robust than alternative methods when there are relatively few data points (Allison, 2010; Wenger & Gibson, 2004).

An alternative method for measuring capacity using the hazard function is the capacity coefficient (Townsend & Nozawa, 1995). This measure is calculated using Eq. 1, where t is a given time point, H F (t) is the integrated hazard function for the full-Thatcher condition, H P1(t) is the integrated hazard function for one partial-Thatcher condition, and H P2(t) is the integrated hazard function for the other partial-Thatcher condition.

$$ C(t) = \frac{{{H_F}(t)}}{{{H_P}_1(t){ } + {H_P}_2(t)}} $$
(1)

Given that the hazard function provides a measure of the amount of work done in a given unit of time, its integral provides a cumulative measure of work done (Townsend & Ashby, 1978; see also Wenger & Townsend, 2000). This cumulative hazard function is calculated as the negative log of the survivor function. When the work done at a given time point is less for the full-Thatcher [H F (t)] than for the sum of both partial-Thatcher [H P1(t) + H P2(t)] conditions, capacity is limited, in that only a certain amount of information can be processed, and any extra information does not cause more work to be achieved. Limited capacity therefore occurs when H F (t) < H P1(t) + H P2(t), such that C(t) < 1. Unlimited capacity is defined as when H F (t) = H P1(t) + H P2(t), such that C(t) = 1. Supercapacity is characterised by H F (t) > H P1(t) + H P2(t), such that C(t) > 1, demonstrating that the work achieved in the full-Thatcher condition is greater than the sum of the work achieved with the partial-Thatcher changes.

The Miller inequality (Ingvalson & Wenger, 2005; Miller, 1982; Townsend & Nozawa, 1995)

$$ {S_F}(t)--{S_P}_1(t)--{S_P}_2(t) + 1 \geqslant 0 $$
(2)

The Miller inequality is given in Eq. 2, where t is a given time point, S F (t) is the survivor function for the full-Thatcher condition, S P1(t) is the survivor function for the first partial-Thatcher condition, and S P2(t) is the survivor function for the second partial-Thatcher condition.

A violation of the Miller (1982) inequality implies supercapacity. If S F (t) = S P1(t) + S P2(t), it can be clearly seen that the Miller inequality will not be violated (the left-hand side [LHS] will be 1). Similarly, the inequality will not be violated when S F (t) > S P1(t) + S P2(t) (LHS will be greater than 1). A violation of the Miller inequality will occur when S F (t) + 1 < S P1(t) + S P2(t). That is, at time t, the probability that a response has not occurred for the full-Thatcher condition is lower than the summed probabilities for the partial-Thatcher conditions, and this difference at least 1.Footnote 2 In other words, responses occur faster for the full-Thatcher condition than they do for either of the partial-Thatcher conditions. Given that the Miller inequality is a more conservative test of supercapacity than the capacity coefficient, a violation is evidence for extreme supercapacity (see Ingvalson & Wenger, 2005, p. 19; Townsend & Wenger, 2004).

The Grice inequality (Grice et al., 1984; Ingvalson & Wenger, 2005; Townsend & Nozawa, 1995)

The Grice inequality (Eq. 3) is computed by subtracting the survivor function for the full-Thatcher condition from the survivor function of either the eye condition or the mouth condition, depending on which of these values is smaller:

$$ \min \left[ {{S_P}_1(t),{S_P}_2(t)} \right]--{S_F}(t) \geqslant 0. $$
(3)

As well as measuring processing capacity, the Grice inequality can also be used to determine a redundancy gain in a finer-grained way than can be achieved with mean RTs.

A violation of the Grice inequality implies limited capacity. If S F (t) > S P1(t) or S F (t) > S P2(t), then the Grice inequality is violated (LHS < 0). In other words, a violation occurs at time t if the probability of not having responded is lower in one of the partial-Thatcher conditions than in the full-Thatcher condition. Put simply, a violation occurs when responses are faster in at least one of partial-Thatcher conditions than in the full-Thatcher condition. A violation therefore provides evidence for limited capacity, in which there is no capacity for processing the extra information in the full-Thatcher condition, and responses are actually slower than the faster of the two partial-Thatcher conditions. The requirement for slower responses in the full-Thatcher condition causes a violation of this inequality to be evidence for extreme limited capacity (see also note 8 of Ingvalson & Wenger, 2005; Townsend & Wenger, 2004): A full-Thatcher condition is actually less helpful than at least one of the partial-Thatcher conditions.

Unlike the capacity coefficient and the Miller inequality, the Grice inequality is not reliant on a comparison between the full-Thatcher and both partial-Thatcher conditions. Rather, it compares the full-Thatcher condition with the faster of the partial-Thatcher conditions, regardless of whether one partial-Thatcher condition was relatively fast and the other relatively slow.

Experiment 1

The aim of Experiment 1 was to test whether inverted mouths and eyes, presented in an upright face context, produce supercapacity processing, or whether responses to full-Thatcher faces are equivalent to the combination of responses to inverted eyes and inverted mouths in partial-Thatcher faces. Inverted faces were not tested, as we did not hypothesise supercapacity processing for inverted faces.

Method

Participants

A group of 12 undergraduate students participated in Experiment 1 for course credit. The participants had a mean age of 20.5 years (SD = 3.6). Two of the participants were male, ten were right-handed, and all had self-reported normal or corrected-to-normal vision.

Apparatus

The experiment was conducted using custom E-Prime software (Psychology Software Tools, Inc., Sharpsburg, PA), with stimuli displayed on a 15-in. monitor set to a refresh rate of 60 Hz and a resolution of 1,280 × 1,024. Responses were made using the left and right mouse buttons, with the buttons counterbalanced across participants. The participants were seated approximately 57 cm from the screen.

Stimuli

Sixteen greyscale female faces were obtained from the Stirling Picture Database. These 16 faces were used to create four sets of stimuli: no stimulus manipulation (typical faces), only the eyes or only the mouths inverted (partial-Thatcher faces), and both the eyes and mouths inverted (full-Thatcher faces) (see Fig. 1).Footnote 3 In total, there were 64 face stimuli. The faces were manipulated in Adobe Photoshop with the blur tool used to remove high-contrast edges caused by manipulating the images. Whole images were blurred using Gaussian blur with a radius of one pixel. The stimuli subtended 6.7° × 8.3° of visual angle, with the face occupying 4.1° × 6.3° within the image, and, perhaps most critically for the task, the distance between the top of the eyebrows and the bottom of the mouth was 2.9° (see Fig. 1).

Fig. 1
figure 1

Example stimuli from Experiments 1 and 2, with a typical face (top left), eye partial-Thatcher (top right), mouth partial-Thatcher (bottom left) and full-Thatcher (bottom right). Note the example of the face image is not one of the stimuli used in the experiment, but an illustrative example of the Thatcher illusion as instantiated in this study

In order to address the hypothesis that detection of the Thatcherised features in upright faces would be marked by supercapacity processing, only upright faces were presented.

Procedure

Participants were presented with a single, typical partial- or full-Thatcher face. The participants were asked to decide whether the face was “odd” or “normal” as quickly and accurately as possible. The stimulus was displayed until response. Given the nature of the response (“odd” or “normal”), the task was to detect oddity, or grotesqueness, within the stimulus.

The experiment comprised 768 trials, divided into four blocks. Each Thatcherised face was presented twice in each block, to give 32 eye, 32 mouth, and 32 eye-and-mouth trials. Each unchanged face was presented six times in each block to give 96 typical trials. In total, each participant viewed 128 eye, 128 mouth, and 128 eye-and-mouth Thatcher trials, and 384 typical trials. The participants had a short, self-paced break between blocks. Face identity and type of manipulation were randomised within all blocks.

The primary aim in deciding the frequency of each trial type was to equate the number of “odd” and “normal” responses, such that the probability of finding an inversion in any given face was .5. However, the trial proportions chosen may increase the chance of finding a redundancy gain by artificially inhibiting responses to one of the partial-Thatcher faces. Such inhibition can occur for two reasons. Firstly, given no inversion for one feature (e.g., the eyes), the probability of an inversion for the other feature is .25 (e.g., 32 mouth-inversion trials vs. 96 typical trials). This probability is lower than the overall probability of an inversion (.5), and will therefore inhibit responses to single-inversion trials (e.g., responses to mouth-inversion trials will be slower than those to full-Thatcher trials) (Mordkoff & Egeth, 1993; Mordkoff & Yantis, 1991). It is worth noting that the same contingencies are not implied by the presence of an inversion: Inversion of one feature (e.g., the eyes) is not predictive of the inversion of the other feature (e.g., the mouth), because given an eye inversion, the proportion of trials with a mouth inversion is .5. Secondly, the RTs for the single-inversion trials can be artificially slowed if the participant preferentially attends to one location or another (Mullin, Egeth, & Mordkoff, 1988). For example, if the participant attends to the eye location, an eye inversion would result in faster responses than would a mouth inversion. This difference would not affect the responses to full-Thatcher trials but would slow the mouth-inversion trials relative to the other trial types, causing an overall slowing for single-inversion trials. The participants were not informed about these probabilities of co-occurrence.

Results

Before assessing processing efficiency, analyses were conducted on the performance data (RT and accuracy) in order to establish whether a redundancy gain occurred for detecting grotesqueness in the full-Thatcher condition as compared with the partial-Thatcher conditions. These analyses were conducted both for individual participants and over all participants. Secondly, given a redundancy gain, the measures of processing efficiency were calculated in order to establish the capacity levels underlying the redundancy gain. All of the analyses were conducted only on Thatcher trials, in which a feature inversion had occurred, in order to compare between the partial- and full-Thatcher conditions. All post-hoc t tests were Bonferroni corrected by adjusting the p value.

Redundancy gain: analyses of group mean and individual RTs

To test for an overall redundancy gain, the effect of the number of feature changes (partial vs. full Thatcher) on RTs and error rates was first examined at the group level. In order to minimise positive skew in the RT data, and hence to eliminate the need for outlier trimming, reciprocal RTs were analysed (Ratcliff, 1993) using the mean reciprocal RT over correct trials. Nontransformed mean RTs were also analysed (see note 5 for the results). The error data were transformed using the arcsine transformation, arcsin \( \left( {\sqrt {x} } \right) \), where x is the percentage error. This transformation was required to normalise the binomial distribution of the accuracy data.

One-way repeated measures ANOVAs revealed a significant effect of feature change for RTs and error rates, Fs(2, 22) = 22.03 and 35.97, both ps < .01, partial eta-squareds (η 2p s) = .67 and .77, respectively (see Fig. 2). Follow-up t tests revealed a redundancy gain, with faster responses and higher accuracy in the full-Thatcher condition than in both partial-Thatcher conditions [all ts(11) > 6.36, p < .01]. There was no significant difference between the eye and mouth conditions for RTs or errors [ts(11) = 2.26 and 2.68, ps = .14 and .07, respectively].Footnote 4

Fig. 2
figure 2

Mean response times (RTs, in milliseconds) and error rates (percentages) from Experiment 1 (black) and Experiment 2 (grey). Error bars represent one standard error

As the capacity measures will be examined at the level of the individual, each participant’s RT data were also analysed separately to test for a redundancy gain for each individual. For each participant, a separate one-way independent-samples ANOVAFootnote 5 was conducted for the Feature Change factor, with three levels (eye and mouth vs. eye vs. mouth) and with up to 128 trials in each condition. The effect of feature change was significant for ten of the 12 participants [F(2, 325) to F(2, 371) > 3.5, ps < .04, η 2p s = .02 to .2], approached significance for one participant [F(2, 322) = 2.60, p = .08, η 2p = .02], and was not significant for the last participant [F(2, 302) = 1.32, p = .27, η 2p = .01]. Follow-up t tests were conducted on the ten participants showing a significant effect of feature change. For five of the participants, RTs were faster in the full-Thatcher condition than in one of the partial-Thatcher conditions (all ps < .05). For the remaining five participants, RTs were faster in the full-Thatcher condition than in both of the partial-Thatcher conditions (all ps < .05), demonstrating a redundancy gain.

Processing efficiency across time

The RT analyses revealed a significant redundancy gain at the level of the group, and at the level of the individual for five participants. We wished to determine whether or not this redundancy gain arose due to supercapacity processing. Mean-RT analyses alone cannot be used to answer questions regarding processing capacity, particularly when capacity changes over time, and therefore the measures of processing capacity defined earlier were employed.

The three panels in Fig. 3 show the mean values for the capacity coefficients, Miller inequalities, and Grice inequalities calculated across groups (those exhibiting a redundancy gain vs. those not), and Fig. 4 presents these measures for all 12 participants individually. Given that the three methods are nonparametric, no transformation of RTs was required before analysis. RTs for incorrect trials were removed before analysis, and for each incorrect-trial RT, the most similar correct-trial RT was also removed from the same condition. This removal of twin RTs is standard practice and is necessary because there were two possible responses (odd and normal), so for every incorrect trial that could be considered a guess, there was also a correct trial that could be considered a guess (Eriksen, 1988). The survivor functions were calculated using the SAS LIFETEST procedure, with the life-table method and a width (bin size) of 25 ms (see Allison, 2010).Footnote 6

Fig. 3
figure 3

Processing efficiency data from Experiment 1, presented at the level of the group. Black lines represent the mean values for the group of participants exhibiting a redundancy gain, and grey lines represent the mean values for the group who did not. The capacity coefficient means were calculated by lining up the earliest response of each participant before averaging across relative time bins

Fig. 4
figure 4

Processing efficiency data from Experiment 1. Each line represents the efficiency measure for one participant over time. Black lines represent values from those participants who exhibited a redundancy gain, and grey lines represent values from those who did not

For the group of participants exhibiting a redundancy gain (n = 5/12), overall capacity measures at the level of the group (Fig. 3) showed supercapacity at early time points (capacity coefficient > 1, violation of the Miller inequality), as well as a little evidence for extreme limited capacity at late time points (violation of the Grice inequality). For the group of participants not exhibiting a redundancy gain (n = 7/12), there was no evidence of supercapacity (capacity coefficient < 1, one very minor violation of the Miller inequality) or of extreme limited capacity (violations of the Grice inequality). In sum, there was only modest evidence for supercapacity processing in five out of 12 participants, and for most of these the evidence was limited to very early time points.

For completeness, at the level of individuals, there was very little evidence of extreme supercapacity from the Miller inequality (few violations, all values close to or above zero), with the few violations that did occur occurring at early time points ( < 800 ms). Those participants demonstrating the strongest violations were those that exhibited a redundancy gain (black lines, Fig. 4). There was some evidence of extreme limited capacity from the Grice inequality for some participants, with violations at some time points (values below zero). The participants demonstrating violations were ones who did not exhibit a redundancy gain (grey lines, Fig. 4). From the capacity coefficients, there was some evidence of supercapacity, with values exceeding 1 for the majority of time points for one participant, and at some early time points ( < 750 ms) for five participants. All but one of these participants exhibited a redundancy gain (black lines, Fig. 4). For mid-to-late time points, the capacity coefficients were between 0 and 1, indicating limited to unlimited capacity, but no evidence of supercapacity processing.

Discussion

The analysis revealed a redundancy gain at the level of the mean, as well as redundancy gains at the level of the individual for five of the 12 participants. For most participants, the capacity data are inconsistent with interdependencies between features that serve to boost performance. Therefore, the results demonstrate that supercapacity is not necessary, nor automatic, for detection of the Thatcher illusion, and some participants were able to complete the task with limited capacity. However, there were some participants, particularly those demonstrating a redundancy gain, for whom there was some evidence of processing reaching supercapacity, with nonadditive processing of eyes and mouths.

The results suggest that the discrimination between Thatcher and typical faces does not rely on supercapacity processing and can occur using a limited to an unlimited capacity system. There is little evidence that the redundancy gain arises from supercapacity, and no evidence that supercapacity is necessary for a redundancy gain, especially at mid-to-late time points. No extra work is necessarily achieved in the full-Thatcher condition, as compared with the total work across both partial-Thatcher conditions. In addition, there is little evidence that the redundancy gain, when present, emerges because of positive interactions between features. These results could be explained by a race to completion of two independently processed features.

In these tasks, our reasoning was that participants sought evidence of inversion (and, hence, oddity). The presence of one inverted feature would require an “odd” response, while the presence of a typical feature would not necessarily indicate a “normal” response. The gain in RTs for the redundant inverted feature (full over partial Thatcher) does indeed suggest that participants were using the inverted feature(s) to conduct the task. However, we should note that there is nothing to prevent participants from assigning the same task relevance to upright as to inverted features. In this case, the presence of both types of information in the partial-Thatcher condition would engage additional resources for processing relative to the full-Thatcher condition. The effect of this would be to suppress the measure of processing capacity. Although this is possible, the account is highly unlikely to hold, as it would require participants to engage in the simultaneous detection of two types of targets. In other types of visual-processing tasks, it is known that performance is compromised when participants are required to, for example, search for and detect two different targets simultaneously (Menneer, Barrett, Phillips, Donnelly, & Cave, 2007).

It might be possible that participants could detect a feature inversion but not experience grotesqueness. If so, the results would not require reference to grotesqueness at all. This is, however, highly unlikely to be the case. Firstly, the phenomenon of grotesqueness in the illusion is experienced automatically (e.g., Bartlett & Searcy, 1993), with a correlation between the timings of activation of the fusiform face area and the amygdala when viewing Thatcherised faces (Donnelly et al., 2011; Rotshtein, Malach, Hadar, Graif, & Hendler, 2001), and secondly, the magnitude of perceived grotesqueness is higher for full- than for partial-Thatcherised faces (Cornes et al., 2011). Therefore, it is highly unlikely that any strategy for perceiving inversion without the experience of grotesqueness could succeed. In addition, our analyses concerned comparisons across full-Thatcher and partial-Thatcher conditions. The presentation of features remains constant across these conditions, and therefore any particular instance of a feature in which grotesqueness was not apparent would be the same across both partial- and full-Thatcher faces.

Experiment 2

There are at least three possible reasons why a redundancy gain could be found in Experiment 1 without strong evidence for supercapacity. The first is that the task of categorising faces as “odd” or “normal” could have encouraged participants to focus on the indicative features (eyes or mouth) to make the categorisation. Secondly, the 2.9° distance between eyebrows and mouth may have encouraged eye movements, which in turn are likely to encourage the selective processing of individual features more than when eye movements are prohibited, particularly because eye movements are frequently made to eyes and mouths in the processing of faces (Smith, Gosselin, & Schyns, 2004; Spezio, Adolphs, Hurley, & Piven, 2007). Such serial processing would allow for self-termination after examination of a single feature, rather than exhaustive processing of both features. Such a change would suggest that configural processing was no longer taking place (Wenger & Townsend, 2001) and would therefore affect the inferences regarding capacity made in Experiment 1. Thirdly, Experiment 1 might not have provided strong evidence for supercapacity if participants used typicality as well as oddity to conduct the task, as we discussed earlier.

In Experiment 2, we explored whether the failure to observe supercapacity in Experiment 1 might have resulted from the reasons described above (categorisation task demands, eye movements, and two sources of information in the partial-Thatcher condition). Using the same stimulus set, we adopted the procedure employed by Ingvalson and Wenger (2005), in which same/different judgements were to be made to successively presented face pairs. The design allowed for “same” and “different” responses to be orthogonal to the presentation of typical and Thatcherised faces. In doing so, Experiment 2 would prevent the use of a single absolute value (odd/normal) to complete the task. While this task could still be terminated on the first feature of the second face, because only one difference was required in order to provide a “different” response, the probe face was presented sufficiently briefly that eye movements could not be made in response to these faces. Nevertheless, eye movements were measured, and data were excluded from trials on which eye movements were made.

Method

Participants

A group of 15 student volunteers were recruited as participants. Three of the participants were excluded, as their average error rates across conditions exceeded 25 %.Footnote 7 The mean age of the remaining participants was 25.8 years (SD = 6.6). Three of the participants were male, 11 were right-handed, and all had self-reported normal or corrected-to-normal vision.

Apparatus

The stimuli were presented on a 21-in. monitor with a refresh rate of 60 Hz and a resolution of 1,280 × 1,024. Responses were made via a two-button response box, with the buttons counterbalanced across participants. Eye movements were recorded using an SR Research EyeLink 1000 eyetracker and the associated software. Although stimulus viewing was binocular, recordings of eye movements were made from the right eye only. A chin rest and a forehead rest were used, with the participants seated 57 cm from the monitor.

Stimuli

The stimuli were the same as in Experiment 1, except for the following. Only ten of the 16 faces were used, in order to reduce the number of trials and to prevent the participants from becoming uncomfortable during the eyetracking process. The ten faces used were randomly selected from the 16. Secondly, the faces were presented with a smaller visual angle of 2.1° × 2.6°, giving a maximum of 1.0° vertical distance from the top of the eyebrows to the bottom of the mouth. A smaller stimulus was used in order to allow processing of the face within a single fixation, and thus to minimise serial or piecemeal processing of the eyes and mouth.

Procedure

There were four variants of each face: one typical, two partial-Thatcher faces, and one full Thatcher. Each variant of a face was paired with the three other variants of the same face, giving six pairs. Two randomly selected faces were reserved in order to provide 16 practice trials. For the eight faces used in the experimental trials, this process created 48 pairs of faces matched for identity but varying in structure. The order of presentation of the two faces in each pair was counterbalanced, providing 96 “different” trials, of which 32 were eye change only, 32 were mouth change only (either typical to partial Thatcher or partial Thatcher to full Thatcher), and 32 were double change (typical to full-Thatcher faces or partial-Thatcher eye change to partial-Thatcher mouth change). Each variant was also matched with itself to create 32 “same” pairs, which were each presented three times throughout the experiment, to provide 96 “same” trials. There were a total of 192 experimental trials. The trial frequencies were the same as had been presented in Experiment 1, in that the numbers of “same” and “different” responses were equated, at the expense of artificially increasing the chance of finding a redundancy gain (see the Exp. 1 Procedure).

The eyetracker was calibrated within 0.3° accuracy for each participant using a nine-point calibration display. Each trial began with a drift correction dot, which was displayed until the participant fixated it within 0.5°. Calibration was repeated if this level of accuracy was not achieved. After the drift correction, the first face was displayed for 500 ms, followed by a blank screen for 500 ms, then the second face for 150 ms. This timing was chosen in response to piloting, with the aim of avoiding multiple fixations on the second face. During practice trials, the second face was presented for 250 ms to allow the participants to familiarise themselves with the task. In all trials, a response screen of four centrally located question marks was shown until response.

The participants were instructed to decide whether pairs of sequentially presented faces were the same or different. This task therefore differed from that in Experiment 1, in that responses could not depend on the mere detection of grotesqueness. The participants were told to respond as quickly as possible while minimising errors. The data from the practice trials were not recorded. Participants were asked to keep their eyes still when viewing the stimulus. If more than one fixation was made on the second face, the trial and data were discarded, which resulted in two trials being discarded from the analysis across all participants.

Results

The data were analysed as in Experiment 1 unless otherwise stated. All analyses were conducted only on “different” trials that contained a typical face, such that a double feature change (typical paired with full Thatcher) exhibited a smaller change in grotesqueness than did a single feature change (typical paired with partial Thatcher). A double feature change also occurred between a Thatcherised eye face and a Thatcherised mouth face, but the difference would not represent a change in the total amount of grotesqueness. For completeness, the data for all types of trial are presented in Fig. 5.

Fig. 5
figure 5

Mean response times (RTs, in millseconds) and error rates (percentages) from Experiment 2 for each type of trial. Error bars represent one standard error

Trials containing a typical face (i.e., those analysed) could be presented with the typical face appearing first or second. Bonferroni-corrected t tests on reciprocal RTs and arcsine error rates revealed shorter RTs for both single-change conditions, as well as lower error rates for the mouth change condition when the typical face was presented first than when it was presented second (all ts > 2.97, ps < .05). There was no effect of face order for the double-change condition or for the eye change condition on error rates (all ts < 1.88, ps > .26). This analysis shows that there are some differences in the levels of difficulty within each type of single-change trial. However, such differences were to be expected, and given that both orders were used for each pair of faces, any effects of order would cancel out throughout the experiment.

Redundancy gain: analyses of group mean and individual RTs

One-way ANOVAs across all participants revealed a significant main effect of feature change for errors, F(2, 22) = 13.24, p < .01, η 2p = .55, but no effect for RTs, F < 1, η 2p = .05 (see Fig. 5). Error rates were lower in the double-change condition than in both single-change conditions (both ts > 4.28, ps < .01), and there was no difference between the single-change conditions (t < 1).

One-way ANOVAs for individual participants were used to analyse the reciprocal RTs, with up to 16 trials in each condition. One of the 12 individual-participant ANOVAs gave a significant effect of feature change, F(2, 47) = 5.75, p < .01, η 2p = .20, and one showed a trend towards a significant effect, F(2, 43) = 3.07, p = .06, η 2p = .13. All other ANOVAs showed no significant effect (all Fs < 1.79, ps > .18, η 2p s = .02 to .08). For the participant exhibiting an effect of feature change, RTs were faster in the double-change (full-Thatcher) condition than in the eye change (partial-Thatcher) condition, t(30) = 3.94, p < .01, but not than in the mouth change (partial-Thatcher) condition, t < 1. These results do not exhibit a redundancy gain at the level of the individual.

There was no evidence for a redundancy gain in the RT data at the level of the mean, as well as at the level of individuals.Footnote 8 However, the error data did suggest a redundancy gain, implying a speed–accuracy trade-off in which RTs were speeded in the single-change conditions at the expense of accuracy. If accuracy levels were equal across the single-change and double-change conditions, a redundancy gain in RTs may have been found, because single-change RTs may have needed to be longer to achieve the levels of accuracy exhibited in the double-change condition. Therefore, the processing capacity measures will be calculated. Given that the capacity measures were calculated after incorrect and twinned correct RTs were removed, the RTs analysed represent correct nonguessed responses, and therefore compensate for a speed–accuracy trade-off.

Processing efficiency across time

The data were analysed as for Experiment 1, except that the Cox proportional-hazards ratio was calculated instead of the capacity coefficient because of the reduced number of trials, using the PHREG procedure in SAS. Figure 6 shows the Cox proportional-hazards ratios, Miller inequalities, and Grice inequalities at the levels of the group and the individual. Schoenfeld residuals were calculated for the RTs over all participants and change conditions (eye, mouth, or both), using the RESSCH procedure in SAS. There was no significant correlation between RTs and the Schoenfeld residuals, confirming that the assumption of constant proportionality was met, and therefore the RTs did not require trimming before analysis.

Fig. 6
figure 6

Processing efficiency data from Experiment 2. For the hazards ratios, the first 12 bars represent the individual participants, and the 13th bar represents the ratio calculated over all participants. For the Miller and Grice measures, the black lines represent the mean values over all participants, and each grey line represents the efficiency measure for one participant over time

At the level of the group, the proportional-hazards ratio was calculated over all participants, with stratification by participants to control for heterogeneity between the participants. The ratio was not significantly different from 1 (p = .125), indicating that the number of changes had no effect on processing capacity. To examine changes in capacity over time, the Miller and Grice inequalities were calculated and showed no evidence of supercapacity at any time point (no violations of the Miller inequality), and some evidence for limited capacity (violations of the Grice inequality).

At the level of individuals, three participants had hazards ratios that were significantly greater than 1 (all ps ≤ .05), indicating an increase in processing capacity between one and two changes (eyes or mouth vs. eyes and mouth). For the remaining nine participants, and over all participants together, the hazards ratios were not significantly different from 1 (all ps > .2). For the changes in capacity over time (Miller and Grice inequalities), one participant, who also had a hazards ratio greater than 1, exhibited a large violation of the Miller inequality, demonstrating supercapacity at the early time points ( < 625 ms). The same participant showed no violations of the Grice inequality, indicating that capacity was not limited. Two other participants had a few minor violations of the Grice inequality at early time points. However, the remaining participants had larger violations of the Grice inequality, implying limited capacity, particularly at early time points ( < 850 ms).

Discussion

The lack of evidence for supercapacity in Experiment 1 might have been due to the task characteristics allowing piecemeal sampling of the facial features. In Experiment 2, such sampling was prevented by restricting eye movements. However, there was no redundancy gain in RTs for any individual or at the level of the mean. The lack of a redundancy gain suggests that there was no benefit of two changes (typical as compared with full Thatcher) over the single-change conditions (typical as compared with partial Thatchers). In addition, the capacity analyses provided very little evidence for the supercapacity processing of feature inversion in upright faces. The results are consistent with inverted eyes and mouths being processed independently in upright faces, or as having interdependences that are not excitatory, thus failing to produce supercapacity.

Despite the difference in the tasks, and the measures taken to encourage configural processing, these results agree with those from Experiment 1 by demonstrating that processing of grotesqueness in the Thatcher illusion is not at supercapacity when participants can terminate on the first feature completed. There is, however, evidence that such experimental designs sometimes encourage analytic, nonconfigural perception of features in faces (Wenger & Townsend, 2001), while exhaustive processing promotes findings of supercapacity (Wenger & Townsend, 2006).

The lack of a redundancy gain for grotesqueness detection here is in contrast with those of Bradshaw and Wallace (1971; also replicated by Perry, 2008, and Perry, Blaha, & Townsend, 2008), who found that the speed of same/different face judgements increased with the number of differences between the faces. The difference between the results could be explained by the presentation of the stimuli, which were presented simultaneously by Bradshaw and Wallace, but successively here. The simultaneous presentation used in previous research may have encouraged piecemeal comparison between the faces, and thereby increased the probability of attending to a changed feature early in the trial when there were two changes as compared with when there was only one. In addition, despite being face-like, the faces used by Bradshaw and Wallace were sketch-like Identi-Kit faces, which may have discouraged configural processing (Leder, 1996).

General discussion

Across successive same/different and categorisation tasks, whether or not eye movements were allowed, similar results held: The distinguishing of Thatcherised faces from typical faces generally occurred with capacity processing that was sub-supercapacity (i.e., was limited to unlimited), such that the grotesqueness signals in full-Thatcher faces rarely go beyond the sum of grotesqueness signalled by partial-Thatcher faces. This finding is contrary to what was expected. Furthermore, although a simple redundancy gain was reliably found in Experiment 1, where a categorisation response was required, it was not found in Experiment 2, where a same/different response was required. This result was obtained despite the fact that the experiments were conducted with trial frequencies that should have biased towards finding a redundancy gain. We therefore conclude that the signalling of grotesqueness in Thatcher faces can be aided when signalled by two features as compared with one, but this facilitation only occurs when categorisation responses to single faces are required. Even then, the processing capacity data are consistent with simple, parallel horse-race processing, and possibly with serial processing on occasion (when extreme limited capacity is found). These data are consistent with two accounts. First, no interactions formed across inverted eyes and mouths serve to speed the detection of grotesqueness. Second, inhibitory interactions exist between feature-processing channels. In the absence of other data, Occam’s razor suggests that we should go with the most parsimonious account and thus conclude that there is little evidence of interactions across eyes and mouths in the Thatcher illusion.

While the stimuli and tasks were not equated across Experiments 1 and 2, a comparison across the two does reveal something of interest: While responses were faster to eyes than to mouths in Experiment 1, there was a trend in the opposite direction in Experiment 2. The shifting trend is also reflected in the accuracy data, with a significant interaction across experiments for accuracy [F(2, 44) = 3.19, p = .05], although it misses significance for RTs [F(2, 44) = 2.36, p = .11]. The two tasks were different, and therefore there was no a priori expectation that RTs in the full-Thatcher condition would be matched across experiments. Nevertheless, the fact that they were matched makes it appropriate to comment on the differential trends across eyes and mouths. We suggest that the categorisation task used in Experiment 1 was approached much as a recognition task, in which attention tends to be allocated to eyes (Riby, Riby, & Reay, 2009; Smith et al., 2004). The same/different task used in Experiment 2 may have been approached more as a pattern-matching exercise, such that the preference to attend to the eyes was removed.

The present finding goes beyond the fact that the orientation of whole faces is critical to enabling the perception of grotesqueness in the Thatcher illusion. The present experiments addressed whether there is a positive interaction between Thatcherised features, as measured by processing capacity, and showed that no more work is necessarily achieved in response to full-Thatcher faces than the sum of the work achieved in response to individual features. These data are inconsistent with upright faces leading to a configural superiority effect for the detection of grotesqueness in the Thatcher illusion, but they are consistent with research demonstrating that effects that appear configural in nature can be accounted for by nonconfigural models (Fifić & Townsend, 2010).

The lack of supercapacity in processing Thatcherised faces is apparently in contrast to the findings of Ingvalson and Wenger (2005), who found supercapacity processing in their test of the dual-mode hypothesis. However, there was an important difference between their work and the present experiments. Here, we measured capacity for two configural changes that led to grotesqueness; hence, both changes were of the same type, and combining the two changes would not affect the type of information available. Ingvalson and Wenger, in contrast, were measuring capacity for different types of changes (featural and configural), to test whether the two channels of information interact. The difference between a single change (featural or configural) and both types of changes is not only in the amount of information, but in the types of information available. This change in information type did not take place for the stimuli used here.

The results of the present study should be considered alongside those of Cornes et al. (2011). Cornes et al. explored the Thatcher illusion within the terms of general recognition theory (GRT), using response frequencies rather than latencies. They demonstrated that the effect of inversion on the Thatcher illusion could be accounted for by shifts in perceptual sensitivity and decision criteria (in terms of GRT, a violation of perceptual and decisional separability) across orientations. Their data suggested that within-stimulus interactions between internal and external features in Thatcher faces were limited. We added to those data here by reporting very limited evidence of oddity detection occurring with supercapacity.

The present findings provide some initial evidence about the relationship between configural processing, processing capacity, and the Thatcher illusion. Given the surprising lack of evidence for configural processing in the Thatcher illusion, it may be that other reported configural face effects should be similarly explored.

The present experiments were designed to address the issue of configurality in terms of supercapacity processing, and they cannot directly speak to questions of processing architecture (serial vs. parallel vs. coactive) or the stopping rule (self-terminating vs. exhaustive) (see Wenger & Townsend, 2001). However, the lack of evidence for supercapacity in the present data suggests that a coactive architecture with positive dependencies between feature channels (eyes and mouth) can be ruled out as a possible architecture for the processing of the Thatcher illusion. For instance, a positively dependent coactive architecture would give rise to capacity coefficient values of up to 5 (see Townsend & Wenger, 2004). Future research should extend these findings such that the results can speak directly to the processing architecture and stopping rule, as well as to processing capacity. Such research should also evaluate performance in tasks that require exhaustive processing (e.g., Wenger & Townsend, 2006). We also suggest that other tasks used as markers of configurality should be explored in terms of processing capacity.