Revealing the mechanisms of human face perception using dynamic apertures

upside-down. perceptual decrements reﬂect a qualitative switch parallel whole-face processing, serial analysis of individual features. observers’ ability to categorize faces presented in their entirety, or viewed through a dynamic aperture that moved incrementally across the facial image. faces region-by-region disrupts holistic processing, but permits serial analysis of local features. In line with the holistic accounts, we predicted that aperture viewing would greatly impair judgements of upright, but not inverted faces. As expected, identity, gender, age, and expression were categorized more precisely when faces were presented upright and in their entirety. Contrary to holistic accounts, however, the detrimental effects of inversion seen in the whole-face condition were no greater than in the aperture condition. Moreover, we found comparable aperture effects for upright and inverted faces; observers exhibited less decision noise when faces were viewed in their entirety, than when viewed through the aperture, irrespective of orientation. We replicate these ﬁndings in control experiments and show that the same pattern is seen irrespective of the direction of aperture transition or the nature of the ﬁll used to replace the occluded regions of the to-be-judged image. These results challenge holistic accounts of the face inversion effect and support an alternative interpretation. First, in line with previous ﬁndings, they indicate that perceptual decrements when viewing upside-down faces result from impoverished descriptions of local regions, not the loss of whole-face processing. Second, when interpreting inverted faces, access to the wider face context appears to be far more important than currently believed.

Proponents of holistic face processing have been criticized for stating their assumptions informally (Fifić & Townsend, 2010;Fitousi, 2015Fitousi, , 2016Wenger & Townsend, 2001). Nevertheless, some important features of the hypothesized information processing characteristics may be delineated. For example, its characterization as 'the simultaneous integration of the multiple parts of a face into a single perceptual representation' (e.g., Rossion, 2008Rossion, , 2013 suggests that upright faces are processed by multiple parallel channels, each describing a particular local region. The activations of these parallel channels may be subsequently combined in a single output channel that conveys an integrated representation of the whole face, upon which perceptual decisions are based (see coactivation architecture; e.g., Fifić & Townsend, 2010). Assertions that integrated representations are 'more than the sum of their parts' (e.g., Shen & Palmeri, 2015) imply that lateral interactions between different channels may improve processing accuracy and efficiency for upright faces. In contrast, the characterization of inverted face processing as ''parts-based" or ''piecemeal" suggests that decisions about inverted faces depend on evidence accumulated through a serial analysis of local features. In the absence of lateral interactions between parallel channels, the description of one region remains unaffected by the processing of other regions; i.e., processing independence is hypothesized.
In its strongest form, holistic processing theory argues that parallel whole-face processing is engaged only in the presence of an intact whole face 1 (Farah et al., 1998;McKone & Yovel, 2009;Tanaka & Farah, 1993;Tsao & Livingstone, 2008). Such gating may increase neurocognitive efficiency by ensuring that resourceintensive processing is engaged only where appropriate (Tsao & Livingstone, 2008). The possibility that holistic face processing is dependent on the detection of canonical first-order facial information has been used to explain diminished integration of information from upper and lower face halves when composite face arrangements are spatially misaligned or inverted . This view is also consistent with reports that judgments about cropped facial features presented in isolation (i.e., in the absence of a facial context) are relatively insensitive to orientation inversion (Palermo et al., 2011;Rhodes, Brake, & Atkinson, 1993), however this remains controversial (see Leder et al., 2001;Rakover & Teucher, 1997). Finally, it accords with findings that scrambled faces -where local features do not appear in their typical locations -do not elicit putative markers of holistic processing, including inversion effects (Martini, McKone, & Nakayama, 2006;Tanaka & Farah, 1993;Towler, Parketny, & Eimer, 2015).

Composite face effects
To date, the principal line of evidence for the holistic processing account comes from the composite face effect (Hole, 1994;Young, Hellawell, & Hay, 1987). When the top half of one face is aligned with the bottom half of another, and presented upright, the halves appear to 'fuse' resulting in a compelling percept of a novel facial configuration Rossion, 2013). When composite arrangements are presented upside-down, however, fusion is greatly diminished (McKone et al., 2013;Susilo, Rezlescu, & Duchaine, 2013). This effect suggests that the visual system integrates information from disparate regions when observers view upright, but not inverted, faces, consistent with theories of holistic face processing (Farah et al., 1998;Maurer et al., 2002;McKone & Yovel, 2009;McKone et al., 2007;Piepers & Robbins, 2013;Rossion, 2008).
The literature on the composite face effect has also been dogged by methodological issues. For example, the size and pattern of composite effects may be affected by the presence or absence of a small gap between the target and distractor halves (Rossion & Retter, 2015), the presence of subtle signs of facial emotion in distractor regions (Gray, Murphy, Marsh, & Cook, 2017), and observers' attentional strategy (Fitousi, 2016). Moreover, there has been considerable debate about the respective merits of the original matching design and a more recent congruency variant (DeGutis et al., 2013;Murphy et al., 2017;Richler, Wong et al., 2011;Rossion, 2013). Whilst the size of composite effects measured using the original design may be affected by response bias (Richler, Cheung, & Gauthier, 2011a), estimates of susceptibility obtained using the congruency paradigm may be contaminated by general factors including response priming and interference (Rossion, 2013).

Aperture paradigms
Given the controversy surrounding the composite face effect, it is important that complementary tests of holistic face processing are developed. One promising line of innovation uses aperture viewing to compare the visual processing recruited by upright and inverted faces. Aperture paradigms restrict observers' fieldof-view, such that only a small part of an image is visible. During aperture viewing, participants must therefore base perceptual decisions on information extracted from exposed local regions, either viewed serially (e.g., in dynamic aperture paradigms) or in isolation (e.g., in static aperture paradigms). Similar aperture techniques have been used elsewhere in the vision sciences to investigate a range of topics including shape perception, object recognition, and reading (Anstis & Atkinson, 1967;Craddock, Martinovic, & Lawson, 2012;McConkie & Rayner, 1975;Morgan, Findlay, & Watt, 1982;Rieger, Grüschow, Heinze, & Fendrich, 2007;Rock, 1981).
Employing this logic, a recent study compared the perception of upright and inverted faces using a gaze-contingent aperture paradigm, in which the to-be-revealed portion of the target face is determined by the fixation behavior of the observer (Van Belle et al., 2010). Consistent with holistic accounts, observers show marked inversion effects only when judging the whole face; when basing decisions on local information sampled serially through the aperture, observers' upright and inverted performance was broadly comparable. Gaze-contingent aperture paradigms potentially allow observers to fixate on the stimulus naturally. It is not clear, however, how the aperture manipulation alters observers' fixation strategies; many observers appear to fixate the regions between features more often in the aperture conditions, relative to freeviewing conditions (Van Belle et al., 2010). Moreover, gazedependent paradigms relinquish control of stimulus presentation to the participant; different participants receive different visual input depending on their fixation strategies. Consequently, manipulations of facial orientation (Van Belle et al., 2010) and observer group (Evers et al., 2017;Van Belle et al., 2011), may be confounded with the basic visual information presented to observers.

Present study
In the present study we sought to test the predictions of holistic face processing theory using a novel dynamic aperture paradigm in which participants were unable to influence the speed or trajectory of the viewing window. Observers made simple binary categorization judgments about faces presented in their entirety, or viewed through a dynamic aperture that moved incrementally across the image. Observers' perceptual ability in each viewing condition (their 'decision noise') was inferred from the slope of the resulting psychometric function. If our expert processing of upright faces is a product of holistic processing, aperture viewing should result in marked performance decrements. Conversely, judgments about inverted faces -thought to depend on a serial part-based analysis should be relatively unaffected by aperture viewing.

Participants
In all experiments, observers were healthy adults with normal or corrected-to-normal vision. Participants were screened for developmental prosopagnosia using the PI20 self-report instrument (Shah, Gaule, Sowden, Bird, & Cook, 2015). Sample size was determined a priori, informed by previous psychophysical investigations of face processing (e.g., Cook, Aichelburg, & Johnston, 2015;Cook, Brewer, Shah, & Bird, 2014;Shah, Bird, & Cook, 2016). Participants whose response function could not be modelled using a cumulative Gaussian were replaced and their data not subject to further analysis. Ethical clearance was granted by the local ethics committee and the study was conducted in line with the ethical guidelines laid down in the 6th (2008) Declaration of Helsinki. All participants gave informed consent and were fully debriefed upon task completion.

Stimuli
Video stimuli presented a single facial image drawn from morph continua ( Fig. 1), either upright or inverted. Continua consisted of seven levels that varied attribute strength from 20% to 80% in increments of 10%. Images were morphed using Morpheus Photo Morpher Version 3.11 (Morpheus Software, Indianapolis, IN). Bitmap image sequences were compiled in Matlab (The Math-Works, Natick, MA) and saved as .avi files. Video stimuli were presented at 30 frames-per-second on a LCD display (60 Hz refresh rate).
The identity continuum was created by morphing images of George W. Bush and Bill Clinton (sourced online). The age continuum was created by morphing a composite of eight male child faces with a similar composite of eight adult faces. The gender continuum was created by morphing a composite of eight male faces with a similar composite of eight female faces. The original Fig. 1. Video stimuli presented a single facial image drawn from morph continua, either upright or inverted. The continua consisted of seven levels that varied attribute strength from 20% to 80% in increments of 10%.
images for the age and gender morphs were sourced from the Radboud Face Database (Langner et al., 2010). The smile-sincerity (Ipser & Cook, 2016), as well as the Surprise-Fear and Disgust-Sadness continua (Biotti & Cook, 2016), have been described elsewhere.

Procedure
In the aperture condition, a viewing window moved over the image (Fig. 2a). The aperture was 12.5% the height of the face ($6.5°of visual arc when viewed at 58 cm) and took 6 s to move across the face. In the whole-face condition, faces were presented in their entirety for 1000 ms (Fig. 2b). On each trial participants were presented with a single video stimulus and were required to make a binary categorization judgment about the face presented (e.g., ''Bush or Clinton?" when judging identity). Following stimulus offset, a response screen was visible until a response was registered. The procedure comprised 448 experimental trials in total (4 conditions Â 16 presentations Â 7 stimulus levels). Trials for the four conditions (upright whole-face, inverted whole-face, upright aperture, inverted aperture) were randomly interleaved within four mini-blocks of 112 trials. All experiments were pro-grammed in MATLAB using the Psychophysics Toolbox (Brainard, 1997;Pelli, 1997).

Analysis
Participants' responses were analyzed by modelling psychometric functions (Fig. 2c), using the Palamedes toolbox (Prins & Kingdom, 2009). For each participant, separate cumulative Gaussian functions were fitted for each of the four conditions. For the purposes of the current study the key parameter of interest is decision noise; a measure of the precision with which stimuli are categorized. Noise estimates were inferred directly from the slope of the psychometric function, defined as the standard deviation of the symmetric Gaussian distribution underlying each cumulative Gaussian function. Lower noise (steep slopes) estimates indicate that observers can perceive subtle differences in stimulus strength and vary their responses accordingly. Greater noise estimates (shallow slopes) reveal that participants' responses are relatively invariant to changes in stimulus strength. Estimates of decision noise were analyzed using ANOVA with Stimulus Orientation (upright, inverted) and Viewing Condition (aperture, whole-face) as within-subjects factors. In the whole-face condition, faces were presented in their entirety for 1 s. (c) Participants were required to make binary categorization judgments about facial stimuli (e.g., 'male' or 'female'?). Responses were analyzed by fitting psychometric functions. The slope of the function is an estimate of decision noise; steep and shallow slopes are associated with low and high noise estimates, respectively.

Results
In our first set of experiments we examined observers' judgments of stimuli drawn from morph continua blending facial identity, age, and gender incrementally. These attributions are thought to depend on the encoding of face structure; a semi-permanent, durable source of facial variation that changes slowly over time (Bruce & Young, 1986;Haxby, Hoffman, & Gobbini, 2000). In our second set of experiments, we examined observers' judgments of transient facial expressions. The visual analysis of facial expression may partially dissociate from the analysis of semi-permanent features such as identity (Biotti & Cook, 2016;Bruce & Young, 1986;Calder & Young, 2005;Duchaine, Parker, & Nakayama, 2003;Haxby et al., 2000) and is thought to depend on information sampled from different local regions (Gosselin & Schyns, 2001). Observers categorized stimuli drawn from morph continua blending expressions of fear and surprise, disgust and sadness, sincere and insincere happiness, respectively. Contrary to the predictions of the holistic account, we found no evidence of disproportionate aperture effects when observers judged upright faces. In contrast, however, two striking findings replicated consistently in all experiments: (i) substantial inversion effects when faces were viewed through the aperture, and (ii) substantial aperture effects when viewing inverted faces. Mean decision noise estimates obtained in the four viewing conditions are shown in Fig. 3.

High-contrast occlusion
In the foregoing experiments we found no evidence of disproportionate aperture effects for upright faces. One possibility is that compensatory processes obscured the detrimental effects of aperture viewing in the upright condition. In the aperture conditions, the occluded facial regions appeared black (pixel values were set to zero). When faces were presented in the upright aperture condition, it is possible that participants were able to retain previously exposed face parts in visual working memory, facilitating better performance. We examined this possibility in a further set of experiments. Observers judged structural attributes (identity, age, gender) using an identical procedure to that employed previously. Crucially, however, occluded regions were replaced with a mask constructed from high-contrast greyscale ellipses (Fig. 4a). If visual working memory augmented perceptual decisions in previous upright aperture conditions, high-contrast occlusion ought to increase the detrimental effects of aperture viewing. Contrary to this prediction, however, the addition of the highcontrast occluder had no material effect on the pattern of results (Fig. 4b).

Reversing the aperture direction
In the aperture conditions described above, the viewing window always moved downwards -from the top of the display to the bottom -irrespective of the orientation of the face. In the upright aperture condition, observers therefore sampled the eyeregion before the nose, and then the mouth. Conversely, in the inverted aperture condition, observers first sampled the mouth, then the nose, and finally the eyes. Certain features of our results, notably the high levels of decision noise seen in the inverted aperture conditions, may therefore reflect the order in which perceptual evidence was accumulated. To determine whether the results obtained above are sensitive to the direction of aperture motion, we compared observers' judgments of structural attributes in the whole-face condition with a novel aperture condition, in which the viewing window moved upwards -from the bottom of the display to the top (Fig. 5a). Once again, however, this modification failed to change the pattern of results observed (Fig. 5b).
Identity classification. Sixteen observers (6 males, M age = 28.44, SD age = 7.70) completed the identity judgment task, responding 'Bush' or 'Clinton'. A main effect of Viewing Condition was

Discussion
The present study used a novel dynamic aperture paradigm to compare the visual processing of upright and inverted faces. We examined observers' ability to categorize upright and inverted face stimuli viewed in their entirety, or viewed through an aperture that revealed the image incrementally.

Aperture judgments are impaired by face inversion
As expected, we found that judgments of faces presented in their entirety were impaired by orientation inversion; decision noise was greater when judging upside-down faces, than upright faces. Holistic processing accounts argue that this decrement is caused by a switch from whole-face processing, to piecemeal, parts-based processing (Farah et al., 1998;Maurer et al., 2002;McKone & Yovel, 2009;Rossion, 2008). When viewing faces through apertures, however, observers were also markedly better when the occluded face was upright, than when it was inverted. Strikingly, the magnitude of the inversion effect seen in the aperture conditions was comparable to, and in some case larger than, those seen in the whole face condition. Insofar as aperture viewing forces observers to process visual information in a serial, partsbased manner, this finding is hard to reconcile with holistic processing accounts; one would expect smaller inversion effects where observers' ability to process upright faces holistically is restricted.
One possibility is that the mechanisms for encoding local regions are orientation sensitive, i.e., they are tuned to upright faces. If orientation inversion impairs the description of local regions, detrimental effects would be expected in both the whole-face and aperture viewing conditions. It has been argued that judgments about isolated features (i.e., shown in the absence of a facial context) are insensitive to orientation inversion (McKone & Yovel, 2009). However, some authors have reported that substantial inversion effects can be obtained with cropped features (Rakover & Teucher, 1997). Interestingly, when judging interocular distance, observers not only exhibit substantial inversion effects when the eye-region is presented in isolation, but the magnitude of the effect does not increase when the occluded areas of the face are made visible (Leder et al., 2001). Consistent with the present findings, these results suggest that orientation inversion may impair the perception of local regions, as well as whole-face processing. When faces are shown upside-down, noisy local descriptions may not only be less informative, but may also be difficult to combine in an optimal manner, contributing to greater decision noise (Gold et al., 2012).
A second possibility is that our aperture window permitted some holistic processing in the upright condition. In its strongest form, holistic processing theory argues that parallel whole-face processing is gated, engaged only in the presence of an intact whole face (e.g., Farah et al., 1998;McKone & Yovel, 2009;Tanaka & Farah, 1993;Tsao & Livingstone, 2008). Our aperture condition would be expected to block this type of processing completely. However, some authors have suggested that local regions viewed in isolation may still recruit some form of orientationspecific holistic (or 'configural') processing that improves observers' sensitivity to inter-feature spatial relations (e.g., Leder & Bruce, 2000;Leder et al., 2001;Rossion, 2008). It is possible that the aperture window used here enabled observers to see enough of the face -for example, both eyes could be viewed simultaneously -to utilize some residual configural processing when faces are presented upright. Viewing windows of smaller sizes or different shapes may block this residual configural ability (e.g., Van Belle et al., 2010). Understanding why different types of aperture window differentially affect the orientation sensitivity of observers' judgments may prove significant for theories of holistic face processing.
Recently it has been suggested that the mechanisms for feature encoding may exhibit retinotopic tuning; the perception of eyes and mouths may be better when presented in the upper and lower visual field, respectively (de Haas et al., 2016). According to this account, perceptual decrements associated with orientation inversion may reflect the presentation of mouths and eyes in the upper and lower visual field, respectively. While this interesting possibility warrants further investigation, it seems unlikely to account for the inversion effect seen in our aperture conditions. Because observers were required to fixate a portion of the image subtending less than 1°as it shifted across the image, exposed regions were likely foveated during each aperture trial.

Comparable whole-face advantage for upright and inverted faces
Consistent with the view that upright faces benefit from a whole-face advantage (Farah et al., 1998;Maurer et al., 2002;McKone & Yovel, 2009;Rossion, 2008), observers exhibited less decision noise when they viewed upright faces in their entirety, than when viewed through the aperture. While the aperture manipulation blocks whole-face processing, local feature information is preserved. These results therefore provide clear, direct evidence for a whole-face advantage; the ability to process facial information in parallel improves perceptual decisions, relative to serial analysis. This was the case when observers judged a variety of structural (identity, gender, age) and transient (facial affect, smile sincerity) attributes, suggesting that whole-face presentation aids the encoding of face structure at a relatively early stage of face processing (Bruce & Young, 1986;Calder & Young, 2005;Duchaine & Yovel, 2015;Haxby et al., 2000).
Strikingly, aperture viewing also had large detrimental effects on the perception of inverted faces equal to, or greater than, those seen for upright faces. The fact that occluding contextual information in the inverted aperture condition resulted in further perceptual decrements contradicts the view that the perception of inverted faces depends on a serial piecemeal analysis of local features. Instead, this finding indicates that both upright and inverted face perception exhibit a whole-face advantage. This suggestion accords with current thinking about the role of context in human vision. It has been argued that stimulus ambiguity and contextual influence are closely linked; where the physical attributes of a target stimulus do not clearly distinguish between different perceptual hypotheses, contextual information may be used to resolve the ambiguity (Bar, 2004;Friston, 2005;Gregory, 1997;Lawson, Rees, & Friston, 2014). Where orientation inversion disrupts the encoding of local features, residual ability to use information from surrounding regions may be crucial. For example, the wider context contains valuable cues to lighting conditions and face shape -perceptual evidence that can help confirm or reject perceptual hypotheses.
The observation of a whole-face advantage for inverted face processing appears inconsistent with the orientation sensitivity of the composite face illusion Rossion, 2013;Young et al., 1987). In the composite illusion, the top half of one face appears to fuse perceptually with the bottom half of another, when the two halves are aligned and presented upright (Hole, 1994;Young et al., 1987). However, this compelling perceptual bias is greatly reduced when arrangements are shown upsidedown, suggestive of weaker integration (McKone et al., 2013;Susilo et al., 2013). How might this feature of the composite face effect be reconciled with our finding of a whole-face advantage for inverted face processing? Crucially, context may still be informative even where it fails to induce demonstrable illusory distortion. One possibility is that the scale of the integration window may be reduced by inversion; residual integration processes may be sufficient to combine information from proximal features of inverted faces, but not to bind distal regions (see also Perceptual Field Hypothesis; Rossion, 2009). Alternatively, perceptual predictions may be weaker, or integration processes slower and less automatic (see Richler, Wong et al., 2011).
Our finding of comparable aperture effects in both orientations accords with previous evidence that the visual processing of upright and inverted faces differs quantitatively, not qualitatively. Should upright and inverted faces engage qualitatively different modes of processing, one might expect a sudden 'jump' in perceptual ability as faces approach their canonical orientation. However, recognition ability varies linearly as a function of orientation rotation (Valentine & Bruce, 1988). Observers also base their judgments of upright and inverted faces on information sampled from similar regions, notably from around the eyes (Sekuler et al., 2004). Composite face effects can be observed for inverted faces when larger sample sizes are utilized (Susilo et al., 2013), or longer presentation times allowed (Richler, Mack, Palmeri, & Gauthier, 2011). At the neural level, upright and inverted faces both elicit strong signal changes in occipital and fusiform face areas (Yovel & Kanwisher, 2005). Similarly, the application of disruptive transcranial magnetic stimulation to the occipital face area impairs judgments of both upright and inverted faces (Pitcher, Duchaine, Walsh, Yovel, & Kanwisher, 2011).

Conclusion
In summary, when viewing faces through a dynamic aperture, observers continue to show striking inversion effects, suggesting that perceptual decrements induced by inversion may reflect problems encoding local regions, not the loss of whole-face processing. Furthermore, we find evidence of a comparable whole-face advantage for inverted faces, contrary to the view that holistic processing is recruited by upright faces only. It remains to be seen how access to the wider face context modulates the processing of upright and inverted faces. Nevertheless, it is clear that the contribution of whole-face processing to inverted face perception has been underestimated.

Author contributions
JM and RC contributed equally to the design of all experiments and drafted the manuscript for publication. RC constructed the stimuli and wrote the experimental program. JM collected and analyzed the data.

Conflict of interest
We have no competing interests.