The importance of conceptual knowledge when becoming familiar with faces during naturalistic viewing

Although the ability to recognise familiar faces is a critical part of everyday life


Recognition
Natural viewing a b s t r a c t Although the ability to recognise familiar faces is a critical part of everyday life, the process by which a face becomes familiar in the real world is not fully understood.Previous research has focussed on the importance of perceptual experience.However, in natural viewing, perceptual experience with faces is accompanied by increased knowledge about the person and the context in which they are encountered.Although conceptual information is known to be crucial for the formation of new episodic memories, it requires a period of consolidation.It is unclear, however, whether a similar process occurs when we learn new faces.Using a natural viewing paradigm, we investigated how the context in which events are presented influences our understanding of those events and whether, after a period of consolidation, this has a subsequent effect on face recognition.The context was manipulated by presenting events in 1) the original sequence, or 2) a scrambled sequence.Although this manipulation was predicted to have a significant effect on conceptual understanding of events, it had no effect on overall visual experience with the faces.Our prediction was that this contextual manipulation would affect face recognition after the information has been consolidated into memory.We found that understanding of the narrative was greater for participants who viewed the movie in the original sequence compared to those that viewed the movie in a scrambled order.To determine if the context in which the movie was viewed had an effect on face recognition, we compared recognition in the original and scrambled condition.We found an overall effect of conceptual knowledge on face recognition.That is, participants who viewed the original sequence had higher face recognition compared to participants who viewed the scrambled sequence.However, our planned comparisons did not reveal a greater effect of conceptual knowledge on face recognition after consolidation.In an exploratory analysis, we found that overlap in conceptual knowledge between participants was significantly correlated with the overlap in face recognition.We also found that this relationship was greater after a period of

Introduction
Recognising the identity of a familiar face is a straightforward process for most human observers if we are familiar with the person.However, the computational challenge of face recognition becomes apparent when we attempt to recognize people who are less familiar.While familiar face recognition is highly accurate across substantial changes in the image (Bruce, 1982;Burton et al., 1999;Mike Burton, 2013), unfamiliar face recognition breaks down under small changes in viewing conditions (Bruce & Young, 1986;Burton et al., 2011;Kramer et al., 2018;Young & Burton, 2017).Cognitive models of face perception suggest that we become familiar with a face by generating image-invariant representations (Bruce & Young, 1986;Young & Burton, 2017).During familiarisation, the representation of a face must transition from an image-based representation based on specific encounters into an invariant representation that can be used to recognize the face across different visual environments.
The successful generation of image-invariant representations is thought to depend on perceptual experience whereby different encounters with a face are integrated to create an invariant representation of a facial identity (Burton et al., 2011;Kramer et al., 2018).Support for this idea comes from studies that show more visual exposure leads to better recognition of faces (Memon et al., 2003;Roark et al., 2006).A key feature of the familiarisation process appears to be the exposure to the variety of encounters with a person (Juncu et al., 2020;Ritchie & Burton, 2017).For example, averaged faces made from many different exemplars from the same person are recognised more accurately than faces made from fewer exemplars (Burton et al., 2005).These findings provide clear evidence for the importance of visual experience, particularly withinperson variability, in becoming familiar with a face.
However, increased perceptual experience is also accompanied by an increase in information about a person (i.e. who they are, what they do, what they are like, where we usually see them) that is distinct from the visual properties of the face.A range of evidence suggests that this conceptual information may also play an important role in the generation of invariant representations necessary for familiar face recognition.For example, it has been shown that faces are difficult to recognize in contexts that are different to those in which they are typically encountered (Thomson, 1986;Young et al., 1985), whereas providing the context in which a face was learnt has been shown to improve face recognition (Hanczakowski et al., 2015;McCrackin et al., 2021;Reder et al., 2013;Schwartz & Yovel, 2016).
Despite these advances in understanding familiar face recognition, typical paradigms involve viewing a limited number of static images that are associated with arbitrary conceptual knowledge about the person, such as a name or occupation.So, it remains unclear how the recognition of faces unfolds in more naturalistic viewing conditions and over longer time periods.A recent study addressed this issue by measuring face recognition of actors in the TV series Game of Thrones (Devue et al., 2019).They found that the faces of the lead actors were recognized better than other actors and that recognition performance was generally better for faces viewed more recently.Although better recognition could reflect increased perceptual experience, it could also reflect increased knowledge about the person.
In natural viewing, we make new memories by integrating information in events or episodes that include what happened, who was present and where and when it happened (Tulving, 2002).A process of consolidation is then necessary if these episodes are to be integrated into longer term memory, which involves the binding, reactivating, and strengthening connections between the hippocampus and distributed neocortical representations (Nadel & Moscovitch, 1997;Squire & Zola-Morgan, 1991;Yonelinas et al., 2019).Interestingly, this process of acquiring new memories is enhanced when new information is acquired in a coherent context (Lewis & Durrant, 2011;Van Kesteren et al., 2010).Studies of word learning, for example, show that the successful consolidation of information increases when the words are associated with meaning (Davis & Gaskell, 2009;Henderson et al., 2015;Williams & Horst, 2014).However, it is not clear whether similar processes are evident in face learning (Bird & Burgess, 2008;Mattarozzi et al., 2019;Olsen et al., 2015).If this is the case, our prediction is that learning faces in a coherent context should lead to more stable recognition over a longer time period compared to when faces are learnt in the absence of a coherent context.
We used a natural viewing paradigm to explore the effects of perceptual and conceptual information on the recognition of faces.Participants who were unfamiliar with the TV series Life on Mars, viewed excerpts from the series in one of the following conditions: 1) Original sequence or 2) Scrambled sequence.A key feature of the design is that the overall visual input is the same for all the conditions.However, scrambling the sequence will dramatically affect the ability to understand the context or narrative (Van Kesteren et al., 2010;Zacks et al., 2007).We then assessed whether conceptual knowledge has an effect on the recognition of faces.If face recognition is dependent only on visual information, we predicted that there should be no difference between any of the conditions.However, if conceptual knowledge is important, the recognition of faces will be greater in the Original condition.We tested face recognition immediately after watching the movie (short-term) and then 4 weeks later (long-term).
Our preregistered analyses assessed 4 specific hypotheses (each has been assessed in a pilot study).Hypothesis 1: Manipulating the order of the events in the movie will affect c o r t e x 1 7 7 ( 2 0 2 4 ) 2 9 0 e3 0 1 understanding of the narrative or context.Our prediction was that there will be a greater understanding of the narrative of the stimulus when it is shown in the original sequence compared to a scrambled order.Hypothesis 2: The recognition of faces after a delay will depend on the context in which they were originally presented.Our prediction was that the reduction in face recognition following a delay will be smaller in the Original condition compared to the Scrambled condition, because the greater conceptual information in the Original condition will help consolidate the faces in memory.Hypothesis 3: Recognition of faces images will be greater if they are consistent with the appearance at encoding.Our prediction was that face images that are visually similar to the faces at encoding will be recognised to a greater extent compared to images that are not consistent with the appearance at encoding.Hypothesis 4: The effect of context on the recognition of faces, after consolidation, will be greater if the images are consistent with the appearance at encoding.Our prediction was that there will be a bigger difference in recognition scores for In Show compared to Out of Show images for the Original condition compared to the Scrambled condition at the delayed time point.

Methods
The accepted Stage 1 manuscript of this Registered Report was registered on the Open Science Framework (OSF) and can be found at: https://osf.io/8wp6f.We report how we determined our sample size, all data exclusions, all inclusion/exclusion criteria, whether inclusion/exclusion criteria were established prior to data analysis, all manipulations, and all measures in the study.

Participants
200 participants (176 female, 9 non-binary, age M ¼ 19.24 years, SD ¼ .86years) were recruited who were native Englishspeaking and were unfamiliar with the TV show Life on Mars.All participants had either normal or corrected-to-normal vision (by self-report) and performed the Cambridge Face Memory Test (Duchaine & Nakayama, 2006) to determine that their face perception was within a normal range (>65%, i.e., not less than 2 SD from the mean).Participants were compensated with an Amazon voucher or course credit for their time.The study conformed with all relevant ethical regulations at the University of York and was approved by the University of York Department of Psychology Ethics Committee.Informed consent was obtained from all participants.

Sampling plan
We conducted a sensitivity analysis (see Fig. 1) for a one-sided independent t-test with a power of .9 and alpha level of .02.This showed a rapid initial decrease in the minimum effect size that could be detected, with improvements being relatively marginal beyond around 100 participants per group for our smallest theoretically important effect size (Hypothesis 2: see orange dashed line in Fig. 1).We chose this as our sample size, as it allowed us to detect effect sizes of a similar magnitude to that found in our pilot work (Supplementary Data) and also kept the experiment feasible from a practical perspective.This is a 'medium' effect size (see Cohen, 1988), and we consider that effect sizes smaller than this are unlikely to have practical relevance for everyday face recognition performance, so it also constitutes the smallest effect size of interest for this work.

Design
The whole experimental design was 2 Â 2 Â 2 with Condition (Original, Scrambled), Image type (In Show, Out of Show) and Timepoint (Immediate, Delayed) as the factors.Condition was a between-subjects factor.Image type and Timepoint were within-subjects factors.

Stimuli
Two 20-min (1170s) movies constructed from audio-visual clips from the first episode of BBC TV series Life on Mars were used as stimuli.Timings are based on previous studies using experimentally familiarised faces (Hahn et al., 2016;Hahn & O'Toole, 2017) and on our pilot study.A key aspect of the design is that each movie contains the same visual input.
The first movie contains the clips in the original order (Original), so that the narrative is coherent.The second movie contains the same clips in a randomised order (Scrambled).An illustration of the different movie stimuli is shown in Fig. 2. A total of 14 clips are used in the stimuli, with a mean length of 84s (range 39se228s).The clips are assigned a random order for the Scrambled condition, with longer clips cut into shorter segments (mean clip length 39s).10 unique characters are present in the clips with varying screen time (34e1170s).
Participants were instructed to fully watch and attend to the movie before completing any of the other tasks.
For the face recognition memory task, we used images from the 10 main actors from the episode of Life on Mars.Static images were taken directly from the TV series and are referred to as "In Show" images.However, these were not images that were seen in the movie.This is critical to avoid confounding Fig. 1 e Sensitivity analysis showing the detectable effect size for a one-sided independent t-test with a power of .9 and alpha level of .02.The dashed lines represent the effect sizes found in the pilot data for each hypothesis.
face recognition with the visual memory of a specific image (Bruce & Young, 1986;Young & Burton, 2017).Each actor also had another image from outside of the Life on Mars TV series, which are referred to as "Out of Show" images.Critically, Out of Show images contain greater within-person variability, with significant changes in physical appearance (Burton et al., 2011;Kramer et al., 2018).Previous research has shown that the amount of within-person variability affects subsequent recognition (Juncu et al., 2020;Ritchie & Burton, 2017).For each In Show or Out of Show face for each face memory test, two foils of different identities were selected that matched the targets in terms of age, expression, hairstyle, lighting, and general appearance (Colloff et al., 2021).19 target images (Out of Show image not available for one actor) and 40 foils were used in each face recognition memory test.Images were cropped to include the head.Example target and foil images are shown in Fig. 3. Different target and foil images were used at each test phase to avoid practice effects.So, a total of 30 In Show images and 29 Out of Show images were shown at the immediate timepoint, and a new set of 30 In Show images and 29 Out of Show images were shown at the delayed timepoint.The comparison between In Show and Out of Show face images is important to determine whether the effect of context on face memory is specific to the visual context in which the images were originally shown (Thomson, 1986;Young et al., 1985).

Procedure
Participants were sent a link to a secure website hosting the online experiment.Participants were prevented from running the experiment on mobile devices.An information sheet was included with a description of the study, the data that would be collected and how it would be stored, and informed consent was given.Participants were randomly allocated to one of the 2 conditions: 1) Original condition, where movie clips are viewed in order, or 2) Scrambled condition, where movie clips are viewed in a random order.The visual exposure is the same in both conditions, but the order of presentation is different across conditions.
After being allocated to a condition, participants commenced with the study.During the study phase, participants were asked to watch and fully attend to the movie stimulus.Immediately after the study phase, participants were tested on their conceptual understanding of the movie clips.They first completed a free recall test, where they were asked to provide a written description of the plot of the movie using as many details as possible.Participants then completed a face recognition memory task, with faces presented individually in a random order.In this test, participants pressed a button to indicate if the identity of the face corresponded to any of the actors in the movie.Stimuli remained on screen until the participant made a response.Finally, participants completed a second contextual understanding test (structured question test), containing a series of 8 questions about specific events in the movie accompanied by a static image of the relevant event in the movie.Task performance on the contextual understanding tasks was graded by two raters using a predefined marking scheme.Agreement between raters for the contextual understanding tests was calculated using intra-class correlation coefficient (ICC) with a two-way mixed model and Agreement definition.
A unique participant identifier was provided by email for participants to complete the face recognition memory task again at 4-weeks after the study phase.A link to the face recognition memory task was sent at 4-weeks for the participant to access the experiment at the final time-point.For the 4-week time-point the experiment had to be completed within 48 h of the link being sent.Following completion of the study, a debriefing sheet detailing the aims of the experiment was provided as well as full payment or course credit.

Data analysis
See Supplementary Table 1 for our study design table with a full list of hypotheses.
Hypothesis 1. Manipulating the order of the events in the movie will affect understanding of the narrative or context Task performance on the conceptual understanding tests was graded by two raters using a predefined marking scheme.Raters (who were blind to the condition) marked the free recall test relative to 10 key events that occurred during the movie.Raters assigned a mark of 0, 1 or 2 for each point dependent on whether the text showed no, partial or a full description of each event, for a possible total of 20 marks.The 8 questions on the structured question test were also marked by raters on a scale from 0 to 2, based on whether they show no, partial or a full understanding for a possible total of 16 marks.The analysis was based on the average scores across raters.Inter-rater reliability was assessed for both the free recall and structured question test aggregated across questions using intra-class correlation coefficients (ICC) in a two-way mixed model with agreement definition.ICC greater than .75indicates good reliability between raters.While this value does not need to be achieved for the experiment to be deemed capable of testing the key hypotheses, an ICC greater than .75validates the marking scheme as effective in consistently assessing the narrative score.The pilot data indicates that reliability should be higher than .75.
To assess whether the movie manipulation leads to differences in conceptual understanding (Hypothesis 1), the free recall scores (Hypothesis 1.1) and structured question scores (Hypothesis 1.2) for each condition were entered into a onetailed independent groups t-test, with an alpha criterion of .02for determining significance.Support for Hypothesis 1.1 and 1.2 will be indicated by a significant effect, with lower scores for the Scrambled condition compared to the Original condition.Successful manipulation of movie context understanding will be shown if both Hypothesis 1.1 and 1.2 are confirmed.
Hypothesis 2. The recognition of faces after a delay will depend on the context in which they were originally presented Performance on the face recognition memory test was calculated using the mean sensitivity (d 0 ) for discriminating between faces of individuals present in the movie and faces of foils who were not present in the movie.d 0 was calculated based on hit rates (i.e.correct recognition of the face as present in the movie) and false alarm rates (i.e.incorrectly responding that foil was present in the movie) for each participant.d 0 was calculated using the following equation: where z(H) and z(FA) are the z transforms of the hit rate (number of hits/number of targets) and false alarm (number of false alarms/number of foils), respectively.Ceiling hit rates or false alarm rates (i.e.hit ¼ 1) were replaced with .999and floor hit rates or false alarm rates (i.e.false alarm ¼ 0) were replaced with .001 to avoid d 0 infinity.d 0 was calculated separately for each face recognition memory test time point (0 h, 4-weeks) and separately for In Show face images and Out of Show face images.
To determine if contextual understanding has a role in recognition of faces after a delay in stimulus encoding (Hypothesis 2), the difference between the immediate and delayed (immediate e delayed) face recognition score (d 0 ) for each condition (Original and Scrambled) will be calculated separately and then compared using a one-tailed independent groups t-test for the In Show images.Support for Hypothesis 2 will be shown if the difference scores are lower in the Original condition compared to the Scrambled condition at p < .02.
Hypothesis 3. Recognition of face images will be greater if they are consistent with the appearance at encoding The average score (d 0 ) was combined across timepoints for the In Show and Out of Show images in the Original condition.To determine whether the appearance of the images at encoding is important for subsequent recognition, a one-tailed independent groups t-test was performed on the difference between In Show and Out of Show face recognition.Support for Hypothesis 3 will be indicated by a greater face recognition score for In Show images than Out of Show images at p < .02.
Hypothesis 4. The effect of context on the recognition of faces, after consolidation, will be greater if the images are consistent with the appearance at encoding To investigate whether the role of contextual understanding on face recognition after consolidation is influenced by the appearance of the faces at encoding, the difference c o r t e x 1 7 7 ( 2 0 2 4 ) 2 9 0 e3 0 1 between the recognition for In Show and Out of Show images (In Show e Out of Show) was calculated for each condition (Original, Scrambled) and compared using a one-tailed independent groups t-test at the delayed time point.Support for Hypothesis 4 will be indicated by a bigger difference in face recognition between In Show images compared to Out of Show images for the Original condition at p < .02.

Exclusion criteria
Participants who did not complete the face recognition test at all time points were excluded from all analyses; participants who did complete the delayed face recognition test but not within the specified time slot were also excluded from analysis.Participants who did not complete both the free recall and structured narrative questions were excluded from analysis.Participants were asked at each time point if they have seen the TV show Life on Mars; participants who had seen the show at any point were excluded from all analyses.
Participants were screened for familiarity with other popular shows characters have been in, such as Ashes to Ashes (2008) which shares characters and actors with Life on Mars.They were also excluded if they had seen the TV show Spooks (2002), as foil images were gathered from this show.

Reliability analysis
Task performance was graded by two raters using a predefined marking scheme.Agreement between raters was calculated using intra-class correlation coefficient (ICC) with a two-way mixed model and Agreement definition.Excellent agreement was found between raters in the free recall test with an ICC of .90 and 95% confidence intervals of .87e.92 (F(201,201) ¼ 18, p < .001)and in the structured question test with an ICC of .88 and 95% confidence intervals of .84e.90 (F(201,201) ¼ 15, p < .001).For further analyses, the average of the free recall and structured questions from the two raters' scores was used to create a Total score.

Exploratory analysis
To investigate how conceptual understanding influences face recognition, we compared individual performance on narrative and face recognition tasks.We asked whether overlap in the content of the free-recall test was correlated with overlap in the face recognition performance across all pairs of participants.Overlap in the free recall test was assessed using Latent Semantic Analysis (LSA).Latent Semantic Analysis (LSA) is a technique in natural language processing and information retrieval that helps to uncover the underlying structure in a collection of text by analysing the relationships between the words (Landauer et al., 1998).LSA uncovers relationships between text datasets by mapping words and documents into a continuous semantic space.In this space, similar words and documents are positioned closer together, reflecting their underlying semantic relationships.In this study, we have compared the free-recall text summary of the narrative between different pairs of participants.The similarity between texts that is measured using LSA is taken as the overlap in semantic (or conceptual) understanding about the movie they have watched.The logic underlying this analysis is that participants may have picked up on different pieces of conceptual information from the movie and this may influence subsequent face recognition.This analysis will provide a measure of the overlap in conceptual knowledge between participants.Overlap in face recognition was calculated by taking the total number of items that were accurately reported in both participants.Significance was assessed using a permutation test, where the rows and columns of the LSA matrix were randomly shuffled and then correlated with the face recognition overlap matrix.This shuffling was repeated 100,000 times to create a null distribution for the null hypothesis that there is no relationship between narrative overlap and face recognition overlap.A Bonferroni-Holm correction (Holm, 1979) was also then applied to correct for familywise error.We were not aware of this approach at the time of pre-registration.However, it provides an alternative approach to explore how overlap in conceptual knowledge might be correlated with overlap in face recognition.

Hypothesis 1: Manipulating the order of the events in the movie will affect understanding of the narrative or context
To determine whether manipulating the order of events in the movie had an impact on conceptual knowledge, narrative scores were compared between the Original and Scrambled conditions.Free recall scores (t(197.6)¼ 17.23, p < .001,d ¼ 2.436) and structured question scores (t(190.64)¼ 9.37, p < .001,d ¼ 1.325) were significantly higher for the Original condition compared to the Scrambled condition (Fig. 4), Fig. 4 e Performance on the narrative understanding tasks for the Original and Scrambled conditions.Performance on the (left) free-recall and (right) structured question narrative tests was significantly greater for the Original compared to the Scrambled group, supporting Hypothesis 1.1 and 1.2.This shows that the conceptual understanding of the narrative was better when events were presented in the original sequence.
confirming Hypotheses 1.1 and 1.2 that conceptual understanding would be greater when viewing clips in their original order.This shows that the manipulation was successful in affecting the conceptual understanding of the movie.

3.2.
Hypothesis 2: The recognition of faces after a delay will depend on the context in which they were originally presented Next, we asked whether recognition of faces depends on the context in which they were presented by comparing performance in the Original and Scrambled conditions.We calculated the difference in face recognition scores between the immediate and delayed timepoints for the In Show faces.This difference (immediate d' e delayed d') was then compared for participants in the Original and Scrambled groups (Fig. 5,left).A smaller difference between immediate and delayed recognition scores would show less forgetting of faces.However, the Original and Scrambled groups were not significantly different after a delay (t(196.8)¼ 1.18, p ¼ .881,d ¼ .168).This does not support Hypothesis 2 that conceptual knowledge has a greater effect on the recognition of faces after a delay.

3.3.
Hypothesis 3: Recognition of face images will be greater if they are consistent with the appearance at encoding Next, we determine whether face recognition would be greater if faces were consistent with the appearance at encoding.We collapsed face recognition scores in the Original condition across the immediate and delayed timepoints for the In Show images and then for the Out of Show images (Fig. 5, middle).We found significantly greater recognition for In Show compared to Out of Show images (t(191.6)¼ 7.30, p < .001,d ¼ 1.03), confirming Hypothesis 3 that face images that are taken from a similar context to those at encoding would be recognised better than face images that are taken from other contexts.

3.4.
Hypothesis 4: The effect of context on the recognition of faces, after consolidation, will be greater if the images are consistent with the appearance at encoding We then asked whether the role of contextual understanding on face recognition after consolidation was influenced by the appearance of the faces at encoding.The difference in recognition scores for the In Show and Out of Show images for the Original and Scrambled condition were calculated at the delayed timepoint (In Show d' e Out of Show d').A greater difference in these scores would reflect greater recognition of faces with a similar visual appearance at encoding after consolidation.However, the difference between In Show e Out of Show scores was not significantly different between the Original and Scrambled conditions at the delayed timepoint (t(195.7)¼ À1.19, p ¼ .118,d ¼ À.168).This does not support Hypothesis 4 (Fig. 5, right).

Exploratory analyses
While our pre-registered analyses focus on the importance of consolidation, they do not consider an overall role of conceptual knowledge on face recognition.To determine whether there was an overall effect of conceptual knowledge on face recognition, a mixed ANOVA was run on Condition (Original, Scrambled), Image (In Show, Out of Show) and Timepoint (immediate, delayed).There was a significant main effect of Condition on face recognition (F(1,198) ¼ 6.45, p ¼ .015,h 2 ¼ .032),with face recognition being greater in the Original compared to the Scrambled condition (Fig. 6).
We report all main effects and interactions from the omnibus ANOVA.There was a significant main effect of timepoint (F(1,198) ¼ 23.14, p < .001,h 2 ¼ .105),with faces recognised better at the immediate timepoint.There was also a significant main effect of Image (F(1,198) ¼ 184.02, p < .001,h 2 ¼ .482),with In Show images being recognised greater than Out of Show images.This is consistent with our support for Hypothesis 3 (see Fig. 6).There was no significant interaction between Condition Â Timepoint Â Image (F(1,198) ¼ 2.35, p ¼ .127,h 2 ¼ .012).This is consistent with the absence of support for Hypothesis 2. There was a significant interaction between Image and Timepoint (F(1,198) ¼ 73.92, p < .001,h 2 ¼ .272).This reflects the smaller difference in face recognition between the In Show and Out of Show images at the delayed compared to the immediate timepoint.We found no significant interactions between Condition and Image (F(1,198) ¼ .01,p ¼ .918,h 2 < .001) or between Condition and Timepoint (F(1,198) ¼ .16,p ¼ .694,h 2 < .001).
Next, we asked whether the overlap of conceptual understanding between participants could predict overlap in face recognition and whether this effect was greater at the delayed timepoint.For each pair of participants, a similarity rating of the free-recall was calculated using a semantic similarity algorithm (LSA; Landauer et al., 1998).This generated a measure of conceptual overlap across all pairwise combinations of participants.Next, we compared this with the overlap in the faces that were accurately recognised across all pairwise combinations of participants.For each pair of participants, we calculated the total number of items that were accurately reported in both participants.We then correlated the conceptual overlap with the face recognition overlap at different timepoints for the In Show and Out of Show images (Fig. 7).
There was a significant positive correlation between the overlap in conceptual knowledge and face recognition for In Show images at the immediate (r ¼ .17,p ¼ .003,lower CI ¼ .16,upper CI ¼ .19)and delayed (r ¼ .21,p < .001,lower CI ¼ .20,upper CI ¼ .22)timepoints.For the Out of Show images, there was no significant correlation at the immediate timepoint (r ¼ .04,p ¼ .511,lower CI ¼ .02,upper CI ¼ .05),but there was a significant positive correlation at the delayed timepoint (r ¼ .18,p ¼ .002,lower CI ¼ .17,upper CI ¼ .19).Finally, we determined whether the magnitude of the correlations changed between timepoints using a one-tailed back transformed average Fisher's Z procedure (Diedenhofen & Musch, 2015;Hittner et al., 2003).We found higher correlations between the conceptual and recognition overlap at the delayed compared to the immediate timepoint for both the In Show images (z ¼ 5.8, p < .001)and the Out of Show images (z ¼ 20.0, p < .001).These findings are consistent with a greater effect of conceptual information after a period of consolidation.

Discussion
The aim of this study was to investigate the importance of nonvisual, conceptual knowledge in face recognition during naturalistic viewing.To address this question, we compared the face recognition of characters in a movie taken from the TV series Life on Mars (LoM) in participants who had not watched it before.
The key manipulation was to present the movie in either its Original sequence or in a Scrambled sequence.Our first preregistered hypothesis was that participants who viewed the Original sequence would have a better understanding of the narrative (conceptual knowledge) compared to participants who viewed the scrambled sequence.Our results confirmed this hypothesis showing that participants in the Original condition had more extensive and accurate recall of the events in both the free recall and structured question scores.Next, we asked whether this difference in conceptual knowledge would have an effect on the ability to recognise the faces.There was a significant main effect of conceptual knowledge on face recognition.That is, participants who viewed the Original sequence had overall higher face recognition than participants who viewed the Scrambled sequence.However, our pre-registered hypothesis was that there would be a bigger effect of conceptual knowledge on face recognition after memory consolidation.To test this, we compared face recognition immediately after viewing the movie and then 4 weeks later.We predicted that viewing the movie in the original sequence would lead to a stronger and more robust representation of the faces.However, our results did not support our second pre-registered hypothesis of an overall effect of conceptual knowledge on face recognition following a period of consolidation.
We next performed an exploratory analysis, to determine whether overlap in conceptual knowledge between participants could explain overlap in face recognition.To do this, There was a main effect of Image, with higher recognition of In Show compared to Out of Show images.There was also a significant main effect of condition, which reflected higher recognition in the Original compared to Scrambled conditions.Error bars represent standard error.
we compared conceptual knowledge across participants using a semantic analysis of the text from the free recall task (Landauer et al., 1998).Rather, than measure overall levels of knowledge, this method measures the overlap in semantic content between participants.This approach has previously been shown to predict similarity in neural responses between individuals (Nguyen et al., 2019).Here, we asked if the overlap in conceptual knowledge across participants predicted which faces were remembered in the recognition test.We found that the pattern of recognition of In Show faces (that were similar to those seen during encoding) was significantly correlated with the overlap in conceptual knowledge at the immediate timepoint.We then asked whether the relationship between conceptual knowledge and face recognition was greater after a delay.Interestingly, we found a greater correlation at the delayed timepoint.This suggests that consolidation may have an effect on the interaction between conceptual knowledge and face recognition.We performed the same analysis with Out of Show faces (that were visually dissimilar to those seen at encoding).We found no significant correlation at the immediate timepoint, but there was a significant correlation at the delayed timepoint.Again, this suggests that consolidation may have an effect on the interaction between conceptual knowledge and face recognition.Further research using these similarity measures may provide a useful way to probe the role of conceptual knowledge in face recognition.
Cognitive models of face perception focus on visual experience and suggest that familiarisation with a face occurs through generation of an image-invariant representation (Bruce & Young, 1986;Burton et al., 2011;Young & Burton, 2017).When an invariant visual representation of the face has been established, conceptual information about the person can be accessed.However, in natural viewing when we are becoming familiar with a person, our perceptual experience with their face is typically accompanied by an increase in conceptual information, such as their name, what they are like, memories of key events and how we feel about them.Although models do not typically include conceptual knowledge as being important for the visual recognition of faces, it has been shown that better face recognition occurs when we associate a face with a name or occupation (Schwartz & Yovel, 2016) or if we make social judgements about the faces during learning (Mattarozzi et al., 2019;Schwartz & Yovel, 2018).However, the paradigms used in these studies involve viewing a limited number of static images that are associated with arbitrary conceptual knowledge about the person.So, it remains unclear how the recognition of faces unfolds in more naturalistic viewing conditions and over longer time periods.A recent study addressed this issue by measuring face recognition of actors in the TV series Game of Thrones (Devue et al., 2019).They found that the faces of the lead actors were recognized better than other actors and that recognition performance was generally better for faces viewed more recently.Although better recognition could reflect increased conceptual knowledge, it could just reflect increased perceptual experience about the person.A key feature of our paradigm is that the visual exposure to faces is the same in both the Original and Scrambled sequences.This means that the difference in face recognition between participants in the two groups reflects differences in conceptual knowledge.
Although our results show an effect of conceptual knowledge in face recognition, visual experience with faces plays a key role in becoming familiar with faces (Devue et al., 2019;Memon et al., 2003;Roark et al., 2006), with exposure to within-person variability being particularly important for learning new faces (Andrews et al., 2015;Burton et al., 2016;Murphy et al., 2015;Ritchie & Burton, 2017).Indeed, recognition of faces in natural viewing is better when the appearance is similar to the appearance at encoding (Devue et al., 2019).Consistent with our third pre-registered hypothesis, we found that faces were better recognised when the appearance was consistent with that at encoding.For example, we found that faces taken from actors in LoM were more recognisable if they were In Show than if they were Out of Show images.This effect of appearance at encoding could reflect the relative visual similarity of the faces presented during recognition.However, it might also reflect other information in the image.For example, reinstating the context in which a face was learnt improves recognition (Hanczakowski et al., 2015;Reder et al., 2013).We were not able to find support for our final pre-registered hypothesis that the effect conceptual knowledge should be greater for images whose appearance was similar to encoding after consolidation in memory.To test this, we compared the difference in face recognition between the Original and Scrambled groups for In Show compared to Out of Show images at the delayed timepoint.We did not find that In Show images showed a greater effect of conceptual knowledge compared to Out of Show images after a delay.
A hallmark of familiar face recognition is the ability to recognise faces across substantial visual changes in the image (Hancock et al., 2000;Young & Burton, 2017).Computational models of face recognition have suggested that purely bottomup image-based descriptions do not provide sufficient information for recognition and that top down processes are necessary to learn within-person variability (Kramer et al., 2018).However, other studies using deep convolutional neural networks show that recognition can be based purely on visual information (Blauch et al., 2021).Nevertheless, it remains unclear how familiar face recognition is achieved in humans and whether non-visual information plays an important role (Rossion, 2018;Yovel & Abudarham, 2021).Our results suggest that the increased conceptual information that accompanies our experience with faces during natural viewing is important in linking visually dissimilar faces into a robust, long-term representation that can be used for recognition.This fits with neuroimaging studies which have shown that the neural response to familiar faces engages non-visual regions of the brain (Di Oleggio Castello et al., 2021;Gobbini & Haxby, 2007;Kov acs, 2020;Noad et al., 2023;Visconti Di Oleggio Castello et al., 2017) and behavioural studies that show the perception of identity can be influenced by non-visual conceptual information (Oh et al., 2021).Future studies using neuroimaging methods should be able to reveal whether the changes that occur as we become familiar with faces during natural viewing become more evident in visual or non-visual regions of the brain.
In conclusion, we show that conceptual knowledge was greater in participants who viewed a movie in its original sequence compared to a scrambled sequence.Despite the fact that the overall perceptual experience was the same in all participants, higher face recognition was evident in participants who viewed the original sequence.This shows an effect of conceptual knowledge on face recognition.However, planned comparisons failed to show that this effect was more sustained over time in participants who viewed the movie in the original sequence.This study provides new insights into the role of conceptual knowledge in face recognition during natural viewing.

Fig. 3 e
Fig. 3 e Examples of faces from the recognition test.(A) In Show target faces were actors as they appear in the show, whereas (B) Out of Show target faces were actors as they appear out of the show.(C) In Show foils were other actors taken from the same show and (D) Out of Show foils were other actors that matched the Out of Show faces.

Fig. 2 e
Fig. 2 e Illustration of movie conditions.(A) The Original condition has the movie clips in the correct sequence, whereas (B) the Scrambled condition had the same movie clips, but they were not presented in the correct sequence.The visual exposure is the same in both conditions, but the order of presentation is different across conditions.

Fig. 5 e
Fig. 5 e Face recognition difference scores for each pre-registered hypothesis.(left panel) The difference in recognition of In Show images at the immediate and delayed timepoints for the Original and Scrambled groups.Hypothesis 2 predicted a smaller difference in the Original compared to the Scrambled group.However, there was no significant difference.(middle panel) Recognition of In Show and Out of Show faces from the Original group, averaged across immediate and delayed time points.Higher recognition was evident for the In Show images, which supports Hypothesis 3. (right panel) The difference in recognition of In Show and Out of Show images at the delayed time point.The difference in the Original group was not significantly greater than the Scrambled condition, which does not support Hypothesis 4. Error bars represent standard error.

Fig. 6 e
Fig. 6 e Face recognition scores for the Original and Scrambled conditions on the immediate and delayed recognition tests.There was a main effect of Image, with higher recognition of In Show compared to Out of Show images.There was also a significant main effect of condition, which reflected higher recognition in the Original compared to Scrambled conditions.Error bars represent standard error.

Fig. 7 e
Fig. 7 e Correlations between the overlap in narrative understanding and overlap in face recognition across all combinations of participants.Overlap in narrative correlated with overlap in the recognition of In Show faces (immediate: r ¼ .17,delayed: r ¼ .21)and Out of Show faces (immediate: r ¼ .04,delayed: r ¼ .18).The correlations at the delayed timepoint were significantly greater than the immediate timepoint.Regression lines are shown in pink.Points denote an individual pairing of participants.