Decoding category and familiarity information during visual imagery

Visual imagery relies on a widespread network of brain regions, partly engaged during the perception of external stimuli. Beyond the recruitment of category-selective areas (FFA, PPA), perception of familiar faces and places has been reported to engage brain areas associated with semantic information, comprising the precuneus, temporo- parietal junction (TPJ), medial prefrontal cortex (mPFC) and posterior cingulate cortex (PCC). Here we used multivariate pattern analyzes (MVPA) to examine to which degree areas of the visual imagery network, category- selective and semantic areas contain information regarding the category and familiarity of imagined stimuli. Participants were instructed via auditory cues to imagine personally familiar and unfamiliar stimuli (i.e. faces and places). Using region-of-interest (ROI)-based MVPA, we were able to distinguish between imagined faces and places within nodes of the visual imagery network (V1, SPL, aIPS), within category-selective inferotemporal regions (FFA, PPA) and across all brain regions of the extended semantic network (i.e. precuneus, mPFC, IFG and TPJ). Moreover, we were able to decode familiarity of imagined stimuli in the SPL and aIPS, and in some regions of the extended semantic network (in particular, right precuneus, right TPJ), but not in V1. Our results suggest that posterior visual areas - including V1 - host categorical representations about imagined stimuli, and that stimulus familiarity might be an additional aspect that is shared between perception and visual imagery.


Introduction
If asked to locate a beloved person, for example a family member within a crowd, most of us would probably identify that person's face among many others in a considerably short amount of time. Due to their personal valence, personally familiar stimuli happen to be easier to recognize compared to unfamiliar stimuli ( Visconti Di Oleggio Castello et al., 2017 ). This is also true when the quality of the visual input (i.e. picture) is degraded ( Burton et al., 1999 ), or when attentional resources are reduced ( Gobbini et al., 2013 ). This facilitation in recognizing familiar stimuli seems to be possible through the larger amount of associated features, i.e. subjective information and personal episodic knowledge, which are lacking in novel and unfamiliar stimuli ( Gobbini et al., 2004 ;Gobbini and Haxby, 2007 ;Cloutier et al., 2011 ).
To identify the network of brain regions involved in the perception of personally familiar stimuli, previous studies mainly focused on two stimulus categories, i.e. faces and places. The network involved in processing personally familiar faces has been suggested to include a "core " network, comprising inferotemporal regions (e.g. the fusiform face area, FFA) responsible for extracting information about the visual appearance of faces, and an "extended " network involved in the retrieval of relevant semantic information, such as personal traits, intentions and attitudes of single case study exploring recognition mechanisms in a patient suffering from topographic agnosia highlighted a key role of the anterior parahippocampal cortex in mediating the recognition of personally familiar places ( Van Assche et al., 2016 ).
To which degree are areas -sensitive to the familiarity of perceived stimuli -also involved in generating a mental image of a person or a place in the absence of a visual stimulus? Visual mental imagery has been interpreted as the reactivation of sensory representations stored in our memory by means of top-down mechanisms ( Dentico et al., 2014 ;Dijkstra et al., 2017 a;Mechelli et al., 2004 ;Pearson, 2019 ). In line with this idea, a growing number of studies showed that visual imagery recruits a similar set of brain regions as those involved in perception (for a meta-analyzes, see Winlove et al., 2018 ). For example, several studies showed that visual mental imagery recruits retinotopically organized early visual cortices ( Kosslyn and Thompson, 2003 ;Kosslyn, 2005 ;Winlove et al., 2018 ), but also parietal areas known to be recruited during visuospatial attention ( Andersson et al., 2019 ;Formisano et al., 2002 ;Sack and Schuhmann, 2012 ;Slotnick et al., 2005 ). Moreover, visual mental imagery of specific stimulus categories (i.e. faces, places and objects) has been shown to recruit category selective regions within the inferior temporal cortex (i.e. FFA, PPA, lateral occipital complex -LOC; Ishai et al., 2000 ;O'Craven and Kanwisher, 2000 ). Using multivariate pattern analyzes (MVPA), it has been demonstrated that it is possible to decode the identity of imagined stimuli in early visual cortices ( Albers et al., 2013 ), parietal ( Ragni et al., 2020 ) and inferior-temporal regions ( Cichy et al., 2012 ;Reddy et al., 2011 ). Finally, several studies demonstrated that it is possible to predict imagined stimuli from activation patterns obtained during perception and vice versa (cross-decoding, Stokes et al., 2009 ;Albers et al., 2013 ;Ragni et al., 2020 ), indicating shared representations between perception and visual mental imagery.
Despite these similarities between imagery and perception, no study so far has explored to which degree it is possible to distinguish between visual mental imagery of personally familiar versus personally unfamiliar stimuli. To address this question, we instructed a group of participants to imagine different personally familiar and unfamiliar stimuli (i.e. faces and places) while measuring their brain activity using fMRI. Specifically, we asked in which brain regions it is possible to distinguish between (1) imagined stimulus categories (i.e., faces and places), and between (2) personally familiar and unfamiliar imagined faces and places. Using region-of-interest (ROI)-based MVPA, we focused on three sets of brain regions: (i) the visual imagery network, comprising key areas involved in the generation and maintenance of mental images (i.e. V1, SPL and aIPS); (ii) category-selective regions within inferior temporal cortex (i.e. FFA and PPA) -which are recruited during visual imagery of specific stimulus categories (i.e. faces and places); (iii) the extended semantic network for face perception, consisting in brain regions involved in the extraction of personal information from familiar stimuli (i.e. precuneus, mPFC, IFG and TPJ; Haxby et al., 2000 ;Visconti Di Oleggio Castello et al., 2017 ).
To anticipate our results, we were able to decode category information of imagined stimuli within occipital (V1) and parietal (SPL, aIPS) nodes of the visual imagery network. Similarly, we were able to distinguish between imagined stimulus categories in category-selective regions (FFA, PPA), and across the extended semantic network for face perception (i.e. precuneus, mPFC, IFG and TPJ). Familiarity of imagined stimuli, on the other hand, could be decoded (over and above vividness) in some regions of the visual imagery network (i.e. right SPL, right aIPS) and the extended semantic network for face perception (in particular, right precuneus and right TPJ), but not in V1 and category-selective areas. Our results suggest that posterior visual areas, including V1, contain category-specific information about imagined faces and places. Moreover, our results demonstrate that familiarity of imagined stimuli is encoded in higher-level semantic regions, extending previous findings on the perception of familiar and unfamiliar stimuli to visual mental imagery.

Participants
Twenty-five healthy volunteers participated in the study. All participants had normal or corrected-to-normal vision and had no history of neurological or psychiatric disease. Before taking part in the study, all participants gave their written informed consent. Due to excessive head movements during the scanning session (average relative mean displacement across all experimental runs > 1mm), two participants had to be excluded from the study, leading to a final sample of twenty-three participants (13 males, 10 females, mean age: 26.6 ± 3.8). The study was approved by the Ethics Committee for research involving human participants at the University of Trento, Italy.

Setup
Visual stimuli were displayed on a liquid crystal monitor (Nordic NeuroLab, Norway; frame rate: 60 Hz; screen resolution: 1920 × 1200 pixels). Participants lay horizontally in the scanner and viewed the screen (31.2°x17.5°of visual angle) binocularly via a rectangular mirror, positioned on the head coil. The auditory cue was delivered by means of MR-compatible earplugs (Siemens pneumatic earplugs). Button presses were collected via MR-compatible response buttons (Nordic Neurolab Box, Nordic Neurolab, Norway). Stimulus presentation, response collection and synchronization with the scanner were controlled using MAT-LAB 2017 (MathWorks, Natick, MA, U.S.A.) and the Psychtoolbox-3 for Windows ( Brainard, 1997 ). Experiment presentation code is available on the Open Science Framework (https://osf.io/ynp5z/).

Stimuli
Stimuli consisted in pictures of different faces and places (4 exemplars each). We selected these two stimulus categories as their perception is known to preferentially recruit portions of high-level visual cortex (i.e. FFA and PPA; Epstein et al., 1999 ;Kanwisher and Yovel, 2006 ). Within each stimulus category, half of the stimuli (i.e. 2 exemplars) consisted of personally familiar stimuli, whereas the remaining half consisted of unfamiliar stimuli. Personally familiar faces were close-up photographs of two male or two female individuals with a close relationship with each participant (e.g. relatives, friends, partners, etc.). Personally familiar places, instead, were images of indoor spaces were participants felt comfortable (e.g. their own bedroom, living room, etc.). Participants provided images several days before the experimental session. A list of all personally familiar stimuli selected by participants is provided in the Supplementary Materials (Table S1) .
Each familiar stimulus was matched with an unfamiliar picture selected from two standard databases (Faces: Face Place Dataset, Tarr, 2008 ;Places: MIT Places-205 Dataset, Zhou et al., 2014 ). Faces were also matched for age, sex and ethnicity, whereas places were matched for the type of room. Pictures were scaled to 155 × 155 pixels, corresponding approximately to 4°visual angle. Each stimulus was then associated with a name: unfamiliar stimuli were tagged with standard labels (i.e. "face one ", "face two ", "room one ", "room two "), whereas familiar stimuli with names selected by each participant. These labels were used as auditory cues during the experimental task (see 2.4. Experimental session and task for more details).

Experimental session and task
Each participant completed a single experimental session, consisting of 9 functional runs (~5 min each; see Fig. 1 a). Each functional run started and ended with a rest period (12 s at the beginning and 20 s at the end), and consisted of 16 trials, for a total of 144 trials per participant (36 trials for each condition). The conditions were embedded in a Fig. 1. Layout of an experimental session, a single run and the trial structure. (a) Experimental session: participants performed a single experimental session, consisting in 9 experimental runs and a structural scan halfway through the experiment. (b) Each run (top panel) consisted of 16 trials, using an eventrelated design. Each trial (bottom panel) was preceded by an ITI (10 s) consisting of the presentation of a central fixation cross and a superimposed placeholder (10 s). Next, an auditory cue (1 s) instructed participants which stimulus to imagine (e.g. a face, identity '1') in the center of the placeholder, for 4 s. At the end of each trial, participants rated the vividness of the mental image they were able to generate on a 4-steps Likert scale, where 1 corresponded to low vividness and 4 to high vividness.
2 × 2 factorial design, with the factors: stimulus category (faces, places) and familiarity (personally familiar, unfamiliar). To examine patterns of brain activation during visual imagery of personally familiar and unfamiliar stimuli, we adopted an event-related design. The stimulus to be imagined was randomized across trials. Before entering the scanner, participants were requested to memorize the familiar and unfamiliar experimental stimuli and practiced the task for one experimental run.
Each trial of the visual imagery task started with a 10 s inter-trial interval (ITI in Fig. 1 b), consisting in the presentation of a central fixation cross and a superimposed placeholder ( Fig. 1 b). The placeholder was represented by a square comprising 4°visual angle and served as a reference for the position and size of the mental image to be generated during the subsequent imagery period ( Fig. 1 b). Then, a verbal instruction (Auditory cue in Fig. 1 b, duration: 1 s) instructed participants which stimulus to imagine in the current trial. After the cue, participants were instructed to imagine the corresponding stimulus as vividly as possible in the center of the placeholder (Imagery Period in Fig. 1 b , duration: 4 s). Next, a question appeared on the screen (Vividness rating in Fig. 1 , duration: 2 s), prompting participants to rate the vividness of the mental image they had previously generated on a 4-steps Likert scale (answer scale: 1. Low vividness -4. High vividness ). Participants were asked to indicate their response by button press. Across the entire trial, the only visual information present on the screen was the central fixation cross and the superimposed placeholder, and the question in the rating phase. At the end of the experimental session, participants filled out the Vividness of Visual Imagery Questionnaire (VVIQ; Marks, 1973). This questionnaire is aimed to assess individual differences in visual imagery abilities by asking the reader to imagine different scenarios (e.g. "Visualize a rising sun. Consider carefully the picture that comes before your mind's eye "), both with eyes open and eyes closed, and rate their vividness on a 5-steps Likert scale (answer alternatives: 1. Perfectly clear and as vivid as normal vision ; 2. Clear and reasonably vivid ; 3. Moderately clear and vivid ; 4. Vague and dim ; 5. No image at all) .

Data acquisition
Magnetic resonance images were collected using a 3T Siemens MAG-NETOM Prisma scanner equipped with a 64-channel head-coil. Functional data were acquired using a multiband EPI sequence (multi-band acceleration factor 3, TE/TR = 28/1000 ms, flip angle = 59°, matrix size = 64 × 64, 42 interleaved slices, in-slice resolution 3 mm x 3 mm). We acquired 294 volumes for each functional run, with axial slices slightly tilted to be approximately parallel to the calcarine sulcus in order to optimize brain coverage.

Behavioral data analyzes
For each participant, we computed average vividness rating for imagined faces and places, separately for familiar and unfamiliar stimuli. We performed a repeated measures ANOVA, with familiarity (2 levels) and stimulus category (2 levels) as factors. Post-hoc t -tests were adopted to compare specific conditions of interest.

fMRI data analyzes 2.7.1. Preprocessing
Data were preprocessed and analyzed using FSL 5.0.11 (FMRIB's Software Library, https://fsl.fmrib.ox.ac.uk/fsl ) in combination with custom software written in MATLAB (MathWorks, Natick, MA, U.S.A.). Preprocessing included rigid-body motion correction to the mean image, followed by slice timing correction and high-pass temporal filtering ( > 0.01 Hz). To prevent multivariate pattern (MVP) analyses from being affected by the removal of fine-scale information in activity patterns ( Kriegeskorte et al., 2006 ), we did not apply spatial smoothing to the functional data.
Each functional run was registered to the subject's anatomical highresolution image with rigid body transformation (using the Boundary-Based Registration algorithm implemented in FSL 5.1; Greve and Fischl, 2009 ) and to the MNI152 2mm standard brain using linear transformation (FLIRT, 12 degrees of freedom; Jenkinson and Smith, 2001 ;Jenkinson et al., 2002 ). Visual inspection of the results of this preprocessing pipeline was performed for each participant. Due to excessive head motion, three functional runs of two different participants were excluded from the analyzes (average relative mean displacement > 1mm).

Univariate analyzes
To examine whether there are differences in univariate activation between familiar and unfamiliar stimuli, we performed a general linear model (GLM) analyzes. The first and second level analyses (fixed effect) were performed in subject's individual space using FSL (fixedeffect model). To examine the amplitude of the blood-oxygen leveldependent (BOLD) response during visual imagery of familiar and unfamiliar stimuli, we created regressors for each factorial combination of stimulus category and familiarity, resulting in a total of 4 regressors (2 categories x 2 familiarity levels) for each experimental run. Regressors were time-locked to the start of the imagery delay (duration: 4 s). Moreover, regressors for the presentation of the auditory cue (time-locked to the onset of the auditory instruction; duration: 1 s) and response-phase (time-locked to the appearance of the question on the screen; duration: 2 s) were added to the model as nuisance regressors. Each regressor was convolved with a canonical hemodynamic response function (HRF). In addition, motion parameters (3 x rotation, 3 x translation) were added to the model as regressors of no interest. Results from the second-level analyzes were then entered in a thirdlevel analyzes across participants, performed using FSL's FLAME (mixedeffect model). For the resulting maps, we adopted a voxel-wise threshold of p < 0.001 (z set at 3.1) and a (corrected) cluster-wise threshold ( p = 0.001) following Gaussian Random Field (GRF) theory (Worsley et al., 1996), which is embedded in the FSL's cluster routine.

ROI definition for multivariate pattern (MVP) analyzes
In this study, we aimed to investigate how neural representations of personally familiar stimuli are encoded within three different brain systems: the visual imagery network, category-selective areas of inferior temporal cortex and the extended semantic network for face perception. We extracted ROIs within these three networks by defining a sphere centered around a priori defined coordinates (radius 9 mm).
The visual imagery network is central for the internal generation and maintenance of mental images ( Dijkstra et al., 2019 a;Mechelli et al., 2004 ). This network encompasses the primary visual cortex (V1), and parietal areas (i.e. SPL and aIPS).
In addition, we identified ROIs in category-selective areas FFA and PPA in the inferior temporal cortex, which are known to be recruited when participants perceive or imagine specific stimulus categories (i.e. faces and places).
For ROIs of the extended semantic system for face perception, we included brain regions considered to be involved in the processing of personally familiar stimuli ( Gobbini and Haxby, 2007 ;Haxby et al., 2000 ), such as the precuneus, the temporo-parietal junction (TPJ), the inferior frontal gyrus (IFG) and the medial prefrontal cortex (mPFC).
Specifically, we defined spherical ROIs centered around the coordinates of a previous study on the perception of personally familiar face stimuli ( Visconti Di Oleggio Castello et al., 2017 ) for V1, FFA, precuneus, TPJ, mPFC and IFG. For SPL and aIPS spherical ROIs were centered on the peak probabilistic voxel of each region as defined by the Juelich Histological Atlas (SPL: area 7A; aIPS: area hIP3; Eickhoff et al., 2007 ). Since the Juelich Histological Atlas does not contain standard ROIs of category selective regions of inferior temporal cortex, the spherical ROIs for PPA were centered around peak values of a standard ROI selected using an automated meta-analytic database ( Neurosynth ; Yarkoni et al., 2011 ). Considering that a more balanced comparison of effects might be achieved if ROIs are defined using the same criteria, we ensured FFA coordinates provided by Visconti Di Oleggio Castello et al. (2017) fell within the range of those provided by the same meta-analytic database used to define PPA (i.e. Neurosynth ). All ROIs were visually inspected using FSLView to ensure there was no overlap between them (see Table  S2 for the coordinates of the peaks of all our ROIs).

MVP analyzes
ROI-based MVP analyses were implemented by means of a 2-class regularized linear discriminant analyzes (LDA) classifier using the CoS-MoMVPA Toolbox ( Oosterhof et al., 2016 ). Estimates for the classification analyzes were generated using a general linear model (GLM) analyzes, performed in participant's individual anatomical space using FSL (fixed-effect model). We created regressors for each single trial within the 9 experimental runs. This led to a total of 144 estimates per participant (36 for each experimental condition). Regressors were time-locked to the start of the imagery delay (duration: 4 s). Moreover, regressors for the presentation of the auditory cue (time-locked to the onset of the auditory instruction; duration: 1 s) and the response-phase (timelocked to the appearance of the question on the screen; duration: 2 s) were added to the model as nuisance regressors. Each regressor was convolved with a canonical hemodynamic response function (HRF). Regressors were orthogonal to each other (maximum correlation between regressors within runs and participants: R 2 = 0.028; average correlation: R 2 = 0.00016). In addition, motion parameters (3 rotational and 3 translational parameters) resulting from 3D motion correction were added to the model as regressors of no interest, for a total of 24 regressors for each experimental run. Since t -values are computed dividing beta estimates by the estimate of the standard error, decoding on the basis of t-values has been shown to suppress the contribution of noisy voxels and therefore has been considered to be better suited for decoding ( Misaki et al., 2010 ). We therefore performed MVPA on t-values instead of beta values.
We performed two different MVP classification analyses: (1) with the category classification analyzes, we aimed to identify brain regions representing the category (i.e. face or place) of imagined stimuli; (2) with the familiarity classification analyzes, we investigated which brain regions encode information relative to the familiarity of imagined stimuli, separately for faces and places.
Cross-validation . We adopted a leave-one-run-out cross-validation scheme. T-values were divided into training and test sets depending on the type of decoding. For category classification (i.e. faces vs places), we cross-validated across familiarity and runs (see Fig. 2 a ): the classifier was trained on t-values obtained during visual imagery of two familiar faces and two familiar places in n-1 runs and tested on t-values corresponding to imagery of the left-out unfamiliar stimuli in the remaining run, and vice versa (18 cross-validation splits). This allowed us to control for familiarity information. For familiarity classification , we crossvalidated across pairs of identities and runs (see Fig. 2 b ). Separately for faces and places, we trained the classifier on t-values corresponding to one familiar and one unfamiliar stimulus in n-1 runs and tested on the tvalues corresponding to two left-out identities (one personally familiar, one unfamiliar) in the remaining run, and vice versa (18 cross-validation splits). This allowed us to disregard identity information from familiarity classification.

ROI-based MVPA
We performed classification analyzes across three separate networks: the visual imagery network (i.e. V1, SPL, aIPS), category-selective areas (i.e. FFA, PPA) and the extended semantic network for face processing (i.e. mPFC, precuneus, TPJ, IFG). For more information see 2.7.2. ROI definition . To assess significance of decoding accuracy, we performed permutation-based testing ( Stelzer et al., 2013 ). For each ROI and classification analyzes, we created 100 random accuracy values for each Fig. 2. Cross-validation schemes. Visual representation of the different cross-validation schemes adopted for (a) category classification (across familiarity and runs) and (b) familiarity classification (across pairs of identities and runs, separately for faces and places). In panel (b) the labels ID1, ID2, ID3 and ID4 represent the 4 different identities of face or place stimuli selected individually for each participant. participant by permuting labels and performing classification. We then randomly selected one permuted accuracy value for each participant, computed the group average and repeated this operation 10000 times to create a null distribution of averaged accuracy values. Statistical significance of the decoding accuracy was computed with respect to this null distribution of group averaged accuracy values. Results were corrected for multiple comparisons (number of ROIs x number of tests) using a false discovery rate (FDR; q -value < 0.05; Benjamini and Yekutieli, 2001 ).
To replicate our ROI analyzes and to identify additional brain regions representing category and familiarity of imagined faces and places, we also performed a whole brain searchlight-based MVPA ( Kriegeskorte et al., 2006 ;Oosterhof et al., 2016 ). For details, see the Searchlight-based MVPA section in the Supplementary Materials ).
Finally, we conducted three additional control analyses to test whether similarity between the auditory cues and/or vividness might partially explain the MVPA results.
First, to exclude a potential influence of similarity between auditory cues for the different categories on the results of the decoding of familiarity, we performed a Representational Similarity Analyzes (RSA), comparing the similarity between neural estimates of familiar and unfamiliar stimuli (for more detailed information see the section The role of the auditory cue: Representational similarity analyzes (RSA) in the Supplementary Materials ).
Second, to explore the potential impact of vividness on decoding analyses, we tried to directly decode vividness ratings provided by participants from patterns of activation using (a) a Support Vector Classification and (b) Support Vector Regression analyzes (see The role of vividness: Support Vector Classification (SVC) and The role of vividness: Support Vector Regression (SVR) in the Supplementary Materials, for more information).
Third, we tested the significance of the differences between the control analyses (i.e., decoding of vividness) and the original analyses (i.e., decoding of familiarity). To this aim, we directly contrasted the classification accuracies for the SVC control analyzes (decoding vividness information) with the classification accuracies of the two original decoding analyses (i.e., decoding familiarity information, separately for faces and places). For more detailed information please refer to section Comparison between familiarity analyzes and SVC analyzes for vividness in the Supplementary Materials .

Data and code availability
Code and summary data for the ROI-based multivariate analyses are available on the Open Science Framework ( https://osf.io/ynp5z/ ). The conditions of our ethics approval do not permit public archiving or peerto-peer sharing of individual raw data. The raw functional and structural data of our study may be made available upon request after a confirmation from the ethical committee of our institution. Fig. 3. Boxplot representing the distribution of vividness ratings ( N = 23) as function of imagined stimulus category and familiarity. Data (black dots) represent average vividness ratings of individual participants, separately for each stimulus category. Each box encloses data comprised between the first (top edge) and third (bottom edge) quartile. The horizontal line within each box denotes the median. Whiskers indicate maximum (top) and minimum (bottom) values. Significance levels: one black asterisk, p < 0.05; two black asterisks, p < 0.01; three black asterisks, p < 0.001. Color scheme same as category decoding in Fig. 1 .

Univariate analyzes
We focused the univariate analyzes on the familiarity effect, testing whether there was an increased activation associated with familiar compared to unfamiliar imagined stimuli (GLM contrast familiar > unfamiliar stimuli). Results from the mixed-effects analyzes were projected on a segmented and inflated surface mesh of the two hemispheres in Brain-Voyager QX 2.8.0 (Brain Innovation). As can be seen in Fig. 4 , familiar stimuli elicited greater activation compared to unfamiliar stimuli within the bilateral mPFC, bilateral precuneus, left anterior temporal lobe and left angular gyrus. By contrast, we found no region where imagining unfamiliar stimuli was associated with an increase of activation compared to familiar stimuli.

Multivariate pattern analyses
We performed a ROI-based multivariate pattern analyzes to examine the representation of personally familiar stimuli (i.e. faces and places) within the visual imagery network, category-selective inferotemporal areas and the extended semantic network for face perception.
Regarding our first question, we were able to decode the category (faces, places) of imagined stimuli from patterns of activation in V1 and parietal (i.e. SPL, aIPS) regions of the visual imagery network ( Fig. 5 a ), within category-selective areas (i.e. FFA and PPA; Fig. 5 b ) and the extended semantic network of face perception (i.e. precuneus, TPJ, mPFC, IFG; Fig. 5 c ). The corresponding statistics are shown in Table 1 .
Regarding our second question, we were able to decode the familiarity of imagined faces and houses within parietal (i.e. SPL, aIPS) nodes of the visual imagery network, but not in V1 ( Fig. 5 a ). Within categoryselective areas, we were able to decode the familiarity of faces, but not places, in left FFA ( Fig. 5 b ). By contrast, we were able to decode the familiarity of both places and faces in PPA. Finally, we were able to decode the familiarity of imagined faces and places within the extended semantic network (i.e. precuneus, TPJ, mPFC, IFG; Fig. 5 c ). The corresponding statistics are shown in Table 1 . Moreover, the results of our ROI analyzes are consistent with the results of the searchlight-based MVP analyses (see Supplementary Materials, Fig. S1-3 ).
To assess whether the familiarity decoding might be explained by differences in similarity of the adopted auditory cues and/or by the different levels of vividness of the imagined stimuli, we performed three additional control analyses. In the following sections, we will briefly describe these analyses (summarized in Fig. 6 ), whereas a more detailed description is provided in the Supplementary Materials.
First, we explored whether there was a possible influence of the similarity of the adopted auditory cues on familiarity decoding adopting RSA (see Supplementary Materials, page 5-9 ). The results of this analyzes showed a possible effect of similarity between the auditory cues within several (i.e. left precuneus, left TPJ, left mPFC and right IFG), but not all of the examined ROIs (see Table S1), which might partially explain the decoding of familiarity in these regions. The result of the RSA analyzes is summarized in Fig. 6: regions showing a significant effect in this analyzes (and thus a potential role of the auditory cue) are highlighted with a dotted circle. Fig. 4. Results of the univariate mixed-effects GLM contrast familiar > unfamiliar stimuli ( N = 23 participants). The z-map of group level activation was thresholded at p = 0.001 and cluster-correction was applied using GRF theory ( p = 0.001). The resulting z-map was projected on a segmented and inflated surface mesh of the two hemispheres. L: Left hemisphere. R: Right hemisphere .  ception (i.e. precuneus, TPJ, mPFC and IFG; panel (c). Statistical significance was assessed by means of permutation testing (10000 iterations) coupled with FDR-based correction for multiple comparisons ( Benjamini and Yekutieli, 2001 ). Yellow: category decoding; blue: familiarity decoding of places; red: familiarity decoding of faces). Significance levels: one black asterisk, p < 0.05; two black asterisks, p < 0.01; three black asterisks, p < 0.001; one red asterisk, q(FDR) < 0.05. Error bars: standard error of the mean (S.E.M.) (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.).
The other two analyses, Support Vector Classification (SVC) and Support Vector Regression (SVR), aimed at assessing the possible role of vividness in driving the decoding of familiarity (see Supplementary Materials, page 10-15 ). These two analyses revealed a possible effect of vividness in some of the examined ROIs (i.e. bilateral V1, left aIPS, bilateral PPA, left FFA, left IFG, right mPFC, right precuneus and right TPJ; see Figs. 6 and S9,S10). However, there were also regions in which we obtained an effect of vividness in the absence of a significant effect for the decoding of familiarity (e.g. V1). The results of the SVC and SVR analyses are summarized in Fig. 6: regions showing a significant effect of vividness for one or both analyses are marked with a dashed circle.
Overall, the results of these control analyses seem to suggest that similarity between auditory cues and differences in vividness between categories might only partly explain the results of the familiarity decoding, as these effects were significant only within a subset of the examined ROIs, but not within all of them.
To complete our control analyses regarding the potential role of vividness, we directly compared the significance of the differences between the decoding accuracies for familiarity and vividness. This com- Fig. 6. Visual representation of the results of the ROI-based classification analyses. Left panel : visual imagery network (i.e. V1, SPL and aIPS). Middle panel : category-selective areas (i.e. FFA and PPA). Right panel : extended semantic network for face perception (i.e. precuneus, TPJ, mPFC and IFG). Colored discs represent significant above-chance classification accuracy (color scheme similar to Fig. 5 ). Statistical significance was assessed by means of permutation testing (10,000 iterations) coupled with FDR-based correction for multiple comparisons ( Benjamini and Yekutieli, 2001 ). Different line types around the disks represent a significant effect in one of the control analyses: dotted line (auditory cue effect, RSA analyzes), dashed line (vividness effect, SVR and/or SVC analyzes), continuous line (familiarity for face and/ or place > vividness).
parison is possible only between the decoding accuracies for familiarity and the results of the SVC control analyzes for vividness, but it is not possible for the SVR analyzes for vividness (which provides normalized correlation coefficients as output). We thus compared the normalized decoding accuracies, separately for familiarity of faces and places, with the normalized classification accuracies for vividness (see Supplementary Materials, pages 16-17 , for details ). The result of this analyzes is summarized in Fig. 6: regions showing significant higher decoding of familiarity than for vividness are highlighted with a continuous circle.
We obtained a significant difference between the decoding accuracies for face familiarity and vividness within several ROIs (Fig. S11): 2 regions for the visual imagery network (Right SPL, Right aIPS) and 5 regions for the extended semantic network for face perception (Left mPFC, Left Precuneus, Right Precuneus, Left TPJ, Right TPJ). Decoding accuracy of place familiarity was significantly higher than decoding of vividness only in one region of the extended semantic network for face perception, i.e. right Precuneus (Fig. S12). In sum, the Right Precuneus was the only region where familiarity for both place and face were significantly higher than vividness.

Discussion
Here we used multivariate pattern analyzes to investigate to which degree it is possible to decode: (1) the category and (2) the familiarity of imagined stimuli. We found that category information was encoded in regions of the visual imagery network, such as V1 and in parietal areas (SPL, aIPS), in category-selective areas of inferior temporal cortex (i.e. FFA and PPA) and the extended semantic network (precuneus, TPJ, mPFC, IFG). Familiarity of imagined faces and places, on the other hand, could be decoded more accurately than vividness from patterns of activation in parietal regions (left SPL, left aIPS) of the visual imagery network, in regions of the extended semantic network for face processing (precuneus, TPJ, mPFC), but not in V1 and within category-selective areas (i.e. FFA and PPA). In the following sections, we will discuss these results in more detail.

Decoding category information of imagined stimuli
We were able to decode category (i.e. face vs. place) information about imagined stimuli from patterns of activation within V1. Previous studies reported encoding of imagined simple stimuli in V1, such as oriented gratings ( Albers et al., 2013 ) or line drawings of individual letters, simple shapes and objects ( Ragni et al., 2020 ). Regarding visual imagery of more complex stimuli (i.e. high-resolution pictures of different stimulus categories), Reddy et al. (2011) were able to decode the imagined stimulus category in category-selective areas (FFA, PPA), higher visual areas (LOC) and extrastriate visual areas (i.e. V2 -V4), but not in V1. Thus, to our knowledge the current study is the first showing decoding of the category of imagined stimuli within V1, expanding the knowledge about the level of complexity of the stimuli that can be encoded in early visual cortices during visual imagery.
Beyond visual areas, we were able to decode category information of imagined stimuli within parietal nodes of the visual imagery network (i.e. aIPS, SPL), and within category-selective areas of the inferior temporal cortex (i.e. FFA, PPA). Parietal regions have repeatedly been shown to be recruited during visual imagery ( Knauff et al., 2000 ;Formisano et al., 2002 ;Slotnick et al., 2005 ;Winlove et al., 2018 ), and SPL and aIPS have been reported to represent information about specific imagined stimulus exemplars ( Ragni et al., 2020 ). Similarly, ventral temporal cortex has been shown to differentially respond during visual imagery of different stimulus categories (i.e. faces, houses and chairs: Ishai et al., 2000 ;faces and places: O'Craven and Kanwisher, 2000 ;faces: Ishai, 2002 ). Moreover, PPA has been shown to encode the identity of specific imagined places ( Boccia et al., 2017 ;Johnson and Johnson, 2014 ). Our results confirm these findings, indicating that both FFA and PPA can host representations of category information during visual imagery.
Within the extended semantic network for face processing, we were able to decode the category of the imagined stimulus within all examined ROIs (i.e. precuneus, TPJ, mPFC and IFG). In a recent MVPA study, Visconti Di Oleggio Castello et al. (2017) presented participants with pictures of personally familiar and unfamiliar faces. The authors were able to decode identity information of perceived faces, independently of their familiarity, within high-level semantic regions, such as the precuneus, TPJ, mPFC and IFG. Our results indicate that the extended semantic network for face perception can represent familiarityindependent category information about a stimulus also during visual mental imagery.

Decoding familiarity of imagined stimuli
Regarding familiarity decoding, given the univariate differences between imagery of familiar and unfamiliar stimuli in bilateral mPFC and precuneus, decoding results in these two areas are likely to be driven by these univariate differences rather than due to differences in the underlying patterns of activation. That said, we were able to decode familiarity of imagined faces and places from patterns of activation in parietal regions (SPL, aIPS) within the visual imagery network. As these regions have been frequently reported to be recruited during spatial tasks (for a meta-analyzes see Winlove et al., 2018 ), both SPL and aIPS have been considered to be part of a "dorsal " spatial network comprising parietal and premotor regions, involved in the representations of spatial configurations of imagined stimuli ( Sack and Schuhmann, 2012 ). In addition, recent MVPA studies revealed representations of stimuli in parietal cortices during perception and imagery ( Dijkstra et al., 2017 a;Ragni et al., 2020 ), suggesting their role in detecting salient parts of imagined and perceived stimuli to support the orientation of top-down attention ( Bogler et al., 2011 ;Dijkstra et al., 2019 a). Our results suggest Table 1 Results of the ROI-based MVP analyses and corresponding statistics. Statistical significance was assessed by means of permutation testing (10000 iterations). Chance level: 50%. Red asterisks denote statistical tests surviving FDR-based correction for multiple comparisons ( Benjamini and Yekutieli, 2001  that parietal regions also contribute to the distinction between imagined stimulus categories. Note that the control analyses showed that the effect of familiarity observed in parietal cortex might be partially (or fully) explained by vividness in the left hemisphere, but not in the right one (Figs. S9,S10): in the right SPL and aIPS, we were unable to decode vividness, and there was a significant difference between decoding familiarity and vividness (Fig. S11). Note that this effect was evident only for faces, suggesting a possible dissociation between imagined stimulus categories and/or hemisphere. Further studies will be required to examine the precise contribution of parietal regions to visual imagery of familiar stimuli. We were unable to decode familiarity of imagined faces and places within V1. Using personally familiar face stimuli, Visconti Di Oleggio Castello et al. (2017) found distributed representations of familiarity information in early visual cortices (i.e. V1 -V3) during perception. Several fMRI studies suggested a feedback transfer of information to early visual cortex during the execution of different tasks: Monaco et al. (2020) , for example, were able to decode different planned actions from patterns of activation within the primary visual cortex; Vetter et al., (2014Vetter et al., ( , 2020 found encoding of real and imagined sounds within V1 in both normally sighted and congenitally blind individuals; Bannert and Bartels (2013) found representations of the prototypical colours of objects presented in black and white, and Muckli et al., (2015) were able to encode visual information in non-stimulated portions of V1, suggesting the existence of mechanisms of cortical feedback from higher-level visual areas ( Muckli and Petro, 2013 ). Visual mental imagery has been argued to rely on a similar top-down mechanism, where information about imagined stimuli is transferred from prefrontal and parietal areas to occipital nodes of the visual imagery network ( Dijkstra et al., 2017 a;Mechelli et al., 2004 ). Our decoding results suggest that, as opposed to what has been observed during perception of personally familiar stimuli ( Visconti Di Oleggio Castello et al., 2017 ), only category-specific, but not familiarity-related information, is possibly transferred to early visual cortex during imagery. Moreover, the results of the control analyses ( Figs. 6 and S9,S10) point towards a specific role of V1 in representing information related to vividness of imagined stimuli, in line with previous research ( Dijkstra et al., 2017 a). In addition, V1 was the only region in which it was possible to decode vividness but not familiarity information, suggesting that vividness might be dissociated from familiarity in this region.
Regarding category-selective areas, we were able to decode the familiarity of imagined faces in left FFA and of imagined faces and places in PPA. The results in left FFA are in line with the observation that the familiarity of perceived faces can be decoded in bilateral FFA ( Visconti Di Oleggio Castello et al., 2017 ). It is worth noting that we were able to decode familiarity of faces from the left but not from the right FFA. This asymmetry in the involvement of FFA during visual imagery of faces has been reported by previous studies, some indicating stronger recruitment within the right ( O'Craven and Kanwisher, 2000 ) and others within the left hemisphere ( Ishai, 2002 ). Note also that the left hemisphere has been reported to be more frequently activated by visual imagery tasks ( Winlove et al., 2018 ).
Note that our control analyses revealed an effect of vividness in bilateral PPA and left FFA, and the effect of decoding of familiarity and vividness did not differ statistically in these regions (Figs. S9-13 and 6 ). Consequently, decoding of familiarity of imagined stimuli in categoryselective regions might be driven by vividness, or a combination of vividness and familiarity. Further studies will be required to clarify this point.
A rich body of neuroimaging studies reported the involvement of regions of the extended semantic network of face perception in processing information about stimulus familiarity. Specifically, perceiving familiar faces has been reported to recruit, in addition to category selective regions within ventral temporal cortex (e.g. FFA and OFA. Haxby et al., 2000 ; Natu and O'Toole, 2011 ), a widespread network of brain regions ( Gobbini and Haxby, 2007 ;Haxby and Gobbini, 2012 ). This network includes different areas that allow us to recognize personally familiar individuals, such as mPFC and TPJ -possibly involved in the retrieval of personal knowledge ( Cloutier et al., 2011 ) -and anterior temporal cortices and precuneus -engaged during the retrieval of long-term episodic memories ( Shah et al., 2001 ). A similar engagement of this network has been reported during the perception of personally familiar places ( Sugiura et al., 2005 ). These authors observed the recruitment of the precuneus, medial posterior cingulate cortex and retrosplenial cortex, in addition to PPA ( Sugiura et al., 2005 ). Moreover, Visconti Di Oleggio Castello et al. (2017) were able to decode familiarity of perceived faces both within category-selective temporal regions (e.g. OFA, FFA) and semantic areas (i.e. mPFC, IFG, TPJ and STS).
Our control analyses further support the central role of the extended semantic network in processing the familiarity of imagined stimuli. Indeed, several regions within the network (left and right TPJ, left and right precuneus and the left mPFC) represented information about familiarity for face -over and above vividness information (Fig. S11). In addition, RSA showed that two regions of the network -right mPFC and right precuneus -represented familiarity of imagined stimuli irrespective of the information regarding the auditory cue. One of these two regions, right precuneus, represented familiarity -over and above vividness -both for faces and places. This ROI, which is part of the extended semantic network, was the only region across all examined ROIs showing all these properties. Overall, our findings further expand previous observations, indicating that visual imagery of personally familiar faces and places recruits a similar set of brain regions -comprising the extended semantic network -as those involved during perception.
The general functional overlap between bottom-up sensory stimulation and top-down mental imagery has been reported in several univariate ( Winlove et al., 2018 ) and multivariate studies ( Reddy et al., 2011 ;Lee et al., 2012 ;Albers et al., 2013 ; for a review see Dijkstra et al., 2019 a), and across different domains (e.g. auditory: Zatorre and Halpern, 2005 ;motor: Monaco et al., 2020 ). Our results are in line with the existence of a shared neural substrate for perception and imagery. An open question remains whether visual imagery and perception of personally familiar stimuli rely on the same neural representations, especially in regions of the extended semantic network for face perception. Since a growing number of studies reported shared neural codes between visual imagery and perception in occipital ( Albers et al., 2013 ), inferotemporal ( Reddy et al., 2011 ;Stokes et al., 2011 ) and parietal ( Dijkstra et al., 2017 b;Ragni et al., 2020 ) nodes of the visual imagery network, a similar neural representation might underlie visual imagery and perception of personally familiar stimuli as well.

Conclusions
Our results revealed that during visual imagery of personally familiar faces and places, information about the stimulus category is distributed across the visual imagery network, category-selective areas and the extended semantic network of face perception. Familiarity information, instead, was represented within parietal regions of the visual imagery network and within some regions of the extended semantic network of face perception, but not in V1 -which seems to be mainly involved in processing vividness information. These results extend previous knowledge about the level of complexity of information that can be represented in early visual cortex during visual mental imagery. Moreover, our results suggest that stimulus familiarity might be an additional element that is shared across visual perception and visual imagery, extending a growing corpus of studies investigating the neural basis of visual mental imagery.

Declaration of Competing Interest
The authors declare no competing financial interests.