Gradual change of cortical representations with growing visual expertise for synthetic shapes

Abstract Objective: Visual expertise for particular categories of objects (e.g., mushrooms, birds, flowers, minerals, and so on) is known to enhance cortical responses in parts of the ventral occipitotemporal cortex. How is such additional expertise integrated into the prior cortical representation of life-long visual experience? To address this question, we presented synthetic visual objects rotating in three dimensions and recorded multivariate BOLD responses as initially unfamiliar objects gradually became familiar. Main results: An analysis of pairwise distances between multivariate BOLD responses (“representational similarity analysis,” RSA) revealed that visual objects were linearly discriminable in large parts of the ventral occipital cortex, including the primary visual cortex, as well as in certain parts of the parietal and frontal cortex. These cortical representations were present from the start, when objects were still unfamiliar, and even though objects were shown from different sides. As shapes became familiar with repeated viewing, the distribution of responses expanded to fill more of the available space. In contrast, the distribution of responses to novel shapes (which appeared only once) contracted and shifted to the margins of the available space. Conclusion: Our results revealed cortical representations of object shape and gradual changes in these representations with learning and consolidation. The cortical representations of once-viewed shapes that remained novel diverged dramatically from repeatedly viewed shapes that became familiar. This disparity was evident in both the similarity and the diversity of multivariate BOLD responses.

Here, we map the cortical representation of synthetic visual objects and track gradual changes as initially unfamiliar objects become progressively familiar with learning.We wondered how pre-existing shape representations would accommodate and integrate novel synthetic objects.We further wondered whether representational changes would be specific to learned objects or extend also to other objects of the same kind.To explore these questions, we analyzed "representational similarity" of spatiotemporal BOLD patterns ( Haxby, 2012;Kriegeskorte, Mur, Ruff, et al., 2008), which offers a potentially sensitive measure for the information encoded in neural activity and may also be related to similarity as perceived by human observers ( Charest & Kriegeskorte, 2015;Collins & Behrmann, 2020;Nestor et al., 2016).
Most previous studies of visual expertise identified cortical sites associated with a particular object category by comparing BOLD activity either between novices and experts or before and after learning.We extend this work in three ways: firstly, by establishing representational distance at the level of object exemplars rather than object categories; secondly, by monitoring gradual changes as observers gain familiarity with object exemplars; and thirdly, by analyzing changes in the diversity of multivariate BOLD activity.Few previous studies have attempted to resolve shape representations in such detail ( Brants et al., 2016;Duyck et al., 2021;Eger et al., 2008;Visconti di Oleggio Castello et al., 2021).To progress fine-grained analysis of representational geometry, we developed synthetic shapes for which visual expertise is acquired comparatively slowly ( Kakaei et al., 2021) and took advantage of a numerically tractable method for linear discriminant analysis in O(10 3 )-dimensional multivariate activity (DLDA; Yu & Yang, 2001).
Our results showed view-invariant representations of shape over surprisingly extensive regions of the ventral occipitotemporal cortex, including the fusiform gyrus, lateral occipital areas, and primary visual cortex.Representational distances were high from the start, even before learning, suggesting that new visual expertise was accommodated and encoded within pre-existing representations.However, shapes that appeared repeatedly (and were memorized by observers) and shapes that appeared just once (and were ignored) diverged dramatically, in terms of their cortical representations, while visual expertise was being acquired and consolidated.

Observers and behavior
Eight healthy observers (4 female and 4 male; aged 25 to 32 years) took part in behavioral training ("sham experiment," one session per observer), the functional imaging experiment ("main experiment," six scanning sessions per observer), and a final behavioral assessment (two sessions).All observers were paid and gave informed consent.Ethical approval was granted under Chiffre 30/21 by the ethics committee of the Faculty of Medicine of the Otto-von-Guericke University, Magdeburg.
In both sham and main experiments, observers viewed sequences of 200 recurring and non-recurring objects (see below and Fig. 1A) and attempted to classify each object as "familiar" or "novel" (by pressing the appropriate button).Over the course of multiple sessions, observers gradually became familiar with recurring objects and thus became able to distinguish them from non-recurring objects.Objects of the sham experiment were twodimensional shapes, whereas objects of the main experiment were rotating, three-dimensional shapes (see below and Fig. 1A).
The main experiment extended over 3 successive weeks, with three sessions on separate days of both the 1st and 3rd week (no sessions took place in the 2nd week).The experiments of the 1st and 3rd week differed in four aspects: sequence type (structured or unstructured), the set of recurring objects, object color (red or blue), and responding hand (left or right).All aspects were counterbalanced across observers.
After the three scanning sessions of a week, observers participated in an additional behavioral session to confirm that they had in fact become familiar with every recurring object.Specifically, they performed a spatial search task in which they pointed out recurring target objects among non-recurring distractor objects ( Kakaei et al., 2021).In addition, observers were offered the opportunity to voice anything they might have noticed about the experiment.

Experimental paradigm
Complex three-dimensional objects were computergenerated and presented as described previously ( Kakaei et al., 2021).A movie can be viewed under this LINK.All objects were highly characteristic and dissimilar from each other as confirmed computationally in terms of vector distances between depth maps ( Kakaei et al., 2021).Objects were presented every 3s, with 2.5s viewing and 0.5s transition time (Fig. 1A).Objects were shown from all sides and, after appearing at an arbitrary angle, revolved smoothly for one full turn (period 2.5s, frequency 0.4 Hz, angular frequency 144 !/s) about one of several axes in the frontal plane (−45 !, 0 !, 45 !, clockwise or counterclockwise).Axes and directions were counterbalanced for each object, and initial viewing angles were chosen randomly (Fig. 1B).All stimuli were generated with MATLAB (The MathWorks, Inc.), presented with the psychophysics toolbox ( Brainard, 1997), and viewed in a mirror mounted to the MR head coil (screen resolution 960 × 720 pixels, frame rate 60 Hz, subtending approximately 8 !× 6 ! of visual angle, average luminance 50 Cd/m 2 , background luminance 5 Cd/m 2 ).Observers responded with the right or left index finger on an MR-safe response box.
Fifteen objects recurred many times during three sessions ("recurring" objects), whereas other objects appeared exactly once ("non-recurring" or "singular" objects).As mentioned, observers classified every object as either "familiar" or "unfamiliar" by pressing a button during its presentation.Over the course of three sessions, Fig. 1.Experimental paradigm.(A) Complex objects were shown for 2.5 s each, separated by 0.5 s transition, in sequences of 200 presentations, with a total duration of 600 s.Over 1 week, observers participated in 3 sessions, viewing 6 sequences during each session (18 sequences in total).Fifteen objects appeared many times each ("recurring objects"), while other objects appeared exactly once ("non-recurring objects").Observers were required to categorize each object as either "familiar" or "unfamiliar" (by button press).(B) Objects appeared randomly rotated and revolved for one full turn (clockwise or counter-clockwise about variable axes in the frontal plane, inclination of 0 !, 45 !, or −45 ! ).(C) Over the course of the week, as observers became familiar with recurring objects, classification performance improved.Here, performance (average and S.E.M.) is shown as a function of the number of presentations for 15 recurring objects, 8 observers, and 2 conditions.The relation between presentations and sessions was probabilistic (indicated by gray shading).(D) Reaction time (average and S.E.M.) as a function of presentation number.With increasing familiarity, reaction times decrease by 50% (from 1.7 s to 0.9 s) and become considerably shorter than the presentation time.
all observers gradually became familiar with the "recurring objects" (see below).The average time-course of learning, as established by a simplified signal detection and reaction-time (RT) analysis, is shown in Figure 1C.
Presentation sequences started with a random recurring object and continued randomly to one of the possible next objects, with neither immediate repetitions ( X → X) nor direct returns ( X → Y → X) being allowed.Sequences comprised 200 objects, of which 180 were recurring and 20 objects non-recurring and were interspersed at random intervals.Object sequences were post-selected such as to counterbalance the number of appearances of every recurring object in every session.
All observers performed the experiment twice in the scanner, once during the 1st week and again during the 3rd week of the main experiment (so that 8 observers provided 16 data sets).As mentioned, the 2 weeks differed in terms of the recurring objects and the presentation sequence."Structured" sequences exhibited predictive sequential dependencies (3 possible recurring next objects), whereas "unstructured" sequences did not (14 possible recurring next objects, see Kakaei et al., 2021 for details).As a result, the repetition latency (i.e., the latency of successive presentations of the same object) was 5.5 ± 15 (median and S.D.) for "structured" and 10.5 ± 11 for "unstructured" sequences.Further aspects and effects of sequence structure are reported and discussed in detail in a companion paper.
To verify that recurring objects had become familiar to observers, every observer performed 60 trials of a spatial search task with 3 recurring and 9 non-recurring objects.The 12 objects were positioned randomly in a 3 × 4 array and were presented for 30 s while rotating in three dimensions (as in the main experiment).After each presentation, observers indicated the recurring object positions with the computer mouse.Performance was consistently above 95% correct.

MRI acquisition
All magnetic-resonance images were acquired on a 3T Siemens Prisma scanner with a 64-channel head coil.

fMRI pre-processing
Our approach to fMRI analysis was influenced by recent advances in comparing uni-and multivariate responses of corresponding voxels between different observers (e.g., Kumar et al., 2022;Nastase et al., 2019).The local correlation structure of voxel response, which is similar in different observers, provided the basis for our functional parcellation ( Dornas & Braun, 2018).The parcellation obviated "searchlight" strategies by defining for all observers corresponding brain "parcels" with corresponding episodes of high-dimensional (O(1000)) multivariate activity.
The fMRI pre-processing procedure was similar to that published previously ( Dornas & Braun, 2018).First, DICOM files were converted into NIFTI format using MRICRON (MRICRON Toolbox, Maryland, USA, NIH).Then, brain tissues were extracted and segmented using BET ( Smith, 2002) and FAST ( Zhang et al., 2001).Field map correction, head motion correction, spatial smoothing, high-pass temporal filtering, and registration to structural and standard images were performed with the MELODIC package of FSL ( Beckmann & Smith, 2004).
Field map correction and registration to structural image were carried out using Boundary-Based Registration (BBR; Greve & Fischl, 2009).MELODIC uses MCFLIRT ( Jenkinson et al., 2002) to correct for head motion.Spatial smoothing was performed with SUSAN ( Smith & Brady, 1997), with full width at half maximum set at FWHM = 5 mm.To remove low-frequency artifacts, we applied a high-pass filter of the cut-off frequency f = 0.01 Hz, that is, oscillations/events with periods of more than 100 s were removed.To register the structural image to Montreal MNI152 standard space with isotropic 2 mm voxel size, we used FLIRT (FMRIB's Linear Image Registration Tool; Jenkinson & Smith, 2001;Jenkinson et al., 2002) with 12 degrees of freedom (DOF) and FNIRT (FMRIB's Nonlinear Image Registration Tool) to apply the non-linear registration.To further reduce artifacts arising from head motion, we applied despiking with a threshold of λ = 100 using BrainWavelet toolbox ( Patel et al., 2014).Later, we regressed out the mean CSF activity as well as 12 DOF translation and rotation factors predicted by a motion correction algorithm (MCFLIRT).Afterward, the time series of each voxel was detrended linearly and whitened (with Matlab functions "detrend" and "zscore").
Finally, the 160, 099 voxels of MNI152 space were grouped into 758 functional parcels according to the MD758 atlas ( Dornas & Braun, 2018).Each functional parcel is associated with an anatomically labeled region of the AAL atlas ( Tzourio-Mazoyer et al., 2002) and comprises approximately 200 voxels or approximately 1.7cm 3 of gray matter volume (212 ± 70 voxels, range 45 to 462 voxels).Parcels were defined for a small population of observers such as to maximize signal covariance within and minimize covariance between parcels in the resting state.In contrast to other parcellation schemes, this was based exclusively on the (typically strong) functional correlations within each anatomical region and disregarded the (typically weak) correlations between different anatomical regions.The MD758 parcellation offers superior cluster quality, correlational structure, sparseness, and consistency with fiber tracking, compared to other parcellation schemes of similar resolution ( Albers et al., 2021;Dornas & Braun, 2018).

fMRI data analysis
To study the neural representation of objects, we extracted the multivoxel activity pattern at N t = 9 time points following object onset.In a functional parcel with N vox voxels, this response pattern constituted a point (or vector) in an N dim -dimensional space, where N dim = N t ⋅ N vox (Fig. 2A).To identify parcels with significant selectivity for individual recurring objects, we employed a representational similarity analysis (RSA; Kriegeskorte, Mur, & Bandettini, 2008) (Fig. 2B).This analysis uses the standardized Euclidean (Mahalanobis) distance between responses in a high-dimensional space to examine the separability of neural object representations as a function of learning, or object type (recurring or non-recurring), or both.Over all 758 parcels, response dimensionality was N dim = 1,911± 634 (mean and standard-deviation), with a range from 405 (Calcarine-L 329, with 45 voxels) to 4,113 (Postcentral-R-484, with 457 voxels).
Our approach to RSA differed from previous work in some respects.Firstly, we analyzed high-dimensional spatiotemporal patterns of BOLD activity (200 voxels × 9 s, or O(10 3 ) dimensions) in non-overlapping gray matter volumes (758 functional subdivisions of 90 anatomical regions, averaging 1.7 cm 3 ; Dornas & Braun, 2018).Other studies have used lower-dimensional spatial activity patterns in overlapping searchlight volumes (O(10 2 ) voxels or dimensions, covering 0.25 to 1.0 cm 3 ; Kriegeskorte et al., 2006).Secondly, we employed multi-class linear discriminant analysis ("direct linear discriminant analysis," DLDA; Yu & Yang, 2001), rather than pairwise discriminability or one-versus-all discriminability (e.g., Hung et al., 2005;Liu et al., 2009).With these modifications, RSA revealed representational geometry at the level of object exemplars, as well as gradual changes in this geometry over sessions and runs.

Linear discriminant analysis
To analyze the response variance that discriminates κ = 15 recurring objects, at most κ − 1 ( )dimensions are required.
Restricting the analysis to 14 principal components of the response could potentially have neglected smaller but more discriminating components.Accordingly, we performed a Linear Discriminant Analysis (LDA), which amounts to a "supervised" principal component analysis (PCA) and yields the κ − 1 ( )dimensional orthonormal subspace S that optimally discriminates the κ response classes.Here, optimality is defined as simultaneously minimizing within-class variance and maximizing between-class variance of responses.
The results of LDA and PCA showed considerable commonality.Over the 758 parcels, the first 14 principal components captured 53 ± 7% (mean and S.D.) of the total response variance, whereas the 14-dimensional subspaces S captured 33 ± 7% of the total variance (or 61± 6% of the principal component variance).Almost all of the subspace variance overlapped with the principal component variance (i.e., 88 ± 5% of subspace variance projected into the space of the first 14 principal components, while the remaining 12 ± 5% projected into the space of the remaining principal components).
Similar numbers were obtained for the 124 identityselective parcels.The first 14 principal components captured 57 ± 6% (mean and S.D.) of the total response variance, and subspaces S captured 38 ± 6% of the total variance (or 67 ± 4% of the principal component variance).Almost all of the subspace variance (91± 3%) overlapped with the first 14 principal components.In summary, Linear Discriminant Analysis captured the useful (discriminating) part of correlated variance and distributed this variance more uniformly over its 14 orthonormal dimensions (6 ± 3% per dimension) than principal component analysis could (4 ± 6% per dimension).
A numerically tractable procedure for identifying the optimal subspace S is available in terms of "direct LDA" or DLDA ( Ye et al., 2006;Yu & Yang, 2001).Briefly, this method first diagonalizes between-class variance to identify κ − 1 discriminative eigenvectors with non-zero eigenvalues, next diagonalizes within-class variance, and finally yields a rectangular matrix for projecting activity patterns from the original activity space (dimensionality N dim ) to the maximally discriminative subspace S and back.As this method is linear and relies on all available degrees of freedom, its results are deterministic.An important feature of this particular algorithm is that within-class variance is maintained near unity for all classes, by means of a suitable scaling of the subspace dimensions.The link github.com/cognitive-biology/DLDAprovides a Matlab implementation of DLDA.

Amplitudes, distances, and correlations
Activity patterns x jk associated with trials k were analyzed in the maximally discriminative subspace S. The terns exhibited an average value of 〈a〉 = 0.99 .The nor- The patterns from successive trials exhibited a weak temporal correlation, with approximately 5% smaller Fig. 2. Analysis of fMRI activity with direct linear discriminant analysis, or DLDA.For each functional parcel, DLDA identified the 14-dimensional space that optimally discriminated the 15 classes of activity patterns associated with 15 recurring objects.Other activity patterns, such as those associated with nonrecurring objects, were also analyzed in this space.(A) For a given parcel with N vox voxels (e.g., yellow region Frontal-Inf-R-8), activity was recorded over 9 s during and following object presentation (2 to 11 s after onset).Each such activity pattern corresponds to a point (or vector) in a 9 ⋅ N vox -dimensional space (right).Here, activity patterns associated with three object presentations are represented schematically (red, green, and blue spheres).(B) To cross-validate discriminability, recurrent object presentations were divided randomly into a training set (90%) and a test set (10%).From the training set, the DLD subspace S was established.Here, exemplars (solid spheres) and class centroids (crosses) are represented schematically.Next, the projections into this space of test set patterns were compared to class centroids.(C) Projection onto the line connecting class centroids i and j revealed the pairwise discriminability/dissimilarity δ i, j of object classes i and j (top), and the distances to class centroids yielded the within-class and between-class variance of representations, SS W and SS B , and the associated variance ratio F = SS B /SS W (bottom).Additionally, a matrix of (mis-)classification probabilities P(reported i | true j) (a.k.a confusion matrix) could be obtained (not shown).(D) To assess object representation generally, test presentations were drawn randomly from the complete set of object presentations (left).To assess changes over the duration of the experiment, the set of presentations was divided into five successive "batches" and test presentations were drawn from one of these batches (bottom).In either case, the training set comprised all remaining presentations (i.e., the complement of the test set).distances at delays below 4 trials and approximately 2% larger distances at delays ranging from 6 to 15 trials (see Supplementary Fig. S1A, B).Comparing pairs of trials with different types of objects, we observed approximately 3% larger response distances D (at all latencies) for the same recurring objects than for either different recurring or non-recurring objects (Supplementary Fig. S1C).Differential response amplitudes A increased marginally with latency, because response amplitudes tended to increase slightly over the course of each run (Supplementary Fig. S1D).This trend was evident for all types of objects and with both "structured" and "unstructured" sequences.In other words, the effect of object type on multivariate hemodynamic responses was limited to response distances and did not extend to response amplitudes.Thus, our data provided no evidence for "repetition suppression." For certain analyses (Sections 2.5.8 and 2.5.9),we established for each parcel w the average delaydependent distance T w (Δk ) = 〈d w,u,r (Δk )〉 u,r between patterns with a relative delay of Δk trials, where the average was taken over subjects u and runs r.The time-course T w allowed us to discount temporal correlations by computing d w,u,r corrected (Δk ) = d w,u,r (Δk ) − T w (Δk ) + 〈T w (Δk )〉 Δk , where 〈T w (Δk )〉 Δk is the average value over delays Δk.

Representation of shape "identity" for recurring objects
Our observations comprised approximately 200 activity patterns for each of the 15 recurring object classes (per observer and condition).To allow for cross-validation, we randomly divided these patterns in a larger "training set" (90% or 190 ± 7.7 per object class) and a smaller test set (10% or 22 ± 0.9 per object class) (Fig. 2B).Note that the "training set" comprised exclusively activity patterns associated with recurring objects.To reduce the variability introduced by random test sets, this selection was repeated N r = 20 times and all statistical measures described below represent the average over repetitions.As illustrated in Figure 2C, in the discriminative subspace S, we compared the n i test set exemplars x ki (where k = 1,…, n i ) of class i to the centroids c j train established for the training exemplars of class j.To compute Mahalanobis distances and variance ratios (see below), we compared test set exemplars x ki of class i to the centroids c j test of test set exemplars of class j.
We used three measures for this comparison, all with comparable results.Firstly, the nearest class centroid c i train to each pattern exemplar x ki was identified to establish a matrix of classification probabilities P( j| i) (probability that an exemplar of class i is nearest to the centroid of class j), also known as "confusion matrix," as well as the "classification accuracy" α = ∑ i P(i| i)P(i), which is the probability that the nearest centroid is the correct one.
Secondly, for each pair of object classes (i, j), object exemplars x ki and x kj from the test set were projected onto the line connecting the two test set centroids, c i test and c j test , and a pairwise discriminability/dissimilarity/ Mahalanobis distance δ i, j was computed from the means, µ i and µ j , and variances, σ i 2 and σ j 2 , of these projections, as . The average over all pairs of object classes was computed as δ = 2 κ κ − 1 ( ) Thirdly, given class centroids c i test and overall centroid c test , we computed the Euclidean distances d ki = x ki − c i test between exemplars x ki and class centroid c i test and, for each object class i, the "sum of squares" as The "within-class" variance of all classes was computed as where Similarly, from the Euclidean distances From the Euclidean distances d ki = x ki test − c test between exemplars and overall centroid, we computed "total" vari- Variances SS W , SS B , and SS T are also denoted, respectively, SS same , SS diff , and SS fam further below.To quantify the discriminability of classes, the variance ratio provided a non-parametric multivariate statistic (PER-MANOVA; Anderson, 2001).The average within-class and between-class dispersion per dimension could be estimated as σ W = SS W /(N − κ) and σ B = SS B /(κ − 1), respectively.

Minimum statistic
To test for statistical significance, we computed average classification performance (in terms of both classification accuracy α obs and f-ratio F obs ) over N r test sets, as well as over 10 3 first-level permutations of object identities (in each of the N r test sets).In principle, we could have tested an "individual null" hypothesis for every parcel and every data set, namely, the probability of obtaining the observed performance α obs (or F obs ) purely by chance.Instead, we computed the "minimum statistic" m = min k α k (or m = min k F k ) over data sets k, as well as over 10 5 second-level permutations (drawn randomly from the first level permutations) and tested the "global null" hypothesis, namely, the probability p n (m) of obtaining the observed minimum performance over n data sets purely by chance.This computation was performed separately for each of the 2 conditions (8 data sets from 8 observers per condition) as well as for the union of conditions (16 data sets from 8 observers).When the "global null" hypothesis could be rejected, we inferred statistically significant classification performance in at least some data sets.Our threshold for significance was p n !(m) < 0.05 after correction for multiple comparisons (758 parcels and 2 conditions) ( Allefeld et al., 2016).

Prevalence analysis
To summarize the results from all observers and conditions, we used a "prevalence analysis" ( Allefeld et al., 2016).Prevalence γ true is the fraction of significant performance over n = 16 data sets.To test the "prevalence null" hypothesis that γ true is below a threshold γ 0 = 0.5, an upper bound for P(γ true < γ 0 ) was obtained from the probability p n !(m) of the minimum statistic over n = 16 data sets, after correction for multiple comparisons: This was the criterion used to label parcels as "identity selective."Threshold prevalence γ !0.5 corresponded to corrected probability p n !(m) !0.0012 and minimal accuracy of 6.67% (i.e., near chance).Additionally, we computed γ est as the largest value for which the "prevalence null" hypothesis could be rejected from the number of data sets, and α = 0.05 the significance threshold.

Representation of shape "novelty" for non-recurring objects
Although recurring and non-recurring objects were comparable and generated in the same way, it seemed possible that neural representations might discriminate the class of 15 recurring objects from the class of 360 nonrecurring objects.Indeed, the two classes became discriminable after observers had learned to classify recurring objects as "familiar" and non-recurring objects as "novel."Accordingly, we considered this discriminability a representation of "novelty." To assess the neural representation of "novelty," we divided non-recurring and recurring objects into two sets of unequal size (approximately N = 216 × 15 recurrent or "familiar" exemplars vs. M = 360 non-recurrent or "novel" exemplars).From the Euclidean distances where between class centroids and overall centroid x k − c tot between exemplars and overall centroid, we obtained total variance To quantify the discriminability of non-recurring and recurring objects, we formed the variance ratio ( Anderson, 2001).Average within-class and betweenclass dispersion per dimension was obtained from σ W = SS W /(N + M − 2) and σ B = SS B , respectively.

Changes of representation analyzed in "batches"
To assess changes in neural representations over the course of the experiment, while also allowing for crossvalidation, we divided all recurring object presentations into five successive "batches" B 1 ,B 2 ,…, each with 20% of the presentations (Fig. 2D).In this way, we could select "test sets" for cross-validated DLDA from one particular batch, while retaining all other presentations as a "training set."As every recurrent object was presented 210 ± 9 times over all sessions, a batch would comprise 42 ± 1.8 presentations, a test set 21± 0.9, and a training set 189 ± 8.1 presentations.To reduce the variance deriving from test set selection, we repeated the random selection N r = 20 times and averaged over repetitions.
To quantify representational changes over the course of learning, we computed the variance ratios F m,w,u identity for each temporal window or batch m, identity-selective parcel w, and data sets u ∈ 1,…, 16 { }.
Additionally, we performed a regression analysis and quantified representational changes in terms of linear trends.Specifically, we determined a "rate" parameter β w identity by fitting a linear mixed-model F m,w,u identity = β 0,w + β w identity m + ξ 0,w,u + ξ 1,w,u m + ε m,w,u with data sets u as the grouping variable, where β 0,w was a fixed-effect coefficient, ξ 0,w,u and ξ 1,w,u were random effect coefficients, and ε m,w,u was residual error.
Similarly, to assess whether neural representations of non-recurring objects change with learning, we divided all object presentations (recurring and non-recurring) into five successive "batches" B 1 , B 2 ,..., each with 20% of the presentations (Fig. 2D), to obtain variance ratios Additionally, we performed a regression analysis to establish linear trends.Changes in the representation of object "novelty" were assessed by fitting the "rate" parameter β w novelty in a linear mixed-model F m,w,u novelty = β 0,w + β w novelty m + ξ 0,w,u + ξ 1,w,u m + ε m,w,u , with data sets u as the grouping variable, where β 0,w was a fixedeffect coefficient, ξ 0,w,u and ξ 1,w,u were random effect coefficients, and ε m,w,u was a residual error.
To establish linear trends F m = 〈F m,w,u 〉 w,u (of either identity and novelty) that average over both parcels w and data sets u, we obtained a rate parameter β 1 by fitting linear mixed-model F m,w,u = β 0 + β 1 m + ξ 0,w,u + ξ 1,w,u m + ε m,w,u with both parcels and data sets as grouping variables.

Geometry of representations
In the cross-validated analyses described above, subspaces S differed slightly between different batches (and training sets).To analyze the geometry of neural representations in a stable framework, we repeated some analyses in fixed subspaces S that reflected all observations (i.e., all recurring activity patterns x k ).In the fixed subspace, we calculated the normalized amplitude For each parcel w, data set u, and run r, we obtained the patterns, and the average amplitude of non-recurring patterns.Similarly, we obtained the pairs comprising one recurring and one non-recurring pattern.For recurring patterns, we further obtained the between pairs of recurring patterns in the same class and the average distance between pairs in different classes.All distances were corrected for the temporal auto-correlation by subtracting the time course of T w (i, j), as described above.
As described further above, the distances between individual activity patterns and different centroids-such as c tot , c nov , and c fam -yielded total variance SS T = SS tot , within-class variance SS W = SS fam + SS nov , and betweenclass variance SS B = SS novfam .For recurring patterns, distances to individual class centroids c i and overall centroid c fam yielded total variance SS T = SS fam , within-class variance SS W = SS same , and between-class variance SS B = SS diff .These values were computed for each parcel w, observer u, and run r, in order to obtain variance frac- { } over runs ′ r ∈s by determining a "rate" parameter β s for identityselective w and non-selective parcels ′ w .Each β s coefficient was acquired from a linear mixed-model Y ′ r ,w,u = β 0,s + β s r′ + ξ 0,w,u + ξ 1,w,u r′ + ε ′ r ,w,u with observers and parcels as grouping variables, where β 0,s was a fixed-effect coefficient, ξ 0,w,u and ξ 1,w,u were random effect coefficients, and ε was residual error.The same approach was used to assess gradual changes over runs in the centroid-to-centroid distances D same (r ), ΔD same (r ), D nov (r ), and ΔD nov (r ).This served to test the statistical significance of linear rates β s in each session.Sessions with significant rates are marked by stars in Figure 6.

Stability of shape identity and novelty representations
We also assessed the stability of the representation of the 16 response classes (15 recurring and 1 nonrecurring) over the course of the experiment.To this end, we compared the average representation in individual runs r (centroids C r of responses to exemplars) to the average representation over all runs (centroids C ave ).For observer u, identity-selective parcel w, and object class i, we calculated the Euclidean distance D u,w,i,r between the relevant C r and C ave , and also the difference ΔD u,w,i,r between the relevant centroids from successive runs, C r and C r +1 .After averaging over observers u, identity-selective parcels w, and object classes i , we obtained D same (r ) and ΔD same for recurring objects and by D nov (r ) and ΔD nov (r ) for nonrecurring objects.
As a baseline for comparison, we also computed the distances D u,w,i,r and differences ΔD u,w,i,r that may be expected purely on the basis of response variance.To this end, we permuted the sequence of all 3,600 trials, separately within each of the 16 response classes (15 recurring and 1 non-recurring) such as to obtain 18 "pseudo-runs" with 200 trials each.Expectation values were obtained by repeating this N r = 1,000 times.
We note that, in an n-dimensional hypersphere of unit radius, the average Euclidean distance between two random points is with d ave ≈ 1.4017 for n = 14.

Dimensional reduction
To visualize representational geometry in two dimensions, we randomly sampled 50 response patterns to each of the recurring and non-recurring objects within the first and the last sessions and calculated a 1,600 × 1,600 pair-wise distance matrix (D w,u ) for each identity-selective parcel w and subject u.We did not wish to average distance matrices over observers, as we did not expect the activity patterns of different observers to be comparable.To sidestep this difficulty, we permuted the order of recurring objects 100 times and for each subject obtained an average matrix D over permutations, which was then averaged over subjects.To visualize the representational geometry of identity in the first and the last session, we used multidimensional scaling (Matlab function mdscale, metric stress) to map the distances matrices for recurring objects (50 exemplars from the first session and 50 exemplars from the last session) into a two-dimensional space.To visualize the representational geometry of novelty, we restricted the distance matrix to non-recurring objects (50 exemplars from the first session and 50 exemplars from the last session) and just 3 of the 15 recurring objects (20 exemplars from either session).

RESULTS
Observers viewed sequences of computer-generated objects, with each object shown for 2.5 s while rotating in three dimensions (Fig. 1A, B, a movie may be viewed HERE).Over three sessions, observers viewed 3,600 objects in total, of which 3,240 were presentations of recurring objects (15 different objects, each appearing approximately 216 times) and 360 were presentations of non-recurring objects (360 different objects, each appearing once).The display was intended to be sufficiently intriguing to remain interesting over 3 successive days.To this end, presentations never repeated exactly.
Observers were required to classify each object as "familiar" (recurring) or "novel" (non-recurring).The task performance improved as observers became increasingly familiar with recurring objects, as illustrated in Figure 1C.
Over the first 600 presentations, classification performance improved approximately from 50% correct (chance) to 85% correct, and reaction times decreased approximately from 1.65 s to 1.25 s.Over the remaining 3,000 presentations, performance improved further to approximately 90% correct and reaction times decreased further to approximately 0.95 s.After three sessions, all observers were "familiar" with all recurring objects and could pick them out from an array of distractor objects.All sessions were performed in an MRI scanner while whole-brain functional imaging data were being collected.In the following, we report the results of three types of analyses.First, we describe the cortical areas in which multivariate BOLD activity encodes information about the identity of recurring objects ("object identity"), as determined by cross-validated analyses of entire data sets (3 sessions per observer).Second, we describe changes in cortical representations over coarse time intervals, by means of cross-validated analyses of successive parts of the data sets (3 sessions divided into 5 batches).These changes pertain to the encoding of both recurring objects and the distinction between recurring and non-recurring objects ("object novelty").Third, we describe changes in representations over finer time intervals (3 sessions divided into 18 runs), by foregoing crossvalidation and adopting a fixed reference frame.These finer intervals confirm the results from coarse intervals but reveal more details about the geometry of neural representations and their development over time.

Cross-validated representation of object identity
To assess the extent to which multivariate neural responses to recurring objects encoded object identity, we relied on optimal linear classifiers combined with cross-validation ("direct linear discriminant analysis," DLDA, see Methods for details).Specifically, we quantified the "identity" information in multivariate responses of every parcel w ∈ 1,…, 758

{
} and data set u ∈ 1,…, 16 { } in terms of classification accuracy α w,u , average pairwise dissimilarity δ w,u , and the ratio of between-class and within-class variance F w,u .All three measures proved highly correlated and supported similar conclusions.For example, Figure 3B illustrates the correlation of classification accuracy α w,u and variance ratio F w,u (ρ = 0.94, p < 0.001).The correlations of a w, u and δ w,u (ρ = 0.95, p < 0.001), and of δ w, u and F w,u (ρ = 0.98, p < 0.001) were comparably strong.The results of individual observers from the two experimental conditions (structured and unstructured object sequences) were highly similar as well, demonstrating test-retest consistency (Supplementary Fig. S2).
For most parcels, the results from different observers showed considerable variability.Whereas a few parcels exhibited significant accuracy α w,u and variance ratio F w,u in all data sets (e.g., Calcarine 331), in many parcels the representation of object identity was significant only in some data sets (e.g., Parahippocampus 325) (Fig. 3B).Global significance was assessed by comparing the minimal accuracy or variance ratio over the 8 data sets from one condition (structured or unstructured) to the minimal values obtained with shuffled data (red ellipse in Fig. 3C, see Methods for details).
Minimal classification accuracy α w was significant in 17% of all parcels (128 of 758 parcels) in the structured sequence condition and in 19% of parcels (146 of 748) in the unstructured condition (p !0.05, corrected for multiple comparisons), when compared to null-distributions obtained from shuffled object identities.For minimal variance ratios F w,u , the corresponding values were 18% and 17%, respectively (136 and 130 parcels).To combine the results from both conditions, we used a "prevalence" analysis to determine parcels in which "identity" was represented significantly in a majority of all 16 data sets (prevalence γ ≥ 0.5), once again comparing the observed minimal values to the minimal values obtained with shuffled data (red ellipse in Fig. 3C, see Methods for details).
Figure 3A illustrates the 124 parcels identified as significantly "identity-selective" by the prevalence criterion γ ≥ 0.5 and Supplementary Figure S3 shows the same information in terms of a sliced brain.Among these were 70 parcels in the occipital cortex, 29 in the parietal cortex, 18 in the fusiform or temporal cortex, and 7 in the frontal cortex.The average prevalence of identityselectivity in these parcels was 0.663 ± 0.016 (mean and S.D.), and the minimal value was 0.58.As the prevalence criterion (based on 16 data sets) was marginally more conservative than the accuracy criterion (based on 8 data sets), 120 of the 124 parcels were significantly "identityselective" in terms of both criteria.The four exceptions (identified only by prevalence, but not by accuracy) were Frontal-superior-R 56, Occipital-superior-R 393, Occipitalmiddle-L 403, and Parietal-superior-R 510.Appendix Table A1 lists the statistical significance of all three criteria for all "identity-selective" parcels.
Overall, there was a pronounced posterior-anterior gradient.Whereas many parcels at the posterior pole of the brain exhibited high classification accuracy, this tended to progressively decrease at more anterior locations (Fig. 3A; Supplementary Fig. S3; Appendix Table A1).To formalize this trend, we assigned 66 of the 124 identity-selective parcels to the 25 topographic visual areas defined by Wang et al. (2015) and, additionally, to the anterior inferior temporal cortex (AIT) and to the inferior frontal cortex (IFC).Supplementary Figure S6 provides an overview of all topographically assigned and non-assigned parcels selective for identity.As illustrated in Figure 8A, this assignment showed that accuracy was comparable in early visual areas (V1-hV4) and in the posterior-ventrolateral regions of the temporal lobe, whereas accuracy was lower in the anterior temporal cortex, the inferior frontal cortex, and in parietal cortical areas.

Cross-validated changes with learning
To assess changes with learning, we separately analyzed five successive and non-overlapping sets of trials ("batches") with linear classifiers and cross-validation (see Methods for details).Specifically, we established ratios of between-and within-class variance for both object identity (15 classes formed by responses to 15 recurring objects) and for object novelty (2 classes formed by responses to recurring and non-recurring objects, respectively).These two variance ratios measured the neural representation of "identity" and "novelty." Variance ratios were converted to z-score values (with respect to the mean and variance of the corresponding shuffle distribution) before being averaged over data sets and/or over parcels.Figure 4A summarizes the results in terms of a grand average over all identity selective parcels.The average identity and novelty ratios were highly significant in all batches ( p < 0.001).Over successive batches, the average identity ratio weakened slightly but significantly (p < 0.05), whereas the average novelty ratio strengthened considerably, especially between batches m = 1 and m = 2 ( p < 0.001).are colored according to a w ave as in (A).A minimum above chance 6.67% corresponds to a prevalence γ above 0.5 (dotted vertical line).The distributions obtained with shuffled identities are indicated as well (red cross and ellipse).
As expected, it was the between class-variances SS B identity and SS B novelty that changed significantly over successive batches m ( p < 0.05 and p < 0.001, respectively), whereas the within-class variances SS W identity and SS W novelty remained essentially the same (p = n.s.), as illustrated by Figure 4B.This was owing to the DLDA algorithm, which maintained within-class variance near unity.Nevertheless, over successive batches, the neural representations of recurring objects tended to become slightly more similar to each other, but more dissimilar to the representations of non-recurring objects.
To ascertain that these overall trends hold true also for individual parcels, we carried out more conventional regression analyses of variance ratios F m,w, u identity and F m,w, u novelty over batches m, parcels w and data sets u.Specifically, we fitted linear mixed-models in order to estimate "rate" parameters β w identity and β w novelty for each identity-selective parcel w.The results revealed negative rates β w identity and positive rates β w identity for almost all parcels, confirming the overall trends in Figure 4C.The variability over parcels was numerically larger for β w novelty (0.15 ± 0.1, mean and S.D.) than for β w identity (0.022 ± 0.015), with both rates weakly correlated (ρ = 0.30, p < 0.001).
Classification accuracy α w identity correlated negatively with β w novelty (ρ = −0.22,p < 0.05) and with β w identity (ρ = −0.74,p < 0.001).To take a closer look at the interaction between "novelty" and "identity," we divided the identity-selective parcels into "novelty terciles" (high, medium, and low, defined by β novelty ) before comparing representations of novelty (F novelty ) and identity (accuracy α) (Fig. 5B).The results differed substantially between batches and terciles.In early batches, F novelty and α correlated for all terciles, suggesting that initially the representations of non-recurring and recurring objects were linked.However, in successively later batches, this correlation waned in the upper tercile.This may suggest that pronounced representations of non-recurrent objects progressively detached from representations of recurrent objects.
Figure 5A illustrates the degree to which identityselective parcels express the overall novelty trend, as quantified by fitted rate β w novelty , and Supplementary Figure S4 shows the same information in terms of brain slices.An anterior-posterior gradient is evident, with a more pronounced representation of novelty at anterior than at posterior locations.This gradient is also apparent when parcels are assigned to topographic visual areas, as illustrated in Figure 8B.Appendix Table A1 lists the rates β w novelty for all identity-selective parcels.

Geometry of identity and novelty representations
Next, we present results from alternative analyses relying on fixed subspaces S for each data set (3,600 trials).
Fixed subspaces reveal a more detailed geometry of neural representations and allow any changes in this geometry to be tracked over successive runs (200 trials each).The disadvantage of this approach is that it precludes cross-validation.Our aim was to establish not just between-and within-class variances, but also the distances underlying the variances, and the response amplitudes underlying the distances.For the representation of object "identity," the within-and between-class geometry was defined by response pairs to same and to different recurring objects, respectively.For the representation of object "novelty," the within-class geometry reflected responses either to pairs of familiar (recurring) or to pairs of novel (non-recurring) objects, whereas the betweenclass geometry concerned responses to mixed pairs of objects (novel-familiar).
We analyzed multivariate responses in terms of variances, distances, and amplitudes and averaged the results over all data sets and all 124 identity-selective parcels, to obtain separate mean values (and standard errors) for each of the 18 successive runs.Additionally, we averaged the results over the remaining 634 (nonidentity-selective) parcels of the brain.We hoped that this would help distinguish more general effects and trends (e.g., habituation, attention, alertness) from learningrelated changes in shape representations.All distances in these analyses were residual distances, to minimize the influence of temporal auto-correlations (Supplementary Fig. S1; see Methods for details).
The analyzed quantities-response amplitudes A, response distances D, and variances SS-are illustrated schematically in Figure 6A, and the results are presented in Figure 6B-D in terms of the mean values and standard errors for every run.In identity-selective parcels, response amplitudes A fam to recurring patterns decreased during the first session (runs 1 to 6, p < 0.05), but not in the second and third session (runs 7 to 12, runs 13 to 18, p > 0.5).Response amplitudes A nov to nonrecurring patterns showed no significant change ( p n.s.) in any session (Fig. 6B).In non-selective parcels, response amplitudes decreased in all sessions, consistent with general habituation.In identity-selective parcels, response distances D diff between different recurring objects declined similarly during the first session ( p < 0.05), but not during subsequent sessions ( p > 0.6) (Fig. 6C).Also, response distances D same between the same recurring objects did not change significantly during any session (p n.s.).In contrast, response distances D nov between non-recurring objects declined disproportionately during the first session ( p < 0.05) but increased during the third session ( p < 0.05).Response distances D novfam between recurring and non-recurring objects, on the other hand, did not change significantly over sessions ( p n.s.).
A first conclusion is that response amplitudes and response distances are consistently larger for recurring objects (blue traces in Fig. 6B, C) than for non-recurring objects (red traces).Importantly, in the very first run, response distances are comparable between different recurring objects (D diff ) and different non-recurring objects ( D nov ), demonstrating that both recurring and non-recurring objects were represented comparably well.Over subsequent runs, response distances decrease far more between different non-recurring objects (D nov ) than different recurring objects (D diff ), demonstrating that a comparative advantage for recurring objects develops gradually (i.e., a kind of repetition enhancement).A second conclusion is that the observed development differs between identity-selective and non-selective parcels.Whereas amplitudes and distances stabilize in the former group of parcels, they habituate progressively in the latter group (both within and between sessions).Thus, the responsiveness of identity-selective parcels remains stable over sessions.A third conclusion is that response distances D nov between different non-recurring objects become comparatively small (already during the first session), not only smaller than the distances D diff between different recurring objects but even smaller than the distances D same between the same recurring objects.
The results for response variances confirmed the trends observed earlier in the batch analysis of crossvalidated variance ratios (Fig. 4A, B).Between-class variance SS diff for recurring objects declined over the course of sessions ( p < 0.005), whereas between-class variance SS novfam for non-recurring objects increased over the first session ( p < 0.005 ), only to decline again during the third session ( p < 0.05).Within-class variances SS same and SS nov remained largely unchanged.The close correspondence between the trends observed over runs and over batches is illustrated also in Supplementary Figure S5.Surprisingly, non-identity-selective parcels mirrored the trends observed for identity-selective parcels in attenuated form.The fact that between-and within-class variances differ systematically suggests that even nonidentity-selective parcels represent object identity to some degree.
It is natural to compare these results to the timecourse of behavioral performance (fraction correct and reaction time) in our observers Fig. 1C, D).The changes in the representation of recurring objects (between class distances D diff and variances SS diff ) show a gradual decrease in the quality of representation and thus do not correspond to improving performance in terms of fraction correct.However, the changes in the representation of non-recurring objects, including the decrease of withinclass distances D nov and variances SS nov and the increase of between-class variances SS novfam and variance ratio R novelty , do correspond to the rapid improvement in fraction correct over the first few runs.Thus, the neural changes over the course of learning point to diverging representations of "novel" (non-recurring) and "familiar" (recurring) objects.

Stability of identity and novelty representations
Relying on fixed subspaces S to analyze each data set also permitted us to assess the stability of neural representations over successive runs.With this in mind, we established the centroids of response classes for each run and examined the displacement of centroids between successive runs.As this calculation concerned centroidto-centroid distances (rather than exemplar-to-exemplar distances), we could not correct for temporal autocorrelations.
The computation of centroids for particular response classes is illustrated schematically in Figure 7A.Given the centroids C r −1 and C r for successive runs r − 1 and r and the average centroid C aver over all runs, we computed absolute centroid-to-centroid distances DC r = C r − C ave as well as relative centroid-to-centroid distances ΔDC r = C r − C r −1 .The 16 response classes were formed by each recurring object (15 classes, DC same and ΔDC same ) and by the non-recurring objects (1 class, DC nov and ΔDC nov ).To compare the displacements expected from sampling noise, we also computed the centroid-to-centroid distances after permuting the responses in each class and regrouping them into 18 "pseudo-runs" (see Methods for details).
The results are shown in Figure 7B.For both recurring and non-recurring objects, average absolute distances DC same r ( ) and DC nov r ( ) diminished during the first ses- sion (runs 1 to 6, p < 0.005), but remained stable during the second and third sessions (runs 7 to 12, and 13 to 18, p > 0.2 ).Notably, absolute distances DC nov r ( ) of novel objects decreased to a much lower average level.Relative distances ΔDC same r ( ) and ΔDC nov r ( ) between succes- sive runs declined during the first session (runs 1 to 6, p < 0.05), remained stable during the second session (runs 7 to 12, p > 0.2), only to decline once again the last session (13 to 18, p < 0.005 for recurring and p < 0.05 for non-recurring objects).Absolute distances were far larger for recurring than for non-recurring classes, corroborating the substantial "response enhancement" already noted above.Both absolute and relative distances were slightly smaller than predicted by sampling noise (thin, pale lines, p < 0.001), demonstrating that responses of true runs were distributed slightly more compactly and consistently than those of pseudo-runs.Note also that relative distances approached the values expected for fully random displacements in a 14-dimensional hyperspherespecifically, relative distances ΔDC were approximately 1.4 times larger than absolute distances DC -again underlining the dominant influence of sampling noise.

DISCUSSION
We studied the cortical representation of synthetic visual objects over multiple days of repeated viewing, while observers learned to classify initially unfamiliar objects as "familiar."Relying on "representational similarity analysis" (RSA), we established distances between spatiotemporal hemodynamic (BOLD) responses to exemplars of different recurring objects, as well as to exemplars of non-recurring objects.Response distances between the same and different recurring objects quantified the neural representation of object identity.Response distances between recurring and non-recurring objects measured the neural representation of object novelty.The results showed that object identity was neurally represented from the start, in the ventral occipitotemporal cortex and beyond.With growing familiarity, the quality of this neural representation remained high, but its geometry expanded to fill the available representational space.In contrast, the neural representation of non-recurring objects (which remained "novel" by definition) improved over time, but its geometry contracted and shifted to the margins of the representational space.

Cortical representation of object identity
To permit a fine-grained analysis of representational geometry, we generated complex and three-dimensional shapes that were highly characteristic and distinguishable and presented these shapes from various points of view and in various states of rotation (always for one complete turn) ( Kakaei et al., 2021).Thus, observers had to recognize an object from all sides in order to classify it as "familiar."Within the category of our synthetic shapes, every recurring object constituted strictly speaking an "exemplar," with individual presentations providing different "instantiations."However, we chose to term objects "classes" and individual presentations "exemplars," as this terminology conforms better to RSA conventions.The selectivity of cortical parcels for object identity was assessed in optimized 14-dimensional subspaces S of the much higher-dimensional space of multivariate responses (O(10 3 ) dimensions).Specifically, we computed a cross-validated "classification accuracy" ( Kriegeskorte, Mur, & Bandettini, 2008) and used a prevalence analysis to combine results from different conditions and observers ( Allefeld et al., 2016).Essentially identical results were obtained with alternative measures such as "linear discriminability" and "variance ratio" (of between-and within-class variance; Anderson, 2001).When spatiotemporal responses to different objects are linearly discriminable, they form a neural representation of object identity.As exemplars of each object were presented from various sides, any such neural representation was by definition view-invariant.The obvious caveats are (i) that object rotation may have exposed the same characteristic features in many or most presentations and (ii) that multivariate hemodynamic responses over 9 s can only distantly reflect the neuronal activity evoked during each 2.5 s presentation.Nevertheless, hemodynamic signals exhibited significant invariance to the various modes of presentation of a given object (e.g., the initial perspective, the axis, and the sense of rotation).
In contrast to many other studies, we did not observe suppressed responses when objects were repeated (i.e., no "repetition suppression") but rather a small enhancement of responses both with longer delays and later trial numbers (Supplementary Fig. S1).This may simply reflect the fact that the object presentations were highly variable and never repeated exactly.Recall that we designed a highly variable display such as to retain the observers' interest over 3 successive days.
The 124 of 758 parcels that were identified as "identityselective" on this basis were situated mostly in the ventral occipitotemporal cortex, but some parcels were also located in the parietal or frontal cortex, as illustrated in Figure 3A.The degree of selectivity exhibited a clear gradient, being stronger at the posterior pole and becoming progressively weaker in more anterior and more dorsal regions, as summarized in Figure 8A.These results are consistent with previous findings that multivariate activity distinguishing different exemplars of a particular class of objects (e.g., faces) is present in the ventral and lateral occipital cortex, on the fusiform gyrus, and in the ventral temporal cortex ( Brants et al., 2016;Eger et al., 2008;Visconti di Oleggio Castello et al., 2021).
In general, it is thought that progressively "higher" levels of visual processing represent progressively "larger" visual sets, beginning with image features, and widening gradually to object features, object exemplars, object categories, and finally to supercategories such as animate or inanimate objects, or objects and landscapes ( Grill-Spector & Weiner, 2014).Accordingly, the discriminability of exemplars within a category is expected to diminish at more anterior locations, which correspond to "higher" levels of visual processing ( Eger et al., 2008;Grill-Spector & Weiner, 2014).Moreover, it has been hypothesized that the spatial scale of neural representations increases with the level of abstraction, in the sense that exemplars are represented at smaller scales than categories ( Grill-Spector & Weiner, 2014).Thus, if this trend is exacerbated in the more anterior parts of the ventral pathway, exemplar representations may become progressively less discriminable at the spatial resolution of BOLD signals.
A previous study of visual expertise for synthetic shapes ( Brants et al., 2016) reported a gradual enhancement of neural representations in object-selective areas, whereas we observed a moderate decline.This difference may have been due to task design.Brants and colleagues used barely discriminable shapes and emphasized perceptual load, whereas we used highly distinguishable shapes and emphasized memory load.
We also observed identity-selectivity in frontoparietal regions that are typically associated with the dorsal visual pathway and the right frontoparietal "attention network."This is consistent with previous findings on the presence of object-and/or face-selective representations in dorsal areas ( Freud et al., 2017;Jeong & Xu, 2016;Konen & Kastner, 2008;Poirier et al., 2006;Visconti di Oleggio Castello et al., 2021).However, the interpretation of this selectivity is not straightforward.Particularly the clusters associated with the "attention network" are often found to express functional correlations with ventral visual areas in both resting and task states ( Dornas & Braun, 2018;Mutlu et al., 2022;Smith et al., 2013).Thus, it seems possible that multivariate functional correlations could have propagated identity-selectivity feedforward throughout the "attention network" and beyond.
Finally, we observed pronounced identity-selectivity in the primary visual cortex (calcarine sulcus, left and right), where neuronal activity encodes basic visual features (orientation, spatial frequency, direction of movement, and so on) ( Grill-Spector & Weiner, 2014;Haxby et al., 2001).It is possible that multivariate hemodynamic responses in the primary visual cortex could have reflected this visually evoked neuronal activity sufficiently well to have encoded object identity, especially as the rotation may have exposed the same low-level features in many or most presentations.Additionally, hemodynamic responses could have been driven by spatiotemporal patterns of feedback from higher areas of the visual cortex.There is some evidence to suggest that feedback can dominate the hemodynamics of the early visual cortex under continuous viewing conditions (as used here) ( Blake & Braun, 2009).

Cortical representation of novel object shapes
We also investigated the representation of "novel" object shapes that were encountered only once (and never recurred).Note that "novelty" is here not meant to imply "surprise" for the observer in the sense of a violation of expectations (e.g., Uddin, 2015).Rather, it simply denotes the more heterogeneous class of non-recurring objects (with 360 exemplars, each from a different object), as distinct from the 15 more homogeneous classes of recurring objects (with approximately 200 exemplars each, all from the same object).As mentioned, "novelty" was measured in terms of the linear discriminability of hemodynamic responses to non-recurring and recurring objects in 14-dimensional subspaces S, more specifically, by comparing pairwise response distances between classes (recurring and non-recurring) and within classes (either recurring or non-recurring).
All 124 "identity-selective" parcels were also "noveltyselective," in the sense that hemodynamic responses discriminated non-recurring and recurring objects to some degree, as illustrated in Figure 5A.As discriminative subspaces were optimized for recurring objectsthat were generated in the same way as non-recurring objects-some degree of discriminability was to be expected.Moreover, as non-recurring objects were more numerous (360 objects) than recurring objects (15 objects), some discriminability was expected purely by chance, particularly in a 14-dimensional space.However, as discussed further below, the linear discriminability of non-recurring objects increased over successive runs and sessions, mirroring observers' improving ability to classify objects as "novel" or "familiar."Because of this dynamic aspect, we quantified the novelty-selectivity of cortical parcels in terms of an "improvement rate," β novelty (Fig. 4).Interestingly, there was an anterior-posterior gradient in that novelty-selectivity was more pronounced in more frontal, parietal, and anterior temporal areas than more posterior temporal and occipital areas, as summarized in Figure 8B.In other words, the representational disparity between familiar object shapes and novel objects shapes tended to be larger in the higher-level (more anterior) visual cortex than in the lower-level (more posterior) cortex, suggesting that learning effects were more pronounced.

Representational changes with learning
As representational changes with learning were the main objective of our study, we addressed this issue with several complementary approaches.First, we divided our observations from 18 runs into five successive "batches" and established the neural representation of both "identity" and "novelty" separately for each batch with cross-validated statistics, while aggregating over all identity-selective parcels (Fig. 4B).Second, to assess changes in individual parcels, we performed a regressional analysis of the same cross-validated data and obtained "rates" of representational changes for every identity-selective parcel (Fig. 4C).Third, we adopted stable discriminative subspaces S and sacrificed cross-validation in order to analyze representational geometry over individual runs (Fig. 6).All three approaches yielded comparable results.
Already in the first run and the first batch, without time for plasticity or learning, the neural representations of identity were maximally differentiated (Figs.4A and 6D; Supplementary Fig. S5).This initial identity representation was most pronounced in known object processing areas, including the ventral occipitotemporal cortex and early visual cortex.Apparently, pre-existing representations based on life-long experience were sufficient to immediately provide a view-independent representation of synthetic shapes, which we had designed to be highly characteristic and discriminable.In contrast, neural representations of novelty were minimally differentiated in the first run and the first batch.As there was no systematic difference between recurring and non-recurring objects (and without time for plasticity), any residual initial discriminability of novelty must be attributed to chance.
Over subsequent runs and batches, the neural representation of object identity remained pronounced, but its quality declined steadily over time (Figs.4A and 6D; Supplementary Fig. S5).Some decline in BOLD activity is not untypical for learning studies over multiple days and is commonly ascribed to repetition suppression, sparsification of responses, and/or diminishing attention or effort (e.g., Poldrack, 2000).However, while our results are consistent with such a scenario in non-identity-selective parcels, they do not support a general decline of activity in identity-selective parcels, as the response amplitudes and distances in these parcels declined only initially and subsequently remained stable (Fig. 6B, C).
In contrast, the neural representation of object novelty improved substantially over subsequent runs and batches.The time course was similar in both analyses (batch-by-batch and run-by-run), with the steepest improvement occurring over the first few runs (Figs.4A and 6D; Supplementary Fig. S5).However, the detailed results revealed that this "improvement" (in discriminating non-recurring and recurring objects) actually reflected a deterioration in the representation of non-recurring objects (i.e., diminishing response distances, Fig. 6C).
In absolute terms, response amplitudes and distances were already larger for recurring objects and smaller for non-recurring objects during the first run and the difference increased over the next few runs (Fig. 6B, C).Apparently, recurring objects benefited from a "repetition enhancement," as the only immediate and systematic difference between recurring and non-recurring objects was the frequency of recurrence.Interestingly, this enhancement was comparable for "structured" and "unstructured" sequences, even though the repetition latencies were quite different (Supplementary Fig. S1B,  C).Accordingly, we hypothesize that the enhancement was not merely a passive effect but rather a consequence of task relevance and cognitive engagement (Supplementary Fig. S1B, C).
As mentioned, the rates of change of identity and novelty representations differed systematically between cortical regions (Fig. 8C).Intriguingly, the rates of novelty gain and identity loss varied inversely over the cortical hierarchy: in early visual areas (V1, V2, V3, hV4), identity declined rapidly, whereas novelty grew slowly.At the opposite end, in the inferior frontal cortex (IFC) and anterior ventral temporal cortex (AIT), identity declined slowly, but novelty grew rapidly.In the higher visual cortex (VO, LO), both rates were intermediate.
It is informative to visualize the observed representational changes in two dimensions (Fig. 9), while approximately preserving the relative pairwise distances in the discriminative subspaces S.This visualization makes clear that the neural representation of recurring objects expands between the beginning and the end of the experiment, filling the available representational space (Fig. 9A).The expansion explains our observation that the linear discriminability of object classes degrades but remains high.In contrast, the neural representation of non-recurring objects contracts between the beginning and the end of the experiment while also shifting to the margins of representational space, which explains why the linear discriminability of non-recurring objects improved over time (Fig. 9B).These two opposite developments may reflect both cognitive engagement and repetition frequency: representations may expand for objects that observers attempt to memorize and/or that recur frequently, but contract for objects that observers learn to ignore and/or that are rare.
In addition to relative changes in representational geometry indexed by linear discriminability, we established absolute changes in representational geometry, indexed by distances between response centroids in successive runs (see Fig. 7).The results were dominated by sampling noise, and the displacement of centroids was comparable to random jumps in a hypersphere while maintaining a given distance from its center.However, both absolute and relative centroid distances were slightly (and significantly) smaller than predicted by sampling noise, indicating that the representations were slightly more consistent and compact.The most interesting result of this analysis was that centroid distances were approximately 30% smaller for non-recurring than for recurring objects, highlighting again the representational disparity noted above.

Behavioral and cognitive changes with learning
The behavioral changes over three sessions of viewing sequences of objects included both increased classification performance ("familiar" or "novel") and decreased reaction times.Both behavioral measures changed rapidly during the first three runs of the first session and more slowly during the second and third sessions (Fig. 1).As described elsewhere ( Kakaei et al., 2021), the classification of a particular object typically changed from (mostly) "novel" to (mostly) "familiar" at one identifiable point in time during the sessions, which we termed "onset of familiarity."This objective observation was consistent with the subjective reports of observers that they memorized all three-dimensional shapes one by one, such that every object became recognizable from all sides.Some observers also mentioned having assigned linguistic labels to individual recurring objects.After the three sessions, all observers were "familiar" with all recurring objects and could pick them out from an array of distractor objects.
Only some of these behavioral changes have obvious counterparts in the neural changes discussed above.First, the decrease of reaction times from under 2 s to under 1 s implies that observers spend less time actively evaluating the stimulus and more time passively observing it.However, the neural response of identity-selective parcels does not mirror this trend, as both response amplitudes and response differences stabilize after the first few runs (Fig. 6B, C).In the rest of the brain (nonidentity-selective parcels), the neural responses do show a progressive decrease, but any attribution would be speculative.
Second, the increase in objective performance and in subjective "familiarity" was not mirrored directly in neural responses to recurring objects, as multivariate responses were sufficiently rich to identify such objects from the very start.However, multivariate responses were dispersed over the three sessions such as to fill more of the available space (see above).This growing response diversity is a plausible correlate of memory consolidation, that is, the formation of stable long-term memories in visually responsive cortical areas.When such memories are consolidated, one would expect that increased connectivity would enhance pattern completion over additional levels of representation, rendering network activity more complex (e.g., Steinberg & Sompolinsky, 2022).It is worth noting that this development was observed for both types of presentation sequences ("structured" and "unstructured"), suggesting that neural consolidation was due to task relevance and not merely to repetition latency.
Third, the increase in objective performance was mirrored indirectly in neural responses to non-recurring objects.Whereas these responses were initially comparable to recurring responses, they contracted over three sessions into a smaller part of the available space, thus becoming more stereotypical.As this part was comparably distant from all recurring responses, it lay at the margins of the representational space.The time course of classification performance corresponded best to this particular development in neural representations.Accordingly, this development was a plausible indirect correlate of memory consolidation, in the sense that visually responsive areas grew less responsive to other objects that failed to match the newly formed long-term memories.

CONCLUSION
We analyzed the cortical representation of visual objects in the multivariate hemodynamic responses of 758 brain parcels.For each parcel, we used linear discriminant analysis to map the O(10 3 )-dimensional responses into a lowerdimensional subspace that optimally discriminated the 15 stimulus classes (recurring objects).Optimal subspaces captured a part of the correlated variance and overlapped substantially with the principal components of the responses.Typically, 2/3 of the principal component variance discriminated between stimulus classes (and thus coincided with the optimal subspace), while the remaining 1/3 was shared between stimulus classes.Our analyses revealed where and how the cortical representations of visual objects changed as visual expertise was being acquired and consolidated by the observers.
Our results were broadly consistent with other recent studies of visual expertise, which have highlighted the roles of three pathways or networks ( Kravitz et al., 2011( Kravitz et al., , 2013)), an occipitotemporal pathway ("ventral pathway"), an occipitoparietal pathway ("dorsal pathway"), and a right frontoparietal network ("attention system").Several studies linked behavioral performance to enhanced activity and/or representation in the frontoparietal network ( Duyck et al., 2021;Poirier et al., 2006;Visconti di Oleggio Castello et al., 2021), as well as in the more anterior parts of the occipitotemporal pathway and the more dorsal parts of the occipitoparietal pathway ( Christophel et al., 2017).
Due to our focus on object shape, our results do not speak directly to the modulation of cortical responses by expectation, such as "expectation suppression" or "surprise signalling" ( Barron et al., 2016;Bell et al., 2016;Mayrhauser et al., 2014;Vinken et al., 2018).Moreover, in our paradigm, object presentations were never repeated exactly and every object presentation contained elements of surprise, as neither the object, nor the point of view, nor the direction of rotation could be anticipated by observers.
The most robust representations of object shape for both recurring objects ("identity") and non-recurring objects ("novelty") were observed in the ventral occipitotemporal cortex, at the intermediate levels of the shape processing hierarchy ( Grill-Spector & Weiner, 2014;Perry & Fallah, 2014).Additionally, we found representations of object shape in "dorsal stream" cortical areas, consistent with the view that these areas encode goal-and taskrelated object features ( Perry & Fallah, 2014).
The most novel aspect of our findings was changes in the geometry of cortical representations as visual expertise for recurring objects was being acquired and consolidated.In relative terms, distances between response classes decreased, and/or distances within classes increased, while observers repeatedly viewed and became familiar with the corresponding stimulus classes.This modest decline in stimulus encoding was however associated with an expansion (or diversification) in the distribution of responses within classes, so that responses of all classes taken together scattered more uniformly over the available representational space.Changes in cortical representations were quite different for stimuli that appeared only once and that observers did not attempt to memorize (non-recurring objects).Here, again in relative terms, distances between classes (nonrecurring and recurring) increased and/or distances within classes (non-recurring) decreased.This steep growth in class encoding was associated with a substantial contraction (or stereotypisation) in the distribution of responses, in the sense that responses to non-recurring objects shifted to the margin of the available representational space.
We conclude that hemodynamic responses to novel object shapes immediately represent the differences between these shapes, even prior to learning, presumably reflecting life-long prior experience.When object shapes grow familiar with learning, hemodynamic responses to the same shapes become more diverse, whereas responses to different shapes remain comparably dissimilar from each other.Responses to control objects that are always novel develop quite differently in that they become less diverse relative to each other, but also more dissimilar from responses to familiar objects.
trials k and l measured on average 〈d〉 = 1.40, consistent with distance expected between random patterns of this amplitude ( 2 ).Averaging over trials k produced normalized response amplitudes A = 〈a k 〉 k .Averaging over pairs of trials k, l separated by a given latency l − k , produced normalized response distances D(l − k ) = 〈d kl 〉 k,l .
F m,w,u novelty for each temporal window or batch m, identity-selective parcel w , and data sets u ∈ 1,…, 16 { }.After averaging over 16 data sets, F m,w novelty = F m,w,u novelty u , we assessed statistical significance by shuffling (10 3 permutations) the identity of recurring and non-recurring objects to obtain the distribution of variance ratios due to chance or data structure.The mean µ m,w and variance σ m,w 2 of this distribution were used to convert F m,w novelty into z-score values Z m,w novelty = F m,w novelty − µ m,w ( ) σ m,w .

Fig. 3 .
Fig. 3. Neural representation of object identity.(A) Identity-selective parcels are shown in color (124 of 758 parcels) on an inflated standard brain and are found in the occipital (70 parcels), parietal (29), temporal/fusiform (18), and frontal cortex (7).Color indicates classification α w ave (average over 16 data sets), and ranges from chance to the largest observed value (6.67% to 17%).Parcels are identified by AAL region and number (in color), as detailed in Appendix Table A1.(B) Classification α w,u and variance ratio F w,u for all 758 parcels w and 16 data sets u.Both values differ highly significantly from the values obtained with shuffled object identities (red cross and ellipse, representing mean ± 3 S.D.).Two particular parcels are highlighted (Calcarine-L 331 in red, Parahippocampus-R 325 in blue, and magnified in the inset) to illustrate the variability of data sets.(C) Minimum values a w min and F w min for all parcels over 16 data sets.Identity-selective parcels

Fig. 4 .
Fig. 4. Changes in the representation of "identity" and "novelty" over successive "batches" of trials.(A) Ratio of within-and between-class variance for object "identity" (κ = 15 classes, inset top left) and object "novelty" (2 classes, inset bottom left).Average variance ratios F m identity (blue, mean ± S.E.M.) and F m novelty (red, mean ± S.E.M.), as a function of batch number m.While F m identity decreases slightly over time ( p < 0.05), F m novelty increases considerably ( p < 0.001), especially initially.All values are averages over data sets in z-score units.(B) Average within-and between-class variances (mean ± S.E.M.), as a function of batch number m. Whereas between-class variances decrease (SS B,m identity , p < 0.05) or increase (SS B,m novelty , p < 0.001), within-class variances remain unchanged.All values are averages over data sets, relative to shuffled averages.(C) Results of regression analysis for 124 identity-selective parcels w.Linear "rate" parameters β w identity and β w novelty compared to each other and to classification α w .Novelty and identity rates correlate weakly over parcels (left, ρ = 0.298 , p < 0.001), as do novelty rate and classification accuracy α w identity (middle, ρ = −0.22,p < 0.05).Identity rates β w and accuracies α w correlate strongly and negatively (right, ρ = −0.74,p < 0.001).Significance of linear trends is indicated by * for p < 0.05 and ** for p < 0.001.

Fig. 5 .
Fig. 5. Neural representation of "novelty" in terms of the variance ratio F novelty and its development over successive batches.(A) Identity-selective parcels and their individual rate parameters β novelty (color scale), estimated by fitting linearmixed models to the F novelty values (from all batches and data sets).(B) Development of F novelty (mean ± S.E.M.) for different "novelty terciles" (upper, middle, and of parcels defined by β novelty ).(C) Correlation between F w novelty and accuracy α w for different batches and novelty terciles.The parcels of each tercile are distinguished by color, with individual regression lines (dashed) and correlation coefficients ρ ( ! indicates p < 0.05).

Fig. 6 .
Fig. 6.Geometry of identity and novelty representation over successive sessions and runs.(A) For each run with N trials = 200 trials, we collected all individual response amplitudes a and all pairwise response distances d (triangular area with color scale) in the maximally discriminating space and computed average amplitudes A fam and A nov (for recurring and non-recurring objects, respectively) and average distances D same and D diff (for same and different recurring objects, respectively), as well as average distances D nov and D novfam (for non-recurring objects and between recurring and non-recurring objects, respectively).(B) Response amplitude A nov (red, mean, and S.E.M.) and A fam (blue, mean, and S.E.M.), over 18 runs grouped into three sessions, for identity-selective (left) and non-selective parcels (right).(C) Pairwise response distance D same (solid blue), D diff (dashed blue), D nov (solid red), and D novfam (dashed red)), over runs and sessions, for both groups of parcels.(D) Variance of response distances SS same (solid blue), SS diff (dashed blue), SS nov (solid red), and SS novfam (dashed red), over runs and sessions, for identity-selective and non-selective parcels.Stars indicate a significant linear trend during a session (see text).All plots show mean (traces) and S.E.M. (shading).

Fig. 7 .
Fig. 7. Stability of identity and novelty representations over successive sessions and runs (A) For each recurring and all non-recurring objects, we calculated the response centroids C r for each run r and the average centroid C ave for all runs and obtained both absolute distances DC r = C r − C ave and relative distances ΔDC r = C r − C r −1 .(B) Centroid-to-centroid distances (mean ± S.E.M.) for all identity-selective parcels and all data sets.Distances DC same and ΔDC same for the same recurring objects (blue) and distances DC nov and ΔDC nov for non-recurring objects (red) are compared to the corresponding values obtained from shuffled data sets (thin, pale lines).Stars indicate a significant linear trend during a session (see text).

Fig. 9 .
Fig. 9. Changes in the geometry of shape identity and novelty representations, visualized with multi-dimensional scaling.Symbols (colored circles) represent neural response patterns in a 14-dimensional space S. Symbols are positioned such that pairwise distances reflect pairwise distances in S. Response classes are distinguished by color and are represented by 50 randomly selected responses each.(A) Fifteen response classes to recurring objects in the first session (left, run 1-6) and the third session (right, run 13-18).Note that recurring response classes expand with learning to fill the available space.(The regions occupied by classes depend on the selected responses."Inside" and "outside" classes can exchange positions).(B) Three response classes to recurring objects and one response class to non-recurring objects (larger symbols), in the first session (left, run 1-6) and the third session (right, run 13-18).Note that the non-recurring response class contracts with learning and shifts to the margins of the available space.Class positions are similar for other triplets of recurring classes.
Fixed subspaces permitted us to assess representational changes between successive "runs."To this end, we computed average amplitudes A w,u,r , distances D w,u,r , variances SS w,u,r , and variance ratios F w,u,r , as described above, for each parcel w, data set u ∈ 1,…, 16 tions F w,u,r fam = SS fam /SS tot , F w,u,r nov = SS nov /SS tot , F w,u,r novfam = SS novfam /SS tot , F w,u,r same = SS same /SS fam , and F w,u,r diff = SS diff / SS fam , as well as variance ratios R w,u,r identity = SS diff (N − κ) / SS same (κ − 1) and R w,u,r novelty = SS novfam (N + M − 2)/( SS nov = SS fam ) .Imaging Neuroscience, Volume 2, 2024 2.5.9.Changes with learning analyzed by "runs" and run r.Within each session s, we assessed the changes of these parameters Y ∈ A,D,SS, F