Probing the neural dynamics of mnemonic representations after the initial consolidation

Memories are not stored as static engrams, but as dynamic representations affected by processes occurring after initial encoding. Previous studies revealed changes in activity and mnemonic representations in visual processing areas, parietal lobe, and hippocampus underlying repeated retrieval and suppression. However, these neural changes are usually induced by memory modulation immediately after memory formation. Here, we investigated 27 healthy participants with a two-day functional Magnetic Resonance Imaging study design to probe how established memories are dynamically modulated by retrieval and suppression 24 hours after learning. Behaviorally, we demonstrated that established memories can still be strengthened by repeated retrieval. By contrast, repeated suppression had a modest negative effect, and suppression-induced forgetting was associated with individual suppression efficacy. Neurally, we demonstrated item-specific pattern reinstatements in visual processing areas, parietal lobe, and hippocampus. Then, we showed that repeated retrieval reduced activity amplitude in the ventral visual cortex and hippocampus, but enhanced the distinctiveness of activity patterns in the ventral visual cortex and parietal lobe. Critically, reduced activity was associated with enhanced representation of idiosyncratic memory traces in the ventral visual cortex and precuneus. In contrast, repeated memory suppression was associated with reduced lateral prefrontal activity, but relative intact mnemonic representations. Our results replicated most of the neural changes induced by memory retrieval and suppression immediately after learning and extended those findings to established memories after initial consolidation. Active retrieval seems to promote episode-unique mnemonic representations in the neocortex after initial encoding but also consolidation.


Introduction
Historically, memories were seen as more or less stable traces or engrams. After initial formation, memory traces are affected by consolidation leading to stabilization and weakening, leading to forgetting ( Ebbinghaus, 1885 ;Lashley, 1950 ;Müller and Pilzecker, 1900 ). However, contemporary research has provided ample evidence showing that memories continue to be dynamically adapted after initial encoding and, thus, can be modified by external factors throughout their existence. For instance, retrieval practice can reinforce memory traces ( Karpicke and Roediger, 2008 ), promote meaningful learning ( Karpicke and Blunt, 2011 ), and protect memory retrieval against acute stress ( Smith et al., 2016 ). In contrast, retrieval suppression can prevent unwanted memories to be retrieved ( Anderson and Green, 2001 ), and reduce their emotional impact ( Gagnepain et al., 2017 ).
Previous neuroimaging studies identified several neural changes that could explain the retrieval-mediated memory enhancement: after repeated retrieval, several studies reported decreased or increased univari-sion ( Anderson, 2004 ;Anderson and Hanslmayr, 2014 ). However, only a few studies investigated neural changes in activity and/or activity patterns across repeated suppression. Depue and colleagues showed the time-specific involvement of inferior frontal gyrus and medial frontal gyrus during the suppression of emotional memory ( Depue et al., 2007 ). Gagnepain and colleagues demonstrated the effect of suppression on visual memories may be achieved by targeted cortical inhibition of visualrelated activity and activity patterns ( Gagnepain et al., 2014 ).
Although these studies shed light upon neural changes underlying memory retrieval and suppression, all of them were based on memory modulation (i.e., retrieval and suppression) immediately after initial memory formation, except for one study that included repeated retrieval on two consecutive days ( Ferreira et al., 2019 ). How the modulation of memory traces after initial consolidation is reflected in the neural activity and mnemonic representation, as assessed by activation patterns during subsequent retrieval is currently not well understood. Studying the neural changes underlying the modulation of initially consolidated memories can provide complementary and critical understandings of the dynamic nature of human memory. Because newly acquired memories are usually more labile compared to consolidated ones ( Frankland and Bontempi, 2005 ) and mnemonic representations shift from the hippocampus to distributed neocortical regions following overnight sleep ( Takashima et al., 2009( Takashima et al., , 2006, the effectiveness of memory modulation could be decreased, and the underlying neural changes could be different. For example, a study showed that suppression of aversive memories after overnight consolidation is harder, and involved reconfigured neural pathways during suppression ( Liu et al., 2016 ). Also, modulation of consolidated memories may provide a clear focus on the changes of long-term memory representation, because previously reported immediate effects (i.e., changes in activity amplitude and activity patterns) can still be caused by short-term changes in related processes such as executive control or attention. Here, we used a two-day functional Magnetic Resonance Imaging (fMRI) design to characterize neural dynamics of initially consolidated memory. After overnight consolidation, memories were in one condition reinforced by repeated memory retrieval and in the other, weakened by repeated memory suppression. We analyzed the neuroimaging data from both the modulation and the subsequent memory retrieval phase to examine neural changes at the moment when specific memory was modulated and in the final memory test in which the aftereffects of the modulation can be measured.
Based on neural findings of memory reinstatement ( Chen et al., 2017 ;Kosslyn et al., 1997 ;Kuhl et al., 2010 ;Lee et al., 2019 ;O'Craven and Kanwisher, 2000 ;Polyn et al., 2005 ;Shohamy and Wagner, 2008 ;Wheeler et al., 2000 ;Wimber et al., 2015 ;Xue, 2018 ), we used both the levels of activity amplitude (i.e., univariate analysis) and activation patterns (i.e., multivariate pattern analysis) of visual area, parietal lobe, and hippocampus to characterize memory traces during memory retrieval and further examined the linear relationship between the two neural changes within the same regions. Furthermore, we adopted a novel design to disentangle perception-related neural activities associated with memory cues presented at the test and retrieval-related neural reactivation associated with reactivated mental images. One method to separate these two processes is to use two perceptual modalities (e.g., sounds as memory cues and pictures as information to be retrieved) ( Bosch et al., 2014 ). Here, we used highly similar visual memory cues across different memory associations. Thus, item-specific neural patterns (at least in visual areas) during retrieval more likely to be caused by retrieval-related memory reactivation instead of visual processing of memory cues.
To sum up, our primary goal is to reveal if two behavioral techniques (i.e., retrieval and suppression) can modulate initial consolidated associative memories, and if such modulation results in altered activity and/or activity patterns detected by fMRI. We first investigated the possibility that associative memories can still be modulated after 24 h. Behaviorally, we asked whether repeated retrieval and memory suppres-sion would oppositely strengthen or weaken original memory traces. Next, using fMRI, we examined whether retrieval and suppression would modify neural measures of memory reactivation (i.e., activity amplitude and activity pattern similarity) oppositely.

Participants
Thirty-two right-handed, healthy young participants aged 18-35 years who were recruited from the Radboud Research Participation System finished two sessions of our experiment. They all had correctedto-normal or normal vision and reported no history of psychiatric or neurological disease. All of them are native Dutch speakers. Two participants were excluded from further analyses due to memory performance at the chance level. Three additional participants were excluded because of excessive head motion during scanning. We used the motion outlier detection program within the FSL (i.e., FSLMotionOutliers) to detect timepoints with large motion (threshold = 0.9). There are at least 20 spikes detected in these excluded participants with the largest displacement ranging from 2.6 to 4.3, while participants included had less than ten spikes. Neuroimaging data of one additional participant was partly used: she was excluded from the analysis of the modulation phase (Think/No-Think paradigm) due to head motion (in total 53 spike, largest displacement = 5.7) only during this task, while his/her data during the other tasks were included in the analyses. Thus, data of 27 participants (16 females, age = 19-30, mean = 23.41, SD = 3.30) were included in the analyses of the final test phase, and data of 26 participants (15 females, age = 19-30, mean = 23.51, SD = 3.30) were included in the analyses of the modulation phase. All participants scored within normal levels when applying Dutch-versions of the Beck Depression Inventory (BDI) ( Roelofs et al., 2013 ) and the State-Trait Anxiety Inventory (STAI) ( van der Bij et al., 2003 ). Furthermore, because of the twosession design (24 h' interval), we used an adapted Dutch version of the Pittsburgh sleep quality index (PSQI) ( Buysse et al., 1989 ) to assess the quality of sleep between the two scanning sessions. Questions for last night's sleep were added to the original version. We compared participants' sleep quality/duration for the last night and the average across the previous four weeks. No participants reported abnormal sleep-related behaviors during the night between two fMRI sessions (i.e., more than two hours of differences in sleep time, time to go to bed, or time to wake up between the last night and the previous four weeks). The experiment was approved by, and conducted in accordance with requirements of the local ethics committee (Commissie Mensgebonden Onderzoek region Arnhem-Nijmegen, The Netherlands) and the declaration of Helsinki, including the requirement of written informed consent from each participant before the beginning of the experiment.

Locations and maps
We used 48 distinctive locations (e.g., buildings, bridges) drawn on two cartoon maps as memory cues. The maps are not corresponding to the layout of any real city in the world, and participants have never been exposed to the maps before the experiment. During the task, the whole map was presented with sequentially highlighting specific locations by colored frames as memory cues. By doing this, we kept visual processes during memory tasks largely consistent.

Pictures
Forty-eight pictures (24 neutral and 24 negative pictures) from the International Affective Picture System (IAPS) ( Lang et al., 1997 ) were used in this study, and these pictures can be categorized into one of four groups: animal (e.g., cat), human (e.g., reading girl), object (e.g., clock) or scene (e.g., train station). Category information was used for the following memory-based category judgment test.
All images were converted to the same size and resolution for the experiment.

Picture-location associations
Each picture was paired with one of the 48 map locations to form specific picture-location associations. We (W.L and J.V) carefully screened all the associations to prevent the explicit semantic relationship between picture and location (e.g., lighter at the-fire department). All 48 picturelocation associations were divided into three groups for different types of modulation (See Modulation Phase). For each map, 24 locations were paired 6 pictures from each category. One-third of associations (8 associations; 2 pictures from each category) on that map were retrieval associations (i.e. "think " associations), one-third of associations were suppression associations (i.e., "no-think " associations), and remaining one-third are control associations.

Overview of the design
This study is a two-session fMRI experiment, with the 24 h interval between two sessions ( Fig. 1 A ). Day1 session consists of the familiarization phase ( Fig. 1 B ), the study phase ( Fig. 1 C ), and the immediate typing test. The Day2 session consists of the second typing test, the modulation phase ( Fig. 1 D ), and the final memory test ( Fig. 1 E ). Among these phases, the familiarization, modulation, and the final memory test phase were performed in the scanner, while the study phase and two typing tests were performed in the behavioral lab. The trial structure and timing are depicted in Fig. S1 . Stimuli were presented while participants were scanned projecting on to a translucent screen (diameter = 598 mm; maximum projection size = 369 × 277 mm) mounted at the end of the scanner's bore and visible via a mirror mounted at the head coil and during behavioral sessions using a 24-inch LED monitor. During the MRI scanning, the distance between the visual surface mirror and the projection screen was around 85.5 cm. Moreover, to keep the visual presentation as consistent as possible, we set the resolution as at 1280 × 1024 for both set-ups.

Familiarization phase
To obtain the picture-specific brain responses to all 48 pictures, we instructed participants to perform the familiarization phase while being scanned ( Fig. 1 B). The second purpose of the task is to let participants become familiar with the pictures to be associated with locations later. Each picture (resolution = 400 × 400) was shown four times at the center of the screen with a visual angle of 7°for 3 s and was distributed over in total of four functional runs. The order of the presentation was pseudorandom and pre-generated by self-programmed Python code. The dependencies between the orders of different runs were minimized to prevent potential sequence-based memory encoding. To keep participants focused during the task, we instructed them to categorize the presented picture via the multiple-choice question with four options (animal, human, object, and scene). We used an exponential inter-trial intervals (ITI) model (mean = 2 s, minimum = 1 s, maximum = 4 s) to generate the ITIs between trials. Participants' responses were recorded by an MRIcompatible response box.

Study phase
Each picture-location association was presented twice in two separate runs ( Fig. 1 C). During each study trial, the entire map (resolution = 1024 × 768) was first presented for 2.5 s, then a BLUE frame was added to a layer on the top of the entire map to highlight one of the 48 locations, for 3 s, and finally, the picture and its associated location were presented side-by-side together for 6 s. We pre-generated a pseudorandom order of the trials to minimize the similarity between the orders in familiarization and the study phase.

Typing test phase
Immediately after the study phase, participants performed a typing test (day1) assessing picture-location association memory. Each location was presented again (4 s) in an order that differed from the study phase, and participants had maximally 60 s to describe the associated picture by typing its name/description on a standard keyboard. Twentyfour hours later (day2), participants performed the typing test again in the same behavioral lab. The procedure was identical to the immediate typing test, but with a different trial order.

Modulation phase
The modulation phase is the first task participants performed during the Day2 MRI session. We used the think/no-think (TNT) paradigm with trial-by-trial self-report measures to modulate initially consolidated memories ( Fig. 1 D). The same paradigm has been used in previous neuroimaging studies, and the self-report does not affect the underlying memory control process ( Anderson, 2004 ;Levy and Anderson, 2012 ). Forty-eight picture-location associations were divided into three conditions. One-third of the associations (16 associations) were assigned to the retrieval condition ( "Think "), one-third of the associations were assigned to the suppression condition ( "No-Think "), and the remaining one-third of the associations were assigned to the control condition. The assignment process was counterbalanced between participants. Therefore, at the group level, for each picture-location association, the possibility of belonging to one of the three modulation conditions is around 33.3%. Associations that belong to different conditions underwent different types of modulation during this phase. Locations which belong to the control condition were not presented during this phase. For a retrieval trial, the entire map was presented (visual angle = 18°) with one particular location, highlighted with a GREEN frame for 3 s, and participants were instructed to recall the associated picture quickly and actively and to keep it in mind until the map disappeared from the screen. For a suppression trial, one location was highlighted with a RED frame for 3 s, and participants were instructed that "when you see a location, highlighted with a RED frame, you should NOT think about the associated picture. Instead, you should try to keep an empty mind during this stage. It is a difficult task, and it is totally fine that sometimes you still think about the associated picture. But please do NOT close your eyes, focus on something outside the screen, or think about something else in your life. These strategies, although useful, could negatively affect the brain activity that we are interested in ……" After each retrieval or suppression trial, participants had up to a maximum of 3 s to report their experience during the cue presentation. Specifically, they answered a multiple-choice question with four response options ( Never, Sometimes, Often, and Always ) by pressing the button on the response box to indicate whether the associated picture entered their mind during that particular trial or not and the relative frequency.
The modulation phase consisted of five functional runs (64 trials per run). In each run, 32 locations (half retrieval trials, and half suppression trials) were presented twice. Therefore, each memory cue that did not belong to the control condition was presented ten times during the entire modulation phase. Again, we pre-generated the presentation orders to prevent similar order sequences across five modulation runs. Between each trial, fixation was presented for 1-4 s (mean = 2 s, exponential model) as ITI.

The final memory test phase
After the modulation phase, participants performed the final memory test within the scanner ( Fig. 1 E). All 48 locations (including both the retrieval/suppression associations as well as control associations) were highlighted one-by-one while showing the entire map again with a BLUE frame. During its presentation (4 s), participants were instructed to recall the associated picture covertly but as vividly as possible and keep the mental image in their mind. Critically, visual input during this phase was highly similar across trials because entire maps were The trial structure with exact timing was depicted in Fig. S1 . (B) During the familiarization phase, all of the pictures of the to-be-remembered associations were randomly presented four times for the familiarization and estimation of picture-specific activation patterns. To keep participants focused, on each trial, they were instructed to categorize the picture shown as an animal, human, location, or object. (C) Study phase. Participants were trained to associate memory cues with presented pictures. (D) Modulation phase. After 24 h, we used the Think/No-Think paradigm to modulate consolidated associative memories. Participants were instructed to actively retrieve associated pictures in mind ( "retrieval "), or suppress the tendency to recall them ( "suppression ") according to the colors of the frames (GREEN: retrieval; RED: suppression) around locations. (E) Final memory test phase. Participants performed the final memory test after the modulation. For each of the 48 location-picture associations, locations were presented again, and participants were instructed to report the memory confidence and categorize the picture that came to mind. always presented, just with different locations highlighted. Next, participants were asked to give the responses on two multiple-choice questions within 7 s (3.5 s for each question): (1) "how confident are you about the retrieval? " They responded with one of the four following response options: Cannot recall, low confidence, middle confidence, and high confidence. (2) "Please indicate the category of the picture you were recalling? " They also had four options to choose from (Animal, Human, Object, and Scene).

Familiarization phase
We did not calculate the accuracy of the category judgment during the familiarization phase because the categorization of a picture could be a rather subjective decision, and it is not relevant for the aim of this study. However, we used individual responses to control for subjective category categorization for the following memory performance evaluation. Specifically, if a participant consistently labeled a given picture across four repetitions as a different category compared to our predefined labels, we generated an individual-specific category label and used this category label for this picture to evaluate the responses in the final test. Otherwise, we used predefined labels to evaluate the responses.

Typing test
Participants' answers were evaluated by two native Dutch experimenters (S.M and J.V) independently. The general principle is that if the answer contains enough specific information (e.g., a little black cat), to allow the experimenter to identify the picture from the 48 pictures used, it was labeled as correct. In contrast, if the answer is not specific enough (e.g., a small animal), then it was labeled as incorrect. We used Cohen's kappa coefficient ( ) to measure inter-rater reliability. In general, lager than 0.81 suggests almost perfect reliability. If two accessors had different evaluations, the third accessor (W.L) determined the final result (i.e., correct or incorrect). After the immediate typing test, we only invited participants with at least 50% accuracy to the Day2 experiment. Three out of 35 recruited participants did not continue on Day2 due to low performance on Day1. For the typing test 24 h later, participants' responses were evaluated by the same experimenters again. Based on the participants' responses in this typing test, we identified picture-location associations that the given participant did not learn or already forgot. These associations were not considered in the following behavioral and neuroimaging analyses, because participants have no memory associations to be modulated. We calculated the average accuracies for the immediate typing test and typing test 24 h later and investigated the delay-related decline in memory performance using a paired t -test.

Modulation phase
Responses during the modulation phase were analyzed separately for retrieval trials and suppression trials. We first calculated the percentage of each option (never, sometimes, often, and always) chosen across 160 retrieval trials and 160 suppression trials for each participant. Next, we quantified the dynamic changes in task performance across repetitions (runs). Before the following analyses, we coded the original categorical variable using numbers (Never-1; Sometimes-2; Often-3; Always-4). For all the established picture-location associations, we calculated their average retrieval frequency rating (based on retrieval trials) and intrusion frequency rating (based on suppression trials) on each repetition. We used a repeated-measures ANOVA to model changes in retrieval and intrusion frequencies rating across repetitions to test if the repeated attempt to retrieve or suppress a memory trace would strengthen or weaken the associations, respectively. Additionally, to quantify individual differences in memory suppression efficiency ( Levy and Anderson, 2012 ), we calculated the intrusion slope score for each participant. Using all the intrusion rating for suppression trials, we used linear regression to calculate the slope of intrusion ratings across the ten repetitions for each participant. Participants with more negative slope scores are better at downregulating memory intrusions than those with less negative slope scores.

The final memory test phase
For each trial of the final memory test, we calculated both a subjective memory measure based on the confidence rating (1,2,3,4) and an objective memory measure based on the category judgment (correct/incorrect). Also, we recorded the reaction times (RT) for category judgments to estimate the speed of memory retrieval. To investigate the effect of types of modulation on the subjective, objective memory, and retrieval speed, we performed a repeated-measure ANOVA to detect within-participants' differences between RETRIEVAL ASSOCIA-TIONS, SUPPRESSION ASSOCIATIONS , and CONTROL ASSOCIATIONS . To assess individual differences in suppression-induced forgetting, we calculated the suppression score by subtracting the objective memory measure of retrieval suppression associations ( "no-think " items) from the control association. Participants showed more forgetting as the result of suppression had more negative suppression scores.

Combinatory analysis of modulation and final test phase
To replicate the relationship between memory suppression efficiency during the TNT task and suppression-induced forgetting during the final memory test reported before ( Levy and Anderson, 2012 ), we correlated suppression scores with intrusion slope scores across all participants. Notably, sample size ( N = 26) of this cross-participant correlational analysis is modest, but it is just a replication analysis of the previous study and the check for the memory suppression manipulation.
During the day1 session, anatomical T1 image was acquired firstly, followed by the field map sequence. Before the four EPI-based pattern localization runs, 8 min of resting-state data were acquired from each participant using the same sequence parameters. Day2 session began with the field map sequence. Thereafter, we acquired six EPI-based task-fMRI runs (five runs of the modulation phase and one run of the final test phase).

Preprocessing of neuroimaging data
All functional runs underwent the same preprocessing steps using FEAT (FMRI Expert Analysis Tool) Version 6.00, part of FSL (FMRIB's Software Library, www.fmrib.ox.ac.uk/fsl ) ( Jenkinson et al., 2012 ). In general, the pipeline was based on procedures suggested by Mumford and colleagues ( http://mumfordbrainstats.tumblr.com ) and the suggestions for Automatic Removal of Motion Artifacts (ICA-AROMA) ( Pruim et al., 2015 ). The first four volumes of each run were removed from the 4D sequences for scanner stabilization. The following preprocessing was applied; Motion correction using MCFLIRT ( Jenkinson et al., 2002 ); field inhomogeneities were corrected using B0 Unwarping in FEAT; non-brain removal using BET ( Smith, 2002 ); grand-mean intensity normalization of the entire 4D dataset by a single multiplicative factor. We used different spatial smoothing strategies based on the type of analysis. For data used in univariate analyses, we applied a 6 mm kernel. In contrast, for data used in multivariate pattern analyses, no spatial smoothing was performed to keep the voxel-wise pattern information. In addition to the default FSL motion correction algorithm, we used ICA-AROMA to further remove the motion-related spurious noise and chose the results from the "non-aggressive denoising " algorithm for the following analyses. Prior to time-series statistical analyses, highpass temporal filtering (Gaussian-weighted least-squares straight line fitting with sigma = 50.0 s) was applied.
Registration between all functional data, high-resolution structural data, and standard space was performed using the following steps. First, we used the Boundary Based Registration (BBR) ( Greve and Fischl, 2009 ) to register functional data to the participant's high-resolution structural image. Next, registration of high resolution structural to standard space was carried out using FLIRT ( Jenkinson et al., 2002 ;Jenkinson and Smith, 2001 ) and was then further refined using FNIRT nonlinear registration ( Andersson et al., 2007 ). Resulting parameters were used to align maps between native-space and standard space and back-projected region-of-interests into native space.

Anatomical region-of-interest (ROI) in fMRI analyses
Based on previous pattern reinstatement studies ( Jonker et  hypothesized that ventral visual cortex (VVC), parietal lobe and hippocampus might carry picture-specific and category-specific information of the memory contents during retrieval. Therefore, we chose them as the ROIs in our fMRI analyses. All ROIs were first defined in the common space and back-projected into the participant's native space for within-participant analyses using parameters obtained from FSL during registration.
We defined anatomical VVC ROI based on the Automated Anatomical Labeling (AAL) human atlas, which is implemented in the WFU pickatlas software ( http://fmri.wfubmc.edu/software/PickAtlas ). The procedure was used before in a previous neural reactivation study conducted by Wimber and colleagues ( Wimber et al., 2015 ). Brain regions, including bilateral inferior occipital lobe, parahippocampal gyrus, fusiform gyrus, and lingual gyrus were extracted from the AAL atlas and combined to the VVC mask. The VVC mask was mainly used as the boundary to locate visual-related voxels in the following activity pattern analyses.
The ROIs of the hippocampus and parietal lobe (including angular gyrus (AG), supramarginal gyrus (SMG), and precuneus) were defined using a bilateral mask within the AAL provided by WFU pickatlas software. To yield better coverage of participants' anatomy, we extended the original mask by two voxels in each direction (i.e., dilated by a factor of 2 in the software).

Univariate generalized linear model (GLM) analyses of response amplitude 2.7.1. GLM analyses of neuroimaging data from the final test phase
To investigate how different modulations (retrieval/suppression) affect the subsequent univariate activation, we ran voxel-wise GLM analyses of the final test run. All time-series statistical analysis was carried out using FILM with local autocorrelation correction ( Woolrich et al., 2001 ) using FEAT. In total, six regressors were included in the model. We modeled the presentation of memory cues (locations) as three kinds of regressors (duration = 4 s) based on their modulation history (retrieval, suppression, or control). To account for the effect of unsuccessful memory retrieval, we separately modeled the location-picture associations that participants could not recall as a separate regressor. Lastly, button presses were modeled as two independent regressors (confidence and category judgment). All trials were convolved with the default hemodynamic response function (HRF) within the FSL.
We conducted two planned contrasts (retrieval vs. control and suppression vs. control) first at the native space and then aligned, resulting in statistical maps to MNI space using the parameters from the registration. These aligned maps were used for the group-level analyses and corrected for multiple comparisons using default cluster-level correction within FEAT (voxelwise Z > 3.1, cluster-level p < 0.05 FWER corrected). All of the contrasts were first conducted at the whole-brain level. Then, for the ROI analyses, we extracted beta values of these ROIs from the final test and compared them for the same contrasts (retrieval vs. control and suppression vs. control).

GLM analyses of neuroimaging data from the modulation phase
We ran the voxel-wise GLM analyses for each modulation run separately. In total, three regressors were included in the model. We modeled the presentation of the memory cues (location) as two kinds of regressors (duration = 3 s) according to their modulation instruction (retrieval or suppression). Button press was modeled as one independent regressor. Also, if applicable, location-picture associations that our participants could not recall were modeled as a regressor. For ROI analyses, we extracted beta values of these ROIs from whole-brain maps of each modulation run separately. We investigated repetition-related changes in beta values using the Repeated ANOVA for retrieval and suppression separately.

Multivariate pattern analyses of brain activation patterns 2.8.1. Activity pattern estimation
All preprocessed (unsmoothed) familiarization, modulation, and final test functional runs were modeled in separate GLMs in each participant's native space. For each trial within familiarization, we generated a separate regressor using the onset of picture presentation and 3 s as the duration. At the same time, we generated one regressor for all button presses of the category judgment to control for the motor-related brain activity. In total, 49 regressors were included in the model. This procedure led to a separate statistical map ( t -values) for each trial. Similarly, for each modulation and final test run, we generated a separate regressor using the onset of the presentation of location (memory cue) and 3 s as the duration. However, button presses were not included in the model because they may potentially carry ongoing memory-related information. Also, we got a separate t map for each modulation or test trial.

Searchlight analysis of picture-sensitive voxels
For each participant, brain data on the familiarization phase (i.e., pattern localization phase) was analyzed using the searchlight method ( Kriegeskorte et al., 2008( Kriegeskorte et al., , 2006 across the entire brain. More specifically, for each searchlight (centered at every voxel in the brain, a sphere with the radius of 5 mm) of each participant, we trained Support Vector Classification (SVC) classifier to differentiate the activity patterns elicited by each picture (or each category) and tested its predictive power using the leave-one-run-out cross-validation. SVC was implemented using the C-Support Vector Machine within the scikit-learn package ( https://scikit-learn.org/stable/ ) ( Pedregosa et al., 2011 ). The multiclass classification was handled according to a one-vs.-one scheme. We used default parameters of the function (regularization (C) = 1, radial basis function kernel with degree = 3). The same setting was applied for all classification described below. Specifically, for each trial, activity patterns within the searchlight were extracted. Since each picture was presented four times during four pattern localization runs, in total, we got four activity patterns within the searchlight for each picture. The within-participant classification was performed using the leave-one-runout cross-validation: activity patterns of one particular run were left out as the testing dataset, and the remaining three runs were used as the training dataset to train the SVC classifier. After all the trainingtesting procedures, our analyses resulted in one accuracy value to represent the overall predictive power of the activity patterns within this particular searchlight. The searchlight walked through the entire brain of each participant. After the searchlight procedure, each participant yielded a classification accuracy map and each voxel within the map stored the classification accuracy of that particular searchlight sphere. To allow the group inferences of the brain regions, we performed onesample t -tests on all of the classification accuracy maps and tested them against chance (chance level = 1/48, 2%). Since we would like to identify picture-sensitive voxels within the VVC, we overlapped the voxels identified by the searchlight ( p uncorrected < 0.001) with the anatomical VVC mask. Because choosing the p uncorrected < 0.001 as the threshold is arbitrary, we also used other thresholds ( p uncorrected < 0.05 and p uncorrected < 0.01) to define the significant voxels and further validated our results using different threshold-dependent masks.
We already used the within-participant searchlight analysis to localize stimuli-sensitive voxels in visual areas. We validated these identified VVC voxels in a cross-participant procedure. By doing this, we explored whether visual perception-related activation patterns of these voxels are shared across participants. Specifically, instead of performing the leaveone-run-out cross-validation within each participant, we used the threefold cross-validation within the entire sample. Firstly, t maps for each picture, and each run were transformed from native space to standard space to enable the cross-participant predictive model training and testing. Then, the identified voxels within the VVC were used as a mask to extract spatial patterns of activation. Finally, data from 2/3 partici-pants was used to train the SVC model, and the remaining 1/3 participants were used to assess the model. It is notable that cross-participant classification is just the confirmatory analysis of the searchlight classification and should not be regarded as independent analysis. The crossparticipant classification was also repeated in three clusters of VVC voxels under different thresholds ( p uncorrected < 0.05, p uncorrected < 0.01, and p uncorrected < 0.001).

Pattern reinstatement analysis
The VVC voxels identified by searchlight analysis and other anatomical-defined masks (including hippocampus, AG, SMG, and precuneus) were used as the mask in the cross-task classification of memory contents. For each trial's t -map estimated based on the final test run, we transformed it from native space to standard space. ROI-based activity patterns from both the pattern localization and final memory test phase were extracted using ROI masks. We performed cross-task threefold cross-validation to reveal the shared neural representation of the perception and retrieval of the same visual stimulus. Activity patterns estimated based on the pattern localization of the 2/3 participants (i.e., training sample) were used to train the SVC predictive model. We used the activity pattern during the final memory test evoked by the corresponding location (memory cure) of the remaining 1/3 participants (i.e., testing sample), together with the trained SVC model to predict the memory content on a trial-by-trial basis. Critically, the SVC model was trained solely on the localizer data (day1), and it was applied to the final memory test (day2) without further model fitting. Moreover, during the final memory test, visual input is highly similar across trials because we just highlighted each location on an identical map as the memory cue. Therefore, if a given classifier can significantly predict memory content, the classification is unlikely based on the neural responses to the memory cue only. For each ROI, we first calculated the average decoding accuracy for each participant across all trials. A common way to evaluate the significance of classification accuracies is to compare them with theoretical chance level (i.e., 1/number of categories). However, previous work has shown that this approach may overestimate the of classification significance ( Combrisson and Jerbi, 2015 ;Jamalabadi et al., 2016 ;Kowalczyk and Chapelle, 2005 ). We used an alternative method to control for this potential bias. For each decoding analysis, we generated an empirical null distribution of accuracies by repeating our decoding analyses with classifiers training on randomly shuffled labels ( N = 1000). Only accuracies whose values are larger than the 95th percentile of this null distribution were considered significant. Values that were larger than the maximum accuracy within this null distribution were assigned a p -value of < 0.001.

ROI-based trial-by-trial pattern similarity analysis on the modulation and final memory test data
Representation similarity analysis (RSA) ( Cohen et al., 2017 ) was used to calculate trial-by-trial pattern similarity within particular types of test trials (e.g., recall of associations belongs to the RETRIEVAL AS-SOCIATIONS ). Given the nature of the within-participant analysis and to improve the pattern similarity estimation, we based all calculations on activity patterns in the native space.
Firstly, we analyzed the multivariate activation patterns of the final test. The identified VVC voxels ( Fig. 2 A ) were transformed from standard space to native space and then used as a mask to extract 3D single-trial activity patterns to 2D vectors and z -scored for the latter correlational analysis. Activation patterns of the hippocampus ( Fig. 2 B ), angular gyrus ( Fig. 2 C ), supramarginal gyrus ( Fig. 2 D ), and precuneus ( Fig. 2 E ) were extracted in the same way. For each participant, after excluding all trials with incorrect memory-based category judgment, we divided the remaining trials into three conditions based on their modulation history (e.g., retrieval practice or retrieval suppression). Next, for activity patterns of trials within the same condition, we calculated neural pattern similarity using Pearson correlations between all possible pairs of trials within the condition ( Fig. 2 F ). The calculations led for each participant to three separate correlation matrices, one for each type of test trials for each participant. Finally, we used the mean value of all of the r -values located at the left-triangle of one participant-specific correlation matrix to represent the neural pattern similarity of that condition (the higher the r -value, the lower the pattern similarity). After repeating these steps for each participant separately, three kinds of pattern similarity values were generated for the statistical test. All mean r -values were Fisher-r -to-z transformed before the following statistical analyses. To investigate if different modulations have different effects on memory representation during the final test, we performed two planned withinparticipant comparisons: (1) RETRIEVAL ASSOCIATIONS vs. CONTROL ASSOCIATIONS; (2) SUPPRESSION ASSOCIATIONS vs. CONTROL ASSO-CIATIONS.
Next, we used the same approach to analyze the modulation data. For each presented location, activity patterns were extracted using the same mask from five modulation runs. Similarly, within-condition (retrieval or suppression) trial-by-trial pattern similarity was calculated for each condition and each run. The dynamic change was modeled using the condition by run interaction using the ANOVA analysis.

Statistical analysis
When comparing continuous variables (e.g., reaction time) between experimental conditions, we used repeated Analysis of variance (ANOVA) or paired t -test. A significant main effect in an ANOVA was followed by post hoc tests, in which multiple comparisons were corrected by the Holm-Bonferroni method. Notably, classification accuracies were not normally distributed. Therefore, we used non-parametric methods (i.e., Friedman Test ) to compare accuracies between experimental conditions. To evaluate the significances of classification accuracy, instead of comparing with theoretical chance levels, we compared real accuracies with an empirical null distribution of accuracies ( See Pattern reinstatement analysis above ). Accuracies were considered significant when they were at least higher than the 95th percentile of the corresponding null distribution. For ordinal responses (e.g., "never, " "sometimes "), the percentage of each option was calculated, and then percentages were compared across repetitions. To account for the number of comparisons that come with multiple ROIs ( n = 9), we applied False Discovery Rate correction based on the Benjamini-Hochberg procedure ( Thissen et al., 2002 ). For all statistical tests that involved multiple ROIs, FDR-corrected p values ( p FDR ) are reported along with raw p values ( p raw ) and effect sizes (e.g., Cohen's d , partial 2 ).

Data and code availability
Custom scripts used in this study, immediate data (i.e., preprocessed single-trial activation patterns used for reinstatement analyses) as well as raw data were uploaded to the Donders Repository ( https://data.donders.ru.nl/ ). The project was named as Tracking the involuntary retrieval of unwanted memory in the human brain with functional MRI in the Repository ( https://doi.org/10.34973/5afg-7r41 ).

Pre-scan memory performance immediately after study and 24 h later
During the immediate typing test (day1), 88.01% of the associated pictures were described correctly (SD = 10.87%; range from 52% to 100%). Twenty-four hours later, participants still recalled 82.15% of all associations in the second typing test (SD = 13.87%; range from 50% to 100%). Although we observed less accurate memory 24 h later (t(26) = 4.73, p < 0.001, Cohen's d = 0.912) ( Fig. S2 ), participants could still remember most location-picture associations well.

Fig. 2. Regions-of-interest (ROI) and rationale of the pattern similarity analysis. (A)
Functionally-defined voxels within the ventral visual cortex (VVC). We identified voxels whose activity patterns can be used to differentiate pictures that were processed during the familiarization phase and were reactivated during successful memory retrieval during the final test. (B) Anatomically-defined bilateral hippocampus ROI. (C) Anatomically-defined bilateral angular gyrus ROI. (D) Anatomically-defined bilateral supramarginal gyrus ROI. (E) Anatomically-defined bilateral precuneus ROI. (F) During the final test, "mental images " were retrieved based on highly similar memory cues (different locations within maps were cued). We derived activation patterns for each memory retrieval trials based on fMRI data, and then quantify the cross-item pattern similarity using Pearson's r . (G) Considering the highly similar perceptional processing, vivid "mental images " during memory retrieval should be reflected in lower activity pattern similarity.

Fig. 3. Behavioral performance during modulation and final memory test phase. (A)
Percentage of the trial-by-trial introspective report during the retrieval trials. For most of the retrieval trials, associated pictures were successfully recalled (1-P never : mean = 84.05%, SD = 11.79%). (B) With repeated retrieval attempts, associated pictures were more likely to "always " stay in mind ( P always : F [9234] = 5.3, p < 0.001, 2 = 0.02). (C) Percentage of the trial-by-trial introspective report during the suppression trials. During half of the suppression trials, participants successfully suppressed the tendency to recall the associated pictures ( P never : mean = 50.62%, SD = 25.35%).
For the analyses of suppression trials, we excluded all locationpicture associations which the participant could not describe correctly immediately before the modulation phase (i.e., Typing Test Day2). This approach controlled for individual differences in memory that could interfere with the analysis of memory suppression. On suppression trials, participants reported that they successfully suppressed the tendency to recall the associated pictures in about half of the trials ( P never : mean = 50.62%, SD = 25.35%, range from 4% to 92.5%; Fig. 3 C ). As shown before in the think/no-think literature before ( Levy and Anderson, 2012 ), the percentage of the four types of trial-by-trial intrusion reports changed differently from the first to the tenth repetition (Choice × Repetition: F [27,702] = 3.4, p < 0.001, 2 = 0.01; Fig. 3 D ). Specifically, the percentage of reporting "never " increased (F [9234] = 5.4, p < 0.001, 2 = 0.04), while the percentage of reporting "sometimes " (F [9234] = 2.5, p = 0.008, 2 = 0.02) decreased over repetitions. These results together suggest that participants were successful at retrieving or suppressing memory traces according to task instructions.

Memory performance during the final memory test
During the final test, participants selected, on average, the correct category (chance level = 1/4) for the associated picture on 91.82% (SD = 6.05%; range from 70.83% to 100%) of the successfully recalled associations of the typing test on day2 (mean = 39.43). We then examined how repeated retrieval and suppression affected memory perfor-mance. First, we compared recall accuracies between three kinds of associations (i.e., RETRIEVAL ASSOCIATIONS, SUPPRESSION ASSOCI-ATIONS, and CONTROL ASSOCIATIONS ). Analysis of objective recall accuracy after modulation showed no significant main effect of modulation (F [2,26] = 0.524, p = 0.595, 2 = 0.013; Fig. 3 E ). Due to the lack of suppression-induced forgetting effect (lower accuracy for SUP-PRESSION ASSOCIATIONS compared to CONTROL ASSOCIATIONS ) at the group level, we performed a correlational analysis to associate performance during memory suppression and the final memory test. We found that participants who were more effective in suppressing intrusions (higher intrusion slope score ) during the modulation phase were the ones who showed larger suppression-induced forgetting effects ( r = 0.411, p = 0.03; Fig. 3 F ), suggesting that successful retrieval suppression was subsequently associated with suppression-induced forgetting. This correlation was also reported before in the think/no-think literature ( Levy and Anderson, 2012 ). Additionally, we investigated the effect of modulation on memory confidence and found a significant main effect (F [2,26] = 5.928, p = 0.005, 2 = 0.07; Fig. 3 G). Post-hoc analyses revealed higher recall confidence for RETRIEVAL ASSOCIATIONS compared to the CONTROL ASSOCIATIONS (t(26) = 3.35, p holm = 0.007, Cohen's d = 0.64) and a trend towards higher confidence compared to SUPPRESSION ASSOCIATIONS that just failed to reach our threshold for statistical significance (t(26) = 2.172, p holm = 0.07, Cohen's d = 0.41). Finally, we asked if modulation affected retrieval speed indexed by the RT during the final test. Even though we did not find a significant main effect of modulation (F [2,26] = 2.905, p = 0.06, 2 = 0.03; Fig. 3 H), recall of RETRIEVAL ASSOCIATIONS was faster compared to the recall of CONTROL ASSOCIATIONS (t(26) = − 2.486, p = 0.02, Cohen's d = − 0.47).

Measuring the pattern reinstatement of individual memory during retrieval
The Support Vector Classification (SVC)-based searchlight analysis revealed brain regions including the lateral occipital cortex, fusiform gyrus, lingual gyrus, and calcarine cortex, which showed picture-specific

Fig. 4. Identify picture-sensitive voxels and measure pattern reinstatement in the ventral visual cortex. (A)
Using the searchlight method, we localized picture-sensitive voxels in brain regions included lateral occipital cortex, fusiform gyrus, lingual gyrus, calcarine cortex, postcentral and precentral gyrus, supplementary motor area, and small clusters within the medial and inferior prefrontal cortex. These voxels showed picture-specific activation patterns during the perception (uncorrected p voxel < 0.001). (B) We restricted our following pattern analyses into these voxels within the ventral visual cortex (VVC) boundary by overlapping the searchlight accuracy map and anatomical-defined VVC. (C) fMRI activation patterns of these voxels during pattern localization were extracted to train a classifier. The activity patterns of these voxels during the final test were further extracted and used as inputs for the classifier for different pictures. (D) The classifier was first validated in a cross-participant, within-task procedure. We demonstrated that picture-sensitive voxels could enable the cross-participant picture classification during perception (mean accuracy = 61.88%, SD = 17.71%, p < 0.001). (E) The same classifier, without further model training, was used for the decoding of memory contents based on activity patterns during retrieval. Results showed that the classifier could decode the memory contents with the accuracy higher than shuffled decoding models (mean accuracy = 43.13%, SD = 16.52%, p < 0.001). (F) We observed the significant lower classification accuracies for cross-task classification compared to the within-task classification (t(26) = − 3.97, p < 0.001). The red line represents the 95th percentile of the accuracy within 1000 randomly label-shuffled null distribution. activation patterns during the perception (uncorrected p voxel < 0.001, Fig. 4 A ). We restricted our following activation pattern analyses to these voxels within the anatomical VVC boundary ( Fig. 4 B ). Next, we confirmed that activation patterns of these voxels could be used for crossparticipant classification of the visual stimulus during perception. We trained the SVC based on activation patterns of two-thirds of all participants and tested the model using the remaining one-third. Results from the three-fold cross-validation confirmed these VVC voxels do enable cross-participant picture classification (mean accuracy = 61.88%, SD = 17.71%, shuffled accuracy max = 3.2%, p < 0.001, Fig. 4 D ).
The preceding results established that activity patterns of voxels within the VVC carry picture-specific information during perception, we next examined if we can detect the pattern reinstatements of memory traces within the same area during the final memory test. We trained the SVC model based on the neuroimaging data from the pattern localization phase to classify the trial-by-trial memory content in the final test ( Fig. 4 C ). Results showed that the classifiers could decode memory content based on activity patterns during the final test with an accuracy (mean accuracy = 43.13%, SD = 16.52%, shuffled accuracy max = 3.3%, p < 0.001, Fig. 4 E ), although the accuracy is significantly lower than the within-task classification of the perceived visual stimulus (t(26) = − 3.97, p < 0.001, Cohen's d = − 0.76, Fig. 4 F ).
We ran two control analyses to test the robustness of observed pattern reinstatement in the VVC during retrieval. We first examined the effect of arbitrary thresholds used in cluster formation on the subsequent classification of memory contents. Specifically, we used the two additional thresholds (uncorrected p voxel = 0.01 and 0.05) to identify picture-sensitive voxels during the whole-brain searchlight analysis and confirmed that the classifications could also be performed based on picture-sensitive voxels under other thresholds (0.01 and 0.05) ( Fig.  S3 ). In addition, beyond picture-specific classifications, we investigated the possibility of category-specific classifications based on brain activity patterns. All of the pictures to be associated can be categorized as one of the four following groups: animal, human, object, or location. Similarly, we localized category-sensitive voxels within the VVC ( Fig. S4D ) and confirmed that these voxels also carry category-specific information during perception (mean accuracy = 69.13%, SD = 9.67%, shuffled accuracy max = 29.6%, p < 0.001, Fig. S4E ). Also, activity patterns of these category-sensitive voxels during memory retrieval could enable crossparticipant, cross-task classification of the category during final memory test (mean accuracy = 44.29%, SD = 8.9%, shuffled accuracy max = 30.4%, p < 0.001, Fig. S4E ).
Based on the same decoding pipeline, we performed a control pattern reinstatement analysis on activation patterns within the premotor cortex ( Fig. S6A ), which, according to the reinstatement model, is not expected to represent memory content during retrieval ( details see Supplemental Texts; Section 4 ). Even for the category-based decoding, which requires less information than the item-based decoding, activation patterns of this area during retrieval could not be used to classify memory contents (Fig. S6B).
Without considering the modulation of each association (i.e., retrieval, suppression, or control), we demonstrated pattern reinstatement of individual memories during retrieval after 24 h delay. Based on the differences in RT and confidence, we tested whether different modulations have different effects on the evidence (i.e., decoding accuracy or decision value ( Linde-Domingo et al., 2019 )) of memory reactivation. For example, if repeated retrieval increased the reactivation evidence, while suppression decreased the evidence). We performed these analyses based on classifier training in both cross-participant and withinparticipant manner. These analyses yielded no significant results between different modulations in all ROIs investigated ( Details in Supplemental Materials; Table S1-S4).
In sum, we identified picture-specific voxels within the VVC and demonstrated the pattern reinstatements of individual memory traces in these voxels during retrieval. The same pattern reinstatements were detected in anatomical-defined hippocampus, AG, SMG, and precuneus. These results are the foundations of our following multivariate pattern analysis: the pattern reinstatements 24 h after initial learning suggested that activity patterns of these regions during retrieval carry mnemonic representations.
Next, we confirmed that the observed activity reduction is related to a linear decrease in activity with repeated retrieval using the data from the modulation phase. Specifically, we extracted the beta coefficient from these clusters for each run of the modulation phase and tested for the change in activity amplitude across runs. We found reduced VVC activity over repeated retrieval attempts (F [4, 25] = 5.95, p < 0.001, 2 = 0.174). Similarly, for the bilateral hippocampus, we observed a trend toward a gradual decrease of hippocampal signal across repetitions (left hippocampus: F [4, 25] = 2.39, p = 0.056, 2 = 0.087 ; right hippocampus: F [4, 25] = 2.22, p = 0.072, 2 = 0.082). Even though we found the retrieval-related activity reduction in right AG and precuneus during the final test, we did not find the corresponding gradual decrease during modulation (right AG: F [4, 25] = 0.734, p = 0.571, 2 = 0.02; right precuneus: F [4, 25] = 1.88, p = 0.12, 2 = 0.05).
Repeated retrieval dynamically enhances the distinctiveness of activity patterns in the visual cortex, but not hippocampus: focusing on the identified VVC voxels, parietal lobe and hippocampus, we calculated the trial-by-trial activity pattern similarity for RETRIEVAL ASSOCIATIONS and CONTROL ASSOCIATIONS separately. Results show that retrievalrelated activity patterns for RETRIEVAL ASSOCIATIONS have decreased similarity in VVC compared to CONTROL ASSOCIATIONS (t(26) = − 2.3, p raw = 0.029, p FDR = 0.08, Cohen's d = − 0.44; Fig. 4 C ). To test the robustness of decreased pattern similarity for RETRIEVAL ASSOCIATIONS in the VVC , we performed the same contrast based on (1) all associations instead of only remembered association, the VVC areas defined by (2) different thresholds and (3) category-sensitive voxels instead of picture-sensitive voxels. All control analyses yield the same result as the reported main analysis ( Figs. S8-S10 ). However, we did not observe a similar effect in the hippocampus ( Fig. 6 H ), but failed to reach significance.
Our ROI analyses already found reduced activity amplitude, but more distinct activity patterns in VVC, right AG, and precuneus. Then we performed the correlational analysis to explore the relationship between changes in activity amplitude and changes in pattern similarity across participants. We found that participants who showed a larger reduction in VVC's activity amplitude were more likely to show a larger decrease in VVC pattern similarity ( r = 0.610, p < 0.001; Fig. 5 C ). This correlation is also significant for right precuneus ( r = 0.427, p = 0.026), but not for right AG ( r = − 0.051, p = 0.799).
To characterize the dynamic modulation of pattern similarity in the VVC, we further applied the same variability analysis to each run of the modulation phase and analyzed these pattern similarity values using a 2 × 5 ANOVA ( modulation; repetition ). We saw a significant main effect of run , reflecting that pattern similarity of the VVC decreased with repetitions (F [4, 100] = 10.55, p < 0.001, 2 = 0.028). We also saw a main effect of modulation , reflecting that pattern similarity of the RETRIEVAL ASSOCIATIONS is consistently lower than the similarity of SUPPRES-SION ASSOCIATIONS (F [1, 25] = 23.77, p < 0.001, 2 = 0.028). The interaction between modulation and runs just failed to be significant (F [4, 100] = 2.427, p = 0.053, 2 = 0.001; Fig. 5 D ). This pattern of results suggests that decreased pattern similarity is not only the result of repetition: even though memory cues of SUPPRESSION ASSOCIATIONS have also been presented ten times during the modulation, repeated retrieval more effectively enhanced pattern distinctiveness compared to suppression. We applied the same dynamic modulation analysis to the ROIs, which demonstrated lower cross-item pattern similarity for RETRIEVAL ASSOCIATIONS (i.e., right AG, left SMG, and bilateral precuneus) during the final memory test phase, but we found no evidence for an interaction between modulation and runs (right AG:

Retrieval suppression was associated with reduced lateral prefrontal activity
Weaker lateral prefrontal cortex (LPFC) activation as the result of retrieval suppression: the contrast between retrieval of SUPPRESSION  ASSOCIATIONS and CONTROL ASSOCIATIONS during the final test revealed decreased activation oin one cluster in the left LPFC ( x = − 52, y = 38, z = 16, Z peak = 4.09, size = 1320 mm 3 ; Fig. 7 A ). We did not find any significant effect of retrieval suppression on hippocampal activity amplitude in the whole-brain or the ROI analy- To characterize dynamical activity changes in the left LPFC, we extracted beta values from the cluster for each modulation run and did not find decreased activity from the first to the fifth run during suppres-sion (F [4, 25] = 2.03, p = 0.09, 2 = 0.056; Fig. 7 B ). Subsequently, we performed an exploratory analysis to restrict analysis within the first four runs and found a gradually decreased activity in the left lPFC (F [3, 25] = 2.98, p = 0.036, 2 = 0.078).
Intact neural representations after memory suppression: next, we examined if retrieval suppression modulated activity patterns in the VVC, hippocampus, or parietal lobe. Pattern similarity analysis revealed no significant difference between SUPPRESSION ASSOCIATIONS and  effect of memory suppression on final memory performance, but the strong correlation between the intrusion slope and suppression-induced forgetting, we further investigated suppression-induced changes in pattern similarity among participants who showed strong negative intrusion slopes and (by correlation) more suppression-induced forgetting. More specifically, we used the median split method to divide the data of all participants into two groups (strong suppression group vs. weak suppression group) according to their intrusion slope value and compared changes in pattern similarity between groups. Our results suggested that both groups did not demonstrate differential suppressioninduced changes in changes in pattern similarity for all ROIs investigated ( Table S6 ).

Discussion
Active memory retrieval is known to be a powerful memory enhancer, while memory suppression tends to prevent unwanted memories from further retrieval. Previous neuroimaging investigations of the neural effect of repeated retrieval and suppression revealed corresponding neural changes in both univariate activity analysis and multivariate activity patterns analysis. Building on these findings, we tested whether similar neural changes can be detected when modulation is delayed by 24 h (i.e., newly acquired memories have undergone the initial consolidation). Also, because we collected fMRI data from both the modulation phase and the final memory test, this design allowed us to perform dynamic analysis on whether the neural changes seen in the final memory test are accompanied by gradual changes during the modulation phase. Similar to previous literature ( Ferreira et al., 2019 ), our results demonstrated that repeated retrieval of consolidated memories was associated with enhanced episode-unique mnemonic representations in the parietal lobe. Critically, our dynamic analysis provided converging evidence for the adaption of stronger mnemonic representations in visual processing areas, which were involved in the initial perception. Our results suggested that repeated retrieval of newly acquired memory and initially consolidated memory may be associated with similar neural changes.
Repeated retrieval strengthened consolidated memories. Behaviorally, our results demonstrate that, after an initial delay of 24 h, repeated retrieval strengthened memories further, indexed by higher recall confidence and shorter reaction times. The beneficial effect of retrieval practice on the subsequent retrieval is well established ( Karpicke and Blunt, 2011 ;Karpicke and Roediger, 2008 ;Karpicke and Roediger III, 2007 ;Smith et al., 2016 ). In our study, memory accuracy was already near the ceiling level, and thus we did not find higher recall accuracy of RETRIEVAL ASSOCIATIONS compared to CONTROL ASSOCIATIONS . Corroborating the behavioral effect during the final memory test, we also found that repeated retrieval of certain memories increased their tendency to remain stable in mind during the modulation phase.
Repeated retrieval is associated with subsequent decreasing activity amplitude. Our whole-brain univariate analysis revealed a set of brain regions, including frontal, parietal (mainly precuneus), and ventral visual areas that showed decreasing activity amplitude with repeated retrieval. Activity changes in frontal and parietal areas have been reported frequently in the literature of retrieval-mediated learning/forgetting, but the directions of the reported changes are mixed. Some of the reports have found similar univariate decreases in frontal or parietal areas ( Kuhl et al., 2010 ;Wimber et al., 2011Wimber et al., , 2008, but others reported activity increases in these areas ( Himmer et al., 2019 ;Nelson et al., 2013 ;van den Broek et al., 2016 ;Wirebring et al., 2015 ). In addition to the whole-brain analysis, our ROI analysis further showed decreased activity in the right angular gyrus. In sum, our study mainly found decreased activity in frontal and parietal areas after repeated retrieval of initially consolidated memories. Moreover, decreased activity in ventral visual areas is a novel finding. Previous studies usually used words as materials to be remembered ( Nelson et al., 2013 ;Wimber et al., 2011Wimber et al., , 2008Wirebring et al., 2015 ), while we used pictures. One other study also used pictures and the TNT paradigm but did not reveal reliable activity changes for retrieved pictures compared to the controlled pictures ( Gagnepain et al., 2014 ). To test the fast-consolidation hypothesis of retrieval-mediated learning ( Antony et al., 2017 ), we further examined changes in hippocampal activity during modulation and final test. Similar to a recent report of slow hippocampal disengagement during repeated retrieval ( Ferreira et al., 2019 ), we found dynamically decreasing hippocampal activity across repeated retrieval for initially consolidated memories. Our results, together with findings of Ferreira and colleagues, are consistent with decreasing retrieval-related hippocampal activity over the course of consolidation ( Takashima et al., 2009( Takashima et al., , 2006. Repeated retrieval enhanced episodic-unique cortical representations. Our multivariate pattern analysis showed that compared to controls, repeated retrieval led to less similar activity patterns in ventral visual areas, and almost all parietal ROIs, including AG, SMG, and precuneus. Using a conceptually similar method, Ferreira and colleagues also reported increased item-unique activity patterns in parietal regions across two days ( Ferreira et al., 2019 ). Ye and colleagues further showed how retrieval practice led to memory updating by differentiating activity patterns in the mPFC ( Ye et al., 2020 ). These results together may suggest the interaction between the effect of repeated retrieval and episodicunique neural representations during fast formation of cortical memories. Similar representational dissimilarity analysis has been used to analyze patterns of activity during retrieval suppression ( Gagnepain et al., 2014 ). However, after the modulation, participants of this study only performed a visual perception task, which measures repetition priming instead of a direct measure of memory. Therefore, it is impossible to directly compare the trial-by-trial pattern similarity during retrieval between RETRIEVAL and CONTROL associations.
One novel aspect of our findings is that after repeated retrieval, we found the decreased retrieval-related activity amplitude correlated with enhanced distinctiveness of activity patterns in ventral visual areas and precuneus. Our dynamic analysis of these two neural measures during modulation and subsequent memory test confirmed further that the neural changes observed during the later test are associated with dynamic adaptation of activity amplitude and pattern similarity during modulation in the ventral visual areas. However, this is not true for the precuneus. In general, this pattern of results is in line with our knowledge about how preexisting associative memory shapes brain responses. Prior information about upcoming stimuli is often associated with overall lower activity amplitude, a phenomenon termed "expectation suppression " ( Summerfield et al., 2008 ;Summerfield and de Lange, 2014 ). At the same time, underlying activity patterns carry more visual information ( de Lange et al., 2018 ;Kok et al., 2012 ). By correlating these two neural changes in the same regions, our study reported a similar phenomenon during episodic memory retrieval. This finding suggests that the inverse relationship between overall activity amplitude and patternbased information representation holds not only for low-level perceptual memory but also for episodic memory retrieval. Moreover, the correlation between the activity amplitude and pattern similarity may also be understood from a "noise correlations " perspective in information processing ( Averbeck et al., 2006 ;Cohen and Kohn, 2011 ). A recent simultaneous EEG-fMRI study found that decreased alpha/beta power, as a potential marker of the reduced noise correlations, was associated with increased stimulus-specific activation patterns measured by representation similarity analysis ( Griffiths et al., 2019 ). We speculate that retrieval practice might not directly enhance memory representations, but affect them by reducing their noise correlations. During retrieval of strengthened memories, redundant ongoing neuronal activity (i.e., noise) may be suppressed. Therefore, we observed lower overall activity amplitude and, at the same time, reduced "noise correlation , " boosting the signal-to-noise ratio. Thus, stimulus-specific neural patterns are reinstated with more specificity, demonstrating lower pattern similarity across distinct trials.
Retrieval suppression inhibited lateral prefrontal activity during subsequent retrieval. For SUPPRESSION ASSOCIATIONS , we observed lower LPFC activity amplitude, but relatively intact activity patterns in visual areas, parietal lobe, and hippocampus during subsequent retrieval. Active memory suppression during retrieval is proposed to be partially supported by inhibitory control mechanisms mediated by the lateral prefrontal cortex ( Anderson and Hanslmayr, 2014 ;Guo et al., 2018 ). During retrieval suppression, LPFC is typically activated ( Anderson, 2004 ;Guo et al., 2018 ;Levy and Anderson, 2012 ), but it showed gradually decreasing activity amplitudes from early suppression attempts to the later trials of suppression ( Depue et al., 2007 ). Consistent with this pattern, we found a similar decrease in LPFC activity amplitude across suppression attempts during the modulation phase and lower activity amplitude during the subsequent retrieval. Together with the trial-bytrial intrusion frequency rating during modulation, this activity decrease across suppression attempts may suggest less inhibitory control demands when suppressing increasingly weakened memories. The observed reduction in LPFC activity during the subsequent retrieval might be a long-lasting effect of this reduced activity amplitude and suggests that modulated cognitive control allocation hampers retrieval. Another interesting observation is that we found weak evidence for suppressioninduced changes in pattern reinstatement during the final memory test. Even though the involvement of the LPFC-hippocampal circuit in suppression has been examined ( Anderson and Hanslmayr, 2014 ;Guo et al., 2018 ), the changes in neural representations of individual memory trace underlying suppression-induced forgetting remain less well studied. One study measured the effect of retrieval suppression on newly acquired visual memories via cortical inhibition ( Gagnepain et al., 2014 ) and this study found that retrieval suppression reduced activity amplitude in the fusiform gyrus compared to retrieval, but the pattern was opposite to the one found in the lateral occipital complex. Effective connectivity and pattern similarity analysis suggested that top-down control mediated by the middle frontal gyrus suppressed perceptual memory traces in the visual cortex. Our study did find the comparable suppressioninduced changes in activity amplitude but not mnemonic representations in the visual cortex. This may relate to the modest behavioral effects or less labile consolidated memory traces. Future studies with stronger suppression-induced forgetting effects can directly compare activity patterns between still-remembered associations and forgotten associations.
Limitations. Our study has a few limiting aspects that should be mentioned. Firstly, given that we only found a modest effect of suppressioninduced forgetting, it is difficult to interpret repeated suppressionrelated fMRI results. There are at least two possible reasons for this modest effect: first, due to extensive training during encoding and/or the nature of our picture-location tasks, recall accuracy for all conditions was close to the ceiling level. Second, the suppression-induced forgetting effect is much smaller when memories have been consolidated ( Liu et al., 2016 ). Thus, in line with previous studies, suppression-induced forgetting may not have emerged as the group level ( Gagnepain et al., 2017 ;Liu et al., 2016 ). Nevertheless, we replicated two findings, confirming that our memory suppression modulation was still effective. First, when unwanted memories were suppressed repeatedly, their tendency to intrude was reduced during the TNT phase ( Benoit et al., 2015 ;Gagnepain et al., 2017 ;Hellerstedt et al., 2016 ;Levy and Anderson, 2012 ;van Schie and Anderson, 2017 ). Second, the extent of this reduction (i.e., intrusion slope) correlated with subsequent suppressioninduced forgetting effect across participants ( Levy and Anderson, 2012 ). Given this correlation, we further compared suppression-induced neural changes between a strong and a weak suppression group, but still did not find an effect of suppression on mnemonic representations. These results may suggest that even for participants who showed suppressioninduced forgetting, the underlying mnemonic representations remain intact. A second potential limitation of our study is that we only found the effect of repeated retrieval on trial-by-trial pattern similarity instead of the more direct measure of memory reactivation, such as decoding accuracy or decision value ( Linde-Domingo et al., 2019 ). Therefore, the relationship between the reduction in univariate activity and enhanced multivariate representation can be interpreted from two different perspectives. On the one hand, it can be explained as the enhanced unique cortical memory representations. On the other hand, the reduction in across-item pattern similarity could be due to factors, for example, the reduced memory unrelated "noise correlations ". It is noticeable that our pattern reinstatement analysis demonstrated that, based on activity patterns in our ROIs, the individual picture can be decoded when the classifier was trained on the localizer data (day1) before testing it on the final memory test (day2). This reinstatement laid the groundwork for our pattern similarity calculation because there is evidence that these activity patterns used in the variability analysis carry item-specific mnemonic information during retrieval. However, when we divided all associations into three groups (i.e., retrieval, suppression, and control), we did not find the evidence for the idea that retrieval or suppression can separately modulate decoding accuracies or d values, but that all three kinds of associations showed comparable decodability during retrieval. This result ruled out the possibility that could fully explain the differences in our pattern similarity measure. These results may suggest that decoding accuracies or d values used here were not sensitive enough after initial consolidation, because perceptual information might already be based on the transformed representation ( Xiao et al., 2017 ). In addition, decoding outcomes and pattern similarity may associate with different aspects of mnemonic representations. Sensitive decoding depends on the reinstatement of the original representation related to the perceptual input, while pattern similarity reflects episode-unique activity patterns across retrieved "mental images ". Enhanced episode-unique representations after repeated retrieval, particularly in the visual processing areas, support the following notion. Given that our memory cues (i.e., highlighted locations) are visually very similar, the changes in pattern similarity in visual areas are more likely to be the result of enhanced mnemonic reinstatements instead of variability induced by visual features of memory cues. Thirdly, when using a conservative correction for the number of ROIs tested, contrasts of parietal areas only showed only considerable trends toward significance, although the individual test is significant. We believe that trends in parietal areas could be caused by the definitions of our ROIs are based on the coarse atlas at the group level. That is to say, for each participant, maybe only part of the parietal ROIs is involved in the retrieval processing.
Conclusion. Taken together, our study probed the effects of repeated retrieval and suppression on initially consolidated memories. We showed that repeated retrieval dynamically reduces the activity amplitude in the visual cortex and hippocampus while enhances the distinctiveness of activity patterns in the visual cortex and parietal lobe. Moreover, reduction in activity amplitude correlated with the enhancement of episode-unique mnemonic representations in visual areas and precuneus. By contrast, repeated suppression, as done here, was associated with the reduced lateral prefrontal activity, but intact mnemonic representations. These findings extended our understanding of neural changes underlying memory modulations from newly acquired memories to initially consolidated memories and suggested that active retrieval may strengthen episode-unique information neocortically after initial encoding and also consolidation.

Declaration of Competing Interest
The authors declare no competing interests.