Hippocampus plays a role in speech feedback processing

There is increasing evidence that the hippocampus is involved in language production and verbal communication, although little is known about its possible role. According to one view, hippocampus contributes semantic memory to spoken language. Alternatively, hippocampus is involved in the processing the (mis)match between expected sensory consequences of speaking and the perceived speech feedback. In the current study, we re-analysed functional magnetic resonance (fMRI) data of two overt picture-naming studies to test whether hippocampus is involved in speech production and, if so, whether the results can distinguish between a “pure memory ” versus a “prediction ” account of hippocampal involvement. In both studies, participants overtly named pictures during scanning while hearing their own speech feedback unimpededly or impaired by a superimposed noise mask. Results showed decreased hippocampal activity when speech feedback was impaired, compared to when feedback was unimpeded. Further, we found increased functional coupling between auditory cortex and hippocampus during unimpeded speech feedback, compared to impaired feedback. Finally, we found significant functional coupling between a hippocampal/supplementary motor area (SMA) interaction term and auditory cortex, anterior cingulate cortex and cerebellum during overt picture naming, but not during listening to one’s own pre-recorded voice. These findings indicate that hippocampus plays a role in speech production that is in accordance with a “prediction ” view of hippocampal functioning.


Introduction
There is growing evidence that the medial temporal lobe (MTL), particularly the hippocampus, also contributes to speech production and verbal communication ( Duff et al., 2008 ;Duff and Brown-Schmidt, 2012 ;MacKay and Johnson, 2013 ;Hamamé et al., 2014 ;Covington and Duff, 2016 ;Llorens et al., 2016 ;Piai et al., 2016 ;Kepinska et al., 2018 ). These findings stand in stark contrast to the traditional view that language is largely a fronto-temporal neocortical function that operates independently of the MTL. Early evidence against this view was provided by observations of language production deficits in amnesic patients with MTL damage ( Corkin, 1984 ;MacKay et al., 1998 ;Duff et al., 2008 ), although these findings could also be accounted for by alternative explanations that did not include MTL damage. Further supporting evidence for a hippocampal role in language processing has come from functional magnetic resonance imaging (fMRI) studies of healthy volunteers ( De Zubicaray et al., 2014 ;Blank et al., 2016 ;Llorens et al., 2016 ), as well as electrophysiological recordings in epilepsy patients ( Hamamé et al., 2014 ;Piai et al., 2016 ), showing changes in hippocampal activity during language production or comprehension tasks. In addition, the hippocampus (and neighboring structures) may be functionally connected to cortical language ar-production by comparing what we intended to say to the perception of our own speech output (speech feedback) ( Levelt, 1983 ;Indefrey and Levelt, 2000 ;Schiller and de Ruiter, 2004 ). A mismatch between actual and predicted sensory (auditory) consequences of speech acts can result in updating internal models of speech content or output in order to optimize future speech acts. Such a mismatch can result from internal causes, such as (incidental) poor organization or control of motor functions while stuttering, or external causes, in which loud environmental noise masks or alters speech feedback. Adopting a "pure memory " view of hippocampal function one would predict no alternation of hippocampal activity with mismatching speech feedback, as the hippocampus contributed semantic or associative memory content to the speech formation, while other brain areas monitor how well the speech production matches the sensory consequences of that output (e.g., Tourville et al., 2008 ;Hickok, 2012 ). From the perspective of the "prediction " view of hippocampal function, impaired speech feedback will alter hippocampal activity, as the hippocampus is actively involved in processing the degree of mismatch between perceived feedback to sensory predictions derived from memory ( Kumaran and Maguire, 2009 ;de Lange et al., 2018 ). In the current study, we tested the involvement of the hippocampus in speech monitoring by analysing whether hippocampal activity as well as functional connectivity with areas of the speech monitoring network changes with changing speech feedback quality.
Studies that investigated the neural correlates of monitoring speech production showed increased activity in auditory cortex and superior temporal gyri, lateral and medial prefrontal cortex and premotor areas, when speech feedback was impaired or interrupted, compared to unimpeded speech feedback ( McGuire et al., 1996 ;Hirano et al., 1997 ;Hashimoto and Sakai, 2003 ;Heinks-Maldonado et al., 2005 ;Christoffels et al., 2007Christoffels et al., , 2011Tourville et al., 2008 ;Zheng et al., 2010 ). This effect is not observed when participants listen to their own prerecorded voice ( Zheng et al., 2010 ;Christoffels et al., 2011 ). These findings imply that the motor system that plans and coordinates speech acts provides a forward model of the sensory consequences of those speech acts ( Hickok, 2012 ;Skipper et al., 2017 ), which is then compared to sensory feedback representations in auditory cortex. Indeed, several components of the (sub)cortical motor system have been implicated in providing such a predictive feed-forward signal for sensory comparison in speaking as well as other self-initiated actions ( Wolpert et al., 1995 ;Blakemore et al., 1998 ;Heinks-Maldonado et al., 2005 ;McNamee and Wolpert, 2019 ). Particularly, the supplementary motor area (SMA) may be important in this process. Neurophysiological recordings showed that SMA neurons code for future movements when they are part of a learned sequence of movements ( Tanji and Shima, 1994 ). Further, SMA activity in fMRI correlates with the voluntary generation of covert auditory speech (speech imagery) ( Lima et al., 2016 ) as well as the involuntary generation of auditory verbal hallucinations of voices in schizophrenia ( van de Ven, 2012 ). Human clinical studies showed that lesions of the SMA resulted in diminished speech control ( Jonas, 1981 ;Alario et al., 2006 ;Hertrich et al., 2016 ) while transcranial magnetic stimulation (TMS) of (pre-)SMA resulted in decreased word production or control of oral gestures ( Tremblay and Gracco, 2009 ), as well as impaired monitoring of self-generated actions ( Haggard and Whitford, 2004 ;Moore et al., 2010 ). Several fMRI studies showed increased SMA activity (and of other frontal and temporal areas) when healthy participants made speech production errors, compared to correct speech acts ( Abel et al., 2009 ;Gauvin et al., 2016 ). Finally, functional connectivity between SMA and speech perception areas is increased when sensory feedback during speaking is masked, compared to when it is unimpeded ( van de Ven et al., 2009 ), but decreased when articulation errors in degenerative speech disorders increase ( Botha et al., 2018 ), which provides further evidence that speech monitoring and control encompass a neural interaction between sensory and motor systems.
If the hippocampus contributes to speech monitoring, then it can be expected that hippocampal activity or connectivity with auditorymotor areas involved in speech monitoring changes when speech feed- back is impaired. To test this hypothesis, we re-analysed brain activity from two previously published speech monitoring studies in which speech feedback was masked by varying levels of superimposed acoustic noise. Importantly, both studies contained the same conditions of noise-masked and noise-free speech feedback, which allowed the data to be aggregated and analysed with the same statistical contrast. A further benefit was that combining the datasets doubled the sample size and consequently the statistical power. We analysed the combined dataset in three ways. First, we investigated whether hippocampal activity changed when speech feedback was masked, compared to when feedback was unimpeded. A change in hippocampal activity when speech feedback conditions changed can be considered as evidence for the "prediction " view of hippocampal functioning in speech production. Second, we investigated whether hippocampus was functionally connected to the auditory cortex and auditory-motor interaction during speech feedback conditions. Task-dependent functional coupling between auditory cortex and hippocampus would be evidence that hippocampus contributes to the mismatch processing in auditory cortex during speech feedback. Finally, we explored the whole-brain functional network that was associated to the auditory-motor interaction, in order to investigate which areas would be sensitive to an associative mismatch signal that would be represented by the functional coupling between SMA and hippocampus.

Methods
For this study, we re-analysed two fMRI datasets, Study 1 ( Christoffels et al., 2007 ;N = 12) and Study 2 ( Christoffels et al., 2011 ;N = 11), in both of which participants completed a similar speech monitoring task. Data of both studies were collected using comparable scanning parameters at equal magnetic field strengths (see Table 1 ). The aggregated data sample contained 23 participants, all of who provided informed consent prior to the start of the respective experiment. Local ethics committees of Nijmegen University Medical Center (Study 1) or the Faculty of Psychology and Neuroscience (Study 2) approved the studies.
The experimental designs of both studies were comparable. Participants had to overtly pronounce the name of an object that was shown as a black-and-white line drawing (picture-naming conditions, PN), or heard their own pre-recorded voice speaking out the name of the presented object (listening conditions, LIS). Fig. 1 provides a schematic overview of the task design for both studies. During the experiment, acoustic noise was superimposed during either overt picture naming or listening at various volume levels (dB). In Study 1, the acoustic noise volume was either 0 dB or at an individually tailored volume at which participants could not hear their own speech feedback while overtly speaking (max dB). In Study 2, acoustic noise volumes varied parametrically between 0 and max dB across four parametric levels during picture naming (PN0, PN1, PN2 and PNMax) or listening (LIS0, LIS2, LIS3 and LISMax), which was again individually tailored. A further difference between the studies was the presentation of a covert naming condition in In both studies, participants saw pictures and either overtly named them (speaking conditions) or listened to their own pre-recorded naming of those pictures (listening conditions) while acoustic noise was superimposed at various volume levels. In Study 1, noise levels varied between 0 (no superimposed noise) and maximum level (black boxes), while in Study 2 noise levels varied parametrically across four levels (grey box). For the analysis, we chose the Voice only and Noise only conditions that overlapped in both studies.
Study 1, which was absent from Study 2. This condition was modelled in the analyses but always excluded from statistical contrasts (see below).

Imaging parameters
MR images of both studies were collected on a 3T scanner from the same manufacturer (Siemens Medical Systems; Study 1: Magnetom Trio whole-body at Nijmegen, Study 2: Magnetom Allegra head-only at Maastricht). In both studies, functional volumes were acquired using a T2 * -weighted echoplanar imaging (EPI) sequence optimized for blood oxygenation level-dependent (BOLD) contrast (see Table 1 for parameter values of each study). The TR included a silent gap (sparse sampling method; Hall et al., 1999 ; see Table 1 ) to allow participants to overtly pronounce the name of the visually presented object or hear a pre-recording thereof without interference of scanner noise during EPI acquisition. High-resolution (1 × 1 × 1 mm 3 ) anatomical scans were acquired using a T1-weighted 3D MP-RAGE (Study 1: 192 sagittal slices, TR = 2.3s, TE = 3.93ms) or an MDEFT sequence (Study 2: 192 sagittal slices, TR = 7.92s, TE = 2.4ms).

Image analysis
All anatomical and functional images were preprocessed using BrainVoyager QX ( Goebel et al., 2006 ) and the NeuroElf toolbox ( www.neuroelf.net ) and custom-written scripts in Matlab ( www.mathworks.com ). Preprocessing functional images included slice scan time correction, three-dimensional (3D) head-movement assessment and correction using rigid body transformations, linear trend removal and temporal high-pass filtering (cutoff at appr. 0.004 Hz). The estimated translation and rotation parameters for head-movement were inspected and never exceeded 3 mm or degree. No spatial smoothing was applied. Preprocessed functional timeseries were coregistered to the within-session anatomical 3D dataset using position parameters from the scanner and manual adjustment, and were subsequently spatially normalized to Talairach space ( Talairach and Tournoux, 1988 ) at an iso-voxel resolution of 3 × 3 × 3 mm 3 .
The functional data were then further cleaned using a temporal principal component analysis-based correction procedure (CompCor; Behzadi et al., 2007 ). For each participant, we first estimated temporal noise sources (based on temporal signal-to-noise estimates) from the functional timeseries using pre-defined anatomical templates of the ventricles and white matter, as obtained by tissue segmentation procedures. The temporal noise sources were then subsequently removed from the functional imaging data using a least-squares solution. The normalized and cleaned data were then analysed using a region-of-interest (ROI) approach.

Regions of interest (ROIs)
ROIs were defined for bilateral auditory cortex (AC), bilateral SMA and bilateral hippocampus in the following ways. To optimize localization of the speech monitoring effect in AC, we used the empirical result of the difference between PN0 and PNMax conditions of Study 1. We note that this selection does not pose a non-independence threat to our current analysis, because 1) this ROI had not been used previously for the analysis of Study 2, and 2) the main target of the current study was to analyse hippocampal activity and functional connectivity. For hippocampal and SMA ROIs, we used the respective maps of the probability atlas by Eickhoff and colleagues ( Eickhoff et al., 2005 ). Specifically, we defined SMA by taking two spheres of 6 mm radius at the three-dimensional center of the left and right Brodmann Area 6 (BA6) ( Geyer, 2004 ). We defined left and right hippocampus by thresholding the probability maps at 60% . The SMA and hippocampus maps were aligned to the Talairach spatial template using a 12-parameter affine transformation. The ensuing ROIs superimposed on an average template of all participants are shown as insets in Figure 2 , and details are listed in Table 2 .

Functional data analysis
Functional timeseries were sampled from and averaged for each of the three ROIs and analysed for changes in signal amplitudes and functional connectivity across the relevant speech feedback conditions. For the analysis of signal amplitudes, a general linear model (GLM) was defined that modeled all block types of the respective experiment as separate regressors, which were subsequently delayed and smoothed using a canonical two-gamma hemodynamic response function (HRF) to accommodate for the hemodynamic delay. The GLM was fitted to the region-of-interest data (ROI analysis, see below) of each participant separately (first-level analysis) using ordinary least-squares regression and the ensuing coefficients were subsequently analysed for significant differences at the population level (second-level analysis). To assess the effect of speech feedback in a comparable way across the two studies, we estimated conditional contrasts in the following way. For Study 1, we contrasted overt picture naming without and with noise masking (i.e., PN0 -PNMax), setting all other conditions to 0 (that is, covert PN, Lis0 and LisMax; speech monitoring contrast = [ − 1 1 0 0 0]). For Study 2, we contrasted overt speaking without noise masking with the maximum noise masking level (i.e., PN0 -PNMax), setting all other conditions to 0, including the intermediate noise levels (i.e., PN1, PN2, Lis0 through LisMax; contrast = [ − 1 0 0 1 0 0 0 0]). We then pooled the data of both studies in order to calculate the statistical effect of the contrast PN0 > PNMax.
Many studies have shown hemispheric lateralization of the neural correlates of speech production to the left hemisphere ( Schirmer et   ROIs are shown on anatomical brain slices as insets. Study 1 data are marked in grey diamonds, Study 2 data are marked in white circles. * P < 0.05, * * * P < 0.001. Behroozmand et al., 2015 ). To assess whether speech monitoring was lateralized in our study, we compared the PN0 -PNMax contrast between the left and right counterparts of the ROIs that showed a significant speech monitoring for the bilateral areas.
To control for any effect of the source of the data (i.e., Study 1 vs. Study 2), the pooled data were analysed using a mixed effects analysis of covariance (ANCOVA) model, with Study as between-subject factor. Further, we calculated the Bayes Factor (BF 10 ) for the between-subject difference between the two studies ( Wagenmakers, 2007 ), in which a BF 10 < 1 indicates that the data are more likely to derive from the null hypothesis (data does not differ between the two studies) than the alternative hypothesis (data differs between studies). Bayes Factor was calculated using JASP version 0.11 ( JASP Team, 2016 ).
Functional connectivity between hippocampus and areas involved in speech monitoring were analysed using a psycho-and a physiophysiological interaction model ( Friston et al., 1997 ;O'Reilly et al., 2012 ). First, we tested whether functional coupling between hippocampus and auditory cortex changed as a function of speech feedback condition (i.e., PN0 > PNMax; psycho -physiological interaction, psycho-PI). Here, the contrast between the PN0 and PNMax conditions formed the psychological variable, the mean-centered hippocampal activity as the physiological variable and the point-by-point product between the (mean-centered) psychological and physiological variables as the interaction. Of note, the psychological and interaction terms contained non-zero values only for the timepoints that corresponded to the contrasted conditions, while all other timepoints were set to 0. Second, we tested whether the functional coupling between hippocampus and SMA was correlated with auditory cortex activity during speech feedback conditions ( physio -physiological interaction, physio-PI; Menon and Levitin, 2005 ). Similar to the psycho-PI, the interaction term was obtained by a point-by-point product of the mean-centered physiological variables (i.e., hippocampal and SMA activity). Further, for all PPI variables we set values of the other conditions to 0.
For both PPI models, the (mean-centered) psychological and physiological terms were de-convolved prior to calculating the interaction term, which was then convolved again, using the canonical two-gamma HRF. Further, to avoid that the other task conditions served as a common source of correlation between the auditory cortex and PPI variables, we appended the task regressors to each of the PPI models.
Finally, the statistical significance of the ROI PPI models of the data pooled across the two studies was calculated using one-sample t-tests, and Bayes Factor values were estimated to verify that the distributions of PPI interaction terms did not differ between the two studies (BF 10 < 1).

Whole-brain physio-PI
In an exploratory analysis, we calculated the whole-brain functional network associated to the hippocampus x SMA connectivity at the voxel-level. To this end, we applied the hippocampus × SMA physio-PI model (of conditions PN0 and PNMax) as described above using a mass-univariate two-level hypothesis test that included all brain voxels. The physio-PI model was appended with task regressors to remove the effects of the task as functional connectivity source. At the first level, functional data were pre-cleaned using the CompCor nuisance terms that we previously estimated. For each participant, the coefficient map of the physio-PI interaction term was smoothed using a Gaussian kernel of 8 mm full-width at half-maximum (FWHM), which then served as entry to the second-level analysis (a mass-univariate voxel-by-voxel one-sample T -test of the subject-level coefficient distribution against 0, df = 22). The second-level statistical map was thresholded at a voxellevel p-value of 0.005 and a minimum cluster size of 1053 mm 3 , as estimated by a cluster threshold algorithm (1000 Monte Carlo simulations using the inherent spatial smoothness of the T-map, at a false-positive rate of 0.05, in BrainVoyager ( Goebel et al., 2006 )). Voxel values that survived the statistical thresholds were superimposed on an anatomical template for visualization. Fig. 2 shows the changes in fMRI activity for each of the three ROIs (AC in blue, hippocampus in yellow, SMA in red) when speech feedback was not masked (labeled as Voice ) and when it was masked with superimposed noise (labeled as Noise ).

ROI activity
For bilateral auditory cortex, we found significant activity above resting baseline for both speech feedback conditions (PN0 -baseline: T(22) = 5.35, P < 0.001; PNMax -baseline: T(22) = 15.50, P < 0.001). Further, activity decreased when participants heard their own voice as unimpeded speech feedback, compared to when feedback was masked by acoustic noise (PN0 -PNMax, controlling for Study: F(1,21) = 72.15, P < 0.001, 2 = 0 . 78 ). The between-subject factor Study was not significant (F(1,21) = 1.02, P = 0.32), and the finding of BF 10 = 0.95 further confirmed that the data did not differ between the two studies. To investigate lateralization of the speech monitoring effect in the auditory cortex, we compared the PN0 -PNMax contrast in the left AC to the right, Fig. 3. PPI results. Scatter plots show psycho-PI (A) and physio-PI (B) interaction terms for the speech and listening conditions. Grey (white) symbols indicate data of Study 1 (Study 2); error bars to the right of each distribution represent the 95% confidence interval of the pooled data. (A) ROI-based psycho-PI shows significant change in hippocampal-auditory cortex connectivity as a function of speech conditions but not during listening. (B) ROI-based physio-PI shows significant hippocampal-SMAauditory functional connectivity during picture naming but not during listening. (C) Wholebrain physio-PI shows significant functional connectivity in dorsal ACC and cerebellar cortex. * P < 0.05, * * P < 0.01. and found no significant effect (controlling for Study; F(1,21) = 0.18, P = 0.68). These findings replicate the previously published results.
For bilateral hippocampus, we found that activity decreased below resting baseline when participants' speech feedback was masked (PN-Max -baseline: T(22) = − 4.14, P < 0.001), but not when feedback was unimpeded (PN0 -baseline: T(22) = − 1.43, P = 0.17). This difference of activity between the two speech feedback conditions was significant (PN0 -PNMax, controlling for Study: F(1,21) = 5.51, P = 0.029, 2 = 0.21). Further, the data did not differ between the two studies (between-subject factor Study, F(1,21) = 2.79, P = 0.11; BF 10 = 0.58). We further tested for laterality of the speech monitoring contrast in hippocampus, and found no significant effect (F(1,21) = 1.73, P = 0.20). These effects were not previously reported for these datasets, which for Study 1 may have been the result of limited statistical power due to a small sample size in combination with voxel-level multiple comparison corrections (for Study 2, we previously only studied auditory cortical responses).
When comparing listening conditions with vs. without superimposed noise (LIS0 -LISMax), we found no significant differences between the conditions in any of the three ROIs (see Fig. 2 (B); all p s > 0.20), suggesting that the feedback effect in auditory cortex and hippocampus only occurred during overt speaking.

Psycho-physiological interaction (psycho-PI)
Next, we analysed whether the functional coupling between hippocampus and auditory cortex changed as a function of changing feedback condition during speech (i.e., PN0 > PNMax) or listening conditions (LIS0 > LISMax). Fig. 3 (A) depicts the psycho-PI interaction term values for each participant in the speech and listening conditions. We found that the psycho-physiological interaction term significantly explained auditory cortex activity (controlled for task conditions and hippocampus activity as main effects; T(22) = 2.65, P = 0.015, Cohen's d = 0.54), in which hippocampal-auditory cortex connectivity decreased during noise-masked speech feedback, compared to unimpeded speech feedback. Bayes Factor analysis showed that the PPI interaction term did not differ between the two studies (BF 10 = 0.38). When the psycho-PI was applied to the listening conditions with and without noise masking, the interaction term was not significant (T(22) = − 0.95, P = 0.35), indicating that the functional coupling between hippocampus and auditory cortex was specific to overt speech production. Again, Bayes Factor analysis showed no effect of Study (BF 10 = 0.41). Further, the psycho-PI terms were significantly different between the speech and listening conditions (mixed-effects ANCOVA controlling for Study: F(1,21) = 12.72, P = 0.002, 2 = 0.38).

Physiological-physiological interaction (physio-PI)
We then analysed whether the functional coupling between SMA and hippocampus explained activity in auditory cortex. We found that the hippocampus x SMA interaction term significantly explained auditory cortex activity during the overt speech conditions (controlled for SMA and hippocampal activity as main effects; T(22) = 4.98, P < 0.001, Cohen's d = 1.04), independent of Study (BF 10 = 0.41), suggesting that auditory cortex activity was correlated with the functional coupling between SMA and hippocampus during overt speech production (see Fig. 3 (B)). When repeating the physio-PI model for the listening blocks with and without noise masking, we found no significant hippocampus x SMA interaction term (T(22) = − 0.72, P = 0.48), again suggesting that the functional coupling between hippocampus and SMA was only correlated with auditory cortex activity when participants overtly named the visually presented objects. BF 10 for Study was 0.44. A mixed-effects ANCOVA revealed a significant difference between speech and listening physio-PI interaction terms (controlled for Study, F(1,21) = 5.38, P = 0.031, 2 = 0.20). Thus, merely seeing the objects while not being required to name them did not result in this functional interaction.

Whole brain physio-PI analysis
To explore the whole-brain functional network of the hippocampus × SMA functional coupling during PN0 and PNMax conditions, we regressed the voxel-level timeseries to the hippocampus x SMA physio-PI model (controlled for main effects of connectivity with hippocampus, SMA and task conditions). Areas that survived statistical thresholding included dorsal anterior cingulate cortex (dACC; x = − 15, y = 20, z = 28; size = 1053 mm 3 ) and cerebellar cortex (left: − 15, − 76, − 38; 1188 mm 3 ; right: 30, − 61, 64; 1107 mm 3 ; see Fig. 3 (C)). These areas have previously been reported to be involved in speech feedback processing (see Fig. 4. Control ROI results. ( A) Analysis of fusiform gyrus (FG) activity showed no significant difference between overt speech conditions. Interaction terms for the Psycho-PI (B) and Physio-PI (C) models were not significant. Grey (white) symbols indicate data of Study 1 (Study 2). Error bars indicate 95% confidence intervals.
Discussion for further information). A physio-PI model that included only listening conditions revealed no significant effects in these areas.

Posthoc ROI control analysis
To verify that the functional connectivity effects were regionally specific, we repeated the analyses for the fusiform gyrus (FG) of the extrastriate cortex as a control ROI. This region was selected because task the area showed no significant speech monitoring effect in Study 1 ( Christoffels et al., 2007 ). Also, the task stimuli did not include drawings of faces, which are known to activate a portion of the FG ( Kanwisher et al., 1997 ). We therefore expected no significant statistical effects of activity or functional connectivity in the FG. The FG ROI was defined from the same atlas as the SMA ROI ( Eickhoff et al., 2005 ), and using the same procedures to obtain two spheres of 6 mm radius each (left: x = − 31.3, y = − 38.0, z = − 23.3, k = 897 mm 3 ; right: x = 33.5, y = − 36.2, z = − 23.5, k = 907 mm 3 ). We restricted the control analyses to the picture naming conditions, and report Bayes Factor (BF 10 ) as evidence for the null hypothesis of no effect due to speech monitoring conditions. Fig. 4 shows the FG activity and functional connectivity results. We found no significant difference between PN0 and PNMax (controlled for Study; F(1,21) = 0.10, P = 0.76; BF 10 = 0.32). Analysis of whether the functional coupling between hippocampus and FG changed as a function of speech feedback conditions revealed no significant psycho-PI interaction term (T(22) = − 1.81, P = 0.084; BF 10 = 0.28). Likewise, analysis of whether the physio-PI model that represented the hippocampus x SMA functional coupling explained FG activity revealed no significant interaction term (T(22) = 0.59, P = 0.56; BF 10 = 0.88). These results thus indicate that the regional functional interactions with hippocampus are specific to the functional specialization with respect to the requirements of the speech conditions.

Discussion
We re-analysed the combined fMRI data collected during overt speech production of two studies and found evidence that human hippocampus is involved in monitoring of speech feedback. Our finding contrasts many older fMRI studies that reported virtually no evidence for a hippocampal involvement in overt picture naming (see reviews Indefrey and Levelt, 2000 ;Price, 2012 ), but is in line with more recent evidence for a role of the hippocampus in language processing and verbal communication ( Duff et al., 2008 ;Duff and Brown-Schmidt, 2012 ;Blank et al., 2016 ;Piai et al., 2016 ;Kepinska et al., 2018 ). Interestingly, while to our knowledge we are the first to show the hippocampal involvement in speech monitoring using fMRI, previous observations of language deficits of patient HM, of whom the bilateral MTL was surgically removed ( Scoville and Milner, 1957 ), already hinted to this involvement. Across a series of studies, Mackay and colleagues found that patient HM detected and corrected fewer self-produced language production errors compared to healthy controls or patients with non-hippocampal lesions, consistent with a deficit in speech monitoring ( MacKay et al., 1998 ;MacKay and Johnson, 2013 ). The authors further concluded that the type of errors pointed to a deficit in novelty processing, rather than a deficit in semantic memory, which resulted from the removal of the hippocampus and other MTL structures. Our findings support this postulation by showing the involvement of the hippocampus during online speech monitoring in healthy individuals.
Further, our observation of larger changes in hippocampal activity when monitoring of speech feedback becomes more challenging fits to other findings that the hippocampus contributes to mismatch processing, showing larger changes in hippocampal activity when encountering novel or unexpected events, compared to expected ones ( Kumaran and Maguire, 2006 ;Long et al., 2016 ). The mismatch process requires that a prediction of future sensory outcomes of speech acts is available. This postulation aligns very well to recent theoretical propositions that hippocampal processing underlies associative prediction of future events or outcomes ( Friston and Buzsáki, 2016 ;Stachenfeld et al., 2017 ). Further evidence for this suggestion comes from a neurophysiological recording study of hippocampal neurons in human epilepsy patients, which showed increased hippocampal oscillatory activity in the theta range (4-8 Hz) when patients read incrementally presented sentences that provided a strong contextual prediction of the final target word, compared to sentences that provided weak contextual predictions ( Piai et al., 2016 ). Thus, we suggest that our results endorse a "prediction ", rather than a "pure memory " account of hippocampal functioning in speech production. In turn, this view can also be applied to interpret previous findings of hippocampal involvement in learning new grammatic rules ( Kepinska et al., 2018 ) or new associations between speech sounds and concepts ( Breitenstein et al., 2005 ), where the novel rules or associations are represented by a mismatch to previously acquired knowledge.
Our finding of decreased hippocampal-auditory cortex connectivity when speech feedback is impaired, compared to unimpeded feedback, also fits to a "prediction " view of the hippocampus in speech production. Previous research has shown hippocampal-sensory functional coupling during formation of sensory predictions ( Buckner, 2010 ;Hindy et al., 2016 ;Kok and Turk-Browne, 2018 ). In speech production, the increase in auditory cortical activity when speech feedback is impeded or impaired may reflect an increase in error signal resulting from the mismatch between actual and expected sensory feedback ( Tourville et al., 2008 ), but could also reflect a release of inhibition or "neural cancellation ", analogous to the inverse of the mismatch signal ( Eliades and Wang, 2005 ;Christoffels et al., 2011 ). In both scenarios, the auditory representation during impaired feedback will deviate from the representation of the predicted feedback, which could weaken the functional integration with other brain areas that also contribute to the processing of that information ( Obleser et al., 2007 ;Nath and Beauchamp, 2011 ). The decrease in functional auditory-hippocampal coupling that we observed could reflect the increased discordance between auditory and hippocampal representations of sensory expectations during speaking when feedback is interrupted.
Further support for a hippocampal involvement comes from our finding that auditory cortical activity could be explained by a hippocampal-SMA interaction during speech production, but not during listening to one's own pre-recorded voice. Ample research has implicated SMA in speech control ( Abel et  monitoring may arise from the functional coupling between SMA and auditory cortex. Particularly, connectivity may increase when speech feedback is masked ( van de Ven et al., 2009 ), while lower connectivity may be related to more articulatory errors in speech disorders ( Botha et al., 2018 ). The SMA is a candidate region to provide a forward model of motor acts that is used to predict sensory consequences ( Wolpert et al., 1995 ). Our finding suggests that the hippocampus may contribute to mismatch processing by providing an associative mismatch signal that incorporates information about the predicted and actual sensory outcomes in varying acoustic contexts. Changes in hippocampal activity or connectivity could result from integrating the sensory prediction derived from the forward model with the perceived sensory consequences, which in turn may be used to update or augment the speech production process.
The whole-brain connectivity results, finally, suggest that a mismatch signal that is coded by the hippocampal-SMA interaction may be further processed by ACC and cerebellum, which have previously been reported to be involved in speech production tasks ( Hirano et al., 1997 ;Christoffels et al., 2007 ;Abel et al., 2009 ). The ACC has been associated to error monitoring ( Van Veen and Carter, 2002 ), which suggests that, during speaking, it may process the mismatch signal that is encoded within the hippocampal-SMA coupling. The role of the cerebellum in multi-sensory ( Naumer et al., 2010 ) and sensori-motor convergence ( Paulin, 1993 ;Blakemore et al., 1998 ;Brooks and Cullen, 2019 ) suggests that it contributes to the integration of speech feedback with expected sensory consequences of motor acts. Finally, connectivity between hippocampus and ACC has been related to mapping of contextdependent action-outcome representations ( Rolls, 2019 ). In all, these findings suggest that processing of mismatch between expected and actual sensory consequences of speaking entail interaction between motor and associative representations of speech production and auditory sensory representations of perceived speech feedback, while the degree of mismatch may be further processed in higher-order areas.
It should be noted here that participants made very few errors (less than 1% of the trials), which limits our inferences about the role of the hippocampus in speech monitoring. For example, it is possible that the hippocampal contribution to speech monitoring allowed task performance to remain very high, but experimental testing of this proposition would require a higher error rate. Further, while we endorse a "prediction " view of the hippocampus in speech production, we do not exclude any contribution of memory ( Klooster and Duff, 2015 ).
The key issue in our argumentation is that a "pure memory " account falls short of explaining changes in hippocampal activity when speech feedback is impaired. A further consideration is that the hippocampus has been associated to a growing collection of functions, of which some are related to processing of episodic memory ( Buckner, 2010 ;Peters and Büchel, 2010 ) while others process statistical regularities that are learned across multiple contexts ( Schapiro et al., 2012 ;van de Ven et al., 2020 ). In other words, the hippocampus may be involved in processing individual events as well as the contextual patterns that span across multiple events. This apparent contradiction can be resolved when considering that both episodic and statistical learning processes support or facilitate predictive processing in the hippocampus ( Schapiro et al., 2016 ), which is in line with our suggestion that the hippocampus supports speech monitoring through sensory predictive processes. Future studies are needed to further disentangle semantic memory from associative mismatch processing in speech production.
In conclusion, we showed that the hippocampus is involved in the online monitoring of the sensory consequences of our own speech acts. Our findings are in line with recent propositions that the hippocampus plays a role in language production and verbal communication ( Duff et al., 2008 ;Duff and Brown-Schmidt, 2012 ;MacKay and Johnson, 2013 ;Friston and Buzsáki, 2016 ;Piai et al., 2016 ), and show that the hippocampus may be involved in predictive processing of speech production, rather than or beyond contributing semantic memory representations of what is said.

Credit author statement
IC conceived and developed the original studies. VV analysed the data. VV, LW and IC wrote the manuscript.