Brain activation during non-habitual speech production: Revisiting the effects of simulated disfluencies in fluent speakers

Over the past decades, brain imaging studies in fluently speaking participants have greatly advanced our knowledge of the brain areas involved in speech production. In addition, complementary information has been provided by investigations of brain activation patterns associated with disordered speech. In the present study we specifically aimed to revisit and expand an earlier study by De Nil and colleagues, by investigating the effects of simulating disfluencies on the brain activation patterns of fluent speakers during overt and covert speech production. In contrast to the De Nil et al. study, the current findings show that the production of voluntary, self-generated disfluencies by fluent speakers resulted in increased recruitment and activation of brain areas involved in speech production. These areas show substantial overlap with the neural networks involved in motor sequence learning in general, and learning of speech production, in particular. The implications of these findings for the interpretation of brain imaging studies on disordered and non-habitual speech production are discussed.


Introduction
Brain imaging studies of fluent speakers have greatly advanced our knowledge of the brain areas involved in speech production [1][2][3][4][5]. Combined, these studies have shown that speech production is supported by a network of brain areas that includes the pre-and postcentral gyri, posterior inferior frontal gyri, medial and lateral premotor cortex, anterior insula, superior temporal gyri, posterior planum temporale region, basal ganglia and cerebellum [4]. These findings have led to the formulation and refinement of theoretical network models of speech production such as the DIVA (Directions into Velocities of Articulators) Model [6][7][8] and the Integrated State Feedback Control Model of Speech Control [9]. PLOS  In addition to studies on fluent speech production, investigations of brain activation patterns associated with disordered speech have provided complementary information on the neural correlates of speech production. However, interpretation of group differences in functional brain activation between control participants and those who experience speech disorders is often complicated because people with speech disorders often use spontaneous or treatment-induced speech modulations (e.g., speech rate changes) [10]. Therefore, activation patterns in people with speech disorders likely reflect not only the underlying dysfunction, but also any compensatory speech strategies or coping mechanisms used. Indeed, studies have shown significant interactions between observed brain activation patterns and speech task modulations, even in adults without speech disorders. In one such study, Riecker and colleagues [11] assessed the effects of speech rate changes on functional brain activation in 8 healthy controls. With this study, the authors aimed to increase our understanding of the neural basis of speech motor control and the neural mechanisms at play in different types of dysarthria. Spastic and ataxic dysarthria are associated with decreases in speech rate, while people with hypokinetic dysarthria may show increased speech rates. Speech rate changes are therefore often targeted in the treatment of dysarthric speech. The authors observed a linear change in BOLD response in speech-related brain regions including the supplementary motor area, the left anterior insula, bilateral thalamus, bilateral sensorimotor cortex, cerebellum and basal ganglia, when fluent-speaking adults were asked to speak at three different self-generated syllable rates. More recently, Marchina and colleagues [3] studied the effect of speech repetition rate on neural activation in 12 healthy controls to identify regions that can support recovery from speech disorders and to aid the development of adaptive treatment protocols for non-fluent aphasia, dysarthria, apraxia of speech and stuttering. They observed a significant linear increase in activation in the bilateral superior temporal areas with self-generated speech rate changes, suggesting that sensory feedback corresponds directly to task demands. Interestingly, bilateral activation changes in the speech motor regions were also identified, but these were less robust when a speaker was using a speech rate close to their habitual rate. They interpreted this as indicating that those close-to-habitual rate changes are likely highly practiced, and thus may require less additional regional motor support. In addition to the effects of speech rate changes in control participants, a number of studies have focused on the effects of fluencyinducing conditions on brain activation patterns in people who stutter. These studies, using fluency-inducing speech tasks such as choral speech, metronome-timed speech and altered auditory feedback, showed similar activation differences between people who stutter (PWS) and fluent speakers as identified during habitual speech. This includes overactivation of the vermal region of the cerebellum, supplementary motor area and insula, as well as decreased activation in the left-sided precentral gyrus [12][13][14], for a recent review, see [15]. At the same time, these tasks also resulted in increased activity in the bilateral superior temporal cortices in both PWS and fluent speakers. These findings contrast with the reduced activation in these areas typically seen during habitual speech production in PWS compared to their fluently speaking peers [16][17][18]. Based on the role of the superior temporal cortices in auditory feedback during speech production [1] and the findings on speech rate changes discussed above, it is unclear to what extent these changes may be associated with speech task modulations rather than increased fluency during these fluency-inducing conditions.
To dissociate the effects of task modulations from those related to increased fluency, De Nil and colleagues [19] investigated the effects of voluntary simulated disfluency (i.e., decreased fluency) on brain activation in PWS and fluent controls. Similar to previous studies comparing brain activation in PWS compared to fluent control participants [13,14,20], they reported that PWS showed less left superior temporal gyrus activation during habitual speech. Interestingly, a within-group comparison in PWS of simulated disfluency compared to habitual speech showed an increase in activation in the bilateral superior temporal, primary motor, premotor and inferior lateral prefrontal cortices, as well as in the left-sided insula and right-sided supramarginal gyrus during the simulated task. As this increase in activation implicated areas similar to those previously identified during habitual speech in PWS, the authors concluded that at least some of the functional overactivations reported previously when comparing stuttering and nonstuttering adults may have reflected between-group differences in the level of automaticity, effort and attention present during speech production. However, in contrast to the stuttering participants, no significant activation differences were observed for the within-group comparisons between the simulated disfluency and habitual speech tasks in nonstuttering participants in this study. This was somewhat surprising, because if the observed activation increases were indeed associated with the effort and attention required to produce atypical speech, similar patterns of increased activation were expected to occur in the control participants. This interpretation is further supported by the findings on speech rate changes and fluency-inducing conditions in fluent speakers discussed above [3,11,17,18]. One of the potential explanations for this unexpected finding discussed in the De Nil et al. [19] paper was that activation changes in the fluent speakers may have been less pronounced and remained sub-threshold because of methodological reasons (use of a lower field strength (1.5T) scanner, sparse scanning, and use of auditory stimuli).
Because a thorough understanding of the observed neural activation patterns in fluent speakers is crucial for the correct interpretation of imaging findings of typical and disordered speech production, we report here on an fMRI study aimed at further investigating the effects of simulating disfluencies on the brain activation patterns of fluently speaking participants. Based on the results from De Nil et al. [19] in PWS, and the results from studies on non-habitual speech production in fluent speakers [3,11,17,18], we hypothesized that asking speakers to simulate speech disfluencies would lead to increased demands placed on the neural network involved in speech production [1,21], resulting in overactivation in the bilateral superior temporal gyri and other sensorimotor areas involved in speech production, even in fluently speaking individuals. To overcome some of the limitations of the previous study investigating simulated disfluency [19], we designed a paradigm using visual instead of auditory stimuli and used a scanner with higher field strength (3T) to increase signal detection. In addition, we included a pseudoword reading task because such words have been found to increase disfluency in PWS [22] and nonword repetition tasks resulted in differences in speech motor dynamics in PWS [23,24]. As a result, we hypothesized that simulating disfluency during pseudoword versus word reading would further increase speech effort, resulting in an additional increase in activation in brain areas involved in speech production. In addition, participants were asked to produce speech overtly as well as covertly, since it has been shown that using overt compared to covert conditions leads to differences in brain responses [1,25].

Participants
Eleven volunteers (5 females, 6 males) without a history of speech, language or neurological problems participated in the study. Their age ranged between 24-60 years, with an average of 40 years. All participants were right-handed, as assessed with the Dutch Handedness Inventory [26,27]. The study was conducted with ethical approval of the University Hospitals Leuven and all participants provided written informed consent in accordance with the Declaration of Helsinki prior to their participation.

Functional brain imaging
Magnetic resonance imaging data were acquired using a 3T MR system (INTERA, Philips Medical Systems, Best, The Netherlands) with an 8-channel phased-array head coil. For functional imaging, a T2 � -weighted single shot gradient echo-echo planar imaging (GE-EPI) sequence was used with an echo time (TE) and repetition time (TR) of 33ms and 3000ms, respectively. The image acquisition matrix was 80×80, with a field of view (FOV) of 230×230mm 2 . A sensitivity encoding reduction factor (SENSE) of 2 was used in the anteriorposterior direction. Thirty-five contiguous transversal slices of 4mm thickness each were acquired with a flip angle of 90˚, resulting in a voxel size of 2.9×2.9×4mm 3 .
Participants were scanned using a block design paradigm in which one run consisted of 9 epochs of 10 trials each. Epochs were presented in sets of three, with the first epoch consisting of 10 character strings (baseline condition), the second of 10 words and the third of 10 pseudowords. A new stimulus was randomly presented every 3 seconds, which resulted in a total duration of 30 seconds per epoch. Word stimuli were 88 high frequency Dutch nouns [28], including 40 one-syllable, 36 two-syllable (e.g., vlin-der [butterfly]) and 12 three-syllable words (e.g., ka-bou-ter [gnome]). Pseudowords and character strings were custom created. Pseudowords were constructed by replacing consonants and vocals in the high-frequency words by other graphemes, thereby retaining the original consonant-vocal structure of the word, resulting in non-existent but readable Dutch words (e.g., vlinder ! klimder, kabouter ! fimouter). Strings of characters were constructed by replacing each letter in a word by a pre-defined character (e.g., ?-{.# (= ).
The stimuli were presented visually and participants were instructed to either passively look at strings of non-sense characters (baseline condition) or to read the (pseudo)words according to one of the following instructions: a. read silently (covert habitual speech production); b. read out loud (overt habitual speech production); c. read silently while repeating the first letter of the word multiple times (covert simulated disfluency); d. read out loud while repeating the first letter of the word multiple times (overt simulated disfluency).
No instructions were given regarding the number of letter repetitions that were required, and repetitions were self-paced. Participants practiced all tasks prior to the scanning session. Each condition was presented twice in randomized order, which resulted in a total of 8 runs presented to each participant.

Data analysis
CAT12 (r1278, http://www.neuro.uni-jena.de/cat/), a toolbox of SPM12 (v7219, http://www. fil.ion.ucl.ac.uk/spm/), running in Matlab 9.3 (R2017b), was used to process T1-weighted structural images. Images were bias corrected, spatially normalized via DARTEL (using the MNI-registered template provided within CAT12), modulated to compensate for the effect of spatial normalization, and classified into gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF), all within the same generative model [29]. Using SPM12, the functional EPI images were realigned and the mean functional image was coregistered to the T1-weighted image. Next, all images were warped into MNI space using the MRI-derived deformation fields [30]. The normalized EPI images were spatially smoothed with a Gaussian kernel of 6 mm fullwidth half-maximum. For display purposes, a study-specific mean T1-weighted image was created by averaging all normalized T1-weighted images; all functional results were overlaid on this image.
First-level single subject analysis was performed by modeling the different stimuli (characters, words and pseudowords) using a boxcar function convolved with the hemodynamic response function using the general linear model. Motion parameters were added in the general linear model as covariates of non-interest for the statistical analysis. Individual statistical parametric maps were generated for words and pseudowords, and for overt and covert reading, for the following contrasts: 'habitual speech production versus passively looking at characters', and 'simulated disfluency versus passively looking at characters'. These contrast images were subsequently entered in a second-level group analysis.
The second level group analysis was modelled as a 3-way ANOVA with the following factors: task (habitual speech production or simulated disfluency), mode (covert or overt), and stimulus (words or pseudowords). Next, t-tests were run to investigate the specific contrasts of interest. All statistical analyses were considered significant at an uncorrected voxel-level threshold of p < .001, with a cluster-level FWE corrected threshold of p < .05. Anatomical localization of the peak activations was determined using Automated Anatomical Labeling (AAL) and the Harvard-Oxford cortical and sub-cortical atlases in FSLeyes (https://fsl.fmrib. ox.ac.uk/fsl/fslwiki/FSLeyes). The processed data files are available from https://osf.io/zc9xk/.

Habitual versus non-habitual speech production
For reference purposes, habitual speech production was compared to the baseline task of passively looking at characters. This resulted in large clusters of bilateral activation in the supplementary motor area (SMA), pre-and postcentral gyri, superior temporal gyri, bilateral thalamus and cerebellum. In addition, left-sided areas of significant activation were present in the fusiform gyrus, amygdala, hippocampus, and inferior frontal gyrus (pars triangularis). See Fig 1 (red color) and S1 Table. Comparison of the simulated disfluencies with the habitual speech task ('simulated disfluency > habitual speech' (1)), showed an increase in activation in clusters including the bilateral SMA, pre-and postcentral gyri and inferior partietal cortex. Additional activation was left-lateralized in the superior temporal gyrus and right-lateralized in the superior cerebellum (see Fig 1 and Table 1).
The 'task x mode' interaction effect contrast showed one significant cluster of activation (FWE-corrected cluster p-value = 0.006, cluster extent = 186 voxels). This left-sided cluster had local maxima in Heschl's gyrus [-55, -12, 3], the superior temporal gyrus [-59, -2, -2] and the temporal pole [-57, 7, -8]. Due to the 'task × mode' interaction effect, the effect of non-habitual speech production was assessed separately during covert and overt speech production. For reference purposes, the results on covert habitual speech production compared to the baseline are presented in S2 Table and Table 2).
During overt habitual speech production compared to the baseline, a large network of areas was activated. These are shown in S3 Table and Fig 3 (red color). The overt simulated disfluency task resulted in significant increased activations compared to overt habitual speech production. These increases were stronger than in the covert simulated disfluency task and (1) contrast results. For visualization purposes, these areas are overlaid on the 'habitual speech > baseline' contrast. Areas of red color indicate habitual speech versus baseline, and green areas indicate simulated disfluencies compared to habitual speech. Yellow indicates areas of overlap. Results were corrected for multiple comparisons using clusterwise FWE (p<0.05), displayed on axial slices of the study specific normalized T1 template in MNI space (right of the image is left of brain). Slices displayed: z = - 28, 9, 25, 41, 51,  Height threshold of p < 0.001 uncorrected, and cluster-based FWE-corrected p < 0.05 across the whole brain (cluster level threshold = 186 voxels). R = right; L = left.

Overt versus covert speech production
Comparison of overt to covert speech production resulted in large clusters of bilateral activation, including the pre-and postcentral and superior temporal gyri, cerebellum, thalamus, amygdala and hippocampus ('overt speech > covert speech' (2), Table 4).

Discussion
Speech task modulations, aimed at altering a person's habitual speech pattern, give us an insight into the neural functioning of the speech network in healthy control speakers. Such task modulations are also important tools used in the assessment and treatment of speech disorders such as dysarthria, apraxia of speech and stuttering. Brain imaging studies focusing on both fluent and disordered speech have shown that non-habitual task modulations resulted in increased activation in speech-related areas [3,11,17,18]. This effect was also seen in PWS when they were asked to speak in a dysfluent manner (i.e., simulate disfluencies), but no difference in activation was present in their fluent control group [19]. Therefore, the present study aimed specifically to revisit and expand the De Nil et al [19] study, by investigating the effects  of simulating disfluencies on the brain activation patterns of fluent speakers during overt and covert speech production. In contrast to that study, the current findings showed that the production of voluntary, self-generated disfluencies by fluent speakers did result in increased recruitment and activation of brain areas involved in speech production. In addition, the nonhabitual compared to the habitual speech task demands interacted with whether speech was produced overtly or covertly. Main effects of overt compared to covert speech production and the use of pseudowords compared to words were also identified.

Habitual versus non-habitual speech production
When habitual speech production was compared to the baseline condition across both speaking modes (covert and overt, see Fig 1 and S1 Table), bilateral increases in activation were present in well-established speech-related areas [1,4], with clusters including the supplementary motor area, precentral, postcentral and superior temporal gyrus, cerebellum and thalamus. In addition, significant left-sided activation was present in the left inferior frontal gyrus (pars triangularis), hippocampus, amygdala and fusiform gyrus. The latter, known as the visual word form area, is responsible for visual word recognition [31], and activation could be anticipated during the visual word reading task used in this study [32]. Compared to habitual speech production, using a non-habitual speech pattern by simulating disfluencies resulted in increased SMA activation, independent of whether speech was  Height threshold of p < 0.001 uncorrected, and cluster-based FWR-corrected p < 0.05 across the whole brain (cluster level threshold = 170 voxels). R = right; L = left.
produced overtly or covertly (see Fig 1 and Table 1). Given that participants were asked to voluntarily repeat the first letter of the word multiple times as part of uttering the word, this activation is consistent with the view that the SMA is involved in the planning and initiation of speech, providing a starting mechanism for speech production [11,33,34]. Because producing multiple repetitions of the first letter of a word prior to completing the rest of the word is a highly atypical speech pattern for fluent speakers, the observed increase may have resulted from greater effort, and additional needed resources, associated with the timing and repeated initiation of the upcoming verbal utterance, including the provision of a syllabic frame for the speech signal, functions typically attributed to the SMA [4,35,36]. This interpretation is supported by observations of an increase in effective connectivity between the pre-SMA and dorsal premotor cortex as a result of an increased load on sequencing of motor plans when novel (i.e., non-habitual) combinations of syllables needed to be produced [37]. In addition to increased SMA activation, analysis of the main effect of simulated disfluencies compared to habitual speech showed that extra neural resources associated with the production of novel motor sequences were recruited [38][39][40][41]. This suggests that producing a novel non-habitual motor sequence required additional control of ongoing movements and error correction, similar to the activations seen when participants are repeatedly producing a new motor task [38][39][40][41]. The cerebellum has an important role in this process. For instance, in a study focused on transfer learning of novel motor sequences, more successful learning was associated with increased activity in the superior cerebellum as well as in the left dorsal premotor cortex, extending into the pre-SMA [42]. These same areas also showed increased activation in the current task contrast. Furthermore, interactions between the cerebellum and primary motor area also appear to be crucial for motor sequence learning and are likely to influence the final representation of the sequence in the primary motor area [40]. Besides production of novel motor sequences in general, the findings of the current study concur with those specifically focusing on speech production. When producing speech, communication between areas involved in motor, auditory and somatosensory processing through feedback and feedforward mechanisms is crucial [6,21]. The projections from premotor to primary motor cortex, as well as cerebellar projections are involved in learning and maintaining feedforward commands for the overt production of syllables according to the DIVAmodel of speech production [6,7]. Auditory and somatosensory information processed in the superior temporal and inferior parietal cortex plays an essential role in maintaining the speech feedback control system. Likewise, in the Dual-Stream Model of Speech Processing [21], connections through the Dorsal Auditory-Motor Stream have been regarded as important for speech development and producing new sequences, with the latter continuing to be important in adults. This dorsal stream was further detailed in the Integrated State Feedback Control Model of Speech Control [9], which suggests that the sensorimotor integration in speech relies on a neural network including the superior temporal sulcus and gyrus, posterior planum temporale, ventral and dorsolateral regions of the premotor cortex, and the cerebellum. Our current findings of non-habitual compared to habitual speech production map nicely onto these theoretical models. In addition, a study on increased sequence complexity during syllable production led to the same increases in SMA and parietal cortex bilaterally, while activation was strongly left-lateralized in the pre-and postcentral gyri and right-lateralized in the cerebellum during overt speech production [1]. Together, this shows that speaking in a non-habitual manner by simulating disfluencies increases the demands on the feedforward and feedback systems involved in speech production.
The production of non-habitual overt speech resulted in increased left-sided auditory activation. Although auditory feedback was not manipulated experimentally in the present study, in the DIVA model auditory feedback has an important function in the development and maintenance of motor plans necessary for speech production [8]. As one source of evidence for this role, studies inducing changes in auditory feedback showed increased activation bilaterally in the superior temporal gyri [8,43]. The increased auditory activation observed during the simulated disfluency task in the current study might therefore result from the increased importance of auditory feedback when producing overt speech in a new, non-habitual manner. Alternatively, the increased activation seen in the left-sided auditory cortex may reflect at least in part an increase in auditory input associated with the longer duration of speech production in the non-habitual speech condition. Future studies specifically designed to tease apart the effects of increased duration, repeated delivery of fast temporal information (i.e., repeated consonants at word onset) and increased reliance on feedback on auditory processing are necessary to allow a more in-depth interpretation of the superior temporal cortex results.
Increased activation in the dorsolateral premotor cortex as well as in the inferior parietal cortex suggests that the production of simulated sound repetitions, compared to habitual speech production, required additional resources involved in cognitive manipulations necessary to integrate somatosensory feedback with other information in the speech production system [44]. Furthermore, the parietal cortex is known to be connected to the SMA and dorsal premotor cortex through the superior longitudinal fasciculus (SLF I), which contributes to the regulation of higher aspects of motor behavior [45,46]. Interestingly, the parietal cortex also plays a role in speech motor programming, with lesions resulting in apraxia of speech [47]. Increased activation in the left parietal cortex has previously been found during both overt and covert comparisons of multisyllabic and monosyllabic words [25]. As the participants were required to utter a longer utterance in the simulated disfluency condition, a length effect may have contributed to increased demands on the speech motor programming. These findings suggest that the production of simulated disfluencies places higher demands not only on systems that are purely related to primary motor and auditory processes in speech production but also on areas important for sensorimotor integration and programming of new speech motor sequences. It would be interesting if future studies could tease the complex interactions between different conditions and stimulus effects on neural activity patterns further apart.
Overall, these findings show that asking participants to speak in a non-habitual manner by simulating disfluencies resulted in increased activation in the neural network of speech production. While no such differences between simulating disfluencies and habitual speech production were identified in a previous study in fluent speakers [19], the observed changes in brain activation in the current study are in line with those provided by other studies on speech modifications. Marchina et al. [3] reported activation changes in the sensorimotor cortex with speech rate changes, with a linear relation between brain activation and speech rate changes only identified in the auditory cortex. This has been interpreted to reflect a direct link between sensory feedback and task demands of non-habitual speech production. The linear relationship between self-paced syllable repetition rates and increased activation in the SMA, left anterior insula, bilateral thalamus, bilateral sensorimotor cortex and cerebellum in Riecker et al. [11]'s study also provides further evidence for the role of this network in non-habitual speech production. This suggests a shift from the use of more efficient 'automated' to more elaborate 'novel production' networks when people are asked to produce speech in a non-habitual manner.

Simulated disfluencies versus stuttering
While the deliberate production of sound repetitions by fluent speakers in our study clearly was a simplification of the complex phenomenon of stuttered speech production in PWS, the activation patterns during the production of simulated disfluency in fluent participants showed interesting parallels to those associated with habitual speech production in PWS compared to controls [12][13][14]20]. This finding suggests that the increased activation found in the pre-and primary motor cortices, SMA and cerebellum in PWS may reflect, at least in part, differences in the level of automaticity, attention and effort required for speech production, as has previously been suggested by De Nil et al. [19].
In contrast, the increased activation in the auditory cortex in the present study with fluent speakers is opposite to the decrease in auditory cortex activation previously observed during habitual speech production in PWS [16,48,49]. However, when asked to simulate dysfluencies, PWS did show an increase in activation in the bilateral superior temporal gyri [19]. Studies investigating the effect of fluency-enhancing conditions have also typically found an increase in activation in the superior temporal gyri in both PWS and controls. This effect seen during choral speech, paced speech, automatic speech and singing has typically been associated with an increase in fluency during such speech conditions [16,17,[48][49][50]. However, this interpretation cannot explain the presence of similar increases in auditory cortex activation in fluent control participants [16,17,[48][49][50]. Rather than being associated with an increase in speech fluency, it is possible that the increases in fMRI BOLD signal reported during fluencyenhancing conditions reflect the increased need for auditory monitoring when producing speech in a new, non-habitual manner. Thus, some of the activation differences found in the superior temporal gyri in previous studies examining the effects of fluency enhancing speech tasks could have been associated with altered monitoring effort during speech production in addition to other possible influences such as the effect of presenting external auditory pacing cues [17,51]. In addition, the present results challenge the hypothesis that the absence of significant activation in the superior temporal gyri during habitual speech production in PWS occurs because the repetitions of sounds and syllables in PWS lead to the repeated delivery of the perceptual prediction of the speech sounds to the auditory system, functioning as an inhibitory signal attenuating the activation [13]. In the present study, the introduction of repetitions in the speech did not result in an attenuation of the activity in the superior temporal cortex, but rather an increase in activity in this brain region. This makes it unlikely that the absence of increased activation in the auditory cortices in PWS can be explained as the result of repeatedly receiving the same prediction of auditory input.

Overt versus covert production
In addition to the effect of the non-habitual simulated disfluency task, a main effect of the mode of speech production was present in the current study. During overt habitual speech production compared to the baseline (see Fig 3 and S3 Table), largely the same pattern of activation was found as in the combined habitual speech compared to baseline contrast (see Fig 1 and S1 Table). During the covert habitual speech task, however, activation was more restricted (see Fig 2 and S2 Table). A direct comparison between the overt and covert speech task revealed large differences between both modes of production, overlapping with the well-established 'minimal network of speech production' (see Table 4). The increases in bilateral activation are consistent with requirement to control the articulators and process sensory feedback when producing speech in an overt manner, as found in previous studies [1,25].
As discussed earlier, our findings on non-habitual speech production suggest that producing simulated disfluencies resulted in increased task demands. This is evidenced by an associated increase in medial premotor cortex activation during covert speech production. This increased reliance on the SMA fits precisely with the non-habitual task requirements on sequencing, timing and initiation of speech. The same task during overt speech production also resulted in increased activation in this area, while an additional larger network of motor and sensory areas also showed increased activation. Furthermore, a significant interaction effect between task and mode of production was present in the left auditory cortex. The interaction between the speech task and the covert and overt mode of production highlights two important issues. Firstly, covert speech cannot be used as a substitute for overt speech when investigating neural activation during speech production. Secondly, studies involving overt speech production tasks are needed to gain a comprehensive understanding of the neural basis underlying habitual and non-habitual speech production [1,25].

Word versus pseudoword production
While we also hypothesized that pseudoword versus words production would result in increased demands on the speech system, a significant increase in activation was only present in one right-sided occipital cluster during this comparison. Similarly, in a study specifically focusing on orthographic processing, Tagamets and colleagues [52] found more activation in the right-sided occipito-temporal area in their pseudoword versus word contrast, while they did not find a left-sided difference in this same region. Vigneau and colleagues [53], using a region of interest-based approach to study the function of the visual word form area, also found no difference between pseudowords and words in the left hemisphere, while there was a trend for the right homologue of the visual word form area to be recruited more by pseudowords than by words, and a significant increase in right-sided activation in the nearby inferior occipital gyrus region of interest [53]. However, our study did not identify the additional differences in activation in the left frontal operculum and right cerebellum as shown in a previous study on pseudoword reading [54]. As that study used a covert habitual pseudoword reading task, it is possible that any effect of a further increase in effort due to pseudoword reading in our study was masked by the high level of additional activation already required to overtly produce speech and simulate disfluencies during word reading.

Limitations
While the sample size of the current study is consistent with that of previous studies on this topic [3,11], the small sample size is a clear limitation of this study. Replication of our findings in future studies is needed, as is the collection of behavioral data during the MRI acquisition to allow for a more direct comparison of the patterns of brain activation with the behavioral responses in future work.

Conclusion
The current study aimed to enhance our understanding of the effects of the production of non-habitual speech by asking fluent participants to simulate disfluencies. The results showed that the deliberate production of disfluencies, whether overtly or covertly, resulted in increased brain activation compared to habitual speech production. While increases during covert speech production were restricted to the supplementary motor area, those during overt speech production showed substantial overlap with networks involved in the production of novel motor sequences and of speech. As discussed, these findings have a number of implications for the interpretation of differences in brain functioning identified between people with speech disorders and control speakers, and for the interpretation of the effects of non-habitual speech tasks. An accurate and comprehensive understanding of the neural networks supporting habitual speech production, and those supporting speech modulations, helps to shed light on the effect of such modulations when used in the diagnosis and treatment of speech disorders.
Supporting information S1 Table. Results on habitual speech compared to baseline. Height threshold of p < 0.001 uncorrected, and cluster-based FWE-corrected p < 0.05 across the whole brain (threshold = 149 voxels). R = right; L = left. (DOCX) S2 Table. Results of covert habitual speech compared to baseline activation. Height threshold of p < 0.001 uncorrected, and cluster-based FWE-corrected p < 0.05 across the whole brain (threshold = 208 voxels). R = right; L = left. (DOCX) S3 Table. Results of overt habitual speech compared to baseline activation. Height threshold of p < 0.001 uncorrected, and cluster-based FWE-corrected p < 0.05 across the whole brain (threshold = 138 voxels). R = right; L = left.