Neural responses to Finnish inflected forms during overt and covert production: The role of stem frequency and stem allomorphy

Abstract Despite extensive behavioral research on complex word recognition, the neural mechanisms involved in the production of inflections in agglutinative languages, such as Finnish, are still poorly understood. Finnish inflected nouns typically involve morphophonological alternations of the stem (i.e. consonant gradation; CG), which is less common in other languages. Behavioral research on recognition of inflected nouns containing consonant gradation has shown that the number of stem allomorphs results in faster recognition times. To our knowledge, no functional magnetic resonance imaging (fMRI) studies have explicitly investigated consonant gradation in word production. In this study, participants performed covert and overt production tasks during event-related fMRI. Our stimuli comprised real word stems of high and medium frequency as well as pseudoword stems. The stems either included consonant gradation or were non-gradating. Our findings showed that the production of inflected forms containing high frequency stems or CG stems (irrespective of frequency and lexicality) yield enhanced activation of the left inferior frontal and middle frontal gyri (LIFG and MFG, respectively). This suggests that CG stems, that is, stems with more than one allomorph, facilitate lexical lookup, and that the activation of multiple stem allomorphs is reflected in increased recruitment of frontal brain regions.


Introduction
In this study we investigated neural substrates during the online production of inflected words. We focused on the question of whether lexical or grammatical rule operations are utilized during the inflection of words that undergo stem alteration (consonant gradation; CG). More specifically, we investigated how these lexical or grammatical processes manifest on the neural level. In addition to the whole brain analysis, we performed region of interest (ROI) analysis, which allowed for a theory-driven investigation of inflectional production.
In the field of neurolinguistics, the perception and production of morphologically complex words has been a topic of vigorous research. An example of morphological complexity is plural case inflection in English (e.g., 'shark + s'). Previous behavioral studies 2011; Pliatsikas et al., 2014;Slioussar et al., 2014), left putamen (Desai et al., 2006), and left and bilateral thalamus (Desai et al., 2006;Tyler et al., 2004). While the caudate nucleus of the basal ganglia has been associated with lexical operations, the putamen has been tied to the application of grammatical rules (Teichmann et al., 2008).
Lexical and grammatical operations underlying morphological processing at the brain level have also been explained by at least two main theoretical accounts. For instance, the Declarative-Procedural model (DP model;Ullman, 2001) proposes that while the lexicon is dependent on declarative memory, grammar is mostly governed by procedural memory. However, when unproductive (irregular) transformations are required, declarative memory is involved in such processing. According to the model, BA 44, basal ganglia (via the thalamus), and cerebellum are putatively the neural correlates of the procedural memory system, while the declarative memory system is associated with activity in BA 45, the hippocampus, as well as connected temporal and parietal neocortical regions (Pliatsikas et al., 2014;Ullman, 2001Ullman, , 2004Ullman, , 2016. The Core Decompositional Network model (CDM; Marslen-Wilson & Tyler, 2007) does not include explicit predictions of whether subcortical structures, that is, the basal ganglia, thalamus, or cerebellum are involved in grammar processing. The CDM model postulates that all regularly inflected words undergo parsing, which engages the left fronto-temporal subsystem (anterior cingulate cortex, LIFG, STG, left inferior parietal lobule, and middle temporal gyrus). While BA45 of the LIFG, in particular, is associated with morphological parsing, the left STG and left anterior cingulate cortex sub-serve the application of grammatical rules, whereas bi-hemispheric IFG activation purportedly reflects holistic processing (Marslen-Wilson & Tyler, 2007). While both models are largely based on English data, fMRI and PET findings on the Finnish language (Laine, Rinne, et al., 1999;Lehtonen et al., 2006) seem to be in line with the Core Decompositional Model. Some evidence shows, however, that other inflectionally rich languages, such as Polish, may store regular inflections holistically in the mental lexicon (Reid & Marslen-Wilson, 2000; (Bozic, Szlachta, & Marslen-Wilson, 2013).
To specifically account for Finnish inflection, the Stem Allomorph/Inflectional Decomposition (SAID) model was developed Niemi et al., 1994). The model postulates that during recognition, inflected words are decomposed into a free stem + suffix, or a bound stem allomorph + suffix. During production, this model proposes that inflected forms are represented in such a manner that the nominative stem and possible decomposed bound stem allomorph(s) are activated in unison (e.g., 'kauppa' and 'kaup-'). These stem allomorphs are then used as indexes in the mental lexicon prior to the semantic-syntactic integration of the morphological constituents (e.g., inessive case: 'kaupa-ssa' ['in the store']). However, the SAID model does not have explicit predictions for the neural correlates of either production or recognition of inflected forms.
Taken together, there is a need for more neurocognitive evidence on how inflected words, particularly those with CG stems, are produced in morphologically rich languages. Hence, in response to this need, we investigated brain mechanisms underlying covert and overt production of inflected words in Finnish.

The present study
In this study, we specifically examined how (1) CG affects operational strategies (lexical or grammatical) that are involved in inflectional production and how this is reflected in the brain (2) Word Status (i.e. combined word frequency and lexicality); modulates inflection and associated brain mechanisms. To this end, we included different stimulus categories: high frequency real word stems (hf), middle frequency (mf) real word stems, and pseudoword stems (p). All stems either demanded consonant gradation while inflecting (CG) or not (NoCG). Thus, combining Word Status with CG resulted in six types of stems (CG-hf, CG-mf, CG-p, NoCG-hf, NoCG-mf, NoCG-p). The task of the participant was to inflect the visually presented monomorphemic stems either covertly (85% of conditions) or overtly (15% of the conditions).
Based on prior experimental evidence as well as the DP and CDM models, we made several hypotheses regarding expected performance (error rates). Because high frequency stems by definition have a high level of everyday exposure, they should have a more robust representation in the mental lexicon compared to middle frequency stems and pseudoword stems (having no exposure at all). Hence, inflected forms with high frequency stems should be the least laborious to inflect, resulting in fewer errors during covert and overt production. In contrast, because pseudoword stems by definition are not as such represented in the mental lexicon , these stems should be the most laborious to inflect. Middle frequency stems might either cause error rates falling between these extremes, or show similar error rates to high frequency stems (Lehtonen et al., 2006).
With respect to CG, we hypothesized that inflecting the CG stems will be associated with more errors as compared to Non-CG stems, irrespective of Word Status. According to Nemeth et al. (2015), when inflecting words that undergo extra processing steps, that is, stem changes, one needs to retrieve the appropriate stem allomorph from the mental lexicon before an inflectional suffix can be added. This possible extra processing step should thus manifest as increased error rates in both covert and overt production.
At the brain level, we focused primarily upon regions such as the middle frontal gyrus (MFG), due to its importance in lexical word retrieval (for a review see Price, 2012). In addition, we expected that: (1) Word Status should influence activity in the BA45, MFG, the hippocampus, as well as connected temporal and parietal neocortical regions (Pliatsikas et al., 2014;Ullman, 2001Ullman, , 2004Ullman, , 2016). The DP model predicts that high frequency words should activate these brain regions more strongly than words with lower frequency and pseudowords. CG should activate BA45 (the retrieval of the correct allomorph), irrespective of Word Status. This is because CG words are associated with an extra processing step that involves retrieval of the correct allomorph stems (Nemeth et al., 2015); (2) Inflecting CG-pseudoword stems compared to NoCG-pseudoword stems might depend more heavily on procedural memory, because these words do not have lexical memory traces. Thus, according to the DP model, they should be associated with activation of the BA44 basal ganglia (via the thalamus) and cerebellum.
In contrast, the CDM (Marslen- Wilson & Tyler, 2007) does not include explicit predictions of whether subcortical structures, such as the basal ganglia, thalamus, or cerebellum are involved in morphological processing. This model also makes opposite predictions about the effect of Word Status in the BA45. According to the CDM, the BA45 supports stem decomposition and stem-lookup prior to suffix concatenation for lower frequency words and pseudowords. Thus, there should be an increased activation of the BA45 for pseudoword stems compared to high and medium frequency real word stems, because high and medium frequency items should mostly depend on full-form retrieval (Lehtonen et al., 2006;Lehtonen & Laine, 2003;Soveri et al., 2007;Vartiainen et al., 2009).

Ethical approval
This study was carried out according to the regulations and guidelines set forth by the Ethical principles of research in the Humanities and Social and Behavioral Sciences; conveyed by the University of Helsinki Ethical review board in Humanities and Social and Behavioral Sciences. All participants gave written informed consent in accordance with the Declaration of Helsinki.

Participants
Fifteen right-handed native speakers of Finnish, seven females, mean age 31.3, range 18-44 participated in the experiment. Handedness was assessed by a shortened Finnish version of the Edinburgh handedness inventory (Oldfield, 1971). There were no reports of neurological, psychiatric disorders, or alcohol/drug dependencies. All participants had normal or corrected-to-normal vision, and normal hearing. Each participant was given a consent form, MRI safety checklist in her/his native language.

Stimuli
Stimuli consisted of six stem types, of which half of them would require CG in the inessive case inflection: CG-p (pseudo), CG-mf (medium frequency), CG-hf (high frequency); stems that would not need CG: NoCG-p NoCG-mf, and NoCG-hf. To obtain the final list of CG-and NoCG-high and medium frequency stems (45 in each group), we used the Match program with an initial list of 270 words (see Van Casteren & Davis, 2007) to balance for: 1) length of words (in letters) for lemma and surface forms, 2) logarithmic corpus lemma frequencies within frequency categories, 3) gradation type within gradation categories 4) rated word frequency. We obtained lemma frequencies from the Language Bank of Finland's, Korp [Corpora: Finnish Tree Bank, 1990-2000 Luvun Suomalaisia Aikakausi-ja Sanomalehtiä, Suomi 24 2016 H2, KLK Suomenkieliset Lehdet (1980-2000] using 132 million tokens (Borin et al., 2012). In addition, we wanted to verify whether the corpus lemma frequencies would reflect everyday word usage (see Shtyrov et al., 2011;Tryk, 1968). To this end, we additionally estimated a rated word frequency for each lemma (based on ratings from 5 Finnish speaking subjects, not included in the experiment). The rated word frequencies had a high correlation with the corpus-based lemma frequencies (r = 0.533, n = 180, p = <0.001). Initially, we had aimed at having both high and low per million frequencies stems, but other factors such as lemma length, surface length (in letters), as well as matching across gradation, had to be carefully considered. Thus, in order to get two statistically different sets of words we had to make low frequency words slightly higher than typically used (see e.g., Lehtonen et al., 2006). Hence, they are more closely comparable with "medium" surface frequency (per million) than with low frequency. The inclusion of pseudoword stems into the stimuli design was strictly to investigate the online application of consonant gradation rules without possible confounding variables of high and medium frequency real word stems.

Procedure
In the experimental task (See Fig. 1), the participants were presented with a randomized series of the monomorphemic stems from each of the six word categories. Participants were asked to always silently generate the inflected form of the monomorphemic stems in the inessive case. In 85% of the cases, the silent generation was followed by an evaluation of the inflected form, where either a correctly or incorrectly inflected form would appear. The participant would press their right pointer finger on a response pad if the inflected word was the same as they silently produced, and their middle finger if the inflected form was different from the one they had generated silently. In 15% of the trials, the participants were prompted to overtly produce the inflected form. A mouth icon appearing after the monomorphemic stem word/pseudoword stem presentation would signal the overt production. Prior to the experiment, each participant was instructed to speak into a microphone (Optoacoustics two-way full duplex FOMRI, Or Yehuda, Israel) in a ventriloquist-like manner, i.e., as loudly and as clearly as possible, while limiting the use of facial muscles. The task consisting of 15% overt generation allowed us to ensure that the participant would always silently generate the inflected word during the given time period, as we randomized the appearance of the overt production prompt and its appearance was unbeknownst to the participants (for a similar method, see (Hut & Leminen, 2017). This kind of a paradigm allowed us to verify the accuracy of the inflected form generation, opposed to a fully silent generation task. Furthermore, due to only 15% of trials involving minimal facial movement, motion artifacts were not a significant factor, thus allowing us to include the overt speech trials into the fMRI analysis.
Stems were displayed via a Panasonic PT-DZ110XEJ projector (Osaka, Japan), on a back projection screen for 1.5 s (black text on grey background, Arial font, 24 pt) using a mirror, which was attached to the head coil. Thereafter a blank grey screen was displayed for an interval randomly chosen between 2 and 6 s, serving as the silent generation period. After this interval, in 85% of the trials the subjects were presented with either the correct or incorrect inflected form (2.5 s) to respond to (covert task). In 15% of the trials a mouth icon (overt task) was displayed for 2.5 s serving as a prompt that the subjects were to overtly produce the inflected word aloud. There was a randomly varying interval between 1 and 6 s of rest between the trials (inter trial interval; ITI), see Fig. 1. During each run, there were three 40-s breaks denoted by a hollow circle, allowing the participants' BOLD signal to abate. In between the two runs, 5 minutes of anatomical scans were completed, while the participant watched a silent, subtitled movie or closed their eyes. Fig. 1. Experimental Task. The task started with a word stem being displayed, in this case a non-gradated high frequency 'sana' (i.e. word), after that the subject covertly inflected the word in inessive case (2-6 s). After this either the correct or wrongly inflected form was displayed (85% of the cases) and the participant answered "correct/incorrect", or a mouth icon was displayed (15% of the trials, overt task) where after the participant overtly produced the inflected word.
The presentation of the stimuli was written and commanded by a script, in Presentation 18.1 (Neurobehavioral Systems, Albany, CA, United States). Each participant was in the MRI scanner for a combined time of approximately 75 min, which was split into two experimental task runs (~25 min each), as well as structural and diffusion tensor imaging (DTI, results reported elsewhere) scans.

Behavioral data acquisition and analysis
The accuracy of the covert button press responses was first analyzed by extracting any misses (no button press), and extremely fast reaction times (<100 m s). Any statistical operations were only carried out on the remaining trials. As the task was not designed to test how quickly participants responded, reaction times were not necessarily accurate, and thus we report only error rate data.
The accuracy of the participants' overt speech was evaluated offline for correctness by two independent Finnish-speaking raters. Their inter-rater reliability agreement (McHugh, 2012) was 98.8%, with an intraclass correlation coefficient (Pearson) of 0.83. The final error rate percentage of the overt spoken responses was determined after unintelligible and missed trials were excluded. One participant's overt data had to be excluded due to an MRI noise-filter malfunction, where the voice responses were indiscernible from scanner noise, however the participant successfully performed the task involving button presses, which comprised 85% of the experiment trials. The statistical analysis was carried out on the remaining trials.

fMRI data acquisition and analysis
Whole-brain functional and anatomical images were acquired using a 3T Siemens MAGNETOM Skyra MRI scanner (Siemens Healthcare, Erlangen, Germany) with Syngo MR D13C software and a 32-channel head-neck coil. Functional images were acquired using a T2 gradient-echo echo planar imaging (EPI) pulse sequence with 42 × 3.5 mm axial slices, with axial slices gap: 0. Data analysis was carried out in FSL version 5.0 (Jenkinson et al., 2012). Motion correction was completed using MCFLIRT (Jenkinson et al., 2002). The data quality, and possible artifacts of the motion from the overtly spoken trials were carefully checked, as they could have caused noise in the data. Image distortion was corrected using EPI unwarping (unwrap direction: y, EPI TE: 35 m s, EPI dwell time: 0.7 m s, signal loss threshold: 10%). Each image was spatially smoothed with a Gaussian kernel, with Full Width at Half Maximum (FWHM) of 9 mm (Pajula & Tohka, 2014); grand-mean intensity normalization of the entire 4D dataset was done by a single multiplicative factor; high-pass temporal filtering was achieved using Gaussian-weighted least-squares straight line fitting, with sigma equal to 30.0s.
The first level analysis was carried out using FSL's fMRI Expert Analysis Tool (FEAT) version 6.0. The six different word types (CGhf, CG-mf, CG-p; NoCG-hf, NoCG-mf, NoCG-p) were then modeled in a general linear model (GLM) as different explanatory variables (EVs) (see Supplementary Table 1). All participants' data was registered to MNI space using FLIRT (Greve et al., 2009;Jenkinson et al., 2001;Jenkinson et al., 2002). B0 unwarping and simultaneous registration was carried out with boundary-based registration (BBR) with 3 degrees of freedom to the initial structure.
Based on the first level statistics, we performed six, 2 x 2 repeated measures ANOVAs across subjects using FSL. The first-level statistics were modeled this way because FSL does not allow correction for sphericity that is necessary in repeated measures ANOVAs with more than two levels (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/GLM). The significance was determined by Z-transformed Fmaps in FEAT (version 6.0; FMRIB's Software Library, www.fmrib.ox.ac.uk/fsl), the cluster defining threshold was determined by Z > 2.3 and a cluster significance threshold of p being <0.05 (Worsley, 2001). The rendered images were made using Freesurfer (version 6.0, freesurfer.net), and the coordinates of the significant cortical regions reported were retrieved using Freesurfer's Freeview, and automatic cortical parcellation (Destrieux et al., 2010). The coordinates and Z-scores of the subcortical areas and areas of the cerebellum were retrieved with FSL's (version 5.0) FSLeyes.

Region of interest (ROI) analysis
Based on our a priori theory-driven assumptions and the significant main effects for the whole brain ANOVAs, we selected the BA44, BA45, and MFG as regions of interest (ROI). For each ROI, the mean percentage signal change against silent rest was extracted for the six conditions: CG-hf, CG-mf, CG-p; NoCG-hf, NoCG-mf, NoCG-p. The ROIs were further divided according to the Juelich histological atlas for the regions BA44, BA45 (Amunts et al., 1999); MNI Cerebellar Atlas in MNI-152 Space After Normalization with FLIRT was used for cerebellar regions Crus I, Crus II, VI, VIIb, VIIIA (Diedrichsen et al., 2009); and the Harvard-Oxford Subcortical Structural Atlas for the thalamus, caudate, putamen, and pallidum.

Behavioral data
Statistical analyses were performed in IBM SPSS Statistics, Version 25.0 (Armonk, NY: IBM Corp). Analysis of the error rates for the covert production task was performed with two-way repeated-measures ANOVA with the factors Gradation (NoCG, CG) and Word Status (pseudowords, real words with a high frequency, real words with a medium frequency). In case of violations of sphericity, Greenhouse-Geisser adjustments were performed (corrected degrees of freedom are reported). The ANOVA showed that there was a main effect of Gradation [F (1,14) = 57.24, p <. 0001, η 2 = 0.804] because CG stems elicited higher error response, as well as Word Status [F (1.458,20.416) = 56.81, p < 0.0001, η 2 = 0.802] because pseudoword stems yielded higher error rates compared to real word stems with a high and medium frequency. There was also an interaction between Consonant Gradation × Word Status [F (1.357,18.99) = 28.08, p < 0.0001, η 2 = 0.66], because the gradation effect was strongest for inflected forms with pseudo word stems. See Fig. 2 for post hoc pairwise comparisons (Bonferroni corrected).
For overt responses, there were only errors in the CG-mf condition (1.5%, SEM: 1.03 and the CG-p condition (12.08%, SEM: 2.84). The performance was error free in all other conditions.

fMRI results
For the whole brain analysis, the role of word status (frequency and lexicality) as well as gradation was investigated using six separate two-way ANOVAs (initial cluster defining threshold, Z = 2.3, and cluster probability being p < 0.05; Worsley, 2001). We observed significant main effects in the following ANOVAs for Consonant Gradation: NoCG-hf and NoCG-mf vs. CG-hf and CG-mf (Fig. 3a); NoCG-hf and NoCG-p vs. CG-hf and CG-p (Fig. 3c). These main effects were significant because CG-stems, regardless of their frequency or lexicality, yielded a greater response than NoCG-stems (see rendered images depicted in Fig. 3, and a complete list of the significant brain areas in Supplementary Table 1). We also observed significant main effects in the following ANOVAs for Word Status: NoCG-and CG-hf vs. NoCG-and CG-p (Fig. 3b); NoCG-and CG-mf vs. NoCG-and CG-p (Fig. 3d).

Regions of interest results
Three separate four-way ANOVAs were performed for the different ROIs. The first group consisted of frontal regions (BA44, BA45, MFG; Fig. 4); the second of subcortical areas (Caudate, Pallidum, Putamen, Thalamus; Supplementary Fig. 1A), and the third consisted of different subcomponents of the cerebellum (Crus I, Crus II, VI, VIIB, VIIIA; Supplementary Fig. 1B). The other factors in each ANOVA were Hemisphere (left, right), Consonant Gradation (CG, NoCG), and Word Status (high, medium, pseudo). All ANOVA results were Greenhouse-Geisser corrected when necessary.
[F (1.449,20.289) = 13.167, p = 0.001 η 2 = 0.485]. The Main effect of Consonant Gradation was due to a higher activation for CGstems versus NoCG-stems. The main effect of Word Status in the frontal ROI was due to a lower activation for pseudoword stems compared to stems with a medium frequency (mean differences: 0.071; p = 0.009), and stems with a high frequency (mean difference: 0.072; p = 0.002). An interaction between Region and Consonant Gradation was also significant [F (2,28) = 8.4, p = 0.001, η 2 = 0.375]. A post hoc test revealed significantly higher BOLD activations for CG-stems versus NoCG-stems in all three frontal regions: In the ANOVA for the cerebellar regions and the subcortical regions there were only significant main effects of Word Status [F (2, 28) = 9.9, p = 0.001, η 2 = 0.415; F (1.304,18.257) = 6.7, p = 0.013, η 2 = 0.325]. In both cases, this was due to a lower BOLD activation during covert and overt production of inflected pseudowords compared to real stems with a high frequency (subcortical mean difference: 0.043; p = 0.003; cerebellum mean differences: 0.060; p = 0.007), see Supplementary Fig. 1.

Discussion
We investigated brain mechanisms underlying covert and overt production of inflected words and pseudowords in Finnish. We included real word stems with a medium or high frequency, as well as pseudoword stems. When inflected in the inessive case, stems either underwent CG stem changes or did not. Our behavioral results on covert production demonstrated a significant increase in error rates for the production of inflected forms containing CG stems as compared to No-CG stems. Irrespective of CG, there were more errors when inflecting pseudoword stems compared to real word stems with a medium and high frequency. The only errors in overt production were made in the CG-mf and CG-p stem conditions. The lower error rate in the overt production task, compared to covert production, suggests that there might be an inner inspection of phonological speech errors, prior to the inflected form being uttered aloud. One explanation for increased errors in covert production is offered by the covert repair hypothesis (Postma & Kolk, 1993). According to this hypothesis, the covert production task could trigger purely lexical access, since there is no need for an internal phonological check for correctness, the way there is for overt production (Hartsuiker et al., 2005). This also explains an increase in error rates in covert production for CG stems, as this phenomenon is purely morphophonological in nature (Gehring & Heinzmann, 2011;Karlsson, 2008). Our ROI fMRI results illustrated that there was a significant increase in BOLD activation in BA44, BA45, and MFG when inflecting medium and high frequency stems, compared to pseudo stems. In the same three ROIs, there was a significant increase in BOLD activation for inflected word CG-stems, compared to NoCG-stems. In addition, subcortical and cerebellar regions showed higher activation for forms with high and medium frequency than for those with pseudo stems. Next, we will discuss each result in more detail.
According to the Stem Allomorph/Inflectional Decomposition (SAID) model Niemi et al., 1994), native speakers of Finnish utilize combinatorial processing for inflected words during both recognition and production, except for those of a very high frequency. We, however, did not find significant differences in error rates between high and medium frequency stems during covert production. Our results on the production of inflected forms suggest both high and medium frequent items are stored in their whole-forms in the output lexicon. This corroborates the findings by Lehtonen et al. (2006), who reported that the recognition of high and medium frequency stems was evidently performed in a holistic manner.
Moreover, observing decreased rates of errors for NoCG-stems suggests that inflection of NoCG-mf and NoCG-hf stems might activate the full-form. Increased error rates of CG-stems could, in turn, signify a mistake during stem allomorph lookup. Additionally, we observed an increase in error rates for CG-p stems, as compared to NoCG-p stems, suggesting application of combinatorial processes when producing forms with a pseudoword stem. Our results are in line with the work of Laine and colleagues (1999), where they observed increased error rates for inflected forms consisting of pseudoword stems and real suffixes, in comparison with monomorphemic pseudowords. We also suggest that the increase in error rates for inflections with CG-stems, compared to NoCG-stems, could be attributed to the need for online stem gradation, or the increased effort of activating the correct allomorph prior to suffix concatenation Niemi et al., 1994).
In line with our initial expectations, (c)overt production of inflections with high frequency stems elicited the highest BOLD

Fig. 4. Region of Interest.
Bar charts represent mean % signal change (±SEM) as compared to rest. The color-coded brain plots are used only for illustrative purposes to accompany the bar graphs. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.) activation in the LIFG. This also supports previous findings on recognition of pseudowords. For instance, in a visual lexical decision paradigm, Binder et al. (2003) reported that while pseudoword processing was behaviorally more taxing, the BOLD signal showed that the real words elicited stronger activations as compared to non-words. This pattern of activity has also been observed in neuromagnetic studies, where monomorphemic real words yielded stronger neural activation than pseudowords (MacGregor et al., 2012;Shtyrov et al., 2007). Thus, this lexicality effect observed in all three of our ROI areas (frontal, subcortical, and cerebellar), supports the assumption that these areas are involved in lexical access (Ullman, 2001(Ullman, , 2006, also in the output lexicon. Moreover, we observed that the subcortical and cerebellar structures were only sensitive to word frequency, but not to CG. Our word frequency finding is in line with electrophysiological studies (e.g., Alexandrov et al., 2011;Shtyrov et al., 2011), where stronger neural activation was interpreted as reflecting stronger memory traces for high frequency words, when compared to lower frequency words. In our study, pseudowords stems had the least amount of BOLD activation in the VIIB for both the CG and NoCG conditions. The VIIB has been commonly associated with working memory processes, and the creation of motor memory traces ( (Durisko & Fiez, 2010); (Stoodley & Schmahmann, 2009). Our finding suggests that VIIB might be associated with, for instance, articulatory rehearsal needed for speech production.
We could not fully align our neuroimaging results with any particular model. Recall that according to Ullman's (2016) DP model, BA45 is related to the lexical retrieval, whereas BA44 is related to rule-governed grammatical operations. The CDM, in turn, proposes that BA45 is associated with morphological decomposition (Marslen-Wilson & Tyler, 2007; see also (Schremm et al., 2018) ). We, however, did not observe significant differences in the activation of BA44 and BA45, when using stimuli to specifically target rule application (i.e., CG stems) and lexical retrieval (Word Status, i.e., word frequency and lexicality). Hence, we could not find concrete evidence for a functional specificity of these areas and this still requires closer scrutiny in the future.
Our prediction that there would be greater BOLD signal in the three ROI clusters for CG stems compared to No-CG stems was confirmed. Recent studies on the processing of Finnish allomorphy have examined factors of stem frequency and productivity (inflectional regularity and irregularity; Nikolaev et al., 2014;Nikolaev et al., 2018). In addition to stem frequency, the number of allomorphs a particular stem may have affected the processing. Namely, the recognition of inflected words is quicker for unproductive stems that have at least three allomorphs (e.g., water 'vesi': ves-, vet-, vete-, ved-), as compared to productive stems (e.g., plastic 'muovi': muovi-, muove-). Nikolaev and colleagues therefore postulated that a higher number of allomorphs facilitate recognition. This is because during lexical access, the allomorphs are unified, since no meaning is assigned until a suffix is attached. Thus, there might be a stronger or broader neuronal network for words with more stem allomorphs compared to words with less or no stem variants. While we did not specifically contrast productivity and unproductivity of allomorphs, we nevertheless witnessed stronger BOLD activity to stems that have more allomorphs (CG) than those that have fewer (NoCG). A significant effect of CG was observed in the frontal ROI, suggesting that MFG, BA44, and BA45 are involved in the allomorph retrieval.
Finally, according to the SAID model, during recognition, morphological decomposition is applied in all but highly frequent inflections, taking place before semantic integration of morphological constituents. We observed the increased BOLD signal for real word CG stems with high frequency and lower BOLD activation for CG pseudoword stems. Hence, our findings suggest that production of inflected forms is compositional even for highly frequent stems when inflection requires consonant gradation.

Summary and conclusions
Our findings suggest that the increased BOLD signal in the frontal, subcortical, and cerebellar ROIs may be related to lexical memory traces of stem or allomorph frequency. Our findings also suggest that there is a facilitation effect when producing inflected forms with CG stems. More specifically, there is an initial activation of all stem allomorphs and a higher number of allomorph variants for a stem would overall strengthen the memory trace of the stem. The allomorph selection is followed by choosing the proper stem allomorph and the concatenating it with the correct suffix. Furthermore, we propose that the inflections of CG stems are produced by activating all stem allomorphs, and this process is tied to a network of frontal areas (BA44, BA45, and MFG). To further investigate the effect of stem allomorphs in production of inflected forms, future research should also control for a number of allomorphs a stem possesses, as well as stem productivity.