Sensorimotor contributions to working memory differ between the discrimination of Same and Different syllable pairs

Sensorimotor activity during speech perception is both pervasive and highly variable, changing as a function of the cognitive demands imposed by the task. The purpose of the current study was to evaluate whether the discrimination of Same (matched) and Different (unmatched) syllable pairs elicit different patterns of sensorimotor activity as stimuli are processed in working memory. Raw EEG data recorded from 42 participants were decomposed with independent component analysis to identify bilateral sensorimotor mu rhythms from 36 subjects. Time frequency decomposition of mu rhythms revealed concurrent event related desynchronization (ERD) in alpha and beta frequency bands across the peri- and post-stimulus time periods, which were interpreted as evidence of sensorimotor contributions to working memory encoding and maintenance. Left hemisphere alpha/beta ERD was stronger in Different trials than Same trials during the post-stimulus period, while right hemisphere alpha/beta ERD was stronger in Same trials than Different trials. A between-hemispheres contrast revealed no differences during Same trials, while post-stimulus alpha/beta ERD was stronger in the left hemisphere than the right during Different trials. Results were interpreted to suggest that predictive coding mechanisms lead to repetition suppression effects in Same trials. Mismatches arising from predictive coding mechanisms in Different trials shift subsequent working memory processing to the speech-dominant left hemisphere. Findings clarify how sensorimotor activity differentially supports working memory encoding and maintenance stages during speech discrimination tasks and have potential to inform sensorimotor models of speech perception and working memory.


Introduction
Anterior portions of the dorsal stream (i.e., motor/premotor cortices), responsible for linking sound to action (Hickok and Poeppel, 2000, reliably activate during speech perception (Callan et al., 2006(Callan et al., , 2010Skipper et al., 2007;Osnes et al., 2011;Bowers et al., 2014;Oliveira et al., 2021). Despite the preponderance of studies demonstrating this sensorimotor activity, its functional role remains somewhat unclear, though it appears to be related to the cognitive demands of the perceptual tasks (Deng et al., 2012;Peschke et al., 2012;Wostmann et al., 2017). For example, tasks requiring active discrimination of phonemes typically elicit stronger sensorimotor activity than those requiring only passive listening (Meister et al., 2007;Bowers et al., 2013;Alho et al., 2014). In these classic discrimination tasks, pairs of stimuli are held in working memory whilst a same/different discrimination is made. Time-sensitive neuroimaging studies (Bowers et al., 2013;Jenson et al., 2014a) have demonstrated that the greatest amount of sensorimotor activity elicited by these tasks is found immediately following offset of the second stimulus; findings which are consistent with a strong role for sensorimotor contributions to working memory (Burton et al., 2000;Sato et al., 2009;Hickok et al., 2011).
Within speech perception tasks that invoke working memory, a number of variables have been identified that influence the degree of sensorimotor contribution. These variables include signal clarity (Osnes et al., 2011;Jenson et al., 2019b), stimulus complexity (Hakonen et al., 2016), processing demands (Scharinger et al., 2017) and maintenance load (Pesonen et al., 2006;Wilsch and Obleser, 2016). The purpose of the current study is to probe the influence of another variable on sensorimotor contributions to working memory. Typically, in classic syllable discrimination studies, data from all correctly discriminated trials are aggregated for analysis without the consideration that matched (i.e., same) pairs of stimuli may be processed within working memory somewhat differently than unmatched (i.e., different) pairs. While there is evidence from the visual domain to suggest differential working memory processing of matched and unmatched stimulus pairs (Engel and Wang, 2011), to date this phenomenon remains unexplored in speech discrimination. To probe how sensorimotor contributions to working memory may differ on the basis of trial type (i.e., same vs. different), it is first necessary to consider how sensorimotor activity supports speech discrimination.
Working memory involvement in speech discrimination is comprised of distinct phases, operating in concert to allow stimuli to be retained and processed. The first stage involves the extraction of a phonological form (i.e., articulatory code) from the available sensory trace (Jacquemot and Scott, 2006), a process deemed essential as the original auditory signal is subject to rapid decay (Wilsch and Obleser, 2016). Sensorimotor activity during this stage is mediated by the cognitive effort required to extract a phonological form, reflected in processing load effects over fronto-central regions (Scharinger et al., 2017). Following extraction of a phonological representation, covert articulatory processes serve to refresh working memory contents (Wilson, 2001;Buchsbaum et al., 2005), akin to Baddeley (2003)'s phonological loop. Sensorimotor contributions to covert articulatory processes are demonstrated by load effects over anterior dorsal stream regions during the maintenance phase of working memory (Woodward et al., 2006;Perrachione et al., 2017). While sensorimotor activity across the anterior dorsal stream clearly supports multiple components of working memory, it is now essential to consider how these processes may be impacted by trial type.
Activation of anterior sensorimotor regions during speech perception demonstrates somatotopic specificity (Pulvermuller et al., 2006;Skipper et al., 2007;Bartoli et al., 2016), such that perception of different speech tokens activates distinct neuronal populations. An important consequence of this sensorimotor somatotopy is that unmatched trials activate two distinct patches of cortex in the anterior dorsal stream, while matched trials lead to the recurrent activation of a single cortical patch. This raises the possibility that anterior sensorimotor activity during matched trials may be influenced by repetition priming, in which recently activated items are processed more rapidly and efficiently than novel items (Schacter and Buckner, 1998;Henson, 2003). Priming effects are widespread across the brain (Grill-Spector et al., 2006;Eckers et al., 2013) and are characterized by repetition suppression (Auksztulewicz and Friston, 2016;Korzeniewska et al., 2020), the attenuation of neural responses to repeated stimuli (Schacter and Buckner, 1998;Wiggs and Martin, 1998;Henson and Rugg, 2003). Hemodynamic studies of priming effects during speech perception have identified repetition suppression over anterior sensorimotor regions (Wagner et al., 1997;Buckner and Koutstaal, 1998;Buckner et al., 2000;Van Turennout et al., 2003;Orfanidou et al., 2006) during working memory processing, suggesting that priming effects may mediate sensorimotor contributions to working memory. However, this interpretation is complicated by the lack of temporal precision inherent to hemodynamic studies. Specifically, it is possible neither to clearly attribute observed differences to the post-stimulus window when working memory processes are deployed, nor to disentangle the differential influence of priming on encoding and maintenance stages of working memory. To clarify these ambiguities, it is necessary to compare anterior sensorimotor activity with high temporal precision associated with the discrimination of matched and unmatched stimulus pairs.
The precise temporal resolution of electroencephalography (EEG) makes it well-suited for identifying and characterizing anterior sensorimotor activity during speech discrimination. EEG can be used to capture the mu rhythm (Hari, 2006), an oscillatory marker of sensorimotor function typically recorded over anterior dorsal stream regions (Pineda, 2005;Tamura et al., 2012) and consisting of peaks in alpha (~10 Hz; sensory) and beta (~20 Hz; motor) frequency bands. Mu oscillations can be decomposed across time and frequency with event related spectral perturbations (ERSP) to reveal patterns of synchronization (ERS) and desynchronization (ERD), corresponding to cortical inhibition and excitation, respectively (Pfurtscheller and Lopes da Silva, 1999). Beta ERD emerges during diverse movement tasks including walking (Seeber et al., 2014), reaching (Frenkel-Toledo et al., 2013), and swallowing (Cuellar et al., 2016), and has been associated with motor execution (Zaepffel et al., 2013). However, beta ERD also emerges during action observation (Denis et al., 2017) and motor imagery (Brinkman et al., 2014), suggesting that beta oscillations also encode the motor to sensory transformations (i.e., forward models; Wolpert and Flanagan, 2001) supporting mental simulation of action. Similar to beta, alpha ERD emerges in movement tasks, and is thought to be associated with the processing of the primary sensory response to assist with sensory guidance of movement (Tamura et al., 2012;Pineda et al., 2013;Quandt et al., 2013;Babiloni et al., 2016). However, the presence of alpha ERD during passive limb movement (Kuhlman, 1978;Arroyo et al., 1993) and action observation (Muthukumaraswamy and Johnson, 2004;Cannon et al., 2014) suggests that the mu alpha band also encodes sensory to motor transformations (inverse models; Miall, 2003) which map sensory signals onto motor-based representations. While activity in alpha and beta bands is often correlated (Carlqvist et al., 2005), they encode distinct processes (Sebastiani et al., 2014) and hold potential for clarifying the dynamics of sensorimotor activity over the anterior dorsal stream (Jenson et al., 2014b;Saltuklaroglu et al., 2018;Jenson et al., 2019a,b). However, it is first essential to determine how these sensorimotor oscillations unfold during speech discrimination tasks.
To probe sensorimotor dynamics during speech discrimination, Jenson et al. (2014a) employed EEG to identify the mu rhythm during the accurate discrimination of/ba/ /da/syllable pairs, using ERSP analyses to decompose mu oscillations into alpha and beta frequency bands. While activity was observed across the trial, the strongest patterns of activity emerged in the late trial epoch. Specifically, alpha and beta ERD emerged in the late peri-stimulus window, with the magnitude of ERD increasing during the post-stimulus window. This pattern has been observed in a growing corpus of speech discrimination studies (Jenson et al., 2014a(Jenson et al., , 2015Saltuklaroglu et al., 2017;Thornton et al., 2017Thornton et al., , 2019, suggesting that it may characterize neural responses to such tasks. As this pattern of concurrent alpha and beta ERD is observed both during overt speech production (Gunji et al., 2007;Tamura et al., 2012;Kittilstved et al., 2018) and in working memory tasks (Tsoneva et al., 2011;Behmer and Fournier, 2014), results were interpreted as evidence of covert rehearsal to facilitate working memory maintenance. This is consistent with notions of covert production being instantiated by paired forward and inverse models (Pickering and Garrod, 2013) and the association of beta and alpha frequency bands with these models, respectively (Sebastiani et al., 2014). However, based on the results of Jenson et al. (2014a), it is not possible to clarify the contributions of encoding and maintenance stages of working memory to mu oscillations.
In a follow up study, Jenson et al. (2019b) probed the mu rhythm during the discrimination of degraded and non-degraded syllable pairs. While results were similar to those reported in Jenson et al. (2014a), a delayed transition from weak to strong post-stimulus alpha/beta ERD was observed in degraded conditions. This was interpreted as evidence of a prolonged encoding stage (Jacquemot and Scott, 2006) prior to engagement of covert rehearsal to subserve working memory maintenance. Under this framework, both encoding and maintenance stages of working memory are encoded in mu oscillations, with the transition between stages marked by the dramatic increase in magnitude of ERD across alpha and beta frequency bands. As sensorimotor contributions to working memory processes can be probed over time through ERSP decomposition of mu oscillations, it is critical to consider how priming may influence mu activity. To date, no studies have examined repetition priming effects on the sensorimotor mu rhythm, though studies examining the oscillatory consequences of repetition priming over auditory regions have reported reduced ERD for primed compared to unprimed stimuli (Tavabi et al., 2011;Brennan et al., 2014). These results may be interpreted to suggest that a decrement in the magnitude of ERD across alpha and beta bands of the mu rhythm may be considered a marker of priming effects in the anterior dorsal stream.
The over-arching goal of the current study is to determine the influence of trial type (i.e., matched vs unmatched) on sensorimotor activity during speech discrimination. More specifically, it is to determine the influence of priming effects on sensorimotor mu activity during the working memory phase of speech discrimination. In accord with the notion that matched trials will elicit repetition priming during both encoding and maintenance stages of working memory, it is hypothesized that the magnitude of alpha and beta ERD will be weaker for matched compared to unmatched trials across the late peri-stimulus and poststimulus timeframe. In line with proposals that primed items are processed more quickly than unprimed items (Grill-Spector et al., 2006), that encoding is followed immediately by maintenance (Heinrichs--Graham and Wilson, 2015), and that a spike in the magnitude of mu ERD reflects the transition from encoding to maintenance stages of working memory (Jenson et al., 2019b), it is further hypothesized that the increase in the magnitude of mu ERD will occur earlier for matched than unmatched trials. Support of these hypotheses will shed light on the manner in which sensorimotor contributions to working memory are modulated by stimulus-specific features, further clarifying the dynamic contributions of anterior dorsal stream regions to perceptual processes.

Participants
The study cohort consisted of 42 female native English speakers (mean age = 24.1; 3 left handed) with no history of hearing impairment, communicative, cognitive, or attentional disorders. The Edinburgh Handedness Inventory (Oldfield, 1971) was used for the assessment of handedness. All subjects provided informed consent prior to participation in accordance with the ethical considerations of the Declaration of Helsinki. It was deemed necessary to restrict the subject pool to a single sex given reports that sensorimotor processing strategies may differ between males and females (Popovich et al., 2010;Kumari, 2011;Thornton et al., 2019), and females were chosen as a sample of convenience.

Stimuli
To generate the stimulus pairs for the active discrimination tasks in the current study, CV syllables comprised of a voiced consonant (i.e., /b/ ,/d/,/g/,/l/) paired with a vowel (/i/,/ɑ/,/ϵ/) were recorded by a male native English speaker on an AKG C520 microphone paired with a Mackie 402-VLZ3 pre-amp and a Krohn-Hite Model 3384 amplifier. Since the dorsal stream responds more robustly to voices of the opposite sex (Junger et al., 2013), pairing a male speaker with an exclusively female participant cohort was expected to increase the sensitivity of the analyses. Recordings were bandpass filtered from 20 Hz-20 kHz and digitized at 44.1 kHz. In order to minimize the potential effects of lexicality (Pratt et al., 1994;Chiappe et al., 2001;Kotz et al., 2010;Ostrand et al., 2016), two syllables (/bi/,/li/) were excluded, leaving ten distinct syllables. Ten tokens of each syllable were recorded, with the best exemplar of each syllable chosen for use in the study on the basis of overall duration, vowel clarity, and consonant clarity. The selected speech tokens were filtered from 300 to 3400 Hz (Callan et al., 2010;Jenson et al., 2019b), then normalized for intensity (70 dB SPL) and duration (200 ms) in Audacity 2.0.6.
Stimulus pairs were generated from the normalized speech tokens, with 200 ms of silence inserted between the individual syllables and 1400 ms of silence following the offset of the second syllable. Thus, the total length of stimuli was 2 seconds. While segmentation is not necessary for successful syllable discrimination, it should be noted that segmentation is known to modulate activity in anterior aspects of the dorsal stream (Burton et al., 2000;Locasto et al., 2004;Sato et al., 2009;Thornton et al., 2017). To avoid any potential confounds of segmentation effects, syllable pairs within an individual trial were allowed to differ only by the initial consonant. Subject to this limitation, 36 distinct syllable pairs were generated. Auditory stimuli for the control condition consisted of white noise presented at 70 dB SPL.
In contrast to our previous work which employed synthetic speech tokens (Jenson et al., 2014b;Saltuklaroglu et al., 2017;Bowers et al., 2019;Thornton et al., 2019), stimuli for the current study were generated from natural speech signals to increase the ecological validity of tasks. It should therefore be considered whether this transition in stimuli has the potential to influence results. The mu rhythm robustly responds to both natural (Crawcour et al., 2009;Antognini and Daum, 2019) and synthetic (Ulloa and Pineda, 2007;Thornton et al., 2017) stimuli, and we are unaware of any published studies demonstrating differential sensorimotor responses to natural and synthetic speech signals. It is therefore deemed unlikely that this transition meaningly influenced the results, and findings are interpreted within the framework of our existing body of work.

Design
The data for the current study comprises a subset of conditions from the larger experimental paradigm reported in Jenson et al. (2019b). Briefly, Jenson et al. (2019b) employed a 2 × 3 (set size x signal clarity) within-subjects design referenced to a control condition (7 total conditions), with subjects performing active discrimination of CV syllable pairs in all experimental conditions. The level of set size were Small (discrimination of /ba/ and /da/; condition 2 below) and Large (discrimination of full complement of CV syllable pairs; condition 3 below). The levels of signal clarity were Quiet, Noise-masked, and Narrow-band filtered. The data reported in the current manuscript were drawn from both levels of set size and a single level of signal clarity (i.e., Quiet).
The notion of "enough" data is difficult to quantify as the non-Gaussian nature of EEG signals precludes the use of standard power analyses to determine sufficient sample size. However, a frequently employed heuristic in this type of research is that approximately 30 * number of channels 2 datapoints is necessary to achieve a stable and reliable ICA decomposition. With 66 channels, a sampling frequency of 256 Hz, and a 5 second trial epoch, approximately 102 trials are necessary. Thus, in order to achieve a reliable ICA decomposition in the current study it was necessary to aggregate trials from Small and Large set conditions, separating them into Same and Different. However, it was first necessary to determine whether set size had an impact on sensorimotor activity, potentially influencing the current analyses. The published results of Jenson et al. (2019b) demonstrate that there was no effect of set size on sensorimotor activity at any time-frequency point, considered both within and across levels of signal clarity. Additionally, there was no interaction between set size and signal clarity. Thus, in both Jenson et al. (2019b) and the analyses reported herein, data were collapsed across the levels of set size to increase statistical power [For a more detailed explanation of Jenson et al. (2019b), please see Supplemental Materials]. The conditions presented to subjects were thus: 1. Passive listening to white noise 2. Discrimination of /ba da/ syllable pairs (4 possible pairings) 3. Discrimination of full complement of CV syllable pairs (36 possible pairings) Condition 1 was used as a control condition and conditions 2 and 3 were active discrimination conditions. In order to control for the presence of a movement task (i.e., button press) in the active discrimination conditions (2-3), a button press was included in the control condition. To avoid any potential effects of response bias (Venezia et al., 2012;Smalle et al., 2015), stimuli were randomized in a block design and an equal number of Same and Different trials were included in each block. To enable hypothesis testing, data from all Same trials were extracted (regardless of which condition they were presented in) and aggregated, then the procedure was repeated for Different trials. The suitability of this approach is confirmed by the absence of differences between conditions 2 and 3 in direct statistical comparison (Jenson et al., 2019b). Thus, while stimuli were presented in the conditions listed above, comparisons for the purpose of hypothesis testing in the current study were based on the following classification: a) Passive listening to white noise b) Discrimination of Same trials c) Discrimination of Different trials

Procedures
Participants reclined in a comfortable chair in an electromagnetically shielded, double-walled, sound-proof booth. Stimuli were presented binaurally at 70 dB SPL through Etymotic ER3-14A earphones, and button-press responses were captured by a computer running Compumedics NeuroScan Stim 2, version 4.3.3. The response cue consisted of a 100 ms 1 kHz pure tone presented 3000 ms following stimulus onset. As anticipatory motor planning can occur up to 2000 ms prior to movement onset (Graimann and Pfurtscheller, 2006), this timeline was chosen to minimize potential contamination of discrimination-related neural activity by preparation for the button press. The inclusion of a button press response in the passive control condition served to control for anticipatory movement-related activity across conditions, as well as ensuring that subjects were attending to stimuli (Alho et al., 2012(Alho et al., , 2015. In discrimination conditions, subjects were instructed to press one of two buttons upon hearing the response cue based on whether syllable pairs were judged to be same or different. In the control condition subjects were told to press the button upon presentation of the response cue. Handedness of button press responses was counterbalanced within subjects. Trial epochs were 5 seconds in length, ranging from − 3000 to +2000 ms around time zero, which was defined as the onset of the syllable pair in discrimination conditions. Each epoch contained a baseline window consisting of 1000 ms of silence (i.e., − 3000 → − 2000 ms) which was used for subsequent time-frequency (ERSP) analysis. In the control condition, noise onset was temporally jittered to occur at either − 2000 or − 1500 ms, and persisted throughout the trial epoch (i. e., +2000 ms), with time zero representing an arbitrary point in the middle of noise. Each of the three conditions was presented in two blocks of 40 trials each, yielding six blocks (2 blocks x 3 conditions), with block presentation order randomized across participants. The timeline for trial epochs is shown in Fig. 1.

Neural data acquisition
EEG data was acquired from 64 neural channels supplemented by four surface EMG channels to monitor the electrocardiogram, electrooculogram, and peri-labial muscle movement. Two bipolar recording channels were used to measure the electro-oculogram, with electrodes placed above and below the orbit of the left eye (VEOU, VEOL) and on the lateral and medial canthi of the left eye (HEOL, HEOR) to measure vertical and horizontal eye movement, respectively. Peri-labial muscle activity was captured by two surface EMG electrodes placed over the medial and lateral portions of the orbicularis oris muscle. ECG recording electrodes were affixed over the left and right carotid complexes. An unlinked, sintered NeuroScan Quik Cap arranged according to the extended 10-20 montage (Jasper, 1958) was used for data acquisition. As the poor spatial resolution of EEG is further compounded by the use of standard head models, a Polhemus Patriot 3D digitizer was used to capture veridical channel locations for each subject. These individualized channel locations were recorded following cap placement but prior to data collection, then stored for use during subsequent data processing steps.
A computer running Compumedics NeuroScan Scan 4.3.3 software was paired with the Synamps 2 system to record all EEG data and button-press responses. Data were band-pass filtered (0.15-100 Hz) and digitized at 500 Hz by a 24-bit analog to digital converter. Data were time locked to the onset of the first syllable in each pair, with time zero corresponding to stimulus onset in discrimination conditions and to a point in the middle of white noise in the control condition.

Data processing
All neural analyses were performed in EEGLAB 13.5.4 (Brunner et al., 2013), an open source Matlab toolbox for electro-physiologic data. Data were processed at the individual level to identify bilateral mu rhythms, with subsequent group analyses employed to identify time-frequency differences across conditions and hemispheres. An overview of the processing pipeline is outlined here and discussed in greater detail below: Individual processing: a) Pre-processing of all 6 data files for each subject (2 per condition); b) ICA of each subject's concatenated data files to identify sources of neural activity (i.e., independent components) common across conditions; and c) Localization of independent components for each subject. d) Data files from all subjects and conditions submitted to STUDY module in EEGLAB; e) Principal Component Analysis (PCA) to identify patterns of activity common across participants; f) Bilateral mu clusters identified from the results of PCA; g) Time-frequency decomposition of mu clusters via ERSP analyses; and h) ECD source localization of bilateral mu clusters.

Individual pre-processing
For each subject, all six raw data files were merged to generate a single dataset containing 240 trials. This aggregate data file was then downsampled to 256 Hz to reduce subsequent computational demands and re-referenced to the mastoids (M1 M2) for the reduction of common mode noise. Channels deemed noisy upon visual inspection were discarded. Correlation coefficients were calculated for all channel pairs, with correlations in excess of 0.99 considered evidence of salt-bridging (Greischar et al., 2004;Alschuler et al., 2014). For channel pairs exceeding this threshold, the channel closer to midline was removed to minimize signal redundancy while retaining the overall distribution of contributing channels. Five-second epochs ranging from − 3000 ms to +2000 ms around stimulus onset were extracted from the continuous EEG data, yielding a total of 240 epochs (80 per condition). The aggregate data file was then subdivided into three data files per subject, corresponding to the three conditions (Control, Same, Different). Individual trial epochs were discarded if they contained gross artifact, were not discriminated accurately, or if subjects failed to respond within 2000 ms of the response cue. All useable trials were then submitted to further processing.

Independent component analysis
Pre-processed datasets for each subject were concatenated so that ICA decomposition would yield a single set of channel weights common across conditions. This uniformity of channel weights is critical to allow comparison of component activations across conditions. The extended Infomax algorithm (Lee et al., 1999) was used to decorrelate the concatenated datasets, which were then submitted to ICA training with the extended "runica" algorithm with an initial learning weight of 0.001 and a stopping threshold of 10 − 7 . As the number of independent components (ICs) returned by ICA conforms to the number of channels submitted, a maximum of 66 components (68 recording channels -2 reference channels) were returned for each subject. However, the actual number of components returned by ICA was variable, as the number of channels excluded during pre-processing was not uniform across subjects. Inverse weight matrices (W − 1 ) for each component were then projected onto the original channel montage to generate scalp topographies for each component, which represent coarse estimates of scalp distribution.

Dipole localization
Source localization employed the DIPFIT toolbox (Oostenveld and Oostendorp, 2002) to generate equivalent current dipole (ECD) models (i.e., point source estimates) for each component resulting from ICA decomposition. Individualized channel locations were referenced to the extended 10-20 montage (Jasper, 1958), then warped to a spherical (i. e., BESA) head model. Channel warping reduces the mean distance between the digitized locations and the 10-20 montage while retaining the relative configuration of the recording channels on the scalp. Due to an equipment error, digitized locations were unavailable for four subjects, and the standard channel montage was substituted for these subjects. Automated coarse and fine fitting to the head model yielded dipole models for each of the 2205 components. These models constitute physiologically plausible solutions to the inverse problem, which were subsequently validated by projecting them onto the original channel configuration (Delorme et al., 2012). The mismatch between this projection to the scalp and the original scalp-recorded signal specifies residual variance (RV%), which was used to evaluate the "goodness of fit" for each dipole model.

Study module
Group level analyses were implemented within the EEGLAB STUDY module, allowing the comparison of ICA data across subjects and conditions. The STUDY module was populated with processed datasets from all subjects and conditions. Component with RV exceeding 20% or an ECD model which could not be localized within the cortical volume were excluded from group-level analyses. RV of 20% was selected as a threshold for inclusion in the group analysis as levels in excess of this likely represent artifact or noise.

PCA clustering
Principal component analysis employed the K-means statistical toolbox to cluster components across subjects on the basis of commonalities in spectra, dipole localization, and scalp maps. The initial allocation of components by PCA yielded 40 clusters, with these initial results supplemented by visual inspection to ensure that all components assigned to mu clusters met the inclusion criteria, and that no component meeting the inclusion criteria had been omitted. Mu clusters inclusion criteria were a characteristic mu spectrum (i.e., peaks in alpha and beta frequency ranges), RV < 20%, and localization to an accepted mu rhythm generator site (i.e., Brodmann's area 1-4 or 6). Any components mis-assigned in the initial results of PCA were re-allocated to correct clusters following visual inspection.

Source localization
Following final confirmation of mu cluster membership, bilateral mu clusters were localized through ECD methods. The ECD cluster localization is derived by taking the mean stereotactic coordinates referenced to the Talairach atlas of all contributing dipoles. These coordinates were converted to anatomic locations with the Talairach Client, corresponding to physiologically plausible group-level estimates of cortical generator sites.

ERSP
ERSP analysis was employed to measure changes in spectral power (in normalized dB units) across the time course of perception events from 7 to 30 Hz. Neural signals were decomposed by a Morlet family of wavelets with an initial width of 3 cycles at 7 Hz and an expansion factor of 0.5. A surrogate distribution comprising 200 randomly sampled time points extracted from the inter-trial interval (i.e., from − 3000 ms to − 2000 ms) was used as a baseline to calculate spectral fluctuations over time (Makeig et al., 2004). All single trial data from the frequency range of interest were submitted to time-frequency decomposition, with individual changes over time computed with a bootstrap resampling method (p < .05, uncorrected).
Statistical comparisons to evaluate condition differences were implemented with permutation statistics (2000 permutations) and an False Discovery Rate (FDR) corrections to control for multiple comparisons (Benjamini and Hochberg, 1995). A 1 × 3 Repeated Measures ANOVA was performed to evaluate the presence of an omnibus effect., with paired t-tests to decompose the omnibus effect in both hemispheres. To test for the presence of differential processing across hemispheres, a 2 × 2 mixed ANOVA (paired: Same/Different; unpaired: left/right) with FDR corrections for multiple comparisons was performed.

Results
One subject was excluded from the study for failure to follow instructions as she indicated that she used a single hand for all buttonpress responses. Consequently, data from only 41 subjects were included in the analysis.

Condition accuracy
Subjects discriminated the syllable pairs with a high degree of accuracy for both Same [mean = 98.47, sd = 1.89] and Different [mean = 97.8, sd = 4.77] trials. A generalized linear mixed model [fixed effects: trial type; random effects: subjects] with a gamma distribution and a log link function was implemented in SPSS (version 25.0) to evaluate differences in raw accuracy data for subjects contributing to mu clusters. The effect of trial type was non-significant [F (1,70) = 0.566; p = .45], suggesting that observed neural differences cannot be attributed to task difficulty. The raw accuracy data for subjects contributing to mu clusters is shown in Fig. 2.

Cluster characteristics
Figs. 3 and 4 shows the distribution of components contributing to left and right mu clusters, respectively. Out of the 41 subjects whose data were submitted to neural analysis, 36 contributed to mu clusters. Specifically, 31 subjects contributed a total of 41 components to the left cluster, while 32 subjects contributed 49 components to the right cluster. As any component meeting inclusion criteria were allocated to mu clusters, it was possible for subjects to contribute more than one component to each cluster. The left mu cluster was localized to Talairach [-43, − 10, 39] in the precentral gyrus (BA-6) with a residual variance of 4.21%. The right hemisphere mu cluster was localized to Talairach [45, − 5, 38] in the precentral gyrus (BA-6) with a residual variance of 5.32%.

Omnibus
Left Mu: ERSP data from the left hemisphere mu cluster was characterized by robust alpha and beta ERD in all discrimination conditions, with no activity noted during the control condition. Alpha/beta ERD emerging ~300 ms following stimulus onset and persisted across the remainder of the trial epoch in both Same and Different trials. A 1 × 3 ANOVA employing permutation statistics with FDR corrections for multiple comparisons demonstrated significant differences compared to the control condition at ~250 ms in the alpha and ~200 ms in the beta band and persisting through the remainder of the epoch. The omnibus ERSP results are displayed in Fig. 5.
Right Mu: ERSP data from the right hemisphere mu cluster was similar, albeit weaker, than that found in the left hemisphere, consistent with the notion that sensorimotor transformations for speech are bilateral (Cogan et al., 2014) but left hemisphere dominant (Hickok and Poeppel, 2007;Specht, 2014). A 1 × 3 ANOVA employing permutation statistics with FDR corrections revealed significant differences compared to the control condition emerging at ~350 ms in the alpha band and ~150 ms in the beta band and persisting across the epoch. Given the presence of significant activations compared to the control condition across both hemispheres, it was possible to test experimental hypotheses regarding Same and Different trials.

Same/different comparison
A paired t-test employing permutation statistics with FDR corrections identified significant differences across alpha and beta bands in the left mu cluster between Same and Different trials. Differences emerged ~1000 ms following stimulus onset (i.e., 400 ms following stimulus offset) in both alpha and beta bands and remained throughout the rest of the trial, with stronger ERD present in Different trials. The time course of differences in the right hemisphere was similar, with alpha differences emerging ~850 ms following stimulus onset and beta differences emerging ~800 ms following stimulus onset. However, in contrast to the results found in the left hemisphere, late right hemisphere mu activity was stronger in Same trials.

Hemispheric comparison
To further probe the differential effects of trial type between hemispheres, a 2 × 2 mixed ANOVA with FDR corrections for multiple comparisons was performed to directly compare data from left and right hemispheres. No differences were observed between hemispheres during Same trials, nor was there a significant interaction term. However, alpha and beta hemispheric differences, characterized by stronger activity in the left hemisphere, were present in Different trials, emerging ~850 ms following stimulus onset and persisting across the remainder of the trial epoch.

Discussion
In the current study, ICA identified bilateral sensorimotor mu clusters from a cohort of typically developing female subjects during the accurate discrimination of Same and Different syllable pairs. Consistent with other studies employing comparable inclusion criteria for cluster membership (Jenson et al., 2014aSaltuklaroglu et al., 2017), approximately 88% (36/41) of subjects contributed to mu clusters, with 76% (31/41) and 78% (32/41) contributing to left and right clusters, respectively. Clusters were localized with ECD models to precentral gyri (BA-6) bilaterally, consistent with accepted generator sites for the mu rhythm (Pineda, 2005;Hari, 2006;Jones et al., 2009;Saltuklaroglu et al., 2018). Given the high degree of discrimination accuracy (exceeding 98% across conditions) and the large proportion of subjects contributing useable neural components, it was possible to test experimental hypotheses regarding the influence of trial type (Same vs. Different) on sensorimotor activity. When compared to the control condition, ERSP data from all discrimination conditions was characterized by alpha and beta ERD, which emerged ~250 ms following stimulus onset and persisted across the remainder of the trial epoch. This timeline parallels previous reports of sensorimotor activity during speech discrimination Jenson et al., 2019a,b), and the presence of significant activity following stimulus onset only suggests that mu activity encodes sensorimotor contributions to working memory. Under this interpretation, peri-stimulus alpha and beta ERD reflect the mapping of acoustic stimuli onto phonological representations to enable working memory encoding (Jacquemot and Scott, 2006), while post-stimulus alpha and beta ERD reflect covert articulatory rehearsal to refresh working   memory contents (Tsoneva et al., 2011;Herman et al., 2013;Behmer and Fournier, 2014). This proposal is consistent with the notions of forward and inverse models instantiating covert rehearsal (Pickering and Garrod, 2013), and the association of beta and alpha frequency bands with those models, respectively (Sebastiani et al., 2014). Thus, consistent with the assertion of Hickok et al. (2011), the results of the current study support a working memory-based account of sensorimotor activity during speech discrimination. Consequently, it is possible to probe the influence of trial type on sensorimotor-based working memory processing.
In support of the first hypothesis that matched, but not unmatched, trials would elicit repetition suppression, weaker alpha and beta ERD were observed in the left hemisphere from ~950 ms following stimulus onset and persisting across the remainder of the epoch in Same trials. This temporal profile of observed differences aligns with the maintenance stage of working memory, during which covert articulatory rehearsal (Wilson, 2001;Buchsbaum et al., 2005) is instantiated by paired forward and inverse models (Pickering and Garrod, 2013), encoded in beta and alpha ERD, respectively. It may then be suggested that repetition suppression is reflected in weaker covert rehearsal as less processing is required to reactivate phonological representations underlying matched (i.e., reduplicated) syllable pairs. It should be noted that while the weaker alpha and beta ERD to Same trials in the current study is consistent with reduced peak amplitude in ERP priming studies, the timeline of observed differences differs. Specifically, ERP studies have found weaker responses to previously presented stimuli as early as 100 ms following stimulus onset (Holcomb et al., 2005;Huber et al., 2008;Grainger and Holcomb, 2015), a time frame consistent with the initial evoked response. This differential time course may indicate that it takes longer for priming differences to emerge at higher levels of the cortical processing hierarchy, an interpretation consistent with Tavabi et al. (2011), who found oscillatory priming differences over auditory regions ~350 ms following stimulus onset.
However, caution should be exercised comparing the current oscillatory results to previously reported ERP evidence of repetition suppression as the neural signals arise from different levels of the cortical processing hierarchy. Specifically, as ERPs arise from prescribed neuronal populations (David et al., 2006), their evoked responses are susceptible to classic repetition suppression effects (Auksztulewicz and Friston, 2016). However, time-frequency decomposed EEG data integrate both evoked (i.e., phase-locked) and induced (i.e., non phase-locked) activity (Buzsaki, 2006). Consequently, it is not possible based on the current data to determine whether observed differences arise from spatial separation of neuronal populations, as would be expected with evoked activity, or the influence of a top-down induced process. Nonetheless, while left hemisphere data appear consistent with notions of repetition suppression, it remains critical to consider bilateral patterns of activity to fully characterize sensorimotor responses to matched and unmatched syllable pairs.
Interpretations of mu activity in the current study based solely on repetition suppression are complicated by data from the right hemisphere. In previous investigations of mu oscillations, the right hemisphere exhibited similar, albeit weaker, patterns of activity as those observed in the left hemisphere (Jenson et al., 2014a(Jenson et al., , 2019bThornton et al., 2017Thornton et al., , 2019Saltuklaroglu et al., 2018), consistent with the notion that sensorimotor transformations for speech processing are bilateral (Cogan et al., 2014) but left hemisphere dominant (Hickok and Poeppel, 2000. However, in the current study, right hemisphere mu activity exhibited a characteristically different pattern than that observed in the left hemisphere, with stronger alpha and beta ERD present from ~800 ms through the remainder of the trial epoch in Same trials compared to Different trials. Initially, this may be considered evidence of repetition enhancement (Grill-Spector et al., 2000;James and Gauthier, 2006) in which elevated responses are elicited by previously presented stimuli. However, while concurrent repetition suppression and enhancement have previously been reported (Orfanidou et al., 2006;Korzeniewska et al., 2020), there exists no rationale for proposing a left/right dissociation of repetition suppression and enhancement, respectively. Consequently, further consideration is required to resolve the observed neural patterns.
When the results of the 2 × 2 (condition x hemisphere) contrast are considered, a different picture emerges (see Fig. 6). No differences are observed between left and right hemispheres during Same trials, though activity in the right hemisphere does appear visually weaker in accordance with previous reports of left hemisphere dominance for speech and language (Hickok and Poeppel, 2000Specht, 2014). In contrast to Same trials, robust alpha and beta differences were observed between left and right hemispheres from ~800 ms throughout the remainder of the trial epoch in Different trials. Considered in light of between-condition differences, results may be interpreted to suggest that late mu activity increases in the left hemisphere in Different trials, while it decreases in the right hemisphere. That it, the bulk of sensorimotor activity shifts to the left hemisphere following stimulus offset during Different trials. A clearer picture of how this hemispheric shift of late sensorimotor activity supports task demands emerges when results are considered within the framework of Predictive Coding (Rao and Ballard, 1999;Auksztulewicz and Friston, 2016).
Predictive coding is a component of Analysis by Synthesis (Stevens and Halle, 1967;Bever and Poeppel, 2010;Poeppel and Monahan, 2011), and proposes that motor-based predictions are generated in anterior motor regions on the basis of prior knowledge and relayed to posterior sensory regions via forward models (mu beta) for comparison with the incoming stimulus (Poeppel et al., 2008;Sedley et al., 2016). Following comparison between prediction and afference, any mismatch (i.e., prediction error) is propagated up the cortical hierarchy via an inverse model (mu alpha) for hypothesis revision (Sohoglu et al., 2012). Iterative hypothesis-test-refine loops continue until the mismatch is reconciled and the stimulus is identified. It may then be proposed that an articulatory representation of the first syllable serves as a predictive template against which the second syllable is compared. If the prediction is confirmed, no further processing is necessary and covert rehearsal mechanisms engage to retain stimuli in working memory. However, in the event of a mismatch, the majority of sensorimotor-based working memory processing shifts to the left hemisphere as its specialization for speech processing (Hickok and Poeppel, 2000Specht, 2014) make is better suited for conflict resolution. Multiple lines of reasoning support this interpretation. First, predictive coding is metabolically economical, as matches can be detected with a cursory examination of the second syllable. Second, predictive coding has previously been associated with repetition priming effects (Auksztulewicz and Friston, 2016;Auksztulewicz et al., 2017Auksztulewicz et al., , 2018. Third, right hemisphere differences in the current study precede left hemisphere differences, consistent with the right hemisphere relinquishing processing in favor of the more specialized left hemisphere. It is not proposed that predictive coding underlies all priming effects across perceptual tasks, but rather that it emerges in speech discrimination based on the phonological demands imposed by the task (Hickok et al., 2011).
The second hypothesis that Same trials would transition from encoding to maintenance stages more quickly than Different trials, marked by a sharp increase in magnitude of post-stimulus mu ERD was not supported by the current findings. In contrast to previous reports of a dramatic increase in the magnitude of mu ERD following stimulus offset across speech discrimination tasks (Jenson et al., 2014aSaltuklaroglu et al., 2017;Thornton et al., 2017Thornton et al., , 2019, post-stimulus mu ERD increased in magnitude only in Different trials and exclusively in the left hemisphere in the current study. This may be interpreted to suggest that previous findings were influenced by the aggregation of Same and Different trials during analysis, rather than elevated post-stimulus mu ERD constituting a ubiquitous phenomenon across trials. Specifically, when data from Same and Different trials are averaged, the appearance of stronger alpha and beta ERD following stimulus offset is driven principally by data from Different trials. This interpretation is consistent with the subset of mu studies reporting bilateral findings during speech discrimination (Bowers et al., 2013;Jenson et al., 2014aJenson et al., , 2019b, in which post-stimulus mu ERD appears to increase in left, but not right, mu clusters. This assertion is made tentatively though, as these studies did not evaluate hemispheric differences with direct statistical comparisons. Alternatively, it may be proposed that the Dorsal Attention Network (DAN), which projects to ventral premotor cortex (Allan et al., 2019), may have influenced the results. Specifically, as its projections to PMC overlap with the generator site for mu oscillations (Jones et al., 2009), it is not possible to preclude an influence of the DAN on observed sensorimotor activity. Under this interpretation, the increased left hemisphere ERD in Different trials may result from the increased salience of the second CV syllable, which must be processed further before a behavioral response can be selected (Corbetta et al., 2008). Alternatively, right hemisphere results may be interpreted through the same framework as reduced attention to stimulus characteristics in Same trials (Moore et al., 2017). However, it should be considered that the influence of the DAN on stimulus processing and response selection is typically considered within the framework of visuospatial processing/attention, and its contribution to the speech discrimination tasks employed in the current study is uncertain. Additionally, it remains unclear whether attentional modulation and the deeper stimulus processing posited by Analysis by Synthesis constitute truly distinct interpretations. Nonetheless, post-stimulus activity, characterized by concurrent alpha and beta ERD, may still be interpreted as evidence of covert articulatory rehearsal, though notions of increased magnitude representing the transition from encoding to maintenance stages of working memory are no longer tenable. Rather, it may be suggested that covert rehearsal processes elicit relatively weaker sensorimotor responses than those elicited by initial working memory encoding.

General discussion
In the current study, we employed ERSP decomposition of bilateral mu rhythms to characterize sensorimotor contributions to working memory during the discrimination of matched and unmatched syllable pairs. Findings expand upon previous reports of differential processing of Same and Different stimulus pairs in visual (Henson et al., 2004;Huber et al., 2008;Rodriguez Merzagora et al., 2014) and auditory (Tavabi et al., 2011;Woodward et al., 2013) regions, suggesting that sensorimotor-based working memory processes are also influenced by trial type. While data from the left hemisphere supported notions of repetition suppression (Friston, 2005;Orfanidou et al., 2006;Korzeniewska et al., 2020), with weaker alpha and beta ERD observed during the discrimination of matched pairs, the presence of the opposite pattern in the right hemisphere suggests a more nuanced interpretation. Results were interpreted to suggest that predictive coding mechanisms extract an articulatory template of the first syllable for use as an articulatory hypothesis of the second syllable. Following comparison in posterior sensory regions, confirmed hypotheses lead to engagement of covert rehearsal for working memory maintenance (Wilson, 2001;Buchsbaum et al., 2005;Herman et al., 2013), while sensory mismatches shift subsequent working memory processing to the speech and language dominant left hemisphere (Hickok and Poeppel, 2000Specht, 2014). This interpretation is consistent with Skipper et al. (2017)'s assertion that speech-related networks dynamically reorganize in response to task demands.
While an equal number of Same and Different trials are typically employed in speech discrimination studies to ensure that results are not unequally weighted towards one trial type (Venezia et al., 2012), the results of the current study indicate that more fine-grained analyses have the potential to offer greater insight. Specifically, the separate analysis of Same and Different trials employed herein revealed previously unobserved differences in hemispheric patterns of sensorimotor activity across encoding and maintenance stages of working memory. Results have the potential to inform regarding how sensorimotor processes dynamically adapt to support changing task demands across cognitive and perceptual processes. Findings hold particular relevance for emerging sensorimotor models of speech perception (Liebenthal and Möttönen, 2018) and working memory (Buchsbaum and D'esposito, Fig. 6. Hemisphere x Trial type contrast. The top and middle rows correspond to left and right hemispheres, respectively. Left and middle columns correspond to experimental conditions. The bottom row reflects the results of unpaired t-tests between hemispheres while the right-most column shows the results of paired t-tests between conditions. The bottom pane in the right-most column displays the results of the interaction term (hemisphere x trial type). All differences are significant at p < .05 (corrected for multiple comparisons). 2019). Results also hold promise for clarifying how underlying neurophysiologic differences give rise to observed working memory deficits in sensorimotor-linked disorders such as stuttering Jenson et al., 2019a;Saltuklaroglu et al., 2017) and autism (Wang et al., 2017;Habib et al., 2019).

Limitations
While the results of the current study provide compelling evidence for differential working memory processing of Same and Different syllable pairs, there are several limitations that deserve to be addressed. First, while the majority of subjects (36/41) contributed to mu clusters, not all subjects produced useable sensorimotor components. Reduced subject contribution is commonly reported in EEG research (Nystrom, 2008;Bowers et al., 2013), and has been linked to the use of standard head models. While individualized channel locations were used in the current study, co-registration of these channel locations with a standard cortical template still resulted in a reduced proportion of contributing subjects. Second, as the subject pool was exclusively female, it remains unclear how well observed findings reflect the wider population. Such a consideration is critical, given reports that sensorimotor processing strategies may differ between males and females (Popovich et al., 2010;Kumari, 2011;Thornton et al., 2019). Third, the current study only evaluated activity in anterior sensorimotor regions, despite the fact that the covert rehearsal mechanisms proposed to underlie working memory processing influence both anterior and posterior aspects of the sensorimotor network (Jenson et al., 2015;Bowers et al., 2019). To more fully characterize the influence of trial type on sensorimotor processing, future speech discrimination studies should consider activity from both anterior motor and posterior sensory regions.

Conclusions and future directions
The current study leveraged the temporal specificity of EEG to probe the differential sensorimotor processing of Same and Different syllable pairs as they are encoded and processed in working memory. Stronger alpha and beta ERD in the left hemisphere was paired with weaker alpha and beta ERD in the right hemisphere following stimulus offset in Different trials compared to Same trials. These patterns were interpreted through the framework of predictive coding (Rao and Ballard, 1999;Barron et al., 2020) to suggest that an articulatory representation of the first syllable is used as a predictive template for the second. In matched trials, hypothesis confirmation leads to the engagement of covert rehearsal (encoded in concurrent alpha and beta ERD) to support working memory maintenance (Buchsbaum et al., 2005). In unmatched trials, the detection of a mismatch between prediction and afference shifts the bulk of sensorimotor activity to the speech-specialized left hemisphere (Specht, 2014) for further processing in working memory. Results highlight the dynamic interplay between sensorimotor-based working memory processes and task demands during speech discrimination, demonstrating a clear need to probe how trial type may influence neural activity across additional nodes of the sensorimotor network. The non-invasive and cost-effective nature of this methodology support its continued use in future investigations of sensorimotor processing in clinical and non-clinical populations.

Declaration of competing interest
The authors declare that they have no conflicts of interest.