Can Music Enhance Working Memory and Speech in Noise Perception in Cochlear Implant Users? Design Protocol for a Randomized Controlled Behavioral and Electrophysiological Study

Background: A cochlear implant (CI) enables deaf people to understand speech but due to technical restrictions, users face great limitations in noisy conditions. Music training has been shown to augment shared auditory and cognitive neural networks for processing speech and music and to improve auditory–motor coupling, which benefits speech perception in noisy listening conditions. These are promising prerequisites for studying multi-modal neurologic music training (NMT) for speech-in-noise (SIN) perception in adult cochlear implant (CI) users. Furthermore, a better understanding of the neurophysiological correlates when performing working memory (WM) and SIN tasks after multi-modal music training with CI users may provide clinicians with a better understanding of optimal rehabilitation. Methods: Within 3 months, 81 post-lingual deafened adult CI recipients will undergo electrophysiological recordings and a four-week neurologic music therapy multi-modal training randomly assigned to one of three training focusses (pitch, rhythm, and timbre). Pre- and post-tests will analyze behavioral outcomes and apply a novel electrophysiological measurement approach that includes neural tracking to speech and alpha oscillation modulations to the sentence-final-word-identification-and-recall test (SWIR-EEG). Expected outcome: Short-term multi-modal music training will enhance WM and SIN performance in post-lingual deafened adult CI recipients and will be reflected in greater neural tracking and alpha oscillation modulations in prefrontal areas. Prospectively, outcomes could contribute to understanding the relationship between cognitive functioning and SIN besides the technical deficits of the CI. Targeted clinical application of music training for post-lingual deafened adult CI carriers to significantly improve SIN and positively impact the quality of life can be realized.


Introduction
People who suffer from hearing loss are restricted in their lifestyle, profession, and communication needs.If a hearing aid is no longer sufficient to understand speech, a cochlear implant (CI) can restore hearing and help with communication.This electronic hearing prosthesis has been used since the 1970s to significantly improve hearing, speech comprehension, and active participation in life for people of all ages who are severely hearing impaired or profoundly deaf.Although CI users can achieve high levels of speech understanding in quiet conditions, listening to speech in noisy (SIN) conditions, which is typical of everyday listening and music appreciation, still poses great challenges.
In summary, CI users commonly show a decreased perception of pitch, melody, harmony, and timbre, whereas rhythm perception is usually well preserved [1][2][3].The limited delivery of spectral cues, poor frequency resolution in signal processing, electrode position in the cochlea, and individual duration of deafness may explain the low sound quality.CI manufacturers are aware of the issue of inadequate technical sound transmission and considerable effort is exerted for software optimization.Similarly, clinical researchers have been trying to reveal how specific musical training could support the development of new sound processing through a CI (e.g., [4][5][6]).

Music Training Improves Music Appreciation and Speech Understanding in CI Users
Listening to and creating music holds great cultural and social significance worldwide, bringing people together in communication and community.Despite the benefits of CIs, individuals with hearing impairments often refrain from participating in musical activities and their associated social environments.
For CI users, music processing may be significantly altered by auditory training that can promote brain plasticity in the auditory cortex despite the technological constraints of the CI [7].Over the last 15 years, a variety of musical auditory training methods have been developed for CI users that have been shown to improve the recognition/processing of melodic lines, the identification of timbres, and the subjective perception of music (e.g., [8][9][10]).The degree of improvement has correlated with training intensity and regularity [11].While these training programs range from six-week short-term auditory melodic training [8] to systematic active listening programs [12], and the use of online training resources [13], no systematic music rehabilitation strategy has been established.Additionally, Patient-Reported Outcomes (PROs), such as the Music-Related Quality of Life questionnaire [14], were utilized to evaluate individual music experiences across various real-life situations among different subgroups of CI patients [15][16][17].While diverse music rehabilitation needs were identified, clear indications were not discernible to support the development of music rehabilitation programs for CI users [18][19][20].Furthermore, training effects on music perception and appreciation, and effects on speech understanding in adult CI recipients should be reviewed with caution.Most of the present literature consists of correlational quasi-experimental studies with small sample sizes, considerable inter-individual variability, and moderate treatment effects, which prevents the formulation of definitive causal statements [21].There is a lack of sufficient randomized controlled trials with large samples of hearing-impaired participants randomly assigned to an experimental and a control group to accurately assess the effects of music training.
Therefore, what remains to be done is increased research on the design and effect of music training programs on music appraisal and the effect of music training on speech understanding for CI users.

Music Training for Better Speech Understanding in Noise (SIN) with CI
Perceiving speech in noisy environments (e.g., phone conversations and social gatherings) is consistently compromised in individuals with hearing impairments, thus significantly reducing their quality of life.CI manufacturers have developed technological strategies like directional microphones, noise reduction algorithms, or synchronization with accessories to improve SIN comprehension.
Neuroscience research has shown various transfer effects of musical training on speech processing.Shared neural networks of music and language suggest that the plastic changes induced by musical training influence language processing and acquisition [22,23].Both music and language must separate categorized sounds perceived within a continuous and complex stream.Over time and consistent exposure, the brain develops an internal temporal model to accurately anticipate forthcoming events [24].Compared to speech, music places higher demands on pitch, temporal processing, and the analysis of auditory scenes.The latter is the leading process and refers to the capability to segregate between different but similar-sounding sources (e.g., one speaker within other speakers; the violin and the viola in a string quartet).When playing music, the brain performs at a high degree of precision in temporal and spectral synchronization and prediction [25].Numerous studies starting from the 1990s have compared musicians and non-musicians and have shown significantly better auditory discrimination of basic linguistic structures (phoneme categorization, semantic discrimination, syntax, and prosody) in musicians.As an example, Schön et al. [26] showed that musical training improved melody processing not only in music perception but also in speech processing in normal-hearing participants.
After short-term musical training, Shahin [27] was able to demonstrate transfer effects for improved SIN in normal-hearing nonmusicians as well as enhanced speech understanding for hearing-impaired adults.Kang et al. [28] tested not only speech perception in quiet conditions but also included speech-in-noise tests after auditory musical training.They found that lower pitch discrimination thresholds, higher melody recognition scores, and timbre identification were correlated with SIN thresholds.Other groups documented enhanced SIN perception and auditory scene analysis ability in lifelong musicians [29,30].Fowler et al. [31] verified the degree of music skills as a significant predictor of SIN in both the hearing and the CI population.
However, the mentioned and other related studies face multiple methodological limitations, such as small sample sizes, not being randomized or compared to an (age and condition-) matched control group, or not considering potential confounders and bias.Some have suggested that the benefits of musical training may be related to pre-existing factors like IQ and participants' musical affinity rather than neuronal plasticity [32].More data and valid research designs are required to further support the hypothesis that "better performance on complex auditory tasks such as music perception could generalize to better performance in other difficult auditory domains, such as speech in noise" [31].
In addition to training through analytical music listening, previous research has shown that playing/learning musical pieces integrates multiple sensory and motor systems.Playing an instrument creates an "action-perception link" between the motor and auditory systems of the brain [33,34].While playing, auditory information must be constantly paired with motor activities (e.g., finger positions, lip tension) and sensory feedback.Continuous practice promotes auditory-motor plasticity and establishes strong connections between specific motor activities and precise corresponding acoustic information [35].Furthermore, neuroscientific studies demonstrated an automatic coupling between auditory and motor systems when listening to rhythm and music (e.g., [36][37][38]).Enhanced activation in premotor areas was seen in both expert musicians and nonmusicians when they were listening to a melody they learned to play during short-term musical training [39].Advanced rhythmic and musical skills seem to facilitate extracting temporal information of speech and thus improve understanding in the presence of background noise, as was shown in musicians [40].
Thus, auditory-motor (multi-modal) training may be more beneficial for CI users than purely auditory training [41].Only sparse literature exists on the effects of multimodal training on complex sound perception like vocal emotion recognition and pitch pattern recognition in CI rehabilitation.Chari et al. [42] tested the effect of a one-month computerized musical training on three groups of CI users (an auditory-motor group, an auditory-only group, and a no-training control group).Only the auditory-motor training group scored significantly better in the melodic contour identification task.These results support that short-term multi-modal music training significantly impacts pitch pattern recognition in CI users.

Cognitive Aspects of Speech Understanding
Besides fundamental auditory sensory processing skills, higher-level cognitive functions such as auditory working memory (WM) and selective attention are required to successfully understand spoken language and to communicate [43].WM chunks extensive information into meaningful units, sets current information in context with previous information, and makes predictions.This cognitive process of remembering the beginning of a sentence to anticipate the end is fundamental to understanding language.
Characteristics of musical training (e.g., active music making, analytical music listening, composing) not only support neuroplastic changes in the auditory cortex but also shape executive functions [44][45][46] and successful learning [47].Throughout the last decade, musicianship has been associated with improved auditory and enhanced higher-level cognitive functions like WM and selective attention [46].WM plays a crucial role in both music perception and production when tracking chord progressions or memorizing scores to perform accurately and on time.As a result, musicians are superior at predicting acoustic events and understanding their statistical dependencies while listening or playing, thus demonstrating better verbal [48,49] and non-verbal WM [50,51].
The listed higher-level cognitive functions are required to successfully understand spoken language.WM, as the cortical temporal storage and processing system, even significantly predicts global speech comprehension (Pearson's ρ = 0.30 to 0.52) [52].Rönnberg and colleagues developed the Ease of Language Understanding framework (ELU) to describe the interplay between speech recognition and working memory [53][54][55].In easy listening conditions, the input signal is matched immediately and effortlessly with the phonological representation of the auditory information in long-term memory.Understanding SIN distorts this matching process and requires higher-level cognitive functions for compensation: reliable bottom-up sensory encoding of target speech in the auditory system [56], compensatory sensorimotor integration [57], and top-down functions like auditory WM and selective attention (e.g., [58][59][60][61]).This higher-level cognitive remedial processing is very effortful and lowers WM capacity.
The reduced spectrotemporal delivery of CIs may require additional cognitive resources for understanding speech in noisy environments.Consequently, CI users often exert more mental effort when listening to speech in noise, even when speech intelligibility is adjusted for equal performance compared to individuals with normal hearing [62,63].For example, this increased listening effort has been linked to self-reported experiences of CI users during speech-in-noise tasks in Dimitrijevic et al. [64].This heightened effort is associated with increased frustration, fatigue, and reduced concentration among CI users, impacting their performance at work or school and potentially leading to chronic stress and its negative effects on the mental quality of life (e.g., [65][66][67][68]).
Gray et al. [69] concluded in a mini-review that the characteristics of musical training as an integration of multiple sensory modalities and higher-order cognitive functions benefitted both WM performance and SIN perception in older adults.Just recently, Giallini et al. [70] confirmed a significant link between attention, cognition, and WM capacity in CI users.WM compensates for the degraded auditory signals provided by amplification systems and/or CI in noisy listening conditions but requires high cognitive resources [55,71].Consequently, better speech performance was only reported in CI users with higher attentional resources [71].Only a limited number of studies tested the effect of musical training on WM capacity with adult CI users.Most studies were conducted with CI children and showed particular benefits for WM.Long-term musical training improved auditory WM [72,73] or was superior to visual training for memory recall [74].
Considering the frequently described cognitive and auditory benefits due to the overlap in neural networks for processing speech and music, further evaluation of musicbased training on WM performance in the speech perception of CI users is warranted.

Sentence Final Word Identification and Recall Test (SWIR)
As mentioned above, previous studies rarely included SIN tests to evaluate speech understanding in daily listening challenges and their neuronal components in hearingimpaired people.Ng and colleagues [75] developed the sentence final word identification and recall (SWIR) test, where participants are asked to immediately repeat sentences presented with background noise while simultaneously memorizing the last words for a memory recall after a unit of six sentences.The SWIR protocol enables the assessment of cognition when speech is simultaneously perceived and processed with hearing aids in background noise [75][76][77].Importantly, SWIR sentences are understood by the participant since the signal-to-noise ratio (SNR) level is individually adjusted to 85% speech intelligibility, ensuring that recall performance reflects cognitive performance rather than audibility in the subsequent recall.Previous studies obtained average SNRs at +4.2 and +7.5 dB [77], which is representative of typical, real-life listening conditions.Additionally, the SWIR evaluates WM performance (memory recall of the sentence's last works) like in realistic acoustic environments when the background noise and target speech come from multiple spatially separated locations.The dynamics of the mental rehearsal of six words in WM can be interpreted with the ELU framework.Lunner et al. [78] referred to the SWIR test as "suitable for testing hearing and hearing devices under more realistic and demanding everyday conditions than traditional speech-in-noise tests".This study will apply a novel EEG approach to the SWIR test (SWIR-EEG).

Electrophysiological Measures Related to Speech Encoding, Working Memory, Attention, and Listening Effort
Most CI music perception studies emphasized CI behavioral performance on spectrotemporal acoustic features like pitch, timbre, melody perception, complex rhythm, and duration [1,79].These spectrotemporal features are essential to parse speaker and background streams in SIN conditions.CIs distort auditory signals, which alter the perception in SIN conditions and may affect auditory WM abilities for their users.The analysis of electroencephalogram (EEG) patterns is one of the most common tools for electrophysiological investigations in CI users.It has millisecond time resolution and is safe to use in CI users as opposed to other neuroimaging modalities such as functional Magnetic Resonance Imaging.The CI-EEG literature has traditionally focused on passive listening paradigms designed for the pediatric population, primarily to objectively relate brain activity to their behavioral performance due to unreliable behavioral feedback.EEG measures during active listening requiring attention and working memory induce drastic neural oscillations not seen during passive listening [80].EEG has a long history of quantifying cognitive-related brain potentials including neural oscillatory rhythms classically defined as canonical bands: delta (1-2 Hz), theta (3-6 Hz), alpha (8-12 Hz), beta (15-30 Hz), and gamma (35)(36)(37)(38)(39)(40) [81].It is well suited to study higher-level cognitive processing such as attention and working memory during SIN tasks as well as low-level sensory encoding.Even though attentive listening is associated with changes in nearly all canonical brain rhythms [80], the focus in this proposal will be on alpha and theta oscillatory activity, since these are the most commonly reported brain rhythms in the speech perception, attention, WM, and listening effort literature.Increased alpha activity was seen in the prefrontal cortex (PFC) during various WM situations and when listening in noisy conditions (e.g., [82][83][84][85]).Alpha activity in PFC is generally associated with WM updating during sensory processing [86,87] and simultaneously inhibiting non-involved brain regions [88].Gray et al. [69] reported a positively corresponding decline in alpha-theta activity with distorted WM abilities in older persons.WM-related modulations of alpha-theta activity could illuminate problems in SIN perception [89,90].Similarly, lower resting-state alpha and lower alpha-theta power were associated with high WM-load tasks [91], and an alpha-power reduction was seen in conditions with heightened sensitivity to distractions [92].Additional support for this phenomenon has been shown in studies where hearing difficulty and attention increased with the level of background noise or vocoded speech [80,93,94].Alpha oscillations seem to be particularly sensitive in WM paradigms when tested in hearing loss conditions.Petersen et al. [95] have observed an elevated alpha performance with increasing memory load in individuals with mild hearing loss but a significant drop in severe hearing loss conditions.The researchers assumed that a "cognitive limit" was reached when hearing under difficult conditions.This finding is consistent with other reports indicating that when cognitive resources for auditory perception are strained, less is available for cognitive processing [96][97][98].At present, the neural correlates of profound hearing loss treated with a CI on WM and SIN conditions and how they intersect are still unclear.In addition, there is no documentation of possible changes in alpha-theta oscillatory activity associated with WM capacity and SIN perception after multi-modal musical training.Gray et al. [69] carried out a mini-review to investigate the relationship between musical training, WM, and SIN perception in healthy seniors (>65 years).They concluded that musical training may "preserve" WM-related alpha-theta activity and may benefit speech understanding in noisy conditions in older age.Dimitrijevic et al. [80] observed that alpha event-related synchronization and desynchronization occur as two separate components during SIN in normal hearing listeners.They also observed alpha oscillations in CI users during SIN screening tests and found that they are related to listening effort [64,99].In addition to the cognitive evoked oscillatory changes described above, there is a large body of literature describing the sensory encoding of low-level sound features such as sound tone/speech onsets (REFS) [100], change responses [100], and more recently speech-neural tracking or coherence measures including the Temporal Response Function (TRFs) [101,102].The TRF is a class of neural entrainment methods used to relate speech sounds to brain activity (reviewed in [103,104]).It represents a spatial filter that models brain responses for given acoustic features.The TRF shows enhancement during attention [105] and has been successfully used in CI users to quantify active listening [106,107].This project will focus on the TRFs because the TRF response will be quantified to the speech envelopes using the SWIR sentence stimuli [80].Detecting neural correlates of sensory and cognitive processing in CI users performing WM and SIN tasks after musical training may provide further insight into CI hearing processes than merely standard behavioral tests and may lead to more targeted clinical interventions in rehabilitation.

The Rationale for the Study
Given the cognitive and auditory perception benefits due to the overlap in neural networks for processing speech and music, further evaluation of music-based training on SIN in CI users is warranted.
As there is broad consensus that for both the normal hearing and hearing-impaired population, verbal WM functions are crucial in SIN, only a limited number of studies have described the effects of cognition and attention on speech perception in CI users.
In addition, improved auditory-motor coupling after music training seems to play a critical role in speech perception, especially in noisy conditions.This relationship has not been thoroughly investigated in people with CIs.

Research Questions
This study aims to investigate how speech in noise and working memory performance can be altered through focused multi-modal music training in post-lingual deafened adult CI users and gain a neurophysiological understanding of sensory and cognitive processes using EEG measures.

Materials and Methods
This project is a collaboration of the Music and Health Research Collaboration (MaHRC) of the University of Toronto and the CI Program of Sunnybrook Health Center in Toronto, ON, Canada.

Participants and Sample Size
Participants will be recruited from Canada's largest cochlear implant program at Sunnybrook Health Science Center in Toronto.Included will be postlingually deafened persons aged from 18 to 80 years, both uni-and bilateral implanted with at least 1 year of CI experience, and having native or bilingual fluency in English.Exclusion criteria will be single-sided deafness (SSD), severe cognitive deficits, and neurologic (e.g., stroke) or psychiatric disease.Participants will be asked to sign consent forms for participation, approved by the Research Ethics Board at Sunnybrook and the University of Toronto at their first meeting, and will be compensated for each session throughout the duration of the study.
Using the G-Power program [108], a sample size of 18 people for each of the three experimental groups was calculated (0.7 based on the upper and lower ranges of alpha event-related desynchronization in the digits in noise to obtain at least a 10% change from the baseline).We aim for 27 participants per group to account for subject attrition.

EEG Recording and Study Procedure
This project will apply a novel EEG approach that provides behavioral and electrophysiological measures of the SWIR test (SWIR-EEG).All EEG data will be recorded using a 64-channel actiCHamp Brain Products recording system (Brain Products GmbH, Inc., Munich, Germany).The EEG will be segmented by sentence length.The sentence envelope will be used as a reference calculation of the temporal response function (TRF) using the mTRF toolbox [101].Dynamic imaging of coherent sources (DICS) [109] will be used to determine speech-brain coherence, and the source of alpha power (8-12 Hz) will be determined [64].
For EEG analysis, CI-related artifacts will be reduced using independent component analysis (ICA).The team of the Sunnybrook CI brain lab developed a technique to identify and remove CI artifacts using cross-correlation (or TRFs) with the stimulus envelope and the ICA activations [110].Time-frequency analysis for preprocessed continuous data will be performed using either BESA (Brain Electrical Source Analysis) [111] or FieldTrip [112] in addition to applying custom scripts in Matlab and FieldTrip for more advanced analyses (e.g., TRFs with speech envelopes, across trial beamformer correlations).
The overall study procedure will comprise two pre-train sessions (4 h each) at Sunnybrook CI brain lab, eight music training sessions (50 min each) at MaHRC, and one post-train session (4 h) over a period of 3 months per participant.Prior to the first pre-train session, participants will complete online quality-of-life questionnaires (Speech Spatial Qualities (SSQ) [113], Cochlear implant quality of Life-10 (CI-QoL-10), Cochlear Implant Quality of Life-35 (CI-QoL-35) [114]).Behavioral measurements of clinical speech percep-tion will be assessed through the Sentence Matrix speech test [115].In addition, the AzBio Sentence Test [116] will be applied in a 3D listening setup in the first EEG session and be followed by the SWIR-EEG procedure (see Section 3.3).The SWIR-EEG procedure will be repeated in the second pre-train session 4 weeks later.Two pre-train sessions are required to account for the learning effects and stability of the EEG measures and to determine behavioral and EEG imaging differences in the post-session.After the third and post-EEG session, the participants will fill out a short online "musical background and CI-music training feedback questionnaire" designed by the first author (KM) (see Supplementary Materials).
The participants will be randomly assigned into three separate training groups by single-blinded permuted-block randomization ABC [117].Groups A (pitch) and B (timbre) will be the experimental groups because these tasks involve mostly pitch cues and CI users typically have problems with this type of task.Group C (rhythm) will serve as a control, given that CI users typically perform well on rhythm tasks, and therefore the effects of training are expected to be minimal [2].
The music training (4 weeks in total) will start in the week after the second pre-train session.The participants will take part in eight one-to-one practice sessions led by certified Neurologic Music Therapists, scheduled twice per week with each session lasting 50 min.All three groups will follow a gradually challenging music training paradigm for each of the three conditions.The exercise protocol was developed by the first author based on principles of the effect of multi-modal musical training on auditory and speech processing and will involve both active instrument playing and focused listening to recorded music.
One post-train session will be scheduled one week after the music training to assess the short-term effects of the training by applying the online Sentence Matrix speech test [115] and the SWIR-EEG (see Figure 1 for the complete outline of the experimental paradigm).
Audiol.Res.2024, 14, FOR PEER REVIEW 8 in addition to applying custom scripts in Matlab and FieldTrip for more advanced analyses (e.g., TRFs with speech envelopes, across trial beamformer correlations).
The overall study procedure will comprise two pre-train sessions (4 h each) at Sunnybrook CI brain lab, eight music training sessions (50 min each) at MaHRC, and one posttrain session (4 h) over a period of 3 months per participant.Prior to the first pre-train session, participants will complete online quality-of-life questionnaires (Speech Spatial Qualities (SSQ) [113], Cochlear implant quality of Life-10 (CI-QoL-10), Cochlear Implant Quality of Life-35 (CI-QoL-35) [114]).Behavioral measurements of clinical speech perception will be assessed through the Sentence Matrix speech test [115].In addition, the AzBio Sentence Test [116] will be applied in a 3D listening setup in the first EEG session and be followed by the SWIR-EEG procedure (see Section 3.3).The SWIR-EEG procedure will be repeated in the second pre-train session 4 weeks later.Two pre-train sessions are required to account for the learning effects and stability of the EEG measures and to determine behavioral and EEG imaging differences in the post-session.After the third and post-EEG session, the participants will fill out a short online "musical background and CI-music training feedback questionnaire" designed by the first author (KM) (see Supplementary Materials).The participants will be randomly assigned into three separate training groups by single-blinded permuted-block randomization ABC [117].Groups A (pitch) and B (timbre) will be the experimental groups because these tasks involve mostly pitch cues and CI users typically have problems with this type of task.Group C (rhythm) will serve as a control, given that CI users typically perform well on rhythm tasks, and therefore the effects of training are expected to be minimal [2].
The music training (4 weeks in total) will start in the week after the second pre-train session.The participants will take part in eight one-to-one practice sessions led by certified Neurologic Music Therapists, scheduled twice per week with each session lasting 50 min.All three groups will follow a gradually challenging music training paradigm for each of the three conditions.The exercise protocol was developed by the first author based on principles of the effect of multi-modal musical training on auditory and speech processing and will involve both active instrument playing and focused listening to recorded music.
One post-train session will be scheduled one week after the music training to assess the short-term effects of the training by applying the online Sentence Matrix speech test [115] and the SWIR-EEG (see Figure 1 for the complete outline of the experimental paradigm).

The Procedure of the Sentence Final Word Identification and Recall Test (SWIR)
At first, the speech recognition threshold (SRT) for 85% correct sentence identification using HINT sentences will be determined by adaptively varying the signal-to-noise ratio of the sentence using speech-shaped noise.Each subject-specific SNR will be used in all of their repeated testing before and after music training.Participants will wear a 64-channel

The Procedure of the Sentence Final Word Identification and Recall Test (SWIR)
At first, the speech recognition threshold (SRT) for 85% correct sentence identification using HINT sentences will be determined by adaptively varying the signal-to-noise ratio of the sentence using speech-shaped noise.Each subject-specific SNR will be used in all of their repeated testing before and after music training.Participants will wear a 64channel EEG cap and will be seated in a circular speaker ring array with a computer screen indicating instructions.The sentences will be presented by a speaker directly in front of the listener, while the other seven speakers will play speech babble.The test procedure will comprise twenty blocks of five HINT sentences in noise (SNR for 85% correct), where each block will include five trials of a list of six sentences.The participant will repeat the final word in each sentence (identification task).After each list, the participant will be asked to verbally repeat the last six words in any order (free recall task).The approximate time for this procedure will last about 50 min, including 10 min for the free recall task.
As mentioned before in Section 1.4 the SWIR test protocol enables the assessment of cognitive recall performance (WM) as indexed by the recall of the six final words of the sentence as well as subsequent speech understanding in noise.

Statistical Analysis
All data analysis including descriptive statistics will be performed using R [118].Demographic data for each participant will include age, gender, etiology, handedness, years of education, and musical background.The grouping variable of interest will be "types of training" and will be coded into three categories: "A_pitch", "B_timbre", and "C_rhythm".To test for differences after the training period within the three groups for each condition, one-way repeated measures ANOVA will be conducted.The 0.05 level of probability will be set as the level of statistical significance.The necessary assumptions for conducting a one-way repeated ANOVA should be met with the collected data.The participants in the three groups will be independent from each other, as they will be randomly assigned to the groups.The dependent variable "percentage of correctly identified words/sentence" is a ratio-measurement scale and will be tested to investigate if it is normally distributed before conducting the analysis.Levene's test will be applied to test for homogeneity of variance among the groups.Should the one-way ANOVA analysis reveal that at least one group mean turns out as significantly different from the others; a post hoc test will be performed to detect which ones are significantly different from each other [119].Differences between the groups will be analyzed with repeated measures ANOVA with three groups, an effect size of 80% power, and four repeated measurements.The 0.05 level of probability will be set as indicating statistical significance.Testing assumptions will be completed in the same format as the previous tests.

Instruments and Music Material
For pitch training, melody instruments like tone bars (range of two octaves), the glockenspiel (range of one octave), the metallophone (range of one octave), the piano, and the harp will be used.
For rhythm training, percussion instruments like hand drums, congas, triangles, double-row tambourines, and stand drums will be used.In addition, the piano will be used for presenting rhythm sequences.
For the timbre training, a variety of string, melodic, wind, percussion, and other instruments and different kinds of mallets will be employed.An established playlist of recorded music (source: youtube.com) of each instrument's timbre, solo or in an ensemble, will be presented using a laptop and a Bose loudspeaker system.An additional playlist of wind instruments including flute, trumpet, trombone, saxophone, and clarinet as well as a training song composed by the principal author were recorded to be presented solo, in various combinations, and with band accompaniment.

Expected Outcome
If short-term focused multi-modal music training does enhance WM capacity and SIN in post-lingual deafened adult CI recipients, this study will provide behavioral and EEG neural correlates related to improvements after targeted music training.The results of this study could contribute to illuminating sensory-cognitive integration to support cognitive compensation during SIN perception besides the technological constraints of the CI device.

Conclusions
The impact of auditory-motor training on speech understanding in noise and its neural foundations in CI users has not been extensively researched.Meanwhile, studies on the effects of music training have primarily focused on individuals with normal hearing and extensive musical experience.Moreover, hearing rehabilitation following CI surgery varies globally, with some programs offering minimal training while others provide intensive auditory rehabilitation, with some including music training.
This project aims to assess both the neurophysiological and behavioral effects of multimodal music training on SIN perception and WM in CI users.Expanding the understanding of sensory-cognitive music interventions and their direct neural effects on CI users is crucial for significant clinical implications.
The findings of this study could advance the clinical use of neurologic music training as an effective treatment for post-lingual deafened CI adults, enhancing their ability to

Figure 1 .
Figure 1.The overall outline of the experimental paradigm.

Figure 1 .
Figure 1.The overall outline of the experimental paradigm.
Does pitch and timbre-based training result in greater improvements in behavioral working memory performance in noise measures (greater percent of recall in SWIR working memory task) when compared to rhythm-based training?
(1) Does music training with a focus on pitch, timbre, and rhythm result in improvements in behavioral working memory performance in noise measures (greater percent of recall in SWIR working memory task)?(2)