Visual Mental Imagery and Neural Dynamics of Sensory Substitution in the Blindfolded Subjects

Although one can recognize the environment by soundscape substituting vision to auditory signal, whether subjects could perceive the soundscape as visual or visual-like sensation has been questioned. In this study, we investigated hierarchical process to elucidate the recruitment mechanism of visual areas by soundscape stimuli in blindfolded subjects. Twenty-two healthy subjects were repeatedly trained to recognize soundscape stimuli converted by visual shape information of letters. An effective connectivity method called dynamic causal modeling (DCM) was employed to reveal how the brain was hierarchically organized to recognize soundscape stimuli. The visual mental imagery model generated cortical source signals of five regions of interest better than auditory bottom-up, cross-modal perception, and mixed models. Spectral couplings between brain areas in the visual mental imagery model were analyzed. While within-frequency coupling is apparent in bottom-up processing where sensory information is transmitted, cross-frequency coupling is prominent in top-down processing, corresponding to the expectation and interpretation of information. Sensory substitution in the brain of blind-folded subjects derived visual mental imagery by combining bottom-up and top-down processing.


Introduction
Although sensory modalities such as sight, hearing, and touch are input from isolated organs with independent sensing mechanisms, our central nervous system for sensory processing complements each other to facilitate perception of the environment.This complementarity allows us to understand the environment comprehensively and in an integrated way.It also allows to perceive sensory information without directly seeing, hearing, or touching.For example, we can describe a grandma's smiling face without seeing her through her voice over the phone.Likewise, when parking a car, we can easily estimate the gap between the car and obstacles through the warning sound from the rear sensor.Sensory substitution is supposed to complement and augment sensory experiences of humans under sensory-deprived conditions by exploiting this complementarity.
Sensory substitution techniques can transform sensory information of one modality into another (Bach-Y-Rita et al., 1969).For example, vOICe, one of the visual sensory substitution devices (SSD), can transform a visual shape information into a complex auditory signal called a 'soundscape' (Meijer, 1992;Ward and Meijer, 2010).After training to translate the soundscape stimuli, vOICe users can recognize visual shape information such as landscapes or letters through their ears.This technique can remedy patients with sensory impairment and supplement sensation in a sensory-deprived environment.
Along with the development of SSDs techniques, behavioral studies have demonstrated that SSDs users could perceive visual information such as visual shape and spatial arrangement of objects through soundscape stimuli.In those studies, researchers trained blind and blindfolded subjects to recognize various visual information through soundscape stimuli.For example, with training, subjects could identify the shapes of objects (Amedi et al., 2002;Arbel et al., 2022), letters (Striem-Amit et al., 2012), and visual patterns (Graulty et al., 2018;Poirier et al., 2007a) by listening to soundscape stimuli without visual experiences.
However, in those behavioral experiments, a question has arisen whether subjects could perceive auditory stimuli as visual or visual-like sensations.For example, subjects might directly feel soundscape stimuli as visual sensations, like visual perception.Conversely, they might recognize the information through visual imagery induced by the stimuli.In addition, they might perceive the stimuli as a particular type of auditory sensation since the nature of the stimuli is a sound.Dealing with the question could make us understand sensory-complementary mechanisms of the human's sensory processing system so that we can perceive visual information even in a visually deprived environment, including visual impairment (congenitally or late blind) and blindfolded environment, such as in a cave or the smoke of fire.
Neuroscientists have tried to deal with the question by imaging brain activity when the users perceived sensory substitution stimuli and by finding brain areas for soundscape stimuli and functional networks.Early studies have focused on brain areas recruited to recognize soundscape stimuli under visually deprived conditions.In functional magnetic resonance imaging (fMRI) and positron emission tomography (PET) studies, brains of blind and blindfolded subjects recruited early visual areas such as the Brodmann area (BA) 17/18 and 19 to recognize soundscape stimuli (Poirier et al., 2007a(Poirier et al., , 2007b;;Renier et al., 2004;Striem-Amit et al., 2012).In addition, hierarchically higher visual areas such as ventral occipit0-temporal and inferior temporal areas were recruited simultaneously to early visual areas (Amedi et al., 2007;Arbel et al., 2022;Striem-Amit et al., 2011).These areas are known to be within the ventral visual stream to process the visual shape of objects (Amedi et al., 2002(Amedi et al., , 2007)).Those studies showed that the recruitment of visual areas with soundscape stimuli could be due to participation of visual information processing, although visual information was not given.
Which sensory information processing would best explain the underlying neural mechanism for perceiving visual information through sensory substitution stimuli is a remaining question.The underlying mechanism could be described through hierarchical process between brain areas: hierarchical organization of brain areas and functional couplings between areas (Dijkstra et al., 2017a).
Although neuroimaging literature showed visual area activation for soundscape or Braille stimulus, only several possible assumptions could be drawn (Poirier et al., 2007b) since similar brain representations appeared in processing of sensory information even in different processes such as top-down and bottom-up processes (Dijkstra et al., 2017;Kosslyn, 2005).Thus, it might be challenging to use the neuroimaging approach alone to elucidate the mechanism for processing sensory substitution stimuli.Rather, the underlying mechanism could be elucidated through hierarchical process between brain areas, including hierarchical organization of brain areas and functional couplings between areas (Dijkstra et al., 2017).This hierarchical process is presented in the brain as causal interactions between high and low-cognitive brain regions (Mumford, 1992).Therefore, a new approach is needed to derive possible underlying mechanisms from the neuroimaging literature (e.g., Poirier et al. (2007b)) and test their suitability or explainability in hierarchical process, such as effective connectivity.
Four possible mechanisms could be suggested.Two mechanisms could be hypothesized to recruit visual areas: cross-modal perception and visual-mental imagery (Poirier et al., 2007b).The first mechanism assumed that auditory and visual areas might be directly interconnected through repeated training.Training might induce cross-modal plasticity in the brain.The neuromodulation study, which induced virtual lesions in the brain, could support such a hypothesis as virtual lesion in the early visual area could disrupt the recognition performance of early blind subjects (Collignon et al., 2007).Since the bottom-up process was disrupted in the early stage, subjects' performance might be impaired.Through the auditory-visual connection, visual information of soundscape stimuli could be processed through bottom-up (forward) projection, as in visual perception.On the contrary, the second mechanism assumed that the stimuli might induce visual mental imagery.The auditory area transfers soundscape stimuli toward association areas within the parietal cortex to make a mental representation.Next, the association area transfers this mental representation toward the frontal cortex.The frontal cortex generates visual mental imagery from this representation.It is then sent back to the visual cortex through the top-down projection (Kosslyn, 2005).Previous functional connectivity MRI (fcMRI) studies have reported opposite connectivity results between auditory-visual areas (Kim and Zatorre, 2011) and frontal-visual cortices (Murphy et al., 2016) for recognizing soundscape stimuli.
Contrary to visual processing mechanisms, another mechanism regarding auditory processing could not be ruled out since the nature of the sensation is auditory stimuli.In the third mechanism, the auditory bottom-up pathway might play a dominant role.Thus, stimuli could pass auditory and parietal regions and then end in frontal regions without involvement of the visual cortex (Rauschecker and Scott, 2009).Although, the sensory processing system generally weights on a dominant process, other processes could participate.In other words, all those mechanisms are not mutually exclusive (Poirier et al., 2007b).The fourth mechanism was that all those described mechanisms above (cross-modal plasticity, visual mental imagery, and auditory processing only) participated simultaneously, with their contributions not significantly different from each other.All four mechanisms could have the ability to elucidate the underlying neural mechanism.Nevertheless, none of these possible models has been evaluated or compared to each other.
In this study, we investigate hierarchical processes to elucidate the underlying mechanism of the recruitment of visual areas by soundscape stimuli in a visually deprived condition.Based on the findings in the neuroimaging literature (Amedi et al., 2007;Graulty et al., 2018;Hannagan et al., 2015;Poirier et al., 2007aPoirier et al., , 2007b;;Renier et al., 2004;Striem-Amit et al., 2012), we assumed four sensory processing models to explain the underlying mechanism: (1) auditory bottom-up model (AUD), (2) cross-modal perception model (CM), (3) visual mental imagery model (VMI), and (4) mixed model (MIX).Each model has effective connectivity between cortical sources, called dynamic causal modeling (DCM) for electroencephalography (EEG) (Chen et al., 2008;Kiebel et al., 2009).This connectivity represents the hierarchical process between cortical sources.Among these four possible models, one could dominantly represent the underlying mechanism.In the study of Poirier et al. (2007a), two models presumed that visual processing could be dominant.We measured the explainability of these models based on their ability to estimate spectral dynamics induced through their hierarchical process.

Subjects
Twenty-two healthy Korean subjects participated in this study.They had normal hearing and vision.None of these subjects reported psychiatric or neurological diseases.All subjects participated in this study voluntarily.Experiments were approved by the Institutional Review Board (IRB) of Seoul National University Hospital (approval number: 1906-127-1042) and performed following the Declaration of Helsinki.

Sensory substitution stimuli
In the experiment, visual shapes of 24 Korean consonants and 25 English letters were translated to the auditory soundscape stimuli with vOICe software (Meijer, 1992).This software can read a visual image through a horizontal axis, from left to right.In addition, it can map pixels within the vertical axis to provide frequency information.Lower-positioned and higher-positioned pixels corresponded to lower-frequency and higher-frequency signals, respectively.The brightness of each pixel indicated the loudness (amplitude) of the signal.
H. Kim et al.The algorithm composited information and generated soundscape stimuli.The frequency range for translation was from 500 Hz to 5000 Hz and the sampling frequency was 96,000 Hz.Soundscape stimuli were 2 s long.Detailed translation rules were described by Meijer (1992).

Sensory substitution training procedures
This study delivered visual shape information of letters via soundscape stimuli.Subjects participated in repeated sensory substitution training to perceive visual information in soundscape stimuli.Therefore, the experiment was designed to have two parts.The first part was for a pre-training session and the second part was for sensory substitution sessions (Fig. 1A).
Pre-training session: Before the training session, we measured baseline state of the brain for soundscape stimuli.Thus, we designed a pre-training session.Subjects listened to the soundscape stimuli in this session.However, before listening, the experimenter explained to subjects that various types of noises would be played.Hence, they could not know what stimuli they listened to had visual shape information.Soundscape stimuli were presented after cue onset (Fig. 1B).Subjects listened to stimuli four times repeatedly.The simulation sequence was randomized.
Sensory substitution sessions: After the pre-training session, the experimenter informed subjects that the sound was from sensory substitution stimuli.Next, subjects participated in three days of sensory substitution sessions.All subjects were blindfolded during sessions.The experimenter trained subjects about Meijer algorithm's visual shape-tosound translation rules.After subjects were familiar with soundscape stimuli, visual patterns (diagonal-or arc lines) and simple figures (round, square, and triangles) were delivered.Each sensory substitution session comprised two parts.First, we trained subjects to perceive visual shape information of letters via soundscape stimuli.Fig. 1C illustrates the training part.In the training part, subjects paid attention to the fixation point in the black screen for one second.Soundscape stimuli were then presented for two seconds.We instructed subjects to listen to the stimuli carefully and try to identify shapes of letters as much as possible.After stimuli ended, subjects were given feedback with visual letter images.We only used six letters randomly selected from 49 stimuli in the training part to prevent participants from memorizing experimental stimuli.In addition, each stimulus was presented to subjects twenty times with random sequences.A total of 43 stimuli were not used in the training part.They were used in the testing part only to prevent memorizing effect.
After the training part, subjects took a rest for ten minutes.They then participated in the testing part.This part had a similar procedure to the training part (Fig. 1D).As they did in the training part, subjects paid attention to fixed points and then listened to soundscape stimuli.However, they had to choose the correct letter shape they heard from the four choices and press the correct button (key number 1 to key number in the number pad) with their right hands as fast as possible.Furthermore, letters that subjects did not experience in the training part were presented in the testing part.In every sensory substitution session, stimuli presentation sequence and arrangement of choices were randomized.Such designs can prevent subjects from memorizing the stimuli.Each session was about 1 hour and 30 min long.

EEG acquisition and preprocessing
During experiments, electroencephalography (EEG) signals of subjects were recorded with a SynAmp RT EEG system (Compumedics, Neuroscan) and Scan4.5 recording software (Compumedics, Neuroscan).An electrode cap consisting of 64 AgCl channels (10-20 standard layout) acquired EEG signals of subjects.The sampling rate of the amplifier was set to 512 Hz.In addition, we applied a 0.1 Hz high-pass filter and removed DC trends to correct trends during EEG recording.We recorded EEG signals in a magnetically shielded room to minimize external electrical noises.
We only used EEG data recorded during the test part in the analysis.During preprocessing, we removed 60 Hz power noise and its harmonics with a notch filter.Next, signals were downsampled to 250 Hz to reduce computational cost.Next, EEG channels with bad signals were rejected with a visual inspection.After rejection, common average re-referencing (CAR) was applied.Next, independent component analysis (ICA) was performed to remove artifacts such as eye-blink.Electrooculography (EOG), electrocardiography (ECG), and electromyography (EMG) were then performed.After ICA, signals were segmented into epochs from − s to 2.5 s.We only used epochs in which subjects responded correctly.Lastly, epochs were filtered using a 1 ~ 50 Hz band-pass filter.Preprocessing was done using MNE (Gramfort et al., 2013), the Python library for M/EEG analysis.

Model specification and inference
Dynamic causal modeling (DCM) for induced response (Chen et al., 2008) was employed to investigate the underlying mechanism of sensory substitution.Although the stimuli were auditory and presented to subjects as time-locked, the perception of shape information might be induced by the sound rather than evoked.The stimuli needed perceptual demands in translation.Such demands could cause trial-by-trial variations in latency (Chen et al., 2012(Chen et al., , 2008;;Gilbert and Sigman, 2007).
We specified four different DCMs that could generate spectral dynamics of each ROI changed through repeated sensory substitution training.Fig. 3A graphically illustrates causal couplings between ROIs within each model.Eq. ( 1) presents a bilinear state equation of DCM.This equation describes the dynamics of spectral density in sources (g): There are three matrices (A, B, and C) in the equation.Each matrix contained coupling parameters of each model.Matrix A contained baseline connections between sources.C was where exogenous stimuli u were inputted.Matrix B encoded changes in coupling induced by the experimental effect, v ∈ [0, 1].We measured coupling changes after sensory substitution training (v = 1) compared to pre-training (the baseline, v = 0).The DCM generated activity of cortical sources with a biologically plausible neural mass model containing subpopulations of pyramidal cells, inhibitory interneurons, and spiny stellate cells (Kiebel et al., 2009).Each model was fitted for each subject.Here, we set the frequency range from 4 Hz to 50 Hz, with four frequency modes.In addition, we only modeled the peri-stimulus period, 0 s from the soundscape presentation to 2 s.
For those models, we tested which model best explained the spectral dynamics of five ROIs for processing soundscape stimuli.We employed two Bayesian model selection (BMS) methods to test models.First, we compared the relative log evidence or marginal likelihood under the fixed effect assumption (Penny et al., 2004).This assumed that all subjects might use the same model to process soundscape stimuli.Second, we employed the random effect assumption (Stephan et al., 2009).It could accommodate inter-subject variability of models that might cause a subject-specific source activity for the task.Finally, we identified the best model that the two methods commonly indicated.
The BMS result seemed to help identify the best model among several possible models.However, there remained 'uncertainty' (Penny et al., 2010).For example, the best model might be overfitted to the data, or a more suitable model might be underfitted.Thus, we conducted Bayesian model averaging (BMA) (Penny et al., 2010).The BMA allowed us to address uncertainty about the selected model by pooling parameters across all models (Stephan et al., 2010).This method computed the weighted averages of each model's parameters.Posterior densities of coupling parameters give the weighting.We then tested statistical significance of coupling parameters after BMA analysis.Finally, we conducted a paired t-test to identify changes in coupling parameters through sensory substitution training compared to the pre-training session.However, type-1 error might occur for multiple comparisons.Thus, we corrected the probability through a false-discovery rate (FDR) test.As a result, we could identify which spectral couplings between ROI were significantly changed over sensory substitution training.

Results
We recruited subjects and trained them repeatedly to recognize visual shape information through soundscape stimuli under a blindfolded condition (see details in Experimental Methods and Fig. 1).To investigate the mechanism, we measured EEG and estimated cortical source activity for five regions of interest (ROIs) (Table 1).Neuroimaging studies have demonstrated that perceiving sensory substitution stimuli would recruit such ROIs (Amedi et al., 2002;Arbel et al., 2022;Hannagan et al., 2015;Striem-Amit et al., 2012).We then investigated the hierarchical process in the brain through DCM.DCM can generate cortical source activity of sources using a biologically plausible model with presumed connections.This method can quantitively measure the explainability of each possible hierarchical process model.In addition, analyzing parameters inside the model enabled us to find features to generate neural dynamics.Along with DCM, EEG had superior temporal resolution to previous approaches such as fMRI and PET.Advantages of temporal resolution allowed us to track temporal dynamics in sensory processing between brain areas.

Behavioral result
Fig. 2 illustrates changes in mean accuracy across subjects with repeated sensory substitution training.Subjects' accuracy gradually increased with more sensory substitution sessions (Supplementary Figure 1).At the end of sessions, mean accuracy on training stimuli reached 76.1 % ± 3.0 % after the sensory substitution training.In addition, accuracy on testing stimuli not used for training reached 67.6 % ± 2.8 %.These results show that subjects could recognize visual shape information through soundscape stimuli.
We also tested whether repeated sensory substitution training could enhance subjects' performances.In the first session, subjects' accuracy was 68.2 %±3.8 % for the training stimuli and 52.0 %±3.6 % for the testing stimuli.We compared accuracies of the first and last sessions through two-way ANOVA.Main effects of session (F(1, 68)=12.08,p < 0.001) and stimulus types (F(1, 68)=13.13,p < 0.001) were significant.However, the interaction between stimulus types (train-and test stimuli) and sessions (first and third session) was insignificant (F(1, 68)=1.29,p = 0.26).Post-hoc analysis showed that accuracy for the testing stimuli increased with repeated sensory substitution training (p < 0.001).On the other hand, changes in accuracy of training stimuli were insignificant (p = 0.58).However, sensory substitution training effects could be strongly individual.Thus, we made a mixed linear model and included random effect parameters within the model.The subject effect was significant (t = 2.145, p < 0.05).Even considering the subject effect, two fixed effect parameters, stimulus types (t = − 7.115, p < 0.001) and sessions (t = 3.039, p < 0.01), were still significant.On the other hand, the positive interaction between stimulus types and sessions became significant (t = 2.528, p < 0.05).

Identifying sensory processing models through Bayesian model selection (BMS)
Fig. 3A presents hierarchical couplings between regions of interest (ROIs) according to each model.DCMs were fit for each subject.Fig. 3B shows summed log evidence of each model over subjects.BMS identified the visual mental imagery (VMI) model as the best model to explain the time-frequency dynamics of brain areas.We also employed random effect to address inter-subject variability (Stephan et al., 2009) (Fig. 3C).The result also indicated that the VMI model was the best model.BMS results showed that visual mental imagery best explained the underlying mechanism.Furthermore, the Bayesian model averaging (BMA) result highlighted that the sensory substitution training significantly modulated couplings within the VMI model.

Distinct characteristics of hierarchical couplings within the visual mental imagery model
Within the VMI model, we identified two different types of couplings.The first type represented forward processing.With this bottomup process, neural activities in the relatively lower hierarchy region causally influenced activities in the higher cognitive region.It included BA41/42-to-BA39/40, BA39/40-to-BA44/45, and BA17/18-to-BA37 connections (Fig. 4).Among these connections, sensory substitution training significantly changed BA39/40-to-BA44/45 and BA17/18-to-BA37 connections (p < 0.01, FDR-corrected).Characteristics of connections appeared as within-frequency coupling between regions.The within-frequency coupling meant that two causally coupled regions were bounded in the same frequency band.In Figs.4A and 4B, significant couplings in beta1 (12Hz-16 Hz) appeared in both BA39/40-to-BA44/45 and BA17/18-to-BA37 connections.In addition, theta (4Hz-7 Hz) and frequency around 30 Hz oscillation in the BA17/18 source causally induced the same in the BA37 source significantly (Fig. 4B).

Discussion
This study aimed to identify the underlying neural mechanism in blindfolded subjects for perceiving visual shape information through soundscape stimuli without seeing.To this end, we investigated the hierarchical process between brain areas recruited for processing the soundscape stimuli under a blindfolded condition.Results showed that the visual mental imagery (VMI) model best generated spectral dynamics of cortical sources after repeated sensory substitution training in the blindfolded subject.It implies that blindfolded subjects could recognize visual shape information via sensory substitution stimuli by building visual-mental images in their mind.
Previous studies seem to have limitations in identifying the underlying neural mechanism in two points.First, different underlying mechanisms can recruit some overlapped brain regions.In the case of visual processing, although underlying mechanisms of visual perception (bottom-up) and visual mental imagery (top-down) were different, they could recruit the same brain areas (Dijkstra et al., 2017a;Kosslyn, 2005).Rather than identifying brain areas, measuring hierarchical process between brain areas could be a key to identifying a distinct underlying mechanism.This hierarchical process is presented in the brain as causal interactions between high and low-cognitive brain regions, such as the frontal cortex and early sensory areas (Mumford, 1992).The second point came from the limitation in the functional connectivity method such as fcMRI.The fcMRI results showed changes in information flow between brain areas after sensory substitution training.However, functional connectivity could only measure the statistical correlation between signals of brain areas without showing causal interactions due to the limitation of temporal resolution (Razi et al., 2015).Rather than those approaches, an alternative method that could reveal hierarchical process is mandatory to elucidate the underlying mechanism for sensory substitution.An effective connectivity method has a clue to determine the hierarchical process between brain areas since the method represents the causal influence of one neuronal system exerting over another, along with a modality having sufficient temporal resolution (David et al., 2006;Friston et al., 2003).
We examined changes in effective connectivity between EEG cortical sources to identify the neural mechanism model.Unlike functional connectivity, effective connectivity such as DCM can explain causal interactions between cortical sources.In addition, this approach can explain generation mechanisms of cortical sources by these causal interactions.DCM can generate its neural dynamics through a biologically plausible neural mass model (Friston et al., 2003;Kiebel et al., 2009).We could model the induced responses (Chen et al., 2008) of the brain, since the perception of sensory subsitution stimuli is more likely to be induced responses than evoked ones (Chen et al., 2012;Gilbert and Sigman, 2007).Via DCM, we can elucidate not only the hierarchical process between cortical sources but also the generation of spectral dynamics for composing visual mental images induced by sensory substitution stimuli.In addition to this finding, we identified two processes in the visual-mental imagery model: bottom-up and top-down.These processes can represent hierarchical processes inducing visual mental imagery through soundscape stimuli.
There are benefits in estimating induced spectral dynamics rather than simple evoked responses of signals (Graulty et al., 2018).Neural representations for cognitive functions appear in different oscillations (or rhythms) (Blinowska, 2011;Chen et al., 2008;Wróbel, 2000).Thus, models estimating spectral dynamics can reflect neural processes in our brains accurately.Furthermore, sensory perception consists of computation in localized brain regions and integration of a large-scale processing network between them (Canolty and Knight, 2010;Fiebelkorn et al., 2013;Jensen and Colgin, 2007).Bottom-up processes in perception might deliver information by the same neural oscillations.On the other hand, top-down processes could intervene in the bottom-up network and create complex coupling patterns between different frequency bands.Therefore, investigating these couplings within the hierarchical process model allowed us to identify and characterize the specific neural processing in the brain.
In the visual-mental imagery model, we found two distinct characteristics of hierarchical couplings between spectral dynamics of cortical sources.These characteristics appeared differently depending on the sensory processing hierarchy.The bottom-up processes, in which neural information was transferred from the lower to the higher hierarchical regions, enhanced the within-frequency coupling while suppressing the cross-frequency coupling.On the contrary, the top-down processes, in which the information was transferred from the higher to the lower cognitive regions, enhanced the cross-frequency coupling while suppressing the within-frequency coupling.Our results showed that BA39/ 40 to BA44/45 and BA17/18 to BA37 couplings represented the former while BA44/45 to BA17/18 and BA44/45 to BA37 couplings represented the latter.The flowchart of the suggested hierarchical process is illustrated in Fig. 6.
In the first stage, the soundscape stimuli went into BA41/42.Then BA41/42 delivered neural information to the BA39/40 through the auditory bottom-up pathway (Fig. 4A).In this stage, the BA39/40 might receive the auditory neural information since this region, especially the angular gyrus, plays a cross-modal (or inter-modal) integrative hub in sensory perception (Keil and Senkowski, 2018;Seghier, 2013).This region can integrate sensory information from each sensory modality and process it as associative mental representations (Kosslyn, 2005).These mental representations are multimodal.They are objects' semantics or contexts (Seghier, 2013).Furthermore, their spectral dynamics can be characterized in the beta band (Keil and Senkowski, 2018).Thus, the BA39/40 can translate objects' semantics or context from the auditory neural information via the sensory substitution rule, which subjects learned in the training part of the sensory substitution session.In addition, we found that transferring the semantics through the BA39/40-to-BA44/45 coupling enhanced within-frequency coupling of beta1 (12Hz-16 Hz) but suppressed cross-frequency coupling.This result implies that the neural dynamics of objects' semantics or context may not be modulated into other dynamics.Our results suggest that BA44/45 might synchronize objects' semantics or context from BA39/40 and then initiate top-down processes to prime visual-mental images.
In the next stage, top-down processes appeared in which the hierarchically higher region, BA44/45, causally modulated lower regions such as BA37 and BA17/18 (Fig. 5).These processes are known to characterize the mechanism for constructing or priming visual imagery (Dijkstra et al., 2017a).Kosslyn (2005) has described these processes as 'frontal lobe processes', searching and activating visual properties of objects and priming depicted visual-mental images.Contrary to the bottom-up process described above, these processes should involve modulation of neural dynamics.According to Friston (2000Friston ( , 2001)), modulation of neural dynamics occurs when the population of afferents in one region exerts a modulatory influence on the population in the other region, leading to changes in its intrinsic dynamics.Furthermore, cross-frequency couplings, asynchronous couplings between different frequencies, characterize such modulation (Chen et al., 2008;Friston, 2001).Thus, cross-frequency couplings can illustrate modulation of neural dynamics for objects' semantics or contexts into depicted visual-mental images.
Studies on cross-frequency coupling have found causal couplings between slow and fast (gamma) oscillations (Canolty and Knight, 2010;Chen et al., 2012;Hyafil et al., 2015;Jensen and Colgin, 2007).Likewise, our results showed that slow oscillations in the hierarchically higher region, BA44/45, significantly modulated fast oscillations in lower regions such as BA37 and BA17/18.The central role of coupling is to integrate functional systems scattered throughout the brain across multiple spatiotemporal scales (Canolty and Knight, 2010).This integrative role operates via transferring information from large-scale networks to local cortical processing, facilitating neural modulation and effective computation (Canolty and Knight, 2010;Jensen and Colgin, 2007).The cross-frequency couplings currently have been investigated to understand neural mechanisms of learning and memory (Tort et al., 2009;Vivekananda et al., 2021), motor control (Chen et al., 2012), and sensory perception (Chen et al., 2008;Fontolan et al., 2014).However, no evidence currently has been suggested in the priming of visual imagination.Therefore, we tried to elucidate the top-down priming mechanism of visual-mental images from the semantics or context through cross-frequency couplings.
For objects' semantics or context information, the BA44/45 could send neural dynamics to the object processing system, BA37 (Fig 5A).This top-down process can activate an object's visual properties stored in the visual processing system (Kosslyn, 2005;Slotnick et al., 2005).For the process, BA44/45 could integrate bottom-up processes and then carry temporal predictions for target objects (Arnal and Giraud, 2012) toward the object processing system, BA37.Theta rhythm can characterize this integration and then modulate gamma oscillations in the BA37 (Hyafil et al., 2015).Here, gamma oscillation represents local cortical processing of BA37, which possesses neural codes for the object's visual properties, such as the shape of letters (Hannagan et al., 2015;McCandliss et al., 2003).Through theta-gamma couplings between BA44/45 and BA37, our neural system could seek the most desired or appropriate neural codes for visual shapes for objects' semantics or contexts and filter out unexpected or irrelevant information.
What should not be overlooked in priming visual-mental images is that our brain sequentially builds them up part by part (Kosslyn, 2005).In addition, soundscape stimuli also encode and transmit shape information sequentially over time.Thus, our brain should shift its attention to other parts or characteristics when priming of the part of visual mental images.For a correct build-up, the brain should move attention to expected or predicted part of an object, not the random part (Arnal and Giraud, 2012;Friston, 2005).With repeated priming and attentional shifting, depicted visual-mental images could form in the visual buffer within early visual areas (V1/V2) (Ganis et al., 2004;Kosslyn, 2005).
The cross-frequency coupling in which beta2 (16 Hz-30 Hz) oscillations of BA44/45 modulated BA17/18′s gamma (30 Hz-50 Hz) could illustrate the attentional shifting mechanism for priming visual mental images (Fig. 5B).The attentional mechanism of sensory processing is significantly related to beta oscillation (Arnal and Giraud, 2012;Wróbel, 2000).In predictive coding, our brain infers upcoming sensory events and sends neural code of 'prediction' toward sensory areas to guide the attention toward an appropriate location or part of the object (Friston,   (2005).Colored lines were the functional connections within the visual mental imagery (VMI) model.Solid lines were the significant connection, and dotted lines were the connections not significant.The significant frequency couplings are denoted.The single letters (e.g., β1) denotes the within-frequency coupling, and the notation with two letters (e.g., ɣ | θ) is the cross-frequency couplings.
H. Kim et al. 2005).Beta oscillation represents the propagation of the prediction code via a cortico-thalamic descending pathway (Arnal and Giraud, 2012).Beta activity in 15 Hz-25 Hz represents the cortico-geniculate feedback, which shifts the visual system to attention (Bourgeois et al., 2020;Lindström and Wróbel, 1990;Wróbel, 2000).This pathway may select relevant information of the object and guide the attention (Bourgeois et al., 2020).Thus, BA44/45′s beta2 may carry neural dynamics for attention through cortico-thalamic projection and then modulate BA17/18′s local cortical processes presented as gamma.Modulation of BA17/18′s gamma can characterize the local process to shift the attentional window toward the expected part or characteristics of objects.Furthermore, it can represent forming complete visual-mental images on the visual buffer within early visual areas.With attentional shifting, the priming of visual-mental images is complete and depictive visual images are formed in the visual buffer.
The last process might be another forward process in BA17/18 to BA37 coupling (Fig. 4B).The neural dynamics of BA17/18 propagated to BA37 via gamma around 30 Hz.In this process, neural dynamics did not change into other frequencies.Thus, BA17/18 appeared to propagate visual-mental images formed in the visual buffer toward BA37.This process may represent bottom-up propagation of visual-mental images through the ventral stream.
Considering that participants might utilize individually different strategies, we assumed a random effect in BMS analysis together.Even when individual differences were considered, the visual mental imagery model was a robust model.However, current results can only explain neural dynamics of blindfolded subjects.Results may differ according to subject groups, especially blind subjects.As shown in a neuromodulation study (Collignon et al., 2007), neural dynamics of early and congenitally blind subjects who are sensitive to cross-modal neuroplasticity could be explained by the CM model.On the other hand, late blind subjects' neural dynamics could be explained by VMI, like blindfolded subjects, or MIX models.
There might be a couple of concerns about EEG approaches.One is the accuracy issue of source localization.Due to volume conduction and low signal-to-noise ratio (SNR) caused by skull conductivity, the accuracy of EEG source localization can be restricted.However, our study did not focus on brain mapping by source localization.Instead, it focused on effective connectivity between regions.ROIs were selected from previous reports (see references in Table 1).Although a specific coordination of ROIs was quite different in neuroimaging literature, criteria for the Broadman area covered all those locations.Furthermore, the selection of ROIs in the EEG source localization required enough distance between regions to prevent crosstalk (electrical couplings between two signals).If crosstalk happened, similar or redundant signals could appear in separated regions and the frequency coupling result could show significant increasement in within-frequency coupling in a bidirectional manner throughout the whole frequency band.Therefore, those sparsely selected ROIs robustly estimated the source activity with a slight variance in the source location.In Fig. 4 and Fig. 5, the within-frequency coupling only increased in EVC-to-IFG coupling in a unidirectional manner and at around 4-7 Hz and 30Hz-around frequency bands.Moreover, although EEG has an excellent advantage in computing effective connectivity with a high temporal resolution, a high-frequency EEG activity over 60 Hz band can be contaminated by environmental noises due to a low SNR.Thus, we restricted the frequency band to less than 50 Hz.Statistical analysis of multiple trials identified a significant gamma connectivity component.

Conclusions
This study reveals that visual mental imagery is the underlying neural mechanism of blindfolded subjects to perceive visual information in sensory substitution.The within-frequency connectivity demonstrated a bottom-up perceptive information flow to transfer auditory processing and object properties.Top-down processing is a dominant cognitive modulation mechanism to translate sensory substitution stimuli into visual mental images.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.

Fig. 1 .
Fig. 1.Experimental design.(A) Experimental procedure.(B) Example trial sequence of pre-training session.(C) Example trial sequence of training part in the sensory substitution sessions.(D) Example sequence of testing part in the sensory substitution training.

Fig. 2 .
Fig. 2. Accuracy of testing part in the sensory substitution training session 1 and the session 3. Transparent dots indicate each subjects' test accuracy.The opaque dots indicate the mean accuracy, and the error bars range from two standard error.The statistical significance of post-hoc analysis is indicated as follows: p < 0.05 (*), p < 0.01(**), and p < 0.001(***).

Fig. 3 .
Fig. 3. Dynamic causal models (DCMs) and the result of Bayesian model selection (BMS).(A) Model specifications.In all models, external input (soundscape stimuli) enters via early auditory cortex (BA41/42) (blue lines).The model at the top is baseline model that represents the pre-training session.In the below, four possible DCMs were illustrated.The dashed gray lines represent the linear couplings.Then, the experimental effect (sensory substitution training) exerts the non-linear modulation (solid black lines) on the baseline couplings.(B) BMS result for the fixed effect assumption.(C) BMS result for the random effect assumption.Both assumptions indicate the VMI is the best model.

Fig. 4 .
Fig. 4. The bottom-up couplings within the visual mental imagery (VMI) model.The solid lines represent the couplings that contains the significant coupling parameters.The dashed line represents a connection that belongs to a bottom-up but is not significant.(A) Significant coupling parameters in the BA39/40-to-BA44/45 coupling.(B) Significant coupling parameters in the BA17/18-to-BA44/45 coupling (red: increased, and blue: decreased compared to baseline).

Fig. 5 .
Fig. 5.The top-down couplings within the visual mental imagery (VMI) model.The solid lines represent the couplings that contains the significant top-down coupling parameters.The dashed line represents a connection that belongs to a top-down but is not significant.(A) Significant coupling parameters in the BA44/ 45-to-BA17/18 coupling.(B) Significant coupling parameters in the BA44/45-to-BA37 coupling (red: increased, and blue: decreased compared to baseline).

Fig. 6 .
Fig. 6.A schematic of the hierarchical process for the sensory substitution stimuli.The base model was the visual mental imagery and perception model of Kosslyn(2005).Colored lines were the functional connections within the visual mental imagery (VMI) model.Solid lines were the significant connection, and dotted lines were the connections not significant.The significant frequency couplings are denoted.The single letters (e.g., β1) denotes the within-frequency coupling, and the notation with two letters (e.g., ɣ | θ) is the cross-frequency couplings.

Table 1
Description of regions of interest.