Thalamocortical inhibitory dynamics support conscious perception

Whether thalamocortical interactions play a decisive role in conscious perception remains an open question. We presented rapid red/green color flickering stimuli, which induced the mental perception of either an illusory orange color or non-fused red and green colors. Using magnetoencephalography, we observed 6-Hz thalamic activity associated with thalamocortical inhibitory coupling only during the conscious perception of the illusory orange color. This sustained thalamic disinhibition was temporally coupled with higher visual cortical activation during the conscious perception of the orange color, providing neurophysiological evidence of the role of thalamocortical synchronization in conscious awareness of mental representation. Bayesian model comparison consistently supported the thalamocortical model in conscious perception. Taken together, experimental and theoretical evidence established the thalamocortical inhibitory network as a gateway to conscious mental representations.


Introduction
The experience of conscious perception is essential to everyday life, but the neural correlates underlying this process remain under debate. In particular, whether cortical or the subcortical brain regions are more decisive in conscious perception remains contentious. One view posits that conscious experience occurs primarily within the cortex (Crick and Koch, 2003;Dehaene and Changeux, 2011;Romijn, 2002). Consistent with this view, there are a multitude of anatomical cortico-cortical connections that are reciprocal to feedforward connections, forming extensive intra-cortical feedback loops (Nieuwenhuys et al., 2007). However, a second view postulates that subcortical structures, prominently the thalamus, are also essential for conscious perception (Penfield and Feindel, 1975;Ward, 2011). This controversy on the neural basis of conscious perception has recently brought increased attention to the role of thalamocortical networks in conscious perception (Edelman and Tononi, 2008;John, 2002;Llinas et al., 1998).
The thalamus, a key region located at the core of the brain, is believed to primarily act as a neurodynamically interactive relay station between different subcortical areas and the cortex. Beyond relaying information through the first-order nuclei, the thalamus is also believed to process information through higher-order (i.e., more complex in the processing hierarchy) thalamic nuclei, (Sherman, 2016). A prominent example is the pulvinar, the largest subdivision of the thalamus with strong connectivity with the visual cortex. The thalamus also plays a critical role in regulating arousal, sleep, and awareness, and is the primary candidate for the location of consciousness (Gent et al., 2018;Min, 2010). The thalamus has been referred to as the gateway (or hub) of nearly all sensory inputs (with the exception of the olfactory system) to the corresponding cortical areas through direct thalamocortical circuits (Hwang et al., 2017;Jones, 1985). Therefore, interactions between the thalamus and the cortex are essential for major brain functions in perceptual and cognitive processes. For example, the thalamus is a key station that relays motor and sensory signals to the cortex through direct thalamocortical connections (Reislev et al., 2017;Usrey and Sherman, 2019).
It is noteworthy that a major inhibitory control over the thalamic relay nuclei is performed by the thalamic reticular nucleus (TRN), a sheer neuronal laminar wrapping around the thalamus (Shu and McCormick, 2002). In an early stage of brain development, communicative innervations between the dorsal thalamus and telencephalon (comprised mainly of the cerebral cortex) must pass through the ventral thalamus, the major derivative of which is the TRN (Schambra et al., 1992). Accordingly, the TRN occupies a striking control position within the thalamocortical reciprocal connection, sending inhibitory axons back to the thalamic relay nuclei, roughly to the same region where they receive afferents (Shu and McCormick, 2002;Wang et al., 2001). The inhibitory TRN cells are densely innervated by collaterals from both thalamocortical and corticothalamic neurons, both of which generate strong excitatory projections (Steriade et al., 1997). Taken together, the TRN acts as an integrative junction and serves as inhibitory control over the interactive thalamocortical circuits (Min, 2010).
Given this TRN-mediated inhibitory regulation over thalamocortical interactions during brain function, in the present study we explored whether such a thalamic disinhibition (or gating) plays an essential role in conscious perception of mental representations. Specifically, we investigated whether the thalamocortical inhibitory network principally contributes to conscious perception of illusory color from physically flickered color stimuli using human magnetoencephalography (MEG) data.
Illusory color perception has been studied before with functional magnetic resonance imaging (fMRI), which resolves neural signals with higher spatial resolution than MEG (Knotts et al., 2018;Morita et al., 2004;Sasaki and Watanabe, 2004). For example, the anterior portion of the color-selective area in the ventral occipital cortex (presumably V4) has been reported in relation to conscious perception of illusory color by the McCollough effect (Morita et al., 2004). Using an illusory transparent surface with long-range color filling-in, fMRI activity for filling-in was observed in the primary visual cortex (Sasaki and Watanabe, 2004). However, the temporal dynamics as well as coupled neurodynamics between the thalamus and visual cortices remain unclear. Here, we aimed to characterize the temporal dynamics of thalamocortical interactions during conscious perception. Due to the thin anatomical structure of the TRN, the fitted source activity of the TRN could not be disentangled within the thalamus. Thus, the source activities of the TRN were included in those of the thalamus that were analyzed as a whole to represent the TRN-mediated thalamocortical inhibition in the present study. In this context, the interpretational (not anatomical) meaning of the word "thalamus" in this study implies "TRN-mediated thalamocortical relay networks".
Mental perception based on a physically nonexistent substrate is an ideal testbed to study the neurophysiological basis of conscious perception. We leveraged this in our experimental paradigm, which alternately presented to participants two individual colors, red and green. When these two colors were alternately flickered in a rapid manner, one could occasionally generate (or recognize) the perception of a new color, the fused color orange. Thus, the same stimulation condition would frequently elicit different conscious perception (fusion or non-fusion) in the same individual. To present stimuli, we used a 5 Â 5 array of bicolor (red and green) light emitting diode (LED) lamps ( Fig. 1A and B). Red and green LEDs were alternately lit for three types of presentation time each: 10 ms, 50 ms, and 100 ms. In order to avoid any possible afterimage in the retina, a 5-ms empty presentation gap was inserted between two consecutive red and green stimuli. Since only the 50-ms flickering condition resulted in occasionally different perceptual responses of either fusion or non-fusion condition, it was selected for the main analysis in the present study.

Results
Fifteen participants were instructed to report whether they perceived orange (the fused condition) or individual red/green by pressing buttons at the end of every 5 s trial. Five participants experienced both fusion and non-fusion percepts alternately, 6 participants perceived color fusion only, and 4 participants non-fusion only. The mean perception rates between orange and physically flickering red/green colors were not significantly different at the 50-ms condition (fusion 53.9%, non-fusion 42.1%; t(14) ¼ 0.542, p ¼ 0.596; paired sample t-test; Fig. 1C).
We recorded the brain signals of the participants with a 152-channel MEG system while performing the task. MEG analysis relied on N 1 ¼ 10 participants for fusion and N 2 ¼ 8 for non-fusion, because each participant responded subjectively whether conscious perception of fusion color occasionally occurred or not, and some participants experienced both percepts while others experienced only one percept. We mapped the MEG signals on cortical and subcortical sources (Attal et al., 2009;H€ am€ al€ ainen, 2009;Tadel et al., 2011) based on subject-specific anatomical data. Oscillatory analyses (Pantazis et al., 2018) focused on three main frequencies: 6 Hz for thalamic activity, which is the main thalamic oscillation implicated in thalamocortical networks (Lopes da Silva, 1991;Mariotti et al., 1989;Pinault et al., 2001), 9 Hz for the 50-ms flickering frequency when consecutive inputs of two colors were deemed a single perceptual unit (100 ms þ 10 ms gap), and 18 Hz when each individual color is deemed a separate unit (50 ms þ 5 ms gap). The following four regions of interest (ROIs) were selected to investigate which brain area contributed to the conscious perception of the mentally-fused color: thalamus, V1, V2, and V4. V1 and V2 were analyzed for striate and extrastriate visual processing, and V4 because it is considered crucial for color-perception (Bartels and Zeki, 2000). Since we were interested in thalamocortical interaction neurodynamics, with any potential hemispheric lateralization during the early phases of visual perception beyond the scope of this study, we focused on left hemisphere results for brevity.
Consistent with the literature on steady-state visually evoked potentials (Nunez and Srinivasan, 2006), we observed robust oscillatory responses in V1, V2, and reliable, but weaker, signals in V4, at both flickering frequencies, 9 Hz and 18 Hz ( Fig. 2; Fig. S1). These were true for both the fusion and non-fusion conditions, but when testing for differences, we found significant effects only at 9 Hz between the fusion and non-fusion conditions. Thus, the visual cortex appeared to be engaged in the color fusion process specifically with the 9-Hz neural response, which suggests the perception of two consecutive colors of red and green as a paired stimulus with a 110 ms periodic stimulation. Importantly, visual responses at 9 Hz were stronger for the fusion than the non-fusion condition, indicating that enhanced cortical excitation was linked to the fusion perceptual process. In order to evenly distribute light, a thin plastic diffuser (top) was placed as a cover over the LED array. (C) Response rates (%) to the fusion (orange) and nonfusion (red/green) perception. Error bars indicate standard errors of the mean; N ¼ 15; NS ¼ non-significant; paired sample t-test.

Thalamocortical interactions in human conscious perception
A plausible model for the enhanced cortical responses associated with the fusion condition is the thalamocortical loop model, which would explain the integration of two individual input signals into a fused percept via the thalamus. Indeed, in addition to cortical responses, we also observed a sustained suppression of thalamic activity at 6 Hz, which lasted the entire 5 s duration of the conscious perception of the fused orange color, when compared to the non-fusion condition. This thalamic suppression was temporally overlapped with enhanced visual responses in V1, V2, and V4 at the visual flickering frequency of 9 Hz.
Presumably, this thalamic disinhibition triggers cortical excitation in the form of enhanced 9 Hz activity in visual cortices that enhances perception of the fused orange color, but not the non-fusion percept. To investigate whether the 6-Hz thalamic disinhibition activity can facilitate general thalamocortical communication irrespective of the stimulus flickering frequency, we further presented the same red/green alternating stimuli at 33.3 Hz (i.e., 10-ms flickering condition), which almost always yielded a fusion response (fusion 98.5%, non-fusion 0%; t(14) ¼ 166.175, p < 0.001), and at 4.8 Hz (i.e., 100-ms flickering condition), which almost always yielded a non-fusion response (fusion 0%, nonfusion 98.4%; t(14) ¼ À140.096, p < 0.001). Thalamic activity at 6 Hz did not change by varying stimulation frequency, but was only associated with perceptual experience ( Fig. 3; Fig. S2). Although these two extreme conditions resulted in opposite behavioral responses and entrained cortical activity at different frequencies, we could still analyze the relation of the thalamic 6-Hz responses with respect to the conscious perception of fused color. Specifically, we observed that the thalamic 6-Hz responses were significantly suppressed during the fusion perception in the 10-ms flickering condition as compared to the non-fusion perception in the 100-ms flickering condition (Fig. 3A). Furthermore, the thalamic 6-Hz time course of the 10-ms fusion condition was not significantly different from that of the 50-ms fusion condition (Fig. 3B). Similarly, the thalamic 6-Hz time course of the 100-ms non-fusion condition was not significantly different from that of the 50-ms non-fusion condition (Fig. 3C). non-fusion (green lines) conditions. Visual cortical activities from V1, V2, and V4 were extracted at 9 Hz, while thalamic activity was extracted at 6 Hz. Error bands indicate standard errors of the mean; black lines below curves represent the stimulation period; asterisk lines above curves indicate statistically significant differences (N 1 ¼ 10 for fusion and N 2 ¼ 8 for non-fusion; twosided permutation test with independent (unpaired) samples, p < 0.05 cluster defining threshold; p < 0.05 cluster threshold). All the (sub)cortical activities were from the left hemisphere, with similar dynamics observed in the right hemisphere (not shown).

Bayesian modeling supports thalamocortical inhibitory coupling during conscious perception
Dynamic causal modeling (DCM) (Pinotsis et al., 2014 provided further support in favor of thalamic involvement in the conscious perception of fused color (Fig. 4). Visual responses from V1, V2, and V4 were fitted to two network models: one thalamocortical (Pinotsis et al., preprint) (Fig. S3) and one cortical-dominant (Fig. S4). Evidence of each model (how well it could explain the data) was computed using a Free Energy approximation and a Bayesian model comparison (BMC; see Methods and the study by Pinotsis et al. (2018) for a detailed discussion). Briefly, BMC fits competing models to MEG data and assesses the most likely model. The output of BMC is an odds ratio, called Bayes factor (BF) (Pinotsis et al., 2018). This is a Bayesian analogue of the usual odds ratio and has a similar meaning: it quantifies the probability that one out of many models has generated the MEG data. In other words, it provides the (relative) probability that this could be a true model of the brain compared to alternative models. Using BF we can pool together evidence of the most likely model across different subjects. Here, we compared two models: one cortical-dominant and one thalamocortical, across all subjects who were responsive to both the fusion and the non-fusion conditions (four subjects). Because we expect the activated brain network to depend on the condition (fusion vs. non-fusion) and to be the same across subjects, we performed a fixed effects analysis. We wanted to determine which model was the most likely for each condition in the same subject, and assess whether the most likely model differed depending on changes in conscious perception. Specifically, we hypothesized that the thalamus was actively involved in the fusion condition only. To confirm this hypothesis, the thalamocortical model should be more likely in the fusion condition, and the cortical-dominant model in the non-fusion condition. We did not consider subjects that responded to only one of the two conditions because we would not be able to find the most likely model in the other condition. This led to a relatively small sample (N ¼ 4 subjects). 1 The cortical-dominant model assumed that V1, V2 and V4 receive endogenous neuronal noise input (with white and pink components) from the rest of the brain. Thus, it included thalamic input, but did not distinguish it from input to V1, V2 and V4 from other areas of the brain. In other words, if this model was more likely, it would mean that V1, V2 and V4 receive input from the thalamus but this input would affect their dynamics in the same way as input from any other areas. The thalamocortical model, on the other hand, assumed that V1, V2 and V4 are preferentially coupled with the thalamus. In this case, their MEG responses would be determined by the anatomy of known thalamocortical connections. In particular, this model assumed that V1, V2 and V4 received input from pyramidal cells of other cortical areas, and from thalamocortical (TC) relay neurons. Also, TC neurons received backward connections from pyramidal neurons in V1, V2 and V4. Thus, TC output to the cortex was the result of local interactions with TRN neurons,and Fig. 3. Thalamic activity at 6 Hz. Time courses of thalamic evoked power at 6 Hz between fusion (purple lines) and non-fusion (green lines) conditions. Data are shown for the 10-ms fusion vs. 100-ms non-fusion conditions (A); 10-ms vs. 50-ms fusion conditions (B); and 100-ms vs. 50-ms non-fusion conditions (C). Error bands indicate standard errors of the mean; black lines below curves represent the stimulation period; asterisk lines above curves indicate statistically significant differences (N 1 ¼ 10 for fusion and N 2 ¼ 8 for non-fusion; two-sided permutation test with independent (unpaired) samples, p < 0.05 cluster defining threshold; p < 0.05 cluster threshold). Note that the 10-ms fusion and 100-ms non-fusion conditions were statistically different, but we found no statistical differences between the 10-ms and 50ms fusion conditions, or the 100-ms and 50-ms non-fusion conditions. All cortical activities were from the left hemisphere, with similar dynamics observed in the right hemisphere (not shown). Free Energy for the four subjects when they perceived fused color. The Bayesian factor (BF) shown in the bottom-left corner of the axes, indicates which of the two models is a more plausible generator of the MEG visual cortical signals, with values greater than 1 favoring the thalamocortical model and values less than 1 the cortical-dominant model. For the fusion condition, visual responses appeared to involve the thalamus (BF ¼ 7). Right column: Same as before, but for the non-fusion condition. In this case, visual responses appeared to favor the cortical-dominant model (BF ¼ ¼) with the exception of one subject. the input to TC neurons was from backward connections from the cortex. Also, input to V1, V2 and V4 was the result of specific thalamic dynamics based on the coupling between TRN and TC neurons and the delays due to thalamocortical connections.
Our hypothesis that the thalamus was involved in the fusion but not the non-fusion condition amounted to the thalamocortical model being the more likely model to explain the fusion data but less likely to explain the non-fusion data. This is what we found using BMC: the thalamocortical model was more likely to explain visual responses in the fusion condition (BF ¼ 7; greater than 1 favors the thalamocortical model; Fig. 4 middle panel). At the same time, the cortical-dominant model was more likely to explain visual responses in the non-fusion condition (BF ¼ 1/4; less than 1 favors the cortical-dominant model; Fig. 4 right panel). Taken together, these results support our findings based on the analysis of oscillatory responses that the thalamus is implicated in the conscious perception of fused color. 2

Discussion
Taken together, these observations provide corroborating evidence that the thalamus is essential for perception of fused color in the present paradigm, and that thalamic disinhibition plays a key role in conscious perception of a mental representation. Our findings are also in accordance with the notion that subjective awareness depends on neural networks in the brain supporting dynamic patterns of rhythmic activity, and that inhibitory processes can temporally and spatially structure the activity of excitatory cell assemblies to ensure that information flows to just the right place at just the right time (Buzsaki, 2007). Dynamic causal modeling provided further support in favor of the involvement of the thalamus in the conscious perception of fused color. Consistently, robust thalamic suppression was observed during the perception of a fused color, implicating the thalamus in the fusion process of two different incoming stimuli. Conscious perception might be accomplished by a sufficient number of signal iterations between individual thalamic modal-specific subdivisions and their corresponding modal cortices (e.g., pulvinar and visual cortices). This notion is in accordance with the previous report that conscious access seems characterized by the entry of the perceived stimulus into a series of additional brain processes (Salti et al., 2015). For example, visual information can be further elaborated in terms of its conscious awareness when that information moves iteratively back and forth between the thalamus and visual cortices. Here, signal iterations between cortices and the thalamus may have generated an illusory color perception of orange in the brain although there is no physical orange input to the eyes.
We found strong visual oscillatory responses in all three cortical ROIs, with comparable signals in V1 and V2 but much weaker signals in V4 (Figs. 2 and 5; Fig. S6). It has been proposed that the color center is neither isolated nor traceable to a single area in the visual cortex. As a result, different visual areas may play a different role in conscious perception of color given that they subserve different processes. For example, although V4 has been regarded as essential in color perception because of the strength of its color receptive fields (Bartels and Zeki, 2000), V2 is considered to play a crucial role in color processing by projecting signals from V1 to V4 (Jansen-Amorim et al., 2012). Notably, we observed significant effects only at 9 Hz between the fusion and non-fusion conditions in visual cortices, which suggests the perception of two consecutive colors of red and green as a paired stimulus with a 110 ms periodic stimulation. In general, alpha activity has been known to be essential for the functional gating of incoming stimulus information (Jensen and Mazaheri, 2010;Toscani et al., 2010) and associated with thalamocortical interactions (Bollimunta et al., 2011;Vijayan et al., 2013). However, rather than such alpha oscillations related to functional gating of input stimulus, brain oscillatory responses specifically at 9 Hz were dominantly detected because the present study employed the repetitive presentation of visual stimuli at a fixed flickering frequency; this is known as a steady-state visually evoked potential that is physically driven by external flickering stimuli and observed at the same flicker frequency (and harmonics) as the flickering stimulus that has been presented (Vialatte et al., 2010).
While in the present study we identified significantly higher activity in V1 and V2 compared to V4, they were all temporally overlapped with thalamic disinhibition in 6 Hz. Based on these coupled neurodynamics, the thalamus seems to play a regulatory role over the entire system of thalamocortical loops as a dynamic core in conscious experience. Thus, the cognitive thalamus works as a gateway to mental representations (Wolff and Vann, 2019).
As mentioned earlier, it is noteworthy that a major inhibitory input to thalamic relay nuclei is provided by the TRN (Shu and McCormick, 2002), which interacts synaptically with the thalamocortical neurons (Lopes da Silva, 1991). Based on its anatomical connections, the inhibitory TRN cells may play a key role in coordinating our conscious perception through the inhibitory feedback network across both the thalamus and the cortex (Min, 2010). Throughout the latticework structure of the TRN, conscious perception could be established and refined through accumulating intercommunicative processing across the first-order input signal and the higher-order signals through the relevant thalamocortical loops. TRN cells send their axons back to the thalamus, particularly two branches of a single axon connected to the first-order and the related higher-order nucleus (Pinault et al., 1995;Wang et al., 2001). The functional significance of such a gathering venue has been proposed to be most essential for the interactions among the first-order and the related higher-order circuits in the same modality (Sherman and Guillery, 2001). Since interactions between the TRN and thalamic relay cells are essential for thalamocortical synchronization (Sherman and Guillery, 1998), the TRN-mediated inhibitory control over the thalamocortical interactions may play an essential role in orchestrating all the relevant processes in conscious perception. Notably, the 6-Hz centered thalamic suppressed activity was strongly observed with the appearance of dominant visual cortical activity during the conscious perception of fusion color (Fig. 5). This could consistently support the idea that TRN-centered thalamocortical networking is involved in conscious perception, because the thalamic 6 Hz is mainly generated when TRNs contribute to the inhibitory feedback control of the thalamocortical neurons (Lopes da Silva, 1991;Mariotti et al., 1989;Pinault et al., 2001). Analysis based on DCM provided further evidence that thalamic reticular inhibitory control is important for conscious perception. Using DCM, we compared two models, one assuming endogenous, sensory driven, neuronal noise input to cortical areas (cortical-dominant model) and another that assumed structured input based on the anatomy of thalamocortical connections. Interestingly, evidence in favor of the thalamocortical model in the fusion condition was stronger than the corresponding evidence in favor of the cortical-dominant model in the non-fusion condition (Bayes factor; BF ¼ 7 vs BF ¼ 1/4, respectively). Of course, in both conditions, sensory inputs pass through the thalamus and are subject to its inhibitory control. However, stronger evidence in favor of the thalamocortical model in the fusion condition implies that temporally structured thalamic input from specific TC and TRN neurons (that reflected their dynamics) was present when fused color was perceived, compared to unstructured noise input from all cortical and subcortical areas in the non-fusion condition. This is consistent with our earlier results that structured thalamic input is essential in achieving conscious perception.
Throughout the present MEG study, we confirmed the neuroanatomical bases of illusory color perception, and provided their temporal 2 Note that a value BF ¼ 3 would suggest that the winning model is exp(3) ¼ 20 times more likely than the cortical model. This corresponds to the classical type I error with p-value, p ¼ 1/20 ¼ 0.05. Similarly, BF ¼ 7 that was found in the fusion condition corresponds to p ¼ 0.0009, and BF ¼ 1/4 that was found in the non-fusion condition corresponds to p ¼ 0.01.
neurodynamics showing the thalamocortical inhibitory relationship during conscious perception. Our present observations may provide neurophysiological evidence supporting the thalamic reticular inhibitory network model for consciousness in humans (Min, 2010). However, the present study has some implicational constraints that are worth mentioning. First, there are crucial limitations in current non-invasive neuroimaging, which have constrained the ways we can map both cortical and subcortical activation simultaneously in high temporal resolution (Min et al., 2020). MEG and its electric counterpart, electroencephalography (EEG), offer the necessary temporal resolution, but sources deep beneath the scalp easily escape detection. In addition to the increased distance from sources to the sensors, the targeted subcortical structures are small and surrounded by the cortical mantle; both of these factors lead to a poor signal-to-noise ratio. Nevertheless, recent advances in MEG and EEG source reconstruction offer compelling evidence that localizing brain signals is possible even from deep subcortical structures under favorable circumstances. For example, MEG/EEG fields from subcortical sources can be distinguished from those generated by the cortex when the underlying salient cortical activity is sparse (Krishnaswamy et al., 2017). A new technique linking MEG/EEG recordings to fMRI techniques using representational similarity analysis provides an alternate way to access whole-brain dynamics with combined high temporal and spatial resolution (Cichy et al., 2014). Specifically, a recent study exploited principled modeling and estimation paradigm for MEG source analysis tailored to extracting the cortical origin of electrophysiological responses to continuous stimuli (Das et al., 2020). Second, although a larger sample size would have improved the statistical power of our study, sample sizes of those who were responsive to both the fusion and the non-fusion conditions were limited by participants' occasionally subjective perception. Thus, despite the consistent observations by Bayesian modeling, the limited statistical power should be carefully considered when interpreting our BMC results between the fusion and non-fusion conditions. Third, in the fusion condition, it could be supposed that the orange flicker is expected to entail a longer perceived stimulus duration (i.e., the sum of the durations for the red and green stimuli) than the non-fusion condition. However, we have not experimentally measured the subjectively perceived stimulus duration for both orange fusion and red-green non-fusion conditions. Thus, it is hard to analyze and explicitly describe expected or possible differences in their durations simply based on our current observations. Fourth, although the randomization of trial types, practice session, and subjective reports are all helpful to control for attentional/alertness effects in task engagement, any possible attentional confound across the main contrast of interest (trials where perceptual fusion did occur versus did not occur) may still be of concern during the actual task performance. This potential for attention-perception interactions in visual-thalamic signaling should be addressed in a future line of research.

Participants
Fifteen healthy volunteers (7 female; mean age 31.3 y) participated in this study. All participants had normal or corrected-to-normal vision, and none was color-blind as determined by the Ishihara color test. All subjects gave a written informed consent and received payment for their participation. The study was conducted in accordance with the ethical guidelines established by the Institutional Review Board of Korea University (No. KUIRB-2018-0052-01) and the Declaration of Helsinki (World Medical Association, 2013).
Analysis of behavioral data relied on all N ¼ 15 participants, and analysis of MEG data relied on N 1 ¼ 10 subjects for the fusion condition and N 2 ¼ 8 subjects for the non-fusion condition. This is because one participant was excluded due to poor MEG data quality, and a subset of participants experienced only one percept (fusion or non-fusion). DCM analysis of MEG data relied on N ¼ 4 subjects that experienced both fusion and non-fusion percepts so we can identify the most likely model in each condition.

Materials and procedure
To present stimuli, a 5 Â 5 LED array of 3-mm round diffused bicolor (red and green) light emitting diode (LED) lamps (model number: 100F3W-YT-REGR-CA, Chanzon Technology, China) was used. In this array, five rows and columns of an LED grid were lit on a black panel ( Fig. 1A and B), and the distance between two adjacent rows or columns was 1 cm. The wavelengths for the red and green LEDs were 620-625 nm and 515-520 nm, respectively. These wavelengths are suitable for the stimulation of human retinal long-wave and middle-wave cone photoreceptors that show optimal responses to the colors red and green, respectively, as their mean absorbance wavelengths are 562.8 nm for red and 533.8 nm for green (Bowmaker and Dartnall, 1980). In addition, according to the CIE 1931 RGB color matching functions (CIE, 1931), the maximal peak of the monochromatic test is observed around 610 nm for red and 540 nm for green. In order to control any possible confounding luminance effects, the luminous intensity was balanced across these two colors as much as possible, not only in a physical (approximately 800 mcd) but also in a psychometric manner. In order to ensure the light was evenly distributed, a thin plastic diffuser was placed over the LED array.
Red and green LEDs were alternately lit for three types of presentation time each: 10 ms, 50 ms, and 100 ms. The first 10-ms presentation time was representative for the fastest alternation of two colors, and the last 100-ms presentation time was for the slowest color-alternation. In addition, the middle 50-ms presentation time provided an intermediate condition between these two extremes. The flickered color stimuli were presented for 5 s in each trial, with variable inter-trial intervals ranging from 1.32 s to 4.62 s (centered at 3.3 s). The entire experiment consisted of four runs, with each run presenting 75 trials, and a short break in- the thalamocortical loop model for conscious perception of fused orange color from alternately presented red and green colors. Thalamocortical 3D maps (leftmedial view) of thalamic (6 Hz) and visual cortical (9 Hz; V1, V2, and V4) evoked power between fusion (B) and non-fusion (C) conditions averaged from 0 to 5 s poststimulus. Suppressed thalamic activity (blue scale) is temporally overlapped with enhanced visual cortical activation (red scale), primarily during the fused-color perception. between. One hundred trials were assigned for each type of flickering condition, and they are randomly presented within the four runs. We processed all the runs separately and averaged between runs at the source level. In order to avoid any possible afterimage in the retina, a 5-ms empty presentation gap was inserted between two consecutive red and green stimuli. This gap might play a wash-out role for the previously presented color stimulation on the retina as the time courses of retinal afterimages are interrupted by the duration of the temporal gap (Di Lollo et al., 1988). Therefore, a complete cycle of a red-green flickering pair was 30 ms [(10 þ 5) Â 2 ms], 110 ms [(50 þ 5) Â 2 ms], and 210 ms [(100 þ 5) Â 2 ms], respectively. Accordingly, the central flickering frequencies were 33.33 Hz,9.09 Hz,and 4.76 Hz, respectively. In addition, the presentation frequencies of each single color corresponded to 66.67 Hz (10 þ 5 ms), 18.18 Hz (50 þ 5 ms), and 9.52 Hz (100 þ 5 ms), respectively.
The different flickering conditions induced distinct oscillatory signals in the visual cortices, so it is difficult to compare them directly in terms of cortical signals. Nevertheless, since we hypothesized that the thalamus might utilize its own signaling oscillations irrespective of the characteristics of input signals, we could still analyze their thalamic 6-Hz oscillations as an internal communication signal of the thalamus (i.e., language of thalamus) to compare fusion and non-fusion perception. Moreover, we believed that the fusion mechanism of two individual colors might involve thalamic inhibitory function generating about 6 Hz activity, the principal thalamic oscillation implicated in thalamocortical networks (Lopes da Silva, 1991;Mariotti et al., 1989;Pinault et al., 2001). Thus, the time courses of thalamic 6 Hz activity not only for the 50 ms fusion/non-fusion condition, but also for the 10 ms fusion and 100 ms non-fusion conditions were also computed from the same dataset to compare their responses depending on whether the fused color was consciously perceived or not.
The 50-ms flickering condition resulted in occasionally different perceptual responses of either fusion or non-fusion condition, and was thus optimal to investigate top-down aspects of conscious perception. That is, conscious perception of the fusion color was determined only by top-down processing relevant to subjective experience, and not by physically-driven bottom-up processing. In addition, the 50-ms flickering condition yielded comparable perception rates between fusion (53.9%) and non-fusion (42.1%) conditions (t(14) ¼ 0.542, p ¼ 0.596). For these reasons, the 50-ms presentation condition, with its associated 9.09 Hz and 18.18 Hz frequencies, was selected for the main analysis in the present study.
Participants were instructed to press a button with their right or left index finger depending on whether they perceived a fused color of orange or not, with response hands counterbalanced across participants. To make participants clearly understand the task instructions and perform the task reliably, a practice session and a subjective reporting session were placed before and after the main experiment, respectively. Throughout this practice session, further understanding of the task instructions was reached, and participants were instructed to pay equal attention to both fusion and non-fusion conditions. Furthermore, subjective reports after the experiment were individually collected to check whether the participants performed the task faithfully and whether their attention was unbiased to either fusion or non-fusion condition. To prevent movement artifacts, participants were instructed to press a button after the end of each 5-s flickering presentation.

MEG acquisition
Brain signals were measured using an MEG system (152 channels axial gradiometer, the Korea Research Institute of Standards and Science; see Fig. S7) (Kim et al., 2014;Lee et al., 2009), while participants performed the task. The MEG signals were recorded using a 234-Hz low-pass filter, and digitized at a sampling rate of 1,024 Hz. All experiments were conducted in a magnetically shielded room. Room temperature and humidity were maintained at 21.8 C-23.8 C and 63%-67%, respectively.
Participants were seated in a comfortable armchair and were instructed to keep their eyes open and gaze at the LED panel located outside of the MEG shielding room, with the LED light visible through an electromagnetic shielded window. The distance between the LED panel and participants was approximately 180 cm. This LED array was within a visual angle of 2.25 at a distance of 180 cm. In order to maximize color perception, darkness was maintained both inside and outside of the MEG shielding room during the LED flickering experiment.
Before the experiment, participants received a detailed explanation of the experimental procedure and were familiarized with the experimental surroundings and stimuli. Four head position indicator coils were placed on the participant's head to measure the head position in the MEG helmet both before and after each session. The anatomical fiducial points (i.e., nasion, and left and right preauricular), the center points of each head position indicator coil, and 45 head surface points were measured using a 3D digitizer (ISOTRAK, Polhemus Navigation, Colchester, USA). Participants then underwent the MEG experiment consisting of four successive sessions with 75 trials in each session. The magnetic resonance imaging (MRI) T1-images of all participants were also obtained for a subsequent coregistration of individual MEG source signals to an individual MRI volume. All participants underwent 3 T MRI scans (Achieva 3.0 T TX, Philips Medical System, The Netherlands) using a 32-channel head coil. The T1-weighted MR images were acquired through the 3D MPRAGE sequence with the following parameters: TR/TE ¼ 6.8/3.2 msec, inversion time ¼ 840 msec, FA ¼ 9 , matrix size ¼ 256 Â 240, voxel resolution ¼ 1.0 Â 1.0 Â 1.2 mm 3 , FOV ¼ 256 Â 240 mm 2 , turbo field echo (TFE) factor ¼ 246, slice orientation ¼ sagittal, scan time ¼ 5 min 34 s.

Data analysis
Participants perceived color fusion in almost all 10-ms flickering stimuli, and non-fusion in almost all 100-ms flickering stimuli, but had a mixed response for the 50-ms flickering stimuli. Five participants experienced both fusion and non-fusion conditions alternately, 6 participants experienced color fusion only, and 4 participants non-fusion only. One participant reporting both fusion and non-fusion responses was excluded from further analyses due to poor MEG data quality. Thus, MEG analysis for the 50-ms flickering condition relied on a total of N 1 ¼ 10 subjects for the fusion condition and N 2 ¼ 8 subjects for the non-fusion condition. The demographics (e.g., age, gender, handedness, education), as well as the counterbalancing of the responses, were not statistically different between the fusion and non-fusion groups. For preprocessing and source analysis of MEG data, we used Brainstorm software (Tadel et al., 2011), which is an open source toolbox based on Matlab (The Mathworks, Natick, MA, USA). Continuous MEG data was notch-filtered at 60 Hz (power line) and segmented from À3 to 8 s with respect to the stimulus onset. When MEG epochs were extracted, the baseline correction was conducted within the time window from À1.3 s to À0.3 s. Among the 152 channels, two channels (channels 110 and 131) were designated as bad channels, and seven channels (channels 35, 97, 98, 99, 116, 145 and 151) were not used due to technical reasons; thus to avoid any bias their symmetric counterparts (channels 51, 113, 114, 115, 100, 146, and 152) were also excluded from further analysis. To remove artifacts due to eye blinks or heartbeats, we applied a signal-space projection (SSP) method implemented in Brainstorm. Unlike many other noise-cancellation techniques, SSP does not require additional reference sensors to record the disturbance fields. Instead, SSP depends on the notion that the magnetic field distributions produced by the sources in the brain have spatial distributions sufficiently different from those generated by external noise sources. Moreover, it is assumed that the linear space spanned by the significant external noise patterns has a low dimension (H€ am€ al€ ainen, 2009). In addition, individual trials were discarded when the sensor peak-to-peak amplitude exceeded 10,000 fT within the trial. After artifact-rejection, there was no statistical difference between the number of trials used to represent the fusion versus the non-fusion conditions (t(3) ¼ À0.678, n.s.), reflecting no serious concerns regarding signal-to-noise bias across conditions. Individual 3D MEG topographies were coregistered to individual structural MR images using three craniometric landmark points: nasion, left, and right preauricular points. In addition to the fiducial points, we used the digitized head-points to improve fit accuracy in the coregistration of the MEG data to the structural MRI. Cortical reconstruction and volumetric segmentation were performed using Freesurfer software (http ://surfer.nmr.mgh.harvard.edu). We reconstructed neocortical and thalamus surfaces and segmented the cortical and subcortical volumes based on individual MRI Fischl et al., 1999Fischl et al., , 2002. For MEG source modeling in the neocortex, we placed dipoles with orientations normal to the cortical surface at each vertex. For MEG source modeling in the thalamus, we placed dipoles with no orientation constraint at each node of a volumetric grid (Attal and Schwartz, 2013). The head model was then computed using an overlapping spheres method (Huang et al., 1999). Finally, fitted dipole activations were mapped on cortical and subcortical sources (Attal et al., 2009) based on subject-specific anatomical data using a weighted minimum-norm estimate (Attal and Schwartz, 2013;H€ am€ al€ ainen, 2009). Since similar dynamics were observed in the right hemisphere, all the (sub)cortical activities were computed from the left hemisphere. The number of vertices used for the source analysis in the left hemisphere was 7,501. The mean numbers of sources (AESEM) in each region of interest (i.e., V1, V2, V4, and thalamus from the left hemisphere) are 208 (AE5), 505 (AE8), 70 (AE2), and 133 (AE3), respectively. Last, source activity from each region of interest was summarized into a single time course following Brainstorm procedures that automatically correct for opposite facing vertices. The mapping was based on a noise covariance calculated on the baseline time window from À1.3 s to À0.3 s using all the trials irrespective of the type of responses (i.e., fusion or non-fusion). This time window was chosen to avoid the temporal smearing of poststimulus activity into the prestimulus period.
MEG oscillatory activity was computed by convolving the source time courses with complex Morlet wavelets (Pantazis et al., 2018). The wavelet transform was applied to the averaged evoked potential to compute the evoked activity (phase-locked to the stimulus). Baseline correction was then performed on evoked power using the baseline time window from À0.1 s to stimulus onset. In the present study, we studied three frequencies: 6 Hz for the thalamic activity, since this frequency is the principal one implicated in thalamocortical networks (Lopes da Silva, 1991;Mariotti et al., 1989;Pinault et al., 2001), and 9 and 18 Hz associated with the 50 ms flickering stimulus presentation. In addition, the following four regions of interest (ROIs) were selected to investigate which brain area most highly contributed to the conscious perception of the mentally-fused color: thalamus, V1, V2, and V4. V1 and V2 were selected as they are the primary and secondary visual cortical processing areas, respectively, and V4 as it is a crucial area for color-perception (Bartels and Zeki, 2000). All of these anatomical areas were delineated based on the atlas embedded in Freesurfer software.
Statistical inference for random effects analysis relied on nonparametric permutation tests, because they make minimal assumptions on the distribution of the data. To detect significant time points in power time series, we used cluster-size permutation tests (Maris and Oostenveld, 2007) with p < 0.05 cluster defining threshold and p < 0.05 cluster threshold. The permutation tests were two-sided with independent (unpaired) samples. We used independent tests because only a subset of subjects experienced both fusion and non-fusion percepts, with the majority of them experiencing only one condition. While this is less sensitive than a test with paired samples, our approach is specific regardless of the underlying distribution of the data because it used a non-parametric permutation procedure. We computed 1,000 permutation samples (including the original data) to estimate the data null distribution.
To study how thalamocortical coupling changed between the fusion and non-fusion conditions, we used dynamic causal modeling (DCM) Pinotsis et al., 2012Pinotsis et al., , 2018Pinotsis et al., , 2019. We fitted a new biophysical network model that could describe information flow between cortical and thalamic sources to MEG source reconstructed data (Pinotsis et al., preprint). In this model, sources within the same cortical region were connected using intrinsic connections originating from both excitatory and inhibitory neurons. Different cortical and subcortical sources were then connected with excitatory connections. This resulted in a structurally and biologically constrained model that can predict MEG oscillations in terms of a few biophysical parameters, including synaptic efficacy, density, time constants associated with different neurotransmitters and axonal propagation delays. This new thalamocortical network model was a combination of known thalamic and cortical network models. The thalamic network model included thalamocortical relay (TC) and thalamic reticular nucleus (TRN) neurons (Jansen et al., 1993) and was originally developed by Lopes da Silva et al. (1974). TC neurons project to the cortex, while TRN neurons surround the thalamus and regulate TC neuron activity by sending inhibitory signals (Fig. S3). The LDS model has been previously used to study thalamocortical interactions and the generation of alpha rhythms similarly to our work here Freestone et al., 2013;Jo et al., 2019;Moran et al., 2009;Spiegler et al., 2011). It describes the essential part of thalamocortical connections generating these rhythms. These are bilateral connections between GABAergic TRN neurons (Hughes et al., 2011) and TC neurons as well as cortical layer V neurons (Constantinople and Bruno, 2013). The LDS model does not include the pulvinar that has also been shown to contribute to the generation of alpha rhythms related to attention (see also below) (Quax et al., 2017;Saalmann et al., 2012).
For describing cortical areas, we used another well-known model originally developed by Jansen and Rit (Jansen and Rit, 1995;Pinotsis et al., 2013) that included three populations of neurons: excitatory pyramidal cells, excitatory spiny stellate cells, and inhibitory interneurons (smooth stellate cells). These populations are connected with one another for excitatory (black) and inhibitory (red) connections, and also with populations in other areas (Fig. S4). The JR model and its variants have been used extensively to model visual hierarchy activity beyond evoked responses, including induced responses and epileptic dynamics Freestone et al., 2013;Jo et al., 2019;Spiegler et al., 2011) and thalamocortical interactions (Bhattacharya et al., 2011;Du and Jansen, 2011).
Thus the four areas (i.e., thalamus, V1, V2, and V4) are connected with one another with feedforward and feedback connections (see grey double-sided arrows in the inset of Fig. S5A, B). A cortical source receives input from pyramidal cells of other cortical areas, and from TC relay neurons. TC relay neurons send axonal projections to the cortex and receive backward connections from cortical pyramidal neurons. To sum up, the areas V1, V2, and V4 are described by the Jansen-Rit model (Jansen and Rit, 1995) (Fig. S4) and the thalamus is described by the Lopes Da Silva model (Fig. S3). All areas are then coupled together, and the resulting large-scale network model yields predictions about neural activity that can be compared with reconstructed MEG time series to address specific questions. We call this new model, the JR-LDS model of a thalamocortical network.
Here we focused on whether the thalamus was involved in the conscious perception of fused color. In such a case, the information flow in the thalamocortical network depicted in Fig. S5B would change between fusion and non-fusion conditions. To address this question, we used MEG source reconstructed data. Our hypothesis was that if the thalamus is involved in only one of the two task conditions, then activity in cortical regions would change as a result of thalamic input and also provide output for TC neurons. Thus we fitted the thalamocortical model to visual responses from V1, V2, and V4 (Fig. S5C). We also fitted a reduced version of the same network model, called the cortical-dominant model. This is the same as the thalamocortical (JR-LDS) model except for the thalamus (Fig. S5A). Since they involve different regions and neural populations, the two models make distinct predictions for oscillatory responses that would be recorded in visual areas. The two models include parameters that quantify the coupling (connectivity) between the thalamus and cortex. By scoring their fits to data from the fusion and nonfusion conditions, we found which of the two was more likely to have produced the data. Also, we assessed changes in thalamocortical coupling based on differences in the corresponding model parameters.
To find how thalamocortical coupling changed between the fusion and non-fusion conditions, we fitted cross-spectral densities (CSDs), which obviate the problem of predicting a high dimensional time series signal. This approach is known as DCM for cross-spectral densities and has been applied in a variety of tasks that assume neural activity is stationary over time (Pinotsis et al., 2018). It is an established Bayesian fitting approach for inferring changes in connectivity and synaptic plasticity in large cortical networks in healthy (Boly et al., 2012;Friston et al., 2015;Moran et al., 2013;Pinotsis et al., 2012) and clinical conditions, including schizophrenia (Adams et al., 2016;Dima et al., 2010Dima et al., , 2012Roiser et al., 2013), Parkinson's disease (Herz et al., 2012(Herz et al., , 2013(Herz et al., , 2014Marreiros et al., 2013;Moran et al., 2011), consciousness and drug effects (Boly et al., 2011(Boly et al., , 2012Muthukumaraswamy et al., 2013;Schmidt et al., 2012). Fits from the thalamocortical and the cortical-dominant models are shown in Fig. S5C (red spectra, the thalamocortical model; blue spectra, the cortical model). Auto-spectral and cross-spectral densities are shown in the main and off-diagonal parts of the matrix. Model predictions are shown with solid lines, and MEG source reconstructed data with dashed lines. Note the double peaks in the red sub-panels of Fig. S5C. These correspond to CSDs obtained from the fusion condition. Both models were used to fit the same data from both conditions and the corresponding fits were scored and compared using log Bayes factors (see the main text). Notice, since DCM analysis aimed to identify which model is more likely in each condition, analysis relied on N ¼ 4 subjects that experienced both fusion and non-fusion percepts.
To estimate auto-spectral and cross-spectral densities, the source reconstructed data were first pre-processed to remove artifacts, and then converted to time-frequency representations using a Morlet wavelet transformation. After averaging the time-frequency representations across epochs separately for each condition (fusion and non-fusion), a difference map was calculated between the averaged fusion and nonfusion spectral power. Using a 1-s sliding window with a 50% overlap on the difference map, the optimal time window was individually selected when it had the most pronounced differences between conditions for each subject that responded to both the flickering and fused colors. After determining the 1-s time window, we estimated and averaged across epochs the auto-spectral and cross-spectral densities over that time window. This time window was applied to the DCM analysis.
Fitting used Bayesian techniques (Expectation-Maximization Algorithm) (Friston et al., 2007). Our hypothesis was that changes in thalamocortical coupling are primarily expressed in the alpha, beta and gamma bands (Coppola et al., 2007;Llinas and Ribary, 2001). Thus we fitted spectral responses in the range of 5-70 Hz. We then performed a Bayesian model comparison (BMC) between the two variants for both conditions (for more details about this process, see the study by Pinotsis et al. (2018)). We asked which of the two models was more likely to explain the data for each response condition. Comparison was based on the relative Free Energy which quantifies the evidence of one model being more likely than the other for each dataset (i.e., the Free Energy is a score of models fits). If relative Free Energy was positive, the cortical-dominant model fitted the data better than the thalamocortical model. If it was negative, the reverse was true. Example fits of the two models are shown in Fig. S5C. The goodness of fits is mathematically determined by the analysis shown in Fig. 4. These also include the corresponding log Bayes factors (BF) that express the relative probability of one model being the true model that generated the observed MEG signals compared to the competing model.
The thalamocortical model was more complex than the corticaldominant model as it involved one extra brain area (the thalamus) and several additional parameters (thalamocortical connections, thalamic populations, etc.). Finding that it may be more likely was not due to overfitting, because we used a particular cost function, Free Energy, which prevents overfitting by penalizing for a higher overall parameter number and when parameters are unnecessary to fit the data. For further discussion of how this cost function prevents overfitting, see the study by Pinotsis et al. (2018). In brief, finding that the thalamocortical model was more likely would not be due to its complexity.

Data and code availability
Data are not shared in a public repository due to the privacy rights of human subjects. However, the data and analysis tools used in the current study are available from the corresponding author upon reasonable request.