Zooming-in on higher-level vision: High-resolution fMRI for understanding visual perception and awareness

One of the central questions in visual neuroscience is how the sparse retinal signals leaving our eyes are transformed into a rich subjective visual experience of the world. Invasive physiology studies, which offers the highest spatial resolution, have revealed many facts about the processing of simple visual features like contrast, color, and orientation, focusing on the early visual areas. At the same time, standard human fMRI studies with comparably coarser spatial resolution have revealed more complex, functionally specialized, and category-selective responses in higher visual areas. Although the visual system is the best understood among the sensory modalities, these two areas of research remain largely segregated. High-resolution fMRI opens up a possibility for linking them. On the one hand, it allows studying how the higher-level visual functions affect the fine-scale activity in early visual areas. On the other hand, it allows discovering the fine-scale functional organization of higher visual areas and exploring their functional connectivity with visual areas lower in the hierarchy. In this review, I will discuss examples of successful work undertaken in these directions using high-resolution fMRI and discuss where this method could be applied in the future to advance our understanding of the complexity of higher-level visual processing.


Introduction
Our knowledge of visual processing probably exceeds that of any other sensory modality. On the one hand, vision is the most developed, and arguably the most important source of sensory information for humans and other primates, which explains the popularity of the visual system in neuroscience research. On the other hand, constraints imposed by the nature of most neuroimaging techniques, such as the presence of the magnetic field and acoustic noise of the gradients in MR, or immobilized position of the subject, make vision the easiest to study technically. Despite the extensive knowledge we have gained thus far and continuing research efforts, the main question in visual neuroscience remains unanswered: how is the imperfect information represented on the retina transformed into the rich and meaningful subjective experience of the world? This question is important far beyond visual neuroscience, as the fundamental principles of this transformation and representation can be potentially extrapolated to other sensory modalities. Importantly, due to the all too prominent role that vision plays in human neuroscience research, it has become the most popular modality for the studies of the neural correlates of consciousness (Koch et al., 2016). A better understanding of the visual processing in the brain is therefore essential not only for understanding the visual system itself but also for answering broader neuroscience questions, which may also have relevance for other disciplines, such as computer vision, philosophy, and medicine.
The research on visual processing appears to be split into two different fields. Within the first field of research, we study rather simple visual responses, such as ocular dominance, color sensitivity, orientation selectivity, and contrast response curves. These properties are mainly studied in early visual areas at a fine spatial scale of single cells, columns, and layers using invasive methods in animals. Within the second field of research, we study the processing of complex visual information in the higher-level areas: selectivity to objects, faces, and places in the temporal lobe (Courtney and Ungerleider, 1997), the bottom-up and top-down attention networks Shulman, 2011, 2002). This knowledge about specialized areas and networks has been gathered primarily at a coarse spatial scale of conventional human PET, fMRI or lesion studies. This split into different research areas may have given us the impression of the hierarchical neural processing, where the information travels along the existing neuroanatomical hierarchy (Felleman and Van Essen, 1991;Markov et al., 2014), at each stage forming an increasingly complex neural representation of the stimulus, which progressively correlates more and more with subjective experience (Logothetis, 1998;Riesenhuber and Poggio, 1999).
However, there are aspects of visual activity that contradict the hierarchical view described above, aspects about which we know considerably less. For example, higher-level tasks may influence activity in early visual areas, and we need a better understanding of how these complex higher-level responses map onto their known fine-scale columnar and laminar organization. At the same time, higher-level areas also have some fine-scale functional organization, which has rarely been considered when studying their higher-level responses. We need to bridge the gap between the levels of the visual hierarchy, processing complexity, and the spatial scale of investigation in order to obtain a more complete picture of how the brain accomplishes complex visual functions and generates conscious experience.
So far, this goal has presented considerable challenges. Although MRI has by far the best spatial resolution among non-invasive neuroimaging techniques, its conventional voxel size of around 3 mm 3 is still too large for investigating fine-scale neural activity at the level of cortical columns and layers. Physiology studies that investigate brain function at a fine spatial scale typically require the use of animal subjects, not infrequently anesthetized, which limits the range of visual tasks and their complexity relevant for studying higher-level visual processing. In particular, the investigation of neural mechanisms of conscious visual perception, due to the subjective nature of conscious experience, requires a subject's accurate introspective report, which can be acquired exclusively from humans.
Recent advances in MR technology, such as the increasing availability of ultrahigh-field scanners and development of fast acquisition protocols for fMRI, allow for the reduction of the voxel size to less than a millimeter, thereby bridging these gaps. High-resolution fMRI provides several quantitative and qualitative advantages over conventional fMRI. Quantitatively, increasing the resolution allows us to reduce the physiological noise contribution to the fMRI signal (Triantafyllou et al., 2005), to reduce partial volume effects in voxels at tissue boundaries (Blazejewska et al., 2019;Triantafyllou et al., 2006), and, at ultrahigh field, to reduce the contribution of unspecific signals from within the blood vessels (Uludag et al., 2009), thereby improving the spatial accuracy and the signal-to-noise ratio (SNR) of functional mapping experiments. Qualitatively, the possibility of reducing the voxel size in an fMRI experiment to less than a millimeter allows imaging signals from small subcortical structures or from within small intracortical structures such as cortical columns and layers (Cheng, 2018;Dumoulin et al., 2018). With all these advantages, high-resolution imaging of brain activity becomes possible non-invasively in human subjects. This possibility in turn allows researchers to link fine-scale functional organization of the brain with complex, subjective, and cognitive aspects of visual perception that may be unique to human subjects.
In this article, I will outline how high-resolution fMRI can be used to advance our understanding of higher-level visual functions. I will focus on three specific areas of visual neuroscience: perceptual grouping, visual awareness, and top-down visual attention. I will review several examples that show how high-resolution fMRI studies helped to fill existing gaps in knowledge and opened up new perspectives on visual processing. I will further emphasize gaps that still exist in our understanding of visual processing, where pushing the spatial resolution of fMRI has the potential for advancing our understanding of complex visual functions on the neural level. In my discussion, I endeavor to cover broader advantages of high-resolution fMRI in addition to the possibility of imaging columnar structures and cortical layers. Overall, this review aims at providing a cognitive neuroscientist's perspective on the topic. It should therefore complement the already existing reviews related to the advantages of ultrahigh field and high spatial resolution both in the current and in other dedicated special issues (Norris and Polimeni, 2019;Polimeni and Uludag, 2018;Yacoub and Wald, 2018).

Higher-level effects in the early visual areas
Early stages of the visual pathway at the subcortical and cortical levels are traditionally known for their rather simple visual responses. There is, however, accumulating evidence, derived primarily from human fMRI studies, that activity of these areas can be affected by complex higher-level processing, such as top-down attention (Ling et al., 2015;O'Connor et al., 2002;Watanabe et al., 2011), shape perception (Murray et al., 2002), imagery (Pearson et al., 2015), characteristic color of grayscale objects (Bannert and Bartels, 2013), awareness Wunderlich et al., 2005;Yuval-Greenberg and Heeger, 2013), and many others. The processing of simple visual features in these areas, which is relatively well understood, does not suffice to explain these responses. Therefore, they are typically attributed to feedback signals from higher levels of the visual hierarchy (Gilbert and Li, 2013;Muckli and Petro, 2013;Saalmann and Kastner, 2011). As most of these studies used conventional fMRI with standard spatial resolution, they required advanced analysis approaches that are sensitive to fine-scale changes in voxel activity patterns to reveal the effects of interest (Kriegeskorte et al., 2006). Findings from these studies clearly show that higher-level signal modulations take place at a fine spatial scale, but without revealing the exact mechanism. These findings sparked an interest in the more detailed investigation of these responses using high-resolution fMRI. In this section, I will discuss in detail how the application of high-resolution fMRI has been successful in uncovering the neural processes underlying completion and grouping and will follow up with a possibility of a similar approach for answering specific questions in the field of visual awareness and attention.

Completion and grouping effects
One of the most prominent examples of higher-level effects in early visual areas is the shape selectivity response to both real and illusory shapes. These responses have been well-known on a single-cell level in V1 and V2 for a long time (Grosof et al., 1993;Qiu and von der Heydt, 2005;von der Heydt et al., 1984;Zhang and von der Heydt, 2010), yet conventional human fMRI studies repeatedly attributed such responses to cortical areas higher up in the visual hierarchy (Fang et al., 2008;Kourtzi et al., 2003;Ostwald et al., 2008). If anything, a decrease of fMRI activity has been observed during shape perception in V1 (Fang et al., 2008;Grassi et al., 2016;Murray et al., 2002;Zaretskaya et al., 2013). The increased activity in higher areas and its decrease in early visual areas during shape perception is typically interpreted in the context of predictive coding theory of visual perception at the neural level (Friston and Kiebel, 2009;Murray et al., 2002;Rao and Ballard, 1999;Zaretskaya et al., 2013). Predictive coding views perception as a hierarchical prediction process, whereby higher areas form a prediction about the likely cause of the sensory input (in our example, a shape) and send it back to early areas via feedback connections. Early areas, in contrast, compare the prediction signal with the sensory input. If prediction matches the input (individual lines are arranged in a shape), activity in early areas is reduced. If, however, there is a mismatch between the two (for example, when lines are arranged randomly), early visual areas increase their activity, signaling the "prediction error". The information about the mismatch is sent to higher visual areas downstream in order to adjust the prediction.
The activity decrease during shape perception in early visual cortex was observed with standard fMRI techniques. Several methodological improvements in fMRI data processing can improve the effective resolution of fMRI even without nominally changing the voxel size. For example, the surface-based analysis, which utilizes anatomically informed smoothing and reduces partial volume effects (Dale et al., 1999), as well as individually-defined topographic locations of stimulus representations on the cortical surface, which may be too variable to produce consistent effects in group averaging, can help improve the spatial specificity of the observed effects. When such analysis methods were used, a completely different picture emerged. Topographic representations of the illusory shape appeared to increase their activity during shape perception, while the surrounding regions that generate the illusion (the so-called "inducers") showed a decrease in activity (Grassi et al., 2017;Kok and de Lange, 2014). This effect has been shown using both the classical Kanizsa shapes (Fig. 1A), which are physically manipulated to thwart the illusion in a control condition (Kok and de Lange, 2014), and also using a bistable stimulus which either does or does not induce an illusion of a square under the same sensory input (Grassi et al., 2017). This effect may not hold for all types of stimuli, as yet another study failed to find an increase inside the illusory shape despite careful analysis (de-Wit et al., 2012), and more studies are needed to determine which aspects of the stimulus are crucial. Nevertheless, these findings revealed that the predictive coding mechanism might operate at a fine spatial scale on the neural level. Specifically, regions inside the illusory shape increase activity because the top-down shape prediction does not match the sensory input, which, in the case of an illusory (compared to veridical) shape, is simply absent. Regions that contain the inducers, in contrast, reduce activity, because the top-down shape prediction signal and the sensory input match.
In a subsequent study, Kok et al. (2016) used high-resolution fMRI at ultrahigh field to test further important implications of the predictive coding theory. Using submillimeter resolution, they were able to examine responses to the Kanizsa stimuli across the cortical depth. Their experiment showed that topographic regions of V1, which represent the illusory shape and do not receive any sensory input, increase activity selectively in the deep layers of V1. The deep layers, along with the superficial layers, are the primary target of the feedback projections from higher areas (Fig. 1C). At the same time, the topographic regions of V1 representing the "inducers" decrease their activity equally at all cortical depths. This study, therefore, provided empirical support for the hypothesis that predictive signals arrive into V1 from higher-level visual areas via feedback connections.
Another exciting example of the use of high-resolution fMRI is the investigation of how visual context affects responses in the early visual cortex at distant cortical locations. In their experiment, Smith and Muckli (2010) presented their participants with naturalistic scenes in which one of the quadrants was missing (Fig. 1B). Using the multivariate pattern analysis technique (MVPA), the authors were able to show that regions of the visual cortex that were representing the missing quadrant (i.e., that received no visual stimulation) nevertheless contained information about the picture identity. Although it was likely that a feedback signal generated this activity pattern, an alternative explanation of its origin, namely the horizontal connections within the early cortical areas V1/V2, could not be ruled out entirely. This question was addressed in a more recent follow-up study that used high-resolution fMRI at ultrahigh-field to measure information about the picture identity contained in the pattern of voxel responses across different cortical depths (Muckli et al., 2015). Using the MVPA approach separately for the superficial, middle, and deep layers, the authors could show that when the entire image is shown to the participants, most of the information about the image identity is contained in the middle layers, which are the primary target of feedforward inputs to V1. However, in the case where the corresponding quadrant is missing, information about the image identity is contained primarily in the superficial layers, which are, along with the deep layers, the primary targets of corticocortical feedback connections. This result cannot rule out the effect of horizontal connections entirely. Such connections are known to be present particularly in layers 3 and 5 in non-human primates, to extend over a distance of around 3 mm and to target regions with similar functional properties (Rockland, 2019). Hence, it is possible that an oriented edge at the stimulated location causes activity spread to unstimulated regions with similar orientation preference. Especially with GE-EPI acquisitions, which strongly bias the BOLD signal towards the superficial layers, neural effects at any cortical depth are expected to peak primarily at superficial locations De Martino et al., 2013;Koopmans et al., 2010;Moerel et al., 2018;Polimeni et al., 2010a;Zaretskaya et al., 2020). The observed effects can also stem from a direct bias of large surface vessels, whose contribution to MVPA is well-known Shmuel et al., 2010;Yacoub et al., 2007). Such surface vessels extend over larger tangential distances along the cortical sheet (Shmuel et al., 2010;Weber et al., 2008), and their effects may also be observable at locations distant from the cite of neural activity. Nevertheless, this study represents an important attempt towards a mechanistic understanding of higher-level effects in early visual areas. Future studies can continue addressing the underlying mechanisms of this phenomenon by, for example, examining the spatial distribution of the most informative voxels, or mapping the cortical regions affected by large surface vessels.
The findings of the two lines of research discussed above apparently contradict one another (compare Fig. 1C and D). Both studies linked their effects to feedback processes in the same brain area (V1), but Kok et al. found signal changes related to illusory surface in the deep layers, while Muckli et al. found information related to the surrounding image content in the superficial layers. There are many reasons that could underlie this discrepancy. These reasons range from sensitivity biases of different MRI acquisition sequences (Moerel et al., 2018) and the aspects of the signal one is looking at (univariate signal modulation vs. multivariate response pattern) to the different perceptual processes engaged by the two types of stimuli used. Recent findings presented in 2019 at the Annual Meeting of the Society for Neuroscience suggest that the type of stimulus/task may play a role in which of the feedback targets, the superficial or the deep layers, will exhibit an effect (Bergmann et al., 2019). Using multivariate analysis, the authors of this study reported that the content of illusory perception, which occurs automatically, is best decoded from the superficial layers of V1, while the content of visual mental imagery, which is a voluntary act, is best decoded from the deep layers. Another recently published work also suggested that an illusion of motion drives univariate fMRI responses primarily in the superficial layers of V1 (Marquardt et al., 2020), which is consistent with the multivariate results of Bergmann et al. (but still contradicts the univariate results of Kok et al.) Clearly, there is a need to address the reasons for these discrepancies in future studies. For example, similar to Bergmann et al., one could directly compare static vs. dynamic illusions in one experimental paradigm. Alternatively, one could examine the sorces of feedback singals in the superficial and deep layers by examining their layer-specific connectivity (see section 4). Regardless of the exact reasons for the differences in the outcomes, the studies discussed here demonstrate how high-resolution fMRI can be used to study the origin of higher-level effects in early visual areas, uncover the hierarchical and recurrent mechanisms of visual processing and ultimately test the validity of existing theories of visual perception, such as the predictive coding theory (reviewed in detail in Stephan et al., 2019).
One of the central claims of the predictive coding theory is that the prediction process is hierarchical and that the spatial separation between areas where prediction error is high and where it is low should exist at multiple stages of the visual hierarchy  This prediction finds some support in animal physiology data (see Issa et al., 2018 for a recent example). So far, this prediction has been difficult to test using high-resolution fMRI, which rarely went beyond V1/V2. One reason why V1 (and partially V2) has been the focus of high-resolution studies is that the delineation of different subregions of the illusory figures for subsequent analysis is more challenging in other topographic regions. First, visual regions with retinotopic organization beyond V1/V2 have increasingly larger receptive fields, meaning that any two neighboring voxels are more likely to respond to overlapping parts of the visual field, making it more difficult to distinguish their signals (Dumoulin et al., 2010;Dumoulin and Wandell, 2008). Second, topographic regions beyond V3 are at least 50 % smaller than V1 in their surface area , which hinders the separability of different subregions ever further. Reducing the voxel size in fMRI can, therefore, increase the separability of different subregions within the topographic map, which can help determine whether a pattern of activity across space and cortical depth, similar to that found in V1, can be found in smaller visual areas with topographic organization. High-resolution fMRI can thus help to verify hypotheses of the predictive coding theory beyond V1/V2. Importantly, increasing spatial resolution can help examine such higher-level effects not only in cortical areas but also in subcortical structures with topographic organization, such as the superior colliculus of the midbrain or the lateral geniculate nucleus (LGN) and the pulvinar nucleus of the thalamus. Subcortical structures can profit from highresolution at ultrahigh field not only because they are smaller. Their location deeper inside the head, and therefore farther away from the receiver coils, also reduces fMRI sensitivity in those regions, which can be recovered at ultrahigh magnetic field. Poltoratski et al. (2019) have recently performed an exciting attempt to study higher-level visual effects at the subcortical level. In this study, researchers investigated brain responses to figure-ground segregation by presenting a patch with an oriented pattern that is orthogonal to the oriented pattern of the background. They could show that figure-related increase in activity was present not only in V1 but also in the LGN. Importantly, it was present in the LGN when the figure and the background were shown through the separate monocular channels. Since LGN neurons are strictly monocular, this suggests that the figure-ground related LGN signals are a result of feedback from a higher-level binocular processing stage. Although the neural mechanisms for figure-ground segregation may be different from completion and grouping effects discussed above in V1, this study underscores the general potential of investigating top-down effects on visual processing at the subcortical level.
The study by Poltoratski et al. took advantage of the sensitivity of a 7 T scanner while only moderately increasing the resolution (2 mm isotropic voxels). Further increasing the resolution can also help determine, for example, whether the figure-related increase in LGN is topographically-specific, and whether higher-level effects of completion and grouping reported for V1 can be detected at the level of LGN as well. A similar approach can be directed to other subcortical structures with known topographic organization, such as the superior colliculus or the pulvinar, in order to determine to which extent these structures are engaged in the processing of higher-level visual content. Studies show that the topographic organization of both structures can be detected using high resolution already at 3 T (DeSimone et al., 2015), demonstrating the potential for detecting even more complex fine-scale repsonses in these structures.

Attention and awareness
Another two higher-level visual effects that can be identified at the early stages of visual processing are visual awareness and top-down attention. The extent to which these two processes are linked is still highly debated, and although more recent evidence speaks for two separate mechanisms, I will discuss these two processes together, as they remain closely linked (Aru et al., 2012;Koch and Tsuchiya, 2012;Lamme, 2003;Pitts et al., 2018).
One of the popular experimental approaches for studying the neural mechanisms of visual awareness is the use of ambiguous (also called "bistable") stimuli (Long and Toppino, 2004), and in particular, the binocular rivalry paradigm (Blake and Logothetis, 2002;Tong et al., 2006). In binocular rivalry, participants are presented with two different images separately to each eye. As the images cannot be fused into one, the subjective impression of the observer alternates every few seconds between perceiving one or the other image. The subjective changes in binocular rivalry and other forms of bistable perception occur despite the constant and unchanging visual input, thus allowing researchers to dissociate sensory input and the subjective experience of it.
The level of the visual hierarchy involved in resolving binocular rivalry has been a focus of a long-standing debate (Blake and Logothetis, 2002;Logothetis, 1998;Tong et al., 2006). While single-cell electrophysiology recordings suggest that more neurons in higher-level processing stages reflect subjective changes in binocular rivalry (Gelbard-Sagiv et al., 2018;Logothetis, 1998;Panagiotaropoulos et al., 2012), fMRI results showed relatively strong signal modulations at early processing stages such as V1 Maier et al., 2008;Tong and Engel, 2001) and the LGN Wunderlich et al., 2005). Importantly, fMRI studies imply that signal modulations are occurring between the monocular representations of the two eyes Tong and Engel, 2001), and only to a lesser extent do they reflect the competition between the two image representations independently of the eye of origin . Using high-resolution fMRI at 3 T, Haynes et al.  were able to detect voxels in V1 and in the LGN that show a response bias towards the left or the right eye, thereby isolating the signals related to V1 ocular dominance columns and monocular LGN layers. In a subsequent binocular rivalry experiment, they observed activity modulations in these eye-selective voxels that were dependent on perceptual dominance and suppression of the corresponding eye. This study thus supported the hypothesis that the competition between the two monocular channels mediates subjective changes during binocular rivalry. A more recent animal study also supports this notion. In this study, activity of ocular dominance columns was imaged invasively in non-human primates, essentially finding the same results (Xu et al., 2016).
Although the study of Haynes et al. makes a strong case for the lowlevel origin of binocular rivalry effects, it does not rule out the possibility of competition between more complex stimulus properties occuring at higher stages. A recent study by Schneider et al. (2019) demonstrated effects analogous to those of Haynes et al. in the human V5/hMT + during ambiguous apparent motion perception. While in binocular rivalry ambiguity arises primarily due to dissimilarity of the two eye's inputs, ambiguous apparent motion does not involve any interocular conflict. The stimulus consists of two dots that flicker on the screen in a way that can be interpreted as either vertical or horizontal motion ( Fig. 2A). Using high-resolution fMRI at ultrahigh magnetic field the authors first identified regions in area V5/hMT + preferring visual motion along the vertical and horizontal dimensions -the so-called axis of motion regions, which are related to the motion direction columns in V5 (Zimmermann et al., 2011). Importantly, they were able to demonstrate that subjective experience of the stimulus (horizontal versus vertical) is directly reflected in the activation of either the horizontal or the vertical motion-preferring regions, respectively (Fig. 2B). Together with the study of Haynes et al., this study suggests that a similar mechanism operates in both types of stimuli, binocular rivalry and the ambiguous apparent motion, but in each case involves a different columnar system, depending on the type of conflict involved.
Interestingly, a similar columnar-level feedback mechanism may be engaged not only in bistable perception, but also in other higher-level processes such as top-down attention, prediction and perceptual decision-making. For example, in a study by Lawrence et al. (2019) that I discussed previously, attentional enhancement was observed not only at specific cortical depths, but also selectively in those voxels that prefer grating orientation that had to be attended. Similarly, in a 3 T study examining the neural signatures of expectation in the prestimulus activity and its impact on subsequent perception, expectation effect was found in voxels selective for a grating orientation that was expected (Pajani et al., 2015). Crucially, this prestimulus expectation effect was predictive of a subsequent illusory perception of expected orientation in visual noise. The observation of columnar-specific top-down effects in the early visual cortex is also a well-known fact in primate neurophysiology studies of perceptual decision-making. These studies indicate that decision-related activity is found in those early visual areas that contain a columnar map of the decision-relevant feature (Nienborg and Cumming, 2014). For example, V1 neurons are involved during an orientation-discrimination task and V2 neurons during a disparity-discrimination task, which matches with the presence of orientation and disparity columns in V1 and V2, respectively. Similar effects in humans are yet to be examined using high-resolution fMRI. Columnar-level influence may thus be a general mechanism by which different types of top-down modulations are exerted on early visual areas.
The fact that bistable perception engages columnar structures representing the ambiguous feature does not reveal the origin of this columnar-level signal modulation. In principle, these modulations could originate locally within the same area by the horizontally spreading mutual inhibition, as has been suggested for binocular rivalry (Seely and Chow, 2011), but they could also be a result of feedback from a downstream stage, as suggested by the decision-making studies (Nienborg and Cumming, 2009). High-resolution fMRI can help test these two alternatives by examining the columnar signal modulations across the cortical depth. Finding activity increase predominantly in the deep or the superficial layers would speak in favor of their feedback origin. Conducting such a study may be particularly challenging, as it requires both mapping the columnar-dependent activations and detecting layer-dependent signals simultaneously. Successful studies that have achieved either of these goals often used rectangular voxels, favoring the resolution either within the cortical sheet (Yacoub et al., 2007) or across the cortical depth (Huber et al., 2017) at the expense of the second dimension. Achieving sufficiently high resolution along both dimensions requires technological advances in the field of sequence development for high-resolution fMRI.
Awareness-related modulations in early visual areas have proven to be difficult to dissociate from the effect of top-down attention. Topdown attention can modulate responses at different levels of the visual hierarchy, including V1 and even the LGN (Ling et al., 2015;O'Connor et al., 2002;Schneider, 2011). Because the studies of visual awareness typically require the subjects to report their subjective experience, and because these reports require monitoring of one's perceptual changes by attending to them, the dissociation of the neural mechanisms of attention and awareness has been particularly challenging (Tsuchiya et al., 2015). Clever experimental paradigms specifically designed to distinguish the effects of top-down attention from the effects of awareness suggest that the modulations seen in V1 are primarily attention-driven and that awareness-related modulations are small or non-existent (Watanabe et al., 2011). On the other hand, experimental approaches that do not require an immediate perceptual report and use EEG to measure awareness-related activity (Siclari et al., 2017), or those that eliminate consciousness by sedation (Xu et al., 2016), nevertheless reveal signal modulations in the occipital brain areas. An alternative explanation for the absence of awareness-related fMRI effects in V1 is that this modulation is much more subtle compared to attentional modulation and may have been too weak to be detected. For example, if the awareness-related activity occurs primarily in the deep layers of V1, it can be easily overlooked by conventional GE-EPI acquisitions, which exhibit a strong bias towards detecting activity in the larger vessels located on the cortical surface (Moerel et al., 2018;Polimeni et al., 2010a). High-resolution fMRI can help dissociate the mechanisms of Fig. 2. Example of mapping the awareness-related fMRI activity onto the columnar architecture (reproduced without changes from Schneider et al., 2019). A) schematic illustration of the ambiguous apparent motion stimulus and the physical stimulus B) fMRI activity during bistable vieweing within the clusters preferring horizontal (red) and vertical (green) motion. attention and awareness by examining their correspondent activity profiles across the cortical depth. Top-down attention effects have already been linked to increased activity in the superficial and deep layers of V1 (see Lawrence et al., 2019). Identifying a specific laminar signature of visual awareness which is similar or distinct from that of attention would shed light on the similarities and differences between attention and awareness on the neural level, thereby testing some prominent hypotheses about the neural mechanisms of consciousness and the role of the primary visual cortex (Crick and Koch, 1995;Koch et al., 2016;Lamme, 2006;Odegaard et al., 2017;Silvanto, 2014;Storm et al., 2017).

Higher-level influences on visual receptive fields
Apart from the ability to measure brain activity with higher spatial resolution, significant developments in fMRI analyses, in particular the population receptive field (pRF) mapping technique, allowed to further bring together knowledge from human and animal studies (Dumoulin et al., 2010;Dumoulin and Wandell, 2008). Several studies utilized the pRF technique to examine the changes in pRF location and shape in early visual areas that are driven by higher-level effects. For example, we know from physiology and fMRI studies that receptive field locations in V1 are affected by the illusory rather than the real size of the object (He et al., 2015;Ni et al., 2014) as well as by the spatial locus of top-down attention (Klein et al., 2014). As pRF represents a "compound" receptive field of several hundred thousand neurons in a voxel (Logothetis, 2008), reducing the voxel size provides a more precise measurement of the receptive field size and shape compared to the "true" receptive fields of the underlying neurons. Most importantly, high-resolution fMRI allows for detecting differences in pRF sizes not only tangentially to the cortical surface, but also across the cortical depth, as it has been done in V1 (Fracasso et al., 2016). This opens up the possibility of studying how various pRF properties are affected by higher-level effects in early visual areas across the cortical depth. Such a study has recently been published by Klein et al. (2018), finding the strongest attention-related pRF displacement in deep layers of V1. Interestingly, although this observation is in agreemnt with the target layer of the feedback projections to V1, this finding is opposite to the location of top-down attentional signal enhancement, which was found primarily in the superficial layers (Lawrence et al., 2019). The findings from both studies taken together may indicate that signal enhancement and receptive field displacement effects are driven by different feedback mechanisms, targeting different sub-populations of neurons across the cortical depth.
Utilizing high-resolution fMRI allows extending this line of research to the small subcortical visual structures such as the superior colliculus and the LGN. pRF mapping in these structures has already been successfully performed using high-resolution fMRI at 3 T, which also led to a discovery of additional visual topographic maps in the thalamic reticular nucleus and the substantia nigra (DeSimone et al., 2015). Further increasing the resolution would not only allow mapping the topography of those structures more precisely and discover more subtle fine-scale influences of higher-level effects on their layout, but also to potentially discover more subcortical topographic maps within small structures that are not typically associated with visual processing.

Fine-scale responses in higher-level visual areas
What we typically consider the mid-level and higher-level visual areas often show a remarkable degree of functional specialization. Perhaps the most prominent examples along the ventral stream involve a progression from the selectivity to simple shapes and color, to object, face, body, and scene-selective responses. Along with this specialization, there are also higher-level areas in the frontal and the parietal lobe that seem to be unspecifically responsive to a wide range of visual stimuli and tasks, which have been labeled "the task-positive", or "the dorsal attention" network (Corbetta and Shulman, 2002;Fox et al., 2005).
While the role of the specialized ventral stream areas is rather straightforward, the exact contribution of the frontoparietal areas is much harder to establish, precisely because nearly any task requiring cognitive resources, such as attention, working memory or reporting conscious perception, leads to their activation (Culham and Kanwisher, 2001;Naghavi and Nyberg, 2005).
Compared to low-level visual areas, where higher-level effects are already being studied in the context of their fine-scale functional organization, much less work has been done in the study of higher visual areas. Due to the complexity of their responses and a high degree of response selectivity to certain stimulus classes or tasks their fine-scale organization is much more difficult to reveal. In this section, I will discuss several successful attempts to understand the fine-scale functional organization of higher-level areas, focusing particularly on their topographic organization. I will indicate where high-resolution has the potential to shed light on the underlying mechanisms of higher-level visual functions such as perceptual grouping, visual awareness, and top-down attention.

Higher-level responses and topographic maps
Although generally little is known about the fine-scale functional organization of higher-level visual areas in humans, the accessibility of conventional fMRI has made it possible to make one central discovery. Topographic maps representing the visual field could be identified far beyond V1-V3, not only in regions where topography is well known from primate studies, such as MT or V3A/V3B but also further anterior to those areas along the dorsal and the ventral processing streams (Silver and Kastner, 2009;Wandell et al., 2007) (Fig. 3A). Notably, topography has been found in brain regions where analogies with primate topographic maps may not be as straightforward (Arcaro et al., 2011;Orban, 2016;Van Essen et al., 2001), which constrains the study of these maps exclusively to humans. Since the discovery of these maps, much of the effort has been placed on understanding how their visual topography is related to other higher-level functions ascribed to these areas (Scolari et al., 2015).
Revealing the functions of these maps represents a significant challenge. First, most of these higher-level topographic maps are much smaller in their surface area compared to V1 . Second, their signal is increasingly driven more by comparably weaker and noisier non-visual signals, such as overt attention or eye movements (Konen and Kastner, 2008;Silver et al., 2005). High-resolution fMRI, especially at ultrahigh magnetic field, with its advantages in both the sensitivity and resolution, is a promising tool in studying the fine-scale organization and function of these maps (Hoffmann et al., 2009), and perhaps even discovering new cortical topographic maps in regions where resolution and sensitivity limits of conventional fMRI do not allow for their detection.
One successful example of using high-resolution fMRI to reveal the functional organization of higher-level feature representations investigated numerosity processing in the parietal cortex. The authors took advantage of the sensitivity of a ultrahigh field scanner with only modest resolution increase (ca. 2 mm isotropic voxels) to identify a continuous map of quantity representation within the intraparietal sulcus, which is well known for its numerosity responses (Harvey et al., 2013). Subsequently, these initial findings were extended in two directions. First, by investigating larger parts of the brain and by further reducing the voxel size, the authors were able to detect numerosity maps in other higher-level brain areas, including additional parietal, frontal and middle temporal regions (Harvey and Dumoulin, 2017). Second, they were able to detect an object size map that is independent of, but largely overlapping with, the numerosity map, but also with the visual topographic maps (Harvey et al., 2015). The latter result is shown in Fig. 3B.
Crucially, these experiments and similar lines of research by other groups (e.g. examining the overlap between visual topographic maps with reaching and grasping by Konen et al. (2013), show that multiple independent maps may coexist within the same patch of the cortex. This fact raises the question of the mechanism by which multiple maps co-exist and interact within the same area. For example, different maps may co-exist along a tangential cortical plane at neighboring, but non-overlapping cortical locations, analogous to a hypercolumn of the primary visual cortex (Hubel and Wiesel, 1977;Ts'o et al., 2009). Alternatively, different maps may be generated by the same population of neurons that represent a specific feature flexibly based on the task demands. Finally, distinct maps can co-exist in one area within distinct cortical layers, similar to the separation of the magnocellular and parvocellular pathways in V1 (Olman et al., 2012). Further studies that utilize higher spatial resolution may be able to answer these questions.
Although perceptual grouping and completion responses (already discussed in section 2.1. of this review) can be detected in early visual areas, they are more often associated with mid-and higher-level regions. Perceptual grouping responses classically engage the object-selective lateral occipital and temporal areas (Fang et al., 2008;Grill-Spector et al., 2001;Kourtzi et al., 2003), as well as the superior parietal areas (Grassi et al., 2018;Romei et al., 2011;Yokoi and Komatsu, 2009;Zaretskaya et al., 2013). Currently, it is not clear to what extent these grouping-related activations overlap with the higher-level topographic maps along the ventral and the dorsal streams (shown in Fig. 3A). If grouping-related activity spatially coincides with any of the topographic maps, it would be valuable to identify the fine-scale pattern of activity within each of the visual topographic maps. Specifically, it is critical to determine whether this fine-scale activity pattern reflects the topographic layout of the subjective Gestalt impression, as it has been shown for V1 and V2 (Grassi et al., 2017;Kok and de Lange, 2014). If so, it would be important to determine the corresponding contributions of the higher-level and lower-level topographic representations in generating the Gestalt impression.
Another important question that high-resolution fMRI could help clarify is the contribution of the frontoparietal network to visual awareness. This is one of the most controversial issues in consciousness research (Naber and Brascamp, 2015;Odegaard et al., 2017;Storm et al., 2017;Zaretskaya and Narinyan, 2014). Activation of the posterior parietal and frontal areas, predominantly in the right hemisphere, is known to accompany and causally contribute to subjective changes in binocular rivalry and other forms of bistable perception (Brascamp et al., 2018). The frontoparietal network is often regarded as the source of top-down influences to early and mid-level areas, which, by a feedback mechanism, form a conscious percept (Weilnhammer et al., 2013). Interestingly, at least some of the frontoparietal regions implicated in consciousness may largely overlap with the parietal and frontal topographic maps (compare Fig. 3A and C). No study until now has tested whether such an overlap exists and, if so, how awareness-related modulations map onto the existing visual topography. Zooming into the small topographic maps in the parietal and frontal areas and examining their fine-scale response to changes in visual awareness can shed more light on the nature of these awareness-related activations. For example, it is important to determine whether awareness-related modulations occur only in those regions of the visual topographic maps that represent the stimulus location in the visual field or whether there are separate topographic maps for awareness that neighbor the visual and attention-related ones.
The main challenge in research on the neural correlates of consciousness is the separation of the neural processes related to conscious perception itself from those related to the pre-conscious (stimulus analysis that precedes the occurrence of the conscious percept) and postconscious processing (activity related to further operations on the conscious stimulus such as attention, metacognition, preparation of motor report) (Aru et al., 2012). Recent advances in experimental paradigms in consciousness studies allowed to dissociate conscious changes in bistable perception from the explicit report of the observer about them (Frassle et al., 2014), and even eliminate changes in bistable perception from awareness Zou et al., 2016). These paradigms showed that under no-report or no-awareness conditions, large portions of the original frontoparietal activity disappears, suggesting that it may have constituted an artifact of the post-conscious cognitive processing. Methodological differences between the studies and coarse spatial resolution of conventional fMRI make it challenging to determine which parts of the frontoparietal network stopped responding, which merely reduced their activity, and which responded to perceptual changes equally strongly in these new experimental paradigms as in the old ones. A simple mapping of coordinates from different studies suggests that some of the areas may not be affected by removing the explicit report from the experimental paradigm (Fig. 3C). The careful spatial separation of activity related to pre-conscious and Fig. 3. A) Areas showing visual topographic organization in the human brain (reproduced from Wang et al. (2015) with permission from Oxford University Press). B) Overlap between visual (magenta), numerosity (white), and object size (black) map at the same location within the parietal cortex (reproduced without from Harvey et al., 2015). C) Areas activated during subjective perceptual changes in binocular rivalry paradigm (green) and areas that reduce their activity if an active report of binocular rivalry is not required (red) (reproduced without changes from Zaretskaya and Narinyan, 2014).
post-conscious processing with high resolution and high sensitivity can help to more precisely delineate those parts of the frontoparietal network that are involved in producing conscious visual experience. The search for brain areas contributing to consciousness proper can thus be narrowed down to a more specific set of regions, or even specific functional units within a region, such as specific cortical layers and columns.

Microarchitecture of the category-selective areas in the ventral stream
While we know that many higher-level visual areas in the human parietal, frontal and temporal lobe show topographic organization, other aspects of their fine-scale organization are far less studied. For example, it is not known whether category-selective responses in areas of the temporal lobe show some kind of columnar organization, similar to lower and mid-level areas, and what the features that constitute a column in those areas are. Instead of pinwheel-like organization of abstract features like orientation, motion axis, or color, some of which have already been detected using high-resolution fMRI in humans (see Cheng, 2018 for a review), areas specialized for more complex features could exhibit other types of microstructure. For example, it is well-known from invasive recordings in primates that responses in the object-selective inferior temporal area TE show a type of columnar organization which may support view-invariance (Tanaka, 2003). Neighboring locations in these areas prefer similar objects, but each location has the highest preference for a specific modification of that object (e.g., responses to different orientations of the same face). To my knowledge, no study so far has attempted to detect similar organizational principles in humans using high-resolution fMRI.
Some of the conventional fMRI studies at 3 T suggest that there may also be other, previously unknown, organizational principles within the category-selective areas of the temporal lobe. For example, a study by Henriksson et al. (2015) revealed that the occipital face area is comprised of small clusters representing individual face parts (e.g., eyes, mouth, nose-selective clusters). Importantly, the cortical distance between these clusters correlated with the physical distance of these elements on a real face. This finding indicates that this area, instead of containing a topographic representation of the visual space, contains a topographic representation of a face. It is not known to which extent other regions, such as body-selective or place/landscape-selective areas, may have a similar fine-scale organization. In another example, a series of studies using high-resolution fMRI at 3 T examined the relationship between face-and limb-selective clusters in the occipitotemporal cortex, revealing that the two cluster types are always located side-by-side and appear to be organized into two different processing streams (Weiner and Grill-Spector, 2013). These and other studies show that there is much to be learned about fine-scale organization and responses of category-selective areas within the temporal lobe by further increasing the spatial resolution.

High-resolution functional connectivity
High-resolution fMRI comes at the expense of the amount of brain coverage, forcing researchers to focus primarily on the local effects within one or a few neighbouring areas. Although high spatial resolution has great potential for studying visual responses within areas, no area works in isolation, and much progress can be made in investigating inter-areal interactions. High-resolution functional connectivity is perhaps the least explored method in fMRI studies of visual perception. Although functional connectivity does not imply the existence of physical connections between the two areas, it can reveal synchronization between signals in distinct areas that is nevertheless functionally relevant. Increasing the resolution of functional connectivity studies can shed light on the interactions between higher-and lower-level areas during complex perceptual tasks at a fine spatial scale. As functional connectivity measures are based on analyzing signal changes over time, this approach can also profit from high temporal resolution, which is addressed in several other articles of this special issue. In this final section of my review, I will focus specifically on how increasing the spatial resolution of functional connectivity studies can be beneficial for understanding higher-level visual functions.
High-resolution of functional connectivity can be used to study hierarchical interactions between different visual areas during perceptual grouping. One potential avenue for future research is testing whether there is a topograpy-specific coupling between higher and lower-level areas that accompany e.g. perceptual grouping. Given that there is a topography-specific pattern of responses in early visual areas (discussed in Part 2.1.), it is reasonable to assume that areas representing the foreground of the figure in the early visual areas are selectively coupled with the corresponding topographic regions in the higher visual areas. In our study of perceptual grouping effects in the early visual cortex, we observed an independent V1-V2 coupling of regions representing the background of an illusory figure and regions representing the foreground, suggesting that the foreground enhancement and the background suppression may be driven by distinct sources (Grassi et al., 2017). This idea is also supported by a recent primate neurophysiology study examining the temporal response profiles for the figure and the background representations in V1 (Self et al., 2019). In another relevant study, researchers examined the effect of surrounding context on contour integration (Qiu et al., 2016). They found that the presence of surrounding clutter leads not only to an activity increase within subregions representing the contour, but also to increase in functional connectivity of the corresponding representations within V1 and V2. High-resolution fMRI can help reveal similar effects beyond areas V1 and V2 and determine whether there are similar topography-specific connectivity patterns between V1/V2 and higher-level topographic maps. Another research direction with high potential is the identification of the sources of presumable top-down effects in early visual areas during perceptual grouping, which are currently unknown. For example, connectivity "seeds" can be placed within the feedback layers of the early visual cortex, and areas specifically coupled with the feedback seeds can be identified throughout the rest of the brain. First attempts at exploring the possibility of depth-dependent connectivity have been performed in the early visual cortex (Polimeni et al., 2011(Polimeni et al., , 2010b, as well as in the sensorimotor (Huber et al., 2017) and language (Sharoh et al., 2019) networks, and can be extended to investigate higher-level visual functions.
Changes in connectivity between different levels of processing are also an essential assumption in theories of consciousness (Lamme and Roelfsema, 2000). Using conventional functional connectivity analysis, several previous studies revealed changes in coupling between the mid-/lower-level areas and higher visual areas during bistable perception. For example, one study examined coupling between V5/MT + and the parietal cortex using the method of dynamic causal modeling (DCM), which can dissociate bottom-up and top-down components of functional connectivity (Megumi et al., 2015). This study not only showed that resolving the ambiguity is accompanied by bidirectional interactions between parietal and mid-temporal areas, but also that the bottom-up component of the coupling predicted inter-individual differences in the subjective experience of a bistable stimulus. In another example, the authors examined functional connectivity between V5/MT + and one of the frontal nodes of the frontoparietal network (Weilnhammer et al., 2013). This study was able to show an increased top-down coupling during bistable perception compared to a control condition. Directed functional connectivity measures, such as DCM, have long been used to determine the modulation of information flow in different contexts (Friston et al., 2003(Friston et al., , 1997, but they have several significant limitations. First, they represent an indirect measure of the direction of influence. Second, they usually require detailed knowledge about the areas involved in the task at hand and their connections to be included in the model in form of assumptions. Examining functional connectivity between different areas across the cortical depth represents a more data-driven approach to determining the direction of information flow between areas. Future high-resolution studies of bistable perception and visual awareness can focus specifically on the interaction between higher-level and early/mid-level visual areas across the cortical depth in order to test the hypotheses about the role of feedback in visual awareness.

Conclusions and future directions
In this article, I focused on how high-resolution fMRI has already advanced and can further advance our understanding of higher-level visual functions. In early visual areas, some work has already been done to determine how higher-level visual functions map onto the finescale organization such as topographic maps, columns, and layers. At higher stages of the visual hierarchy, high-resolution fMRI revealed the co-existence and overlap of visual topographic maps and non-sensory feature maps as well as fine-scale organization of responses in areas specialized for specific stimulus categories. Future studies may be able to determine other types of non-sensory maps, elucidate how multiple maps can co-exist within the same area and how their arrangement serves perceptual and cognitive functions. Another central advantage of high-resolution fMRI application has been the ability to attribute functional activations to feedback or feedforward processing by examining functional activations across different cortical depths. This has been particularly successful in studies of completion and grouping but is much less explored in studies of attention and awareness. The majority of these depth-resolved fMRI studies focused on higher-level effects in early visual areas, and more work is needed to explore depth-dependent effects at higher processing stages. Finally, high-resolution functional connectivity studies represent a promising approach for studying finescale interactions between the visual areas at different levels of the hierarchy. Hence, high-resolution fMRI as a method has the potential not only for enhancing our knowledge about the processing at each level of the cortical hierarchy separately, but also to study how different levels interact in the context of higher-level visual tasks.
In this article, I have mainly focused on what can be done with current fMRI technology alone. As high-resolution fMRI is conducted predominantly at ultrahigh magnetic fields, it is still quite limited with regard to the combination with other neuroimaging approaches, most of which are only available for 3 T. For example, high-resolution fMRI at 3 T combined with EEG allowed linking of activity at different cortical depths of V1 to neural oscillations in distinct frequency bands (Scheeringa et al., 2016), which paves the way for an exciting research direction examining similar relationships in other visual areas. It would be important to combine high-resolution fMRI with non-invasive brain stimulation, which allows us to causally manipulate activity in an area and observe its impact on other, remote brain regions in human subjects (Ruff et al., 2009). Examining remote TMS effects across the cortical depth might help infer the direction of connectivity between the stimulated and the remote areas, thereby revealing the hierarchical structure of the studied networks. As high-field MRI becomes more and more common, it is expected that the multimodal tools will catch up to provide neuroscience with more exciting possibilities for exploring the human brain.
Another area of studies where high-resolution fMRI has rarely been applied is the clinically-oriented work of understanding disorders of higher-level vision. For example, how are the multiple topographic maps altered in hemineglect? Or, what is the role of feedback to the visual cortex in producing visual hallucinations? Some of these questions and a more general potential of depth-resolved fMRI for computational psychiatry and predictive coding theories of psychiatric disorders have been addressed in a recent review (Stephan et al., 2019). High-resolution fMRI may also be particularly promising in studying the blindsight-like phenomena, as they are thought to depend on the interaction between the small thalamic nuclei and the cortex (Schmid and Maier, 2015). Once this methodology becomes more established in exploring healthy visual functions, it can be extended to better understanding disorders of higher-level visual functions. This review has been intentionally optimistic. Like any method, highresolution fMRI has its limitations. The majority of those limitations are related to the method of fMRI in general, such as the complex, indirect and poorly-understood relationship between the BOLD signal and neural activity, the complexity of intracortical interactions within and across different layers, or the relatively low temporal resolution compared to the time scale of perception-related neural events (Logothetis, 2008). The more specific limitations of high-resolution fMRI, both physiological and technical, have been discussed in detail elsewhere (see Lawrence et al. (2017) for laminar fMRI in cognitive neuroscience or Cheng (2018) for high-resolution fMRI in vision). It is ever more important to check the findings of high-resoltuion fMRI studies in humans against invasive neurophysiology studies, especially those conducting laminar recordings of neural activity (Self et al., 2017). While physiological limits imposed by the nature of the BOLD signal may be more difficult to overcome, the rapid development of MR technology is expected to surmount the technical limitations imposed by the hardware. Neuroscientists are looking forward to these developments as high-resolution fMRI remains one of the most promising methodologies for studying higher-level visual functions.

Funding
This work was supported by the Young Researcher Group Programme of BioTechMed-Graz, Austria; and the Austrian Science Fund (FWF): P33322.

Acknowledgments
I would like to thank Christof Körner and Anja Ischebeck for their comments on the previous versions of the manuscript as well as the anonymous reviewers for their helpful suggestions.

Appendix A. The Peer Review Overview and Supplementary data
The Peer Review Overview and Supplementary data associated with this article can be found in the online version, at doi:https://doi.org/10 .1016/j.pneurobio.2021.101998.