Studying the visual brain in its natural rhythm

How the brain fluidly orchestrates visual behavior is a central question in cognitive neuroscience. Researchers studying neural responses in humans and nonhuman primates have mapped out visual response profiles and cognitive modulation in a large number of brain areas, most often using pared down stimuli and highly controlled behavioral paradigms. The historical emphasis on reductionism has placed most studies at one pole of an inherent trade-off between strictly controlled experimental variables and open designs that monitor the brain during its natural modes of operation. This bias toward simplified experiments has strongly shaped the field of visual neuroscience, with little guarantee that the principles and concepts established within that framework will apply more generally. In recent years, a growing number of studies have begun to relax strict experimental control with the aim of understanding how the brain responds under more naturalistic conditions. In this article, we survey research that has explicitly embraced the complexity and rhythm of natural vision. We focus on those studies most pertinent to understanding high-level visual specializations in brains of humans and nonhuman primates. We conclude that representationalist concepts borne from conventional visual experiments fall short in their ability to capture the real-life visual operations undertaken by the brain. More naturalistic approaches, though fraught with experimental and analytic challenges, provide fertile ground for neuroscientists seeking new inroads to investigate how the brain supports core aspects of our daily visual experience.

The inevitable question, "But how does the brain respond under natural conditions?" has begun to rouse the collective conscience of visual neuroscientists. This issue prods at a scientific field whose strength has always lain in carefully designed experiments, with elemental stimuli presented during reduced behavioral paradigms. Conventional experiments differ in important ways from real-world vision. In most experiments, stimuli are presented briefly onto a blank display, whereas in real-world vision we are continuously immersed in a complex, panoramic 3D scene. In most experiments, subjects maintain their gaze on a small point at the center of the display to allow for controlled stimulus presentation, whereas in real-world vision gaze shifts and bodily movements continually jostle the retinal input. Placing excessive focus on reductionistic methods runs the risk of missing key organizational features of the visual brain that have evolved to thrive within inherently complex and dynamic environments and in coordination with self-directed actions. These factors have shaped core aspects of the brain's functional organization that are broadly shared among species but are largely untapped by conventional visual neuroscience experiments Miller et al., 2016). Thus measuring and interpreting brain activity during more natural conditions is an important complement to traditional experiments, which offers to enhance our understanding of how the brain approaches the difficult problem of vision. In this brief article, we review experimental paradigms that incorporate real-world visual stimuli, free behavior, and novel analyses to study the visual brain. We focus on electrophysiological and functional imaging experiments in monkeys and humans, though always keeping in mind that many core visual operations are shared broadly across mammals Danjo et al., 2018;Glickfeld and Olsen, 2017;Omer et al., 2018;Vinken et al., 2016;Zoccolan, 2015). A central theme of the article is that conceptual frameworks in visual neuroscience are strongly linked to a tradition of strict reductionism enforced by a relatively narrow range of experimental paradigms, and that concepts are destined to evolve as naturalistic modes of testing gain in popularity.

Our representationalist roots
In science, dominant experimental paradigms often spawn dominant conceptual frameworks. Visual neuroscience, informed largely from experiments in primates, has adhered closely to testing methods designed to reduce the inherent complexity of visual behavior. Most commonly, an experiment consists of recording brain activity in a human or nonhuman primate, whose head is immobilized and whose gaze is fixed upon a small point, while stimuli are flashed briefly onto a display (Fig. 1). Over the past decades, this sort of experiment has revealed visual feature selectivity across dozens of brain areas of humans and nonhuman primates (Allman and Kaas, 1971;Desimone et al., 1984;Engel et al., 1997;Epstein and Kanwisher, 1998;Hubel and Wiesel, 1998;Kanwisher et al., 1997;Kolster et al., 2014;Tootell et al., 1996;Van Essen and Zeki, 1978). The immense body of physiological findings, together with analysis of anatomical connectivity, retinotopic organization, neural response latencies, and lesion studies, have given rise to a widely adopted framework for understanding how the brain approaches the problem of vision (Felleman and Van Essen, 1991;Kravitz et al., 2011;Kravitz et al., 2013).
Superimposed upon this framework is the notion of neural representationalism, where neurons throughout the brain are seen to explicitly represent external visual objects through their selective responses. At its core, the idea of neural representation seldom goes beyond a restatement of the results, namely the responses of specific neurons or voxels to specific stimuli. In practice, however, it provides a linguistic shorthand and basis for analyses that help explore a wide range of cognitive phenomena. For example, the brain's internal representations can be quantified and compared (Kriegeskorte and Kievit, 2013), shaped through learning (Op De Beeck and Baker, 2010), infused with value information (Paton et al., 2006), enhanced through voluntary attention (Desimone and Duncan, 1995;Maunsell, 2015), or utilized in the service of motor actions (Murata et al., 1997). Higher brain centers, seldom specified, are considered to tap these internal representations to support recognition, memory, and subjective perception (Kriegeskorte et al., 2008;Ritchie et al., 2019;Yamins and DiCarlo, 2016). Representations can be manifest in the activity of single neurons, in the synchrony of interconnected neural networks, or in the population responses of a broad collection of fMRI voxels. When not explicitly invoked, the concept of representationalism usually lingers in the background, shaping how researchers think about their data and about the brain's core operations. In this article, we do not question whether representationalism is valid or has been useful but do question whether this framework will continue to prove useful for scientists as they depart from conventional testing and Fig. 1. Fixed-gaze paradigms used in humans and nonhuman primates. The dominance of these paradigms, where the subject is immobilized, the head restrained, and the gaze maintained on a small point (black circle) during the presentation of visual stimuli (red cross), has strongly shaped theories of visual neuroscience. D.A. Leopold, S.H. Park NeuroImage 216 (2020) 116790 measure brain activity during more natural modes of behavior.

The visual brain in response to complex scenes and free viewing
The notion that the brain evolved to contend with real-world problems is hardly new. In fact, there has always been a contingency of visual neuroscientists that has sought to remind the community of importance of this principle for understanding brain organization (Felsen and Dan, 2005). It is evident, for example, that neurons throughout the visual system are tuned for features of the natural world. Even at the earliest stages of vision, the receptive field properties of neurons appear evolutionarily tailored to match the statistics of natural images (Bell and Sejnowski, 1997;Dan et al., 1996;David et al., 2004;Liu et al., 2016;Movshon and Simoncelli, 2014;Niemeyer and Paradiso, 2017;Olshausen and Field, 1996;Ringach et al., 2002). At later stages, many neurons appear tuned for socially important stimuli such as faces and bodily actions Nelissen et al., 2011). Some experiments place particular emphasis on the sensorimotor or "active sensing" components of natural vision (Schroeder et al., 2010). While this area is less explored, it is clear from many experiments that self-directed exploratory behaviors strongly shape the brain's responses through cognitive factors such as attention and target selection (Bichot and Desimone, 2006;Chelazzi et al., 1993;Mazer and Gallant, 2003;Moore et al., 1998), perceptual factors such as whether an object is actively noticed in a scene (Sheinberg and Logothetis, 2001), and social factors such as whether the subject is making direct eye contact with another individual (Minxha et al., 2017;Mosher et al., 2014).
Human neuroscientists have begun to chart the brain's responses during modes of vision that we can operationally describe as "natural" using innovative fMRI paradigms and free viewing. It is by now common to employ movies as visual stimuli, in some cases allowing subjects to freely direct their gaze as hemodynamic or electrophysiological responses are measured (Bartels and Zeki, 2004;Chen et al., 2017;Cohen et al., 2017;Conroy et al., 2013;Hanson et al., 2007;Hasson et al., 2004;Huth et al., 2016b;Wagner et al., 2016). One early and pioneering study used the temporal covariation of hemodynamic responses between voxels across the brain during video viewing to identify large-scale patterns of brain activity, or networks ( Zeki, 2004, 2005). Another series of studies analyzed activity based on a comparison across individuals, mapping the similarity of voxel time courses across corresponding brain regions of different subjects watching the same movies (Chen et al., 2015;Hasson et al., 2004Hasson et al., , 2008Hasson et al., , 2010Hasson et al., , 2012. Prior to this work, few neuroscientists had even attempted to make sense of brain activity recorded during natural modes of visual behavior, such as the free viewing of visual movies.
Studies involving movies and free viewing have spawned new analytic tools for summarizing brain activity and led to new concepts such as chronoarchitecture (Bartels and Zeki, 2004), a hierarchy of temporal receptive windows (Hasson et al., 2008), and continuous cortical spaces for analyzing structural or semantic content (Guntupalli et al., 2016;Huth et al., 2012). After meeting initially mixed reactions, these approaches have gained popularity and given rise to methods used in adjacent fields of neuroscience, such as the study of human speech and language (Honey et al., 2012;Huth et al., 2016a;Regev et al., 2013) and the assessment of functional connectivity (Vanderwal et al., 2017;Wang et al., 2017). Recently, the inter-subject correlation method was extended to study functional homology across species, and in particular between brain regions of monkeys and humans (Mantini et al., 2012a(Mantini et al., , 2012b(Mantini et al., , 2013. In general, the analyses borne from these naturalistic experimental paradigms deemphasize the representation of individual features and instead focus on temporal coordination across brain areas. Another set of questions that arises during the free viewing of movies relates to the brain's distinction between real-world events and selfgenerated actions, both of which continually stimulate the retina during the course of natural vision. A recent macaque fMRI experiment compared the expression of these fundamentally different types visual input across the brain (Russ et al., 2016;Russ and Leopold, 2015). Briefly, the researchers dissected the hemodynamic responses to the free viewing of movies into two separate components, one related to image movement in the movies and one related to retinal motion sweeps caused by the monkey's self-directed eye gaze changes. The two forms of retinal stimulation elicited markedly different patterns of activity across the brain. Eye movement motion strongly modulated the early retinotopic areas such as V1, V2 and V3 but exerted relatively little influence upon regions of the superior temporal sulcus (STS), including area MT and its satellite regions as well as face-and body-selective patches. By contrast, stimulus motion strongly influenced the STS areas but had a proportionally smaller effect on the early retinotopic areas. This pronounced difference points to a potentially fundamental aspect of brain organization that warrants further investigation.

New insights about the composition of face patches
Given our knowledge of brain organization, an obvious question is how category-selective neurons would respond to the complexity of natural vision. An obvious starting point is the abundant face-selective cortical regions of the primate brain Hung et al., 2015;Kanwisher et al., 1997;Tsao et al., 2003), which are known to be replete with face-selective neurons (Aparicio et al., 2016;Tsao et al., 2006;Ungerleider and Bell, 2011). Their activity might serve as an important point of reference for predicting the utility of feature representation concepts in natural paradigms. One study addressed this question by comparing responses from a population of densely packed face-selective neurons (within <1 mm 3 ) in the macaque anterior fundus (AF) face patch during conventional versus naturalistic testing conditions (McMahon et al., 2015). In both conditions, neurons showed consistent responses to multiple presentations of the same stimulus, despite the unconstrained eye movements during the movie viewing (Fig. 2). However, in contrast to their shared category selectivity in response to flashed stimuli, neighboring neurons diverged markedly in their responses to movies, and were often statistically uncorrelated in their time courses. One explanation for this difference is that many of the nominal "face-cells" responded to dynamic and spatial dimensions of the movie that were untested during the flashed image presentation. In some ways, this decorrelation might be expected if stimuli are endowed with a richer set of stimulus features, which is the case for dynamic social videos. Nonetheless, it brings to light the potential changes in perspective that are apt to arise when strict feature-and image-based conceptions of brain organization are pushed to explain responses during more naturalistic paradigms.
The same researchers then turned to the whole brain coverage afforded by fMRI in an attempt to understand the diversity of face-cell responses to natural movies (Park et al., 2017). They exploited the shared timeline imposed by an external 15-min movie stimulus, whose temporal structure was used to entrain both neural and fMRI visual activity. This common timeline allowed for a direct comparison of the time courses of fMRI responses with electrophysiology responses, despite the data being collected separately in different animals. They applied a novel method to combine the imaging and neural data by applying the broad array of fMRI voxel time courses obtained throughout the brain as a transformation, or "read-out", for single-unit activity. Correlating the response time courses of each neuron with this array resulted in a population of neuron-specific activity maps, each reflecting the extent to which a given neuron's movie-driven visual responses were shared with other brain areas. Analyzing these maps provided a new perspective on the functional diversity of responses among AF face patch neurons by identifying functional subpopulations within the small cortical volume (Fig. 3A). It also revealed substantial differences in how the visual activity of the different neural subpopulations is coordinated with the responses of face patches, motion-selective areas, early visual cortex, and particular subcortical structures. Applying the same approach to face-selective neurons in other face patches and in the pulvinar revealed similar levels of local diversity, with some overlap in the functional subpopulations observed at the different sites (Park et al., 2018).
Thus, the concept of "face cells", neurons whose straightforward devotion to faces is among the most reassuring of phenomena in visual neuroscience, appears on unsteady ground when confronted with experiments that diverge from the conventional presentation of flashed images. Similarly called into question is the notion of strict functional segregation, since the rich content of natural videos unveils multiple, intermingled neural subpopulations occupying the same voxel but allied with very different brain networks.

Active elements of social engagement
For primates, vision is a critical mediator of most social interactions, and this is frequently taken as the reason for the abundance of areas selectively responsive to faces Tsao and Livingstone, 2008) and bodily actions (Nelissen et al., 2011;Rizzolatti and Craighero, 2004). The broadening of testing paradigms has allowed researchers to specifically target some aspects of higher-level vision that may have evolved specifically to serve primate social cognition, such as the neuronal responses in the macaque amygdala that signal occurences of direct eye contact (Mosher et al., 2014). An abundance of human fMRI work has investigated cortical regions ostensibly adapted for aspects of yet higher-order social cognition, such as complex social judgements or the attribution of mental states to others (Richardson and Saxe, 2020;Saxe and Wexler, 2005). Recent fMRI experiments have revealed cortical regions in the macaque sensitive to social variables such as personal familiarity and interactive communication (Landi and Freiwald, 2017;Shepherd and Freiwald, 2018). Going a step further, paired subject A. Each row in the heat map corresponds to an individual neuron, and each column is the average spiking response to a particular image. The image categories are shown at the bottom, revealing a strong bias for responding to faces, as is typical among neurons in macaque face patches. On the right are the mean category responses of eight example neurons. B. Movie responses are shown for the same eight example neurons. In each panel, the action potentials for multiple presentations of the movie are shown in the faint raster plots, revealing a strong repeatability across trials. The spike density function is shown in the blue lines superimposed on the rasters. A striking feature of the movie response time courses is that they are largely uncorrelated among these nearby neurons, despite their similar categorical responses shown in A. Fig. 3. Data-driven approaches to understand the functional specialization of single neurons in high-level visual cortex. A. Single-unit mapping of individual time courses of neurons onto voxels across the brain (Park et al., 2017). This method uses fMRI-responses to video content as a way to interpret the response time courses of single neurons. Within macaque face patches, neighboring single neurons show highly divergent correlational maps with the rest of the brain, indicating a heterogeneity of long-range inputs and visual operations in the local neural population. B. Single-unit based design of visual images using deep learning methods (Ponce et al., 2019). Optimized stimuli are determined through multiple iterations of neural recording and stimulus modification. Examples of final optimized patterns are shown (dashed lines are for illustration purposes and do not indicate recording sites in the temporal cortex).
paradigms have begun to allow researchers to probe directly some of the more interactive elements of social cognition. For example, staged interactions in pairs of humans (Babiloni and Astolfi, 2014;Liu et al., 2018;Redcay et al., 2010) and macaques (Chang et al., 2015;Grabenhorst et al., 2019;Haroush and Williams, 2015;Yoshida et al., 2011) are tapping into how the brain draws upon visual information to solve specific problems related to social communication, decision-making, and joint action. These interactive approaches are still in their infancy and, while yet unproven in their capacity to shape our understanding of the brain, are beginning to break important ground for studying core aspects of primate visual behavior.

Where natural and computational approaches meet
As new experiments reveal uncomfortable levels of complexity in the brain's activity during natural modes of behavior, researchers rely increasingly on computational methods to summarize their data and organize their thinking. One area of progress has been in the capacity to describe, extract, and predict complex patterns of neural responses, including under conditions that deviate from traditional testing. In fact, researchers studying the brain's responses to complex videos are increasingly coming to rely on computational modelling to infer the brain's encoding principles (Han et al., 2019;Mandelkow et al., 2016Mandelkow et al., , 2017Wen et al., 2018), functional organization (Cohen et al., 2017;Huth et al., 2012), and functional similarity across different subjects (Conroy et al., 2013). The capacity to capture, extrapolate, and predict neural responses to complex videos is one measure of understanding brain function, which then needs be reconciled with many other known aspects of its organization and physiology.
Another exciting area in which computational methods have made an impact relates to the synthesis of stimuli. For example, advanced computer graphics methods have greatly enhanced the ability to create and parameterize photorealistic stimuli, most commonly faces (Murphy and Leopold, 2019;Paukner et al., 2014;Steckenfinger and Ghazanfar, 2009), bodily actions (Dayan et al., 2016;Giroux et al., 2019), and virtual reality scenes (Doucet et al., 2016;Mueller et al., 2012). As software toolboxes and hardware processing power continue to advance, the field of visual neuroscience will move increasingly toward the creation of synthetic virtual worlds that offer both parametric control and scene complexity for probing the brain's visual responses.
Stimulus synthesis is also taking center stage in a rather different context, where the brain activity-based synthesis of stimuli aims to discover the natural response preferences of visual neurons. In such experiments, neural responses are fed in real-time into a stimulus generation computer, which iteratively produces more effective generations of stimuli according to the response preferences of a neuron or group of neurons. In these so-called genetic algorithms, each iteration analyzes the neural responses to the previous generation to spawn the next stimuli. This approach has been used successfully to test the tuning of high-level visual neurons for volumetric structure (Hung et al., 2012) and visual scene geometry (Vaziri and Connor, 2016). Most recently, this approach was used to generate complex images, in this case using deep learning algorithms, to optimize the responses of neurons in face-selective regions of the macaque temporal cortex (Ponce et al., 2019). The iterative generation of stimuli converged on optimized images that, despite eliciting the strongest spiking responses from neurons, were not recognizable objects. However, the mosaics of colored contours and surfaces that characterized the optimized images (Fig. 3B) did often vaguely resemble animals and other natural stimuli. While this approach still bears hallmarks of representationalism, it is mentioned here because its data-driven and generative nature opens fundamentally new directions for thinking about high-level visual specializations in the brain.
The melding of natural vision and sophisticated computational methods has already begun and will likely accelerate in the coming years. This combination offers a powerful platform to probe the physiology and organization of the visual brain. It may also begin to shape the way we think about brain organization, function, and stimulus representation.

On the horizon: stepping out into the real world
While some investigators have cautiously incorporated naturalistic elements into their research designs, others are preparing to jump in head-first to the complexity of everyday visual behavior by recording neural activity in freely moving and interacting subjects. Until recently, technological barriers made recording brain activity during free movement extremely difficult in the primate. However, an explosion of wireless and datalogging multichannel amplifiers (Luan et al., 2018), together with new video-based positional and behavioral tracking (Nath et al., 2019), has prompted laboratories around the world to begin tracking electrophysiological brain signals in freely moving, and sometimes freely interacting, primates. A handful of studies have demonstrated the feasibility and scientific promise of this mode of experimentation in macaques (Baraduc et al., 2019;Lei et al., 2004;Rajangam et al., 2016;Rolls, 1999;Tamura et al., 1992), marmosets (Courellis et al., 2019;Kondo et al., 2018), other mammals (Geva-Sagiv et al., 2015;Yartsev and Ulanovsky, 2013), and even humans Hayhoe and Matthis, 2018;Podvalny et al., 2017). For primate neuroscience, the marmoset may be a particularly good model for studying interactive behaviors, as this species is notable for its rich social repertoire and its ease in handling and breeding (Courellis et al., 2019;Miller et al., 2016;Mitchell and Leopold, 2015).
Making sense of brain activity during free behaviors will benefit from the development of new testing paradigms that balance natural and spontaneous actions with some level of experimental constraint. As a model, one need only consider the discovery of place cells and grid cells in the rat (Moser et al., 2008;O'Keefe, 1976). These landmark physiological properties of medial temporal lobe initially came to light only because scientists allowed their subjects to move freely but were then studied systematically under a large number of controlled experimental environments. In fact, recent experiments have broadened the scope of hippocampal function to include explicitly social stimuli, since "social place cells" were demonstrated in the hippocampus of bats (Omer et al., 2018) and rats (Danjo et al., 2018) to encode the location of observed individuals. Given the large social groups of primates, and the predominance of vision for their social monitoring, one might predict that the brains of monkeys and humans are particularly adapted to track the spatial layout of social interactions, a theme recently highlighted as an important area for the future (Miller et al., 2016). We end this article by briefly discussing a few other ways in which the adoption of unconstrained behavioral paradigms may gradually impact how neuroscientists come to think about vision.

Real-world scene geometry
Visual neuroscientists are accustomed to thinking about the size and position of a stimulus in degrees of visual angle, its movement in degrees per second and its spatial frequency in cycles per degree. These conventions are inherited from studies of the early stages of visual processing, where physiological responses are closely linked to the retinal geometry and thus well described in angular units. At one level of description, a two-dimensional retinal image is the projection of any of an infinite combinations of object size and distance (Fig. 4A). However, as more experiments consist of subjects moving through real 3D space, it may become obvious the neural responses variation with image size is best interpreted with respect to the real-world geometry of a scene rather than to the retinal subtense of a stimulus. To take one example, as the brain computes the layout of a scene, the ventral visual pathway may well employ the known absolute sizes of objects, for example the absolute sizes of conspecifics and their body parts, to calibrate spatial scale and distance, and thus establish a scene's layout. Thus, one prediction is that information about absolute size, and possibly absolute distance, is embedded among object representations in the brain. Indeed, some human fMRI studies have begun to emphasize the importance of absolute size and found spatial organization of this variable the brain's coding of objects Oliva, 2011, 2012).

Panoramic immersion
Our usual visual experience consists of an immersion within a 3D scene, with a sizeable span of our surroundings visible at any point in time. This fundamental aspect of vision is usually excluded in object vision paradigms, where images are usually presented at or near the center of vision. While the central 10 degrees of vision is clearly important for primates, this corresponds to <0.5% of the visual field, and surprisingly few studies have explored responses to objects outside this range. Much of our knowledge of responses in eccentric cortical areas comes from a handful of physiological studies aiming to map and otherwise characterize the brain's representation of peripheral retinotopic space (Chaplin et al., 2013;Galletti et al., 1999;Mikellidou et al., 2017). Fascinating quesitons arise as visual scientists consider how the visual brain integrates information across the whole visual field . To offer one example, a recent study in unconstrained human patients linked neural activity observed in the visual cortex during natural orienting behavior to the vexing problem of perceptual stability (Podvalny et al., 2017), providing a new perspective on the long-standing puzzle of how the brain contends with the constant stream of retinal shifts arising from saccadic eye movements (Wurtz et al., 2011). The study (Podvalny et al., 2017) found that during large gaze shifts in the context of a social interaction, responses in the early retinotopic visual cortex were strongly affected by the specific saccade parameters, but those in the face-selective cortex were not, a finding that bears similarity to the monkey free-viewing fMRI study described above (Russ et al., 2016). The authors discussed this difference in the influence of saccades as a possible indicator of perceptual stability in the face-selective cortical areas. Over time, experiments in whole-field visual environments will prompt primate neuroscientists to place more focus on peripheral vision, vision-based orienting actions, allocentric spatial geometry, open vs. closed scenes, navigation, perceptual stability, and the layout of social space.

Parallax through self-motion
Another visual operation likely to receive increased attention in freely moving animals is the computation of object structure and scene layout through the parallax afforded by self-directed motion (Fig. 4B). This most fundamental and understudied depth cue utilizes a progression of sequential views derived from an animal's passage through space. It is evolutionarily ancient and widely utilized among sighted animals. Even primates, whose high-resolution, stereoscopic vision offers a panoply of static depth cues, are apt to rely on motion parallax to guide them as they move quickly through cluttered arboreal environments. In the visual cortex, motion-derived parallax depth is processed by similar circuits as binocular-derived parallax depth (Bradley et al., 1998;DeAngelis and Newsome, 1999;Orban et al., 2003;Sereno et al., 2002), indicating the ancestral nature of motion-derived depth mechanisms. Ironically, the only way to completely eliminate the contribution of this most fundamental cue is to rigidly constrain the subject, which is the case for nearly all vision experiments conducted to date.

Temporal continuity
Finally, free-behaving experiments in primates may prompt visual neuroscientists to change their perspective on the natural timeline for understanding neural responses. In conventional experiments, time is often treated as a convenient dimension for designing tasks and sequentially presenting stimuli. In most visual fMRI experiments, the Fig. 4. Real world geometrical considerations that come to the fore when visual experiments involve unrestrained animals. A. Constraints of real object size on the brain's interpretation of visual images. Three example combinations of object size and distance that generate same two-dimensional retnal image are depicted. The practice of describing the size of visual objects in degrees of visual angle, which corresponds to the retinal geometry, is inherently ambiguous and may not match the brain's encoding of objects. High-level visual specializations may instead be more directly related to objects' physical sizes and positions within the scene, including the distance to the observer. B. Deducing 3D volumetric structure and spatial relations through self-motion. The example shows the experience of an observer across time induced by its self movement in threedimensional scene. The important role of self-induced movement in understanding the three-dimensional structure of a scene is generally omitted from visual experiments. However, it may be the most fundamental cue by which animals understand both near and far three-dimensional structure, and it may be deeply embedded within cortical circuits concerned with visual perception and action. time domain is manipulated to impose temporal structure on the stimulus design, which is then used as a signature of voxels responsive to a particular stimulus. In conventional neurophysiology experiments, t ¼ 0 is taken as the moment a stimulus is presented onto a monitor, with the subsequent rollout of neural activity into early and late activity interpreted as a temporal code (Richmond et al., 1987;Sugase et al., 1999). Most researchers have put aside the difficult problem of understanding temporal dynamics of natural visual behavior and the corresponding neural responses, which are continuous and expressed over multiple time scales. However, as paradigms become unmoored from a strict representationalist framework and researchers develop ways to analyze the temporal dynamics of neurons or voxels, such as those in response to a movie, this perspective on the brain's utilization of the time domain may change. In animals engaged in unconstrained and interactive behaviors, it will be of great interest to track activity in areas involved in social interaction, visual analysis of scene layouts, emotional responses and visuomotor actions. Recent work in the rat has emphasized that time itself is represented explicitly in some brain areas, including over time scales of several seconds and longer (Howard and Eichenbaum, 2015). The temporal dynamics of many natural cognitive behaviors are unexplored in primates, and discoveries in these areas are likely to generate new ideas about how the brain contends with the natural rhythm of life in the visual world.

Conclusions
Studying the visual brain in its natural rhythm and real-world spatial layout represents a critical next step for a field long adherent to a consensus framework closely linked to the concept of stimulus representation. Scientific progress follows a course that is sometimes systematic and other times opportunistic-or even capricious. At any moment, researchers face decisions about whether to press forward with a proven construct or paradigm, or to instead veer into new experimental territory whose contours are not yet defined. Major discoveries ride unpredictably atop the slow churn of evolving paradigms, new technologies and changing ideas. Naturalistic paradigms offer exciting new inroads to the study of vision and higher brain function. Those who embrace them take on some risk but place themselves in a position to appreciate fundamentally new aspects of how the brain helps us see, think, and act.