On the potential role of lateral connectivity in retinal anticipation

We analyse the potential effects of lateral connectivity (amacrine cells and gap junctions) on motion anticipation in the retina. Our main result is that lateral connectivity can—under conditions analysed in the paper—trigger a wave of activity enhancing the anticipation mechanism provided by local gain control (Berry et al. in Nature 398(6725):334–338, 1999; Chen et al. in J. Neurosci. 33(1):120–132, 2013). We illustrate these predictions by two examples studied in the experimental literature: differential motion sensitive cells (Baccus and Meister in Neuron 36(5):909–919, 2002) and direction sensitive cells where direction sensitivity is inherited from asymmetry in gap junctions connectivity (Trenholm et al. in Nat. Neurosci. 16:154–156, 2013). We finally present reconstructions of retinal responses to 2D visual inputs to assess the ability of our model to anticipate motion in the case of three different 2D stimuli.


Introduction
Our visual system has to constantly handle moving objects. Static images do not exist for it, as the environment, our body, our head, our eyes are constantly moving. A "computational", contemporary view, likens the retina to an "encoder", converting the light photons coming from a visual scene into spike trains sent-via the axons of ganglion cells (GCells) that constitute the optic nerve-to the thalamus, and then to the visual cortex acting as a "decoder". In this view, comparing the size and the number of neurons in the retina, i.e. about 1 million GCells (humans), to the size, structure and number of neurons in the visual cortex (around 538 million per hemisphere in the human visual cortex [22]), the "encoder" has to be quite smart to efficiently compress the visual information coming from a world made of moving objects. Although it has long been thought that the retina was not more than a simple camera, there is more and more evidence that this organ is "smarter than neuroscientists believed" [41]. It is indeed able to perform complex tasks and general motion feature extractions, such as approaching motion, differential motion and motion anticipation, allowing the visual cortex to process visual stimuli with more efficiency.
Computations occurring in neuronal networks downstream photoreceptors are crucial to make sense out of the "motion-agnostic" signal delivered by these retinal sensors [11]. There exists a wide range of theoretical and biological approaches to studying retinal processing of motion. Moving stimuli are generally considered as a spatiotemporal pattern of light intensity projected on the retina, from which it extracts relevant information, such as the direction of image motion. Detecting motion requires then neural networks to be able to process, in a nonlinear fashion, moving stimuli, asymmetrically in time. [6,10] [37,88] The first motion detector model was proposed by Hassenstein and Reichardt [42], based on the optomotor behaviour of insects. The model relies on changes in contrast of two spatially distinct locations inside the receptive field of a motion sensitive neuron. The neuron will only produce a response if contrast changes are temporally delayed, and is thus not only selective to direction, but also to motion velocity. Several models have then been developed based on Reichardt detectors, attempting to explain motion processing in vertebrate species as well, but they all share the common feature of integrating spatiotemporal variations of the contrast. To perceive motion as coherent and uninterrupted, an additional integration over motion detectors is hypothesised to take place. This integration usually takes the form of a pooling mechanism over visual space and time.
However, before performing these high-level motion detection computations, the photons received by the retina need first to be converted into electrical signals that will be transmitted to the visual cortex, a process known as phototransduction. This process takes about 30-100 milliseconds. Though this might look fast, it is actually too slow. A tennis ball moving at 30 m/s-108 km/h (the maximum measured speed is about 250 km/h) covers between 0.9 and 3 m during this time. So, without a mechanism compensating this delay, it would not be possible to play tennis (not to speak of survival, a necessary condition for a species to reach the level where playing tennis becomes possible). The visual system is indeed able to extrapolate the trajectory of a moving object to perceive it at its actual location. This corresponds to anticipation mechanisms taking place in the visual cortex and in the retina with different modalities [4,56,57,84].
In the early visual cortex an object moving across the visual field triggers a wave of activity ahead of motion thanks to the cortical lateral connectivity [8,46,77]. Jancke et al. [46] first demonstrated the existence of anticipatory mechanisms in the cat primary visual cortex. They recorded cells in the central visual field of area 17 (corresponding to the primary visual cortex) of anaesthetised cats responding to small squares of light either flashed or moving in different directions and with different speeds. When presented with the moving stimulus, cells show a reduction of neural latencies, as compared to the flashed stimulus. Subramaniyan et al. [77] reported the existence of similar anticipatory effects in the macaque primary visual cortex, showing that a moving bar is processed faster than a flashed bar. They give two possible explanations to this phenomenon: either a shift in the cells receptive fields induced by motion, or a faster propagation of motion signals as compared to the flash signal.
In the retina, anticipation takes a different form. One observes a peak in the firing rate response of GCells to a moving object occurring before the peak response to the same object when flashed. This effect can be explained by purely local mechanisms at individual cells level [9,20]. To our best knowledge, collective effects similar to the cortical ones, that is, a rise in the cell's activity before the object enters in its receptive field due to a wave of activity ahead of the moving object, have not been reported yet.
In a classical Hubel-Wiezel-Barlow [5,44,60] view of vision, each retinal ganglion cell carries a flow of information with an efficient coding strategy maximising the available channel capacity by minimising the redundancy between GCells. From this point of view, the most efficient coding is provided when GCells are independent encoders (parallel Figure 1 Synthetic view of the retina model. A stimulus is perceived by the retina, triggering different pathways. Pathway I (blue) corresponds to a feed-forward response where, from top to bottom: The stimulus is first convolved with a spatio-temporal receptive field that mimics the outer plexiform layer (OPL) ("Bipolar receptive field response"). This response is rectified by low voltage threshold (blue squares). Bipolar cell responses are then pooled (blue circles with blue arrows) and input ganglion cells. The firing rate response of a ganglion cell is a sigmoidal function of the voltage (blue square). Gain control can be applied at the bipolar and ganglion cell level (pink circles) triggering anticipation. This corresponds to the label II (pink) in the figure. Lateral connectivity is featured by pathway III (brown) through ACells and pathway IV (green) through gap-junctions at the level of GCells streaming identified by a "I" in Fig. 1). In this setting one can propose a simple and satisfactory mechanism explaining anticipation in the retina based on gain control at the level of bipolar cells (BCells) and GCells (label "II" in Fig. 1) [9,20].
Yet, some GCells are connected. Either directly, by electric synapses-gap junctions (pathway IV in Fig. 1), or indirectly via specific amacrine cells (ACells, pathway III in Fig. 1). It is known that these pathways are involved in motion processing by the retina. AII ACells play a fundamental role in the interaction between the ON and OFF cone pathway [54]. There are GCells able to detect the differential motion of an object onto a moving background [1] thanks to ACell lateral connectivity. Some GCells are direction sensitive because they are connected via a specific asymmetric gap junctions connectivity [80]. Could lateral connectivity play a role in motion anticipation, inducing a wave of activity ahead of the motion similar to the cortical anticipation mechanism? While some studies hypothesise that local gain control mechanisms can be explained by the prevalence of inhibition in the retinal connectome [47], the mechanistic aspects of the role of lateral connectivity on motion anticipation has not, to the best of our knowledge, been addressed yet on either experimental or computational grounds.
In this paper, we address this question from a modeler, computational neuroscientist, point of view. We propose here a simplified description of pathways I, II, III, IV of Fig. 1, grounded on biology, but not sticking at it, to numerically study the potential effects of gain control combined with lateral connectivity-gap junctions or ACells-on motion anticipation. The goal here is not to be biologically realistic but, instead, to propose from biological observations potential mechanisms enhancing the retina's capacity to anticipate motion and compensate the delay introduced by photo-transduction and feed-forward processing in the cortical response. We want the mechanisms to be as generic as possible, so that the detailed biological implementation is not essential. This has the advantage of making the model more prone to mathematical analysis.
The first contribution of our work lies in the development of a model of retinal anticipation where GCells have gain control, orientation selectivity and are laterally connected. It is based on a model introduced by Chen et al. in [20] (itself based on [9]) reproducing several motion processing features: anticipation, alert response to motion onset and motion reversal. The original model handles one-dimensional motions and its cells are not laterally connected (only pathways I and II were considered). The extension proposed here features cells with oriented receptive field, although our numerical simulations do not consider this case (see discussion). Lateral connectivity is based on biophysical modelling and existing literature [1,28,43,78,80]. In this framework, we study different types of motion. We start with a bar moving with constant speed and study the effect of contrast, bar size and speed on anticipation, generalising previous studies by Berry et al. [9] and Chen et al. [20]. We then extend the analysis to two-dimensional motions, investigating, e.g. angular motion and curved trajectories. Far from making an exhaustive study of anticipation in complex stimuli, the goal here is to calibrate anticipation without lateral connectivity so as to compare the effect when connectivity is switched on.
The second contribution emphasises a potential role of lateral connectivity (gap junctions and ACells) on anticipation. For this, we first make a general mathematical analysis concluding that lateral connectivity can induce a wave triggered by the stimulus which under specific conditions can improve anticipation. The effect depends on the connectivity graph and is nonlinearly tuned by gain control. In the case of gap junctions, the wave propagation depends on whether connectivity is symmetric (the standard case) or asymmetric, as proposed by Trenholm et al. in [80] for a specific type of direction sensitive GCells. In the case of ACells, the connectivity graph is involved in the spectrum of a propagation operator controlling the time evolution of the network response to a moving stimulus. We instantiate this general analysis by studying differential motion sensitive cells [1] with two types of connectivity: nearest neighbours, and a random connectivity, inspired from biology [78], where only numerical results are shown. In general, the anticipation effect depends on the connectivity graph structure and the intensity of coupling between cells as well as on the respective characteristic times of response of cells in a way that we analyse mathematically and illustrate numerically.
We actually observe two forms of anticipation. The first one, discussed in the beginning of this introduction and already observed in [9,20], is a shift in the peak of a retinal GCell response occurring before the object reaches the centre of its receptive field. In our case, lateral connectivity can enhance the shift improving the mere effect of gain control. The second anticipation effect we observe is a raise in GCells activity before the bar reaches the receptive field of the cell, similarly to what is observed in the cortex [8]. To the best of our knowledge, this effect has not been studied in the retina and constitutes therefore a prediction of our model. The paper is organised as follows. Section 2 introduces the model of retinal organisation and cell types dynamics, ending up with a system of nonlinear differential equations driven by a time-dependent stimulus. Section 3 is divided into four parts. The first part analyses mathematically the potential anticipation effects in a general setting before considering the role of ACells and lateral inhibition on anticipation (Sect. 3.2) and gap junctions (Sect. 3.3). Both sections contain general mathematical results as well as numerical simulations for one-dimensional motion. The fourth part investigates examples of twodimensional motions. The last section is devoted to discussion and conclusion. In Appendix A, we have added the values of parameters used in simulations and in Appendix B the receptive fields mathematical form used in the paper as well as the numerical method to compute efficiently the response of oriented two-dimensional receptive fields to spatiotemporal stimuli. Appendix C presents a model of random connectivity from amacrine to bipolar cells inspired from biological data [78]. Finally, Appendix D contains mathematical results which constitute the skeleton of the work, but whose proof would be too long to integrate in the core of the paper. This work is based on Selma Souihel's PhD thesis where more extensive results can be found [74]. In particular, there is an analysis of the conjugated effects of retinal and cortical anticipation, subject of a forthcoming paper and briefly discussed in the conclusion.
In all the following simulations, we use the CImg Library, an open-source C++ tool kit for image processing in order to load the stimuli and reconstruct the retina activity. The source code is available on demand.

Retinal organisation
In the retinal processing light photons coming from a visual scene are converted into voltage variations by photoreceptors (cones and rods). The complex hierarchical and layered structure of the retina allows to convert these variations into spike trains, produced by ganglion cells (GCells) and conveyed to the thalamus via their axons. We considerably simplify this process here. Light response induces a voltage variations of bipolar cells (BCells), laterally connected via amacrine cells (ACells), and feeding GCells, as depicted in Fig. 1. We describe this structure in details here. Note that neither BCells nor ACells are spiking. They act synaptically on each other by graded variations of their potential.
We assimilate the retina to a flat, two-dimensional square of edge length L mm. Therefore, we do not integrate the three-dimensional structure of the retina in the model, merely for mathematical convenience. Spatial coordinates are noted x, y (see Fig. 2 for the whole structure).
In the model, each cell population tiles the retina with a regular square lattice. The density of cells is therefore uniform for convenience but the extension to nonuniform density can be afforded. For the population p, we denote by δ p the lattice spacing in mm and by N p the total number of cells. Without loss of generality we assume that L, the retina's edge size, is a multiple of δ p . We denote by L p = L δ p the number of cells p per row or column so that N p = L 2 p . Each cell in the population p has thus Cartesian coordinates (x, y) = (i x δ p , i y δ p ), (i x , i y ) ∈ {1, . . . , L p } 2 . To avoid multiples indices, we associate with each pair (i x , i y ) a unique index i = i x + (i y -1)L p . The cell of population p located at coordinates (i x δ p , i y δ p ) is then denoted by p i . We denote by d[p i , p j ] the Euclidean distance between p i and p j . We use the notation V p i for the membrane potential of cell p i . Cells are coupled. The synaptic weight from cell p j to cell q i reads W p j q i . Thus, the pre-synaptic neuron is expressed in the upper index; the post-synaptic, in the lower index. Dynamics of cells is voltage-based. This is because our model is constructed from Chen et al. model [20], itself derived from Berry et al. [9], where a voltage-based description is used. Implicitly, voltage is measured with respect to the rest state of the cell (V p i = 0 when the cell receives no input).

Bipolar cell layer
The model consists first of a set of N B BCells, regularly spaced by a distance δ B , with spatial coordinates x i , y i , i = 1, . . . , N B . Their voltage, a function of the stimulus, is computed as follows.

Stimulus response and receptive field
The projection of the visual scene on the retina ("stimulus") is a function S(x, y, t) where t is the time coordinate. As we do not consider colour sensitivity here, S characterises a black and white scene with a control on the level of contrast ∈ [0, 1]. A receptive field (RF) is a region of the visual field (the physical space) in which stimulation alters the voltage of a cell. Thus, BCell i has a spatio-temporal receptive field K B i , featuring the biophysical processes occurring at the level of the outer plexiform layer (OPL), that is, photo-receptors (rod-cones) response modulated by horizontal cells (HCells). As a consequence, in our model, the voltage of BCell i is stimulus-driven by the term where x,y,t * means space-time convolution. We consider only one family of BCells so that the kernel K is the same for all BCells. What changes is the centre of the RF, located at x i , y i , which also corresponds to the coordinates of the BCell i. We consider in the paper separable kernel K(x, y, t) = K S (x, y)K T (t) where K S is the spatial part and K T is the temporal part. The detailed form of K is given in Appendix B.
We have resulting from the condition K B i (x, y, 0) = 0 (see Appendix B). Note that the exponential decay of the spatial and temporal part at infinity ensures the existence of the space-time integral. The spatial integral R 2 K S (x, y)S(x, y, u) dx dy is numerically computed using error function in the case of circular RF and a computer vision method from Geusenroek et al. [39] in the case of anisotropic RF, allowing to integrate generalised Gaussians with an efficient computational time. This method is described in Appendix B.
For explanations purposes, we will often use the approximation of V i drive by a Gaussian pulse, with width σ , propagating at constant speed v along the direction e x : where x = iδ B is the horizontal coordinate of BCell i and where σ is in mm, A 0 is in mV.mm (and is proportional to stimulus contrast), V 0 is in mV.

BCells voltage and gain control
In our model, the BCell voltage is the sum of external drive (1) received by the BCell and of a post-synaptic potential P B i induced by connected ACells: The form of P B i is given by Eq. (11) in Sect. 2.3.1. P B i (t) = 0 when no ACells are considered. BCells have voltage threshold [9]: Values of parameters are given in Appendix A. BCells have gain control, a desensitisation when activated by a steady illumination [92]. This desensitisation is mediated by a rise in intracellular calcium Ca 2+ at the origin of a feedback inhibition preventing thus prolonged signalling of the ON BCell [20,73]. Following Chen et al., we introduce the dimensionless activity variable A B i obeying the differential equation Assuming an initial condition A B i (t 0 ) = 0 at initial time t 0 , the solution is The bipolar output to ACells and GCells is then characterised by a nonlinear response to its voltage variation given by where Note that R B i has the physical dimension of a voltage, whereas, from Eq. (9), the activity A B i is dimensionless. As a consequence, the parameter h B in Eq. (6) must be expressed in ms -1 mV -1 . Form (9) and its 6th power are based on experimental fits made by Chen et al. Its form is shown in Fig. 3.
In the course of the paper we will use the following piecewise linear approximation also represented in Fig. 3: 2 3 ], maximal gain; 4 3 ], fast decay.
Thanks to this approximation, we roughly distinguish three regions for the gain function G B (A). This shape is useful to understand the mechanism of anticipation (Sect. 3.1).

Figure 3
Gain control (9) as a function of activity A. The function l(A), in dashed line, is a piecewise linear approximation of G B (A) from which three regions are roughly defined. In the region "Silent" the gain vanishes so the cell does not respond to stimuli; in the region "Max", the gain is maximal so that cell behaviour does not show any difference with a not gain-controlled cell; the region "Fast decay" is the one which contributes to anticipation by shifting the peak in the cell's activity (see Sect. 3.1). The value A c = 2 3 corresponds to the value of activity where gain control, in the piecewise linear approximation, becomes effective

Amacrine cell layer
There is a wide variety of ACells (about 30-40 different types for humans) [64]. Some specific types are well studied such as starburst amacrine cells, which are involved in direction sensitivity [33,34,81], as well as contrast impression and suppression of GCells response [58], or AII, a central element of the vertebrate rod-cone pathway [54].
Here, we do not want to consider specific types of ACells with a detailed biophysical description. Instead, we want to point out the potential role they can play in motion anticipation thanks to the inhibitory lateral connectivity they induce. We focus on a specific circuitry involved in differential motion: an object with a different motion from its background induces more salient activity. The mechanism, observed in mice and rabbit retinas [41,61], is featured in Fig. 1, pathway III. When the left pathway receives a different illumination from the right pathway (corresponding, e.g. to a moving object), this asymmetry is amplified by the ACells' mutual inhibition, enhancing the response of the left pathway in a "push-pull" effect. We want to propose that such a mutual inhibition circuit, deployed in a lattice through the whole retina, can generate-under specific conditions mathematically analysed-a wave of activity propagation triggered by the moving object.
In the model, ACells tile the retina with a lattice spacing δ A . We index them with j = 1, . . . , N A .

Synaptic connections between ACells and BCells
We consider here a simple model of ACells. We assimilate them to passive cells (no active ionic channels) acting as a simple relay between BCells. This aspect is further discussed later in the paper. The ACell A j , connected to the BCell B i , induces on the latter the postsynaptic potential: where the Heaviside function H ensures causality. Thus, the post synaptic potential is the mere convolution of the pre synaptic ACell voltage, with an exponential α-profile [28]. In addition, we assume the propagation to be instantaneous.
Here, the synaptic weight W A j B i < 0 mimics the inhibitory connection from ACell to BCell (glycine or GABA) with the convention that W In general, several ACells input the BCell B i giving a total PSP: Conversely, the BCell B i connected to A j induces, on this cell, a synaptic response characterised by a post-synaptic potential (PSP) P A j (t). As ACells are passive elements, their voltage V A j (t) is equal to this PSP. We have thus Here, W B i A j > 0 corresponding to the excitatory effect of BCells on ACells through a glutamate release. Note that the voltage of the BCell is rectified and gaincontrolled.

Dynamics
The coupled dynamics of bipolar and amacrine cells can be described by a dynamical system that we derive now.
Bipolar voltage By differentiating (11), (4) and introducing we end up with the following equation for the bipolar voltage: where we have used (2). This is a differential equation driven by the time-dependent term F B i containing the stimulus and its time derivative.
To illustrate the role of F B i , let us consider an object moving with a speed v depending on time, thus with a nonzero acceleration γ = d v dt . This stimulus has the form S(x, where ∇ denotes the gradient. Therefore, thanks to Eq. (14), BCells are sensitive to changes in directions, thereby justifying a study of two-dimensional stimuli (Sect. 3.4). Note that this property is inherited from the simple, differential structure of the dynamics, the term dV i drive dt resulting from the differentiation of V B i . This term does not appear in the classical formulation (1) of the bipolar response without amacrine connectivity. It appears here because synaptic response involves an implicit time derivative via convolution (12).
Coupled dynamics Likewise, differentiating (12) gives Equation (6) (activity), (14) and (15) define a set of 2N B +N A differential equations, ruling the behaviour of coupled BCells and ACells, under the drive of the stimulus, appearing in the term F B i (t). We summarise the differential system here: We have used the classical dynamical systems convention where time appears explicitly only in the driving term F B i (t) to emphasise that (16) is non-autonomous. Note that BCells act on ACells via a rectified voltage (gain control and piecewise linear rectification), in agreement with Fig. 1, pathway III. We analyse this dynamics in Sect. 3.2.1.

Connectivity graph
The way ACells connect to BCells and reciprocally have a deep impact on dynamics (16).
In this paper, we want to point out the role of relative excitation (from BCells to ACells) and inhibition (from ACells to BCells) as well as the role of the network topology. For mathematical convenience when dealing with square matrices, we assume from now on that there are as many BCell as ACells, and we set N ≡ N A = N B . At the core of our mathematical studies is a matrix L, defined in Sect. 3.2.1, whose spectrum conditions the evolution of the BCells-ACells network under the influence of a stimulus. It is interesting and relevant to relate the spectrum of L to the spectrum of the connectivity matrices ACells to BCells and BCells to ACells. There is not such a general relation for arbitrary matrices of connectivity. A simple case holds when the two connectivity matrices commute. Here, we choose an even simpler situation based on the fact that we compare the role of the direct feedforward pathway on anticipation in the presence of ACell lateral connectivity. We feature the direct pathway by assuming that a BCell connects only one ACell with a weight w + uniform for all BCell, so that W B A = w + I N,N , w + > 0, where I N,N is the N -dimensional identity matrix. In contrast, we assume that ACells connect to BCells with a connectivity matrix W, not necessarily symmetric, with a uniform weight -w -, w -> 0, so that W A B = -w -W. We consider then two types of network topology for W: 1. Nearest neighbours. An ACell connects its 2d nearest BCell neighbours where d = 1, 2 is the lattice dimension. 2. Random ACell connectivity. This model is inspired from the paper [78] on the shape and arrangement of starburst ACells in the rabbit retina. Each cell (ACell and BCell) has a random number of branches (dendritic tree), each of which has a random length and a random angle with respect to the horizontal axis. The length of branches L follows an exponential distribution with spatial scale ξ . The number of branches n is also a random variable, Gaussian with meann and variance σ n . The angle distribution is taken to be isotropic in the plane, i.e. uniform on [0, 2π[. When a branch of an ACell A intersects a branch of a BCell B, there is a chemical synapse from A to B. The probability that two branches intersect follows a nearly exponential probability distribution that can be analytically computed (see Appendix C).

Ganglion cells
There are many different types of GCells in the retina, with different physiologies and functions [3,70]. In the present computational study, we focus on specific subtypes associated with pathways I-II (fast OFF cells with gain control), III (differential motion sensitive cells) and IV (direction selective cells) in Fig. 1. All these have common features: BCells pooling and gain control.

BCells pooling
In the retina, GCells of the same type cover the surface of the retina, forming a mosaic. The degree of overlap between GCells indicates the extent to which their dendritic arbours are entangled in one another. This overlap remains, however, very limited between cells of the same type [68]. We denote by k the index of the GCells, k = 1, . . . , N G , and by δ G the spacing between two consecutive GCells lying on the grid (Fig. 2).
In the model, GCell k pools over the output of BCells in its neighbourhood [20]. Its voltage reads as follows: where the superscript "P" stands for "pool". We use this notation to differentiate this voltage from the total GCell voltage V G k when they are different. This happens in the case when GCells are directly coupled by gap junctions (Sects. 2.4.4, 3.3). When there is no ambiguity, we will drop the superscript "P". In Eq. (17), the weights W B i G k are Gaussian: where σ p has the dimension of a distance and a p is dimensionless.

Ganglion cell response
The voltage V G k is processed through a gain control loop similar to the BCell layer [20]. As GCells are spiking cells, a nonlinearity is fixed so as to impose an upper limit over the firing rate. Here, it is modelled by a sigmoid function, e.g.
This function corresponds to a probability of firing in a time interval. Thus, it is expressed in Hz. Consequently, α G is expressed in Hz mV -1 and N max G in Hz. Parameter values can be found in Appendix A.
Gain control is implemented with an activation function A G k , solving the following differential equation: and a gain function Note that the origin of this gain control is different from the BCell gain control (9). Indeed, Chen et al. hypothesise that the biophysical mechanisms that could lie behind ganglion gain control are spike-dependent inactivation of Na + and K + channels, while the study by Jacoby et al. [45] hypothesises that GCells gain control is mediated by feedforward inhibition that they receive from ACells. The specific forms of the nonlinearity and the gain control function used in this paper match, however, the first hypothesis, namely the suppression of the Na + current [20].
Finally, the response function of this GCell type is In contrast to BCell response R B (8), which is a voltage, here R G is a firing rate. Gain control has been reported for OFF GCells only [9,20]. Therefore, we restrict our study to OFF cells, i.e with a negative centre of the spatial RF kernel. However, on mathematical grounds, it is easier to carry our explanation when the RF centre is positive. Thus, for convenience, we have adopted a change in convention in terms of contrast measurement. We take the reference value 0 of the stimulus to be white rather than black, black corresponding then to 1. The spatial RF kernel is also inverted, with a positive centre and a negative surround. The problem is therefore mathematically equivalent to an ON cell submitted to positive stimulus.

Differential motion sensitive cells
We consider here a class of GCells connected to ACells according to pathways III in Fig. 1, acting as differential motion detectors. They are able to respond saliently to an object moving over a stationary surround while being strongly inhibited by global motion. Here, stationary is meant in a general, probabilistic sense. This can be a uniform background or a noisy background where the probability distribution of the noise is time-translation invariant. These cells are hence able to filter head and eye movements. Baccus et al. [1] emphasised a pathway accountable for this type of response involving polyaxonal ACells which selectively suppress GCells response to global motion and enhance their response to differential motion, as shown in Fig. 1, pathway III. The GCell receives an excitatory input from the BCells lying in its receptive field which respond to the central object motion and an indirect inhibitory input from ACells that are connected to BCells which respond to the background motion. When motion is global, the excitatory signal is equivalent to the inhibitory one, resulting in an overall suppression. However, when the object in the centre moves distinctively from the surrounding background, the cell in the centre responds strongly.
There are here two concomitant effects. When a moving object (say, from left to right) enters the BCell pool connected to a central GCell k D , the BCells in the periphery of the pool respond first, with no significant change on the GCell response, because of the Gaussian shape (18) of the pooling: weights are small in the periphery. Those BCells excite, however, the ACells they are connected to, with the effect of inhibiting the BCells of neighbouring GCells pools. This has the effect of decreasing the voltage of these BCells which in turn excite less ACells which, in turn, inhibit less the BCells of the pool k D . Thus, the response of the GCell k D is enhanced, while the cells on the background are inhibited. We call this effect "push-pull" effect. Note that propagation delays ought to play an important role here, although we are not going to consider them in this paper.

Direction selective GCells and gap junction connectivity
These cells correspond to pathway IV in Fig. 1. They are only coupled via electric synapses (gap junctions). In several animals, like the mouse, this enables the corresponding GCells to be direction sensitive. Note that other mechanisms, involving lateral inhibition via starburst amacrine cells have also been widely reported [33,34,71,72,81,85,88]. Here we focus on gap junctions direction sensitive cells (DSGCs). There exist four major types of these DSGCs, each responding to edges moving in one of the four cardinal directions. Trenholm et al. [80] emphasised the role of these cells coupling in lag normalisation: uncoupled cells begin responding when a bar enters their receptive field, i.e. their dendritic field extension, whereas coupled cells start responding before the bar reaches their dendritic field. This anticipated response is due to the effective propagation of activity from neighbouring cells through gap junctions and is particularly interesting when comparing the responses for different velocities of the bar. Trenholm et al. showed that the uncoupled DSGCs detect the bar at a position which is further shifted as the velocity grows, while coupled cells respond at an almost constant position regardless of the velocity. In our work, we analyse this effect in terms of a propagating wave driven by the stimulus and show that temporally this spatial lag normalisation induces a motion extrapolation that confers to the retina more than just the ability to compensate for processing delays, but to anticipate motion.
Classical, symmetric bidirectional gap junctions coupling between neighbouring cells would involve a current of the form -g where g is the gap junction conductance. In contrast, here, the current takes the form -g(V G k -V G k-1 ). This is due to the specific asymmetric structure of the direction selective GCell dendritic tree [80]. The experimental results of these authors suggest that the effect of the possible gap junction input from downstream cells, in the direction of motion, can be neglected due to offset inhibition and gain control suppression. This, along with the asymmetry of the dendritic arbour, justifies the approximation whereby the cell k+1 does not influence the current in the cell k. This induces a strong difference in the propagation of a perturbation. Indeed, consider the case In the symmetric form the total current vanishes, whereas in the asymmetric form the current is -gδ. Still, the current can have both directions depending on the sign of δ. This has a strong consequence on the way GCells connected by gap junctions respond to a propagating stimulus, as shown in Sect. 3.3.
The total GCell voltage is the sum of the pooled BCell voltage V (P) G k and of the effect of neighbours GCells connected to k by gap junctions: where C is the membrane capacitance. Deriving the previous equation with respect to time, we obtain the following differential equation governing the GCell voltage: where Gain control is then applied on V G k as in (22). An alternative is to consider that gain control occurs before gap junctions effect. We investigated this effect as well (not shown, see [74]). Mainly, the anticipatory effect is enhanced when the gain control is applied after the gap junction coupling; thus, from now, we focus on the formulation (23) in the paper.
Note that our voltage-based model of gap junctions takes a different from as Trenholm et al. (expressed in terms of currents), because we had to adapt it so as to deal with the pooling voltage form (17). Still, our model reproduces the lag normalisation as in the original model as we checked (not shown, see [74]).

The mechanism of motion anticipation and the role of gain control
The (smooth) trajectory of a moving object can be extrapolated from its past position and velocity to obtain an estimate of its current location [4,56,57]. When human subjects are shown a moving bar travelling at constant velocity, while a second bar is briefly flashed in alignment with the moving bar, the subjects report seeing the flashed bar trailing behind the moving bar. This led Berry et al. [9] to investigate the potential role of the retina in anticipation mechanisms. Under constraints on the bar speed and contrast they were able to exhibit a positive anticipation time, defined as the time lag between the peak in the retinal GCell response to a flashed bar and the corresponding peak when the stimulus is a moving bar.
In this paper we adopt a slightly different definition although inspired by it. Indeed, the goal of this modelling paper is to dissect the various potential stages of retinal anticipation as developed in the next subsections.
Several layers and mechanisms are involved in the model, each one defining a response time and potentially contributing to anticipation under conditions that we now analyse.

Anticipation at the level of a single isolated BCell; the local effect of gain control
We consider first a single BCell without lateral connectivity so that V B i = V i drive . The very mechanism of anticipation at this stage is illustrated in Fig. 4. The peak response time of the convolution of the stimulus with the RF of one BCell occurs at a time t B (dashed line in Fig. 4(a)). The increase in V i drive leads to an increase in activity ( Fig. 4(c)) and an increase of R B (Fig. 4(e)). When activity becomes large enough, gain control switches on ( Fig. 4(d)) leading to a sharp decrease of the response R B (Fig. 4(e)) and a peak in R B occurring at time t B A (dashed line in Fig. 4(e)) before t B . The bipolar anticipation time, B = t Bt B A , is therefore positive.
Mathematically, B > 0 results from the intermediate value theorem using that

and that t B A is defined by
where the right-hand side is positive provided that the parameters h B , τ a are tuned a such that ]. An important consequence is that the amplitude of the response at the peak is smaller in the presence of gain control (compare the amplitude of the voltage in Fig. 4(a) to 4(e)).
The anticipation time at the BCell level depends on parameters such as h B , τ a . It depends as well on characteristics of the stimulus such as contrast, size and speed. An easy way to figure this out is to consider that the peak in BCell response ( Fig. 4(d), (e)) arises when the gain control function G B (A B i ) starts to drop off ( Fig. 4(e)), which from the piecewise linear approximation (10) of BCell arises when A = 2 3 . When V i drive has the form (3). this gives, using N (V i drive ) = V i drive (7) and letting the initial time t 0 → -∞ (which corresponds to assuming that the initial state was taken in a distant past, quite longer than the time scales in the model): where (x) is the cumulative distribution function of the standard Gaussian probability (see definition, Eq. (60) in Appendix B). This establishes an explicit equation for the time t B A as a function of contrast (A 0 ), size (σ ) and speed (v) as well as the parameters h B and τ a . We do not show the corresponding curves here (see [74] for a detailed study) preferring to illustrate the global anticipation at the level of GCells, illustrated in Fig. 5 below.

Anticipation time of the BCells pooled voltage
The main effects we want to illustrate in the paper (impact of lateral connectivity on GCell anticipation) are evidenced by the shift of the peak in activity of the BCells pooled voltage occurring at time t G . We focus on this time here, postponing to Sect. 3.1.3 the subsequent effect of GCells gain control. We assume therefore here that h G = 0 so that A G k = 0 and G G (A G k ) = 1 in (19). Thus, the firing rate of GCell k is N G (V G k ). For mathematical sim-plicity, we will consider that the firing rate function (5) of G is a smooth, monotonously increasing sigmoid function such that N G (V G k ) > 0. We define t G as the time when V G k is maximum, after the stimulus is switched on. This corresponds to dV G k dt = 0 and where this equation holds at time t = t G (we have not written explicitly t G to alleviate notation). This is the most general equation for the anticipation time at the level of BCells pooling.
In the sum i , there are two types of BCells. The inactive ones where For the moment we assume that, at time t G , there is no BCell switching from one state (active/inactive) to the other, postponing this case to the end of the section. Then, Eq. (26) reduces to This general equation emphasises the respective role of (I), stimulus (term F B i (t)); (II), gain . Note that we could as well consider a symmetric gap junctions connectivity where we would have a term The equation terms have been arranged this way for reasons that become clear in the next lines. It is not possible to solve this equation in full generality, but it can be used to understand the respective role of each component.
In the absence of gain control and lateral connectivity (W This generalises the definition of t B , time of peak of a single BCell, to a set of pooled BCells, and we will proceed along the same lines as in Sect. 3.1.1. We fix as reference time 0 the time when the pooled voltage becomes positive. It increases then until the time t G when is positive on [0, t G [ and vanishes at t G . We now show that, in the presence of gain control, the peak occurs at time t G < t G leading to anticipation induced by gain control and generalising the effect observed for one BCell in Sect. 3.1.1. Indeed, Eq. (27) reads now as follows: so that the left-hand side in (29) reaches 0 at a time t G ≤ t G . The right-hand side is positive for the same reasons as in Sect. 3.1.1. The same mathematical argument holds as well, using the intermediate value theorem to show that t G < t G .
We now investigate Eq. (27) with the two terms of lateral connectivity: (III) ACells and (IV) gap junctions. The effect of gap junctions is straightforward. A positive term w gap [V G k -V G k-1 ] increases the right-hand side of Eq. (27). As developed in Sect. 3.3, this arises when the stimulus propagates in the preferred direction of the cell inducing a wave of activity propagating ahead of the stimulus. In view of the qualitative argument developed above using the intermediate value theorem, this can enhance the anticipation time. This deserves, however, a deeper study developed in Sect. 3.3.
The effect of ACells cells is less evident, as the term (-1 can have any sign, so that network effect can either anticipate or delay the ganglion response, as illustrated in several examples in the next section. As we show, this term is in general related to a wave of activity, enhancing or weakening the anticipation effect as shown in Sect. 3.2. Let us finally discuss what happens when some BCell switches from one state (active/inactive) to the other (i.e. V B i = B ). In this case, taking into account the definition (5), the derivative N B (V B i ) = 1 2 . Thus, when a BCell reaches the lower threshold, there is a big variation in N B (V B i ) thereby leading to a positive contribution in (26) and an additional term 1 (27), where the sum holds on switching state cells. As we see in section (3.2), this can have an important impact on the anticipation time.

Anticipation time at the GCell level
We now show that the firing rate of the GCell k, given by (22), reaches its maximum at a time t G A < t G . From (22), at time t G A : V G k starts from 0 and increases on the time interval [0, t G ], thus The right-hand side of (30) starts from 0 at t = 0 and stays strictly positive until either V G k vanishes, which occurs for t > t G , or until dA G k dt vanishes. We choose the characteristic time τ G and the intensity h G in (20) so that dt increases from 0. From the intermediate value theorem, these two curves have to intersect at a time t G A < t G . We finally define the total anticipation time of a GCell as follows: where t B c is the peak of the BCell at the centre of the BCells pooling to that GCell.

Anticipation variability: stimulus characteristics
In general, depends on gain control, lateral connectivity as well as characteristics of the stimulus such as speed and contrast. This has been shown mathematically in Eq. (25) for a single BCell. Here, we investigate numerically the dependence of the total anticipation time of a GCell when the stimulus is a bar of infinite height, width σ mm, travelling in one dimension at speed v mm/s with contrast C ∈ [0, 1]. Results are shown in Fig. 5. This figure is a calibration later used to compare to the effects induced by lateral connectivity. We first observe that anticipation increases with contrast, as it has experimentally been observed [9]. Indeed, increasing the contrast increases V i drive (t) thereby accelerating the growth of A i so that gain control takes place earlier ( Fig. 5(a)). We also notice that anticipation increases with the width of the object until a maximum ( Fig. 5(b)). Finally, the model shows a decrease in anticipation as a function of velocity, as it was evidenced experimentally [9,47] (Fig. 4(c)). Indeed, when the velocity increases, V drive varies faster than the characteristic activation time τ a , and the adaptation peak value is lower. Consequently, gain control has a weaker effect and the peak activity is less shifted than when the bar is slow.
A large part of these effects can be understood from Eq. (25). Note, however, here that simulation of Fig. 5 takes into account the convolution of a moving bar with the receptive field, the pooling effect and gain control at the stage of GCells.
In Fig. 5 we also show the evolution of GCells maximum firing rate as a function of the moving bar velocity, contrast and size. We observe that it increases with these parameters, an expected result.

The potential role of ACell lateral inhibition on anticipation
In this section we study the potential effect of ACells (pathway III of Fig. 1) on motion anticipation. We restrict to the case where there are as many BCells as ACells (N B = N A ≡ N ) so that the matrices W A B and W B A are square matrices. We first derive general mathematical results (for the full derivation, see Appendix D) before considering the two types of connectivity described in Sect. 2.3.3.

Mathematical study Dynamical system
We study mathematically dynamical system (16) that we write in a more convenient form. We use Greek indices α, β, γ = 1, . . . , 3N and define the state vector X as follows: Likewise, we define the stimulus vector F α = F B i if α = 1, . . . , N and F α = 0 otherwise. Then dynamical system (16) has the general form where H( X ) is a nonlinear function, via the function (8), featuring gain control and low voltage threshold. The nonlinear problem can be simplified using the piecewise linear approximation (10).
where (16) is linear and can be written in the form where I N,N is the N × N identity matrix and 0 N,N is the N × N zero matrix. This corresponds to intermediate activity, where neither BCells gain control (9) nor low threshold (5) are active. We first study this case and describe then what happens when trajectories of (32) get out of this domain, activating low voltage threshold or gain control. The idea of using such a phase space decomposition with piecewise linear approximations has been used in a different context by Coombes et al. [23] and in [14,15,18].
We consider the evolution of the state vector X (t) from an initial time t 0 . Typically, t 0 is a reference time where the network is at rest before the stimulus is applied. So, the initial condition X (t 0 ) will be set to 0 without loss of generality.
Linear analysis The general solution of (34) is The behaviour of solution (36) depends on the eigenvalues λ β , β = 1, . . . , 3N , of L and its eigenvectors P β with entries P αβ . The matrix P transforms L in Jordan form (L is not diagonalizable when h B = 0, see Appendix D.1). Whatever the form of the connectivity matrices W B A , W A B , the N last eigenvalues are always λ β = -1 τ a , β = 2N + 1, . . . , 3N . In Appendix D.1 we show the following general result (not depending on the specific form of W B A , W A B , they just need to be square matrices and to be diagonalizable): where drive term (1) is The other terms have the following definition and meaning: corresponds to the indirect effect, via the ACell connectivity, of the BCells drive on BCells voltages (i.e. the drive excites BCell i, which acts on BCell j via the ACells network); corresponds to the effect of BCell drive on ACell voltages, and corresponds to the effect of the BCells drive on the dynamics of BCell activity variables. The first term of (40) corresponds to the action of BCells and ACells on the activity of BCells via lateral connectivity. In the second term corresponds to the direct effect of the BCell voltage with index α -2N on its activity (see Eq. (7)).
To sum up, Eq. (37) describes the direct effect of a time-dependent stimulus (first term) and the indirect lateral network effects it induces. The term E B a,α (t) is what activates the gain control. In the piecewise linear approximation (10), the BCell i triggers its gain control when its activity This relation extends the computation made in Sect. 3.1.1 for isolated BCells to the case of a BCell under the influence of ACells. On this basis, let us now discuss how the network effect influences the activation of gain control and, thereby, anticipation. The structure of terms (38), (39) (40) is interpreted as follows. The drive (index γ = 1, . . . , N ) excites the eigenmodes β = 1, . . . , 3N of L with a weight proportional to P -1 βγ . The mode β in turn excites the variable α = 1, . . . , 3N with a weight proportional to P αβ . The time dependence and the effect of the drive are controlled by the integral t t 0 e λ β (t-s) V γ drive (s) ds. For example, when the stimulus has the Gaussian form (3) and cells are spaced with a distance δ so that cell γ is located at x = γ δ, we have, taking where (x) is the cumulative distribution function of the standard Gaussian probability (see definition, Eq. (60) in Appendix B). This is actually the same computation as (25) with λ β = -1 τ a . Equation (43)  Here, the sign of the real part of λ β , λ β,r is important. If λ β,r < 0, the front has the shape depicted in Fig. 6(top). It decays exponentially fast as t → +∞ with a time scale 1 |λ β,r | . On the opposite, it increases exponentially fast with a time scale 1 λ β,r as t → +∞ when λ β,r > 0, thereby enhancing the network effect and accelerating the activation of nonlinear effect (low threshold or gain control) leading the trajectory out of . Remark that the peak of the drive occurs at γ δvt = 0. The inflexion point of the function (x) is at x = 0. Thus, when λ β < 0, the front is a bit behind the drive, whereas it is a bit ahead when λ β > 0.
Having unstable eigenvalues is not the only way to get out of . Indeed, even if all eigenvalues are stable, the drive itself can lead some cells to get out of this set. When the trajectory of dynamical system (34) gets out of , two cases are then possible: for L, controlling the exponential instability observed in Fig. 6(bottom). Thus, too low BCell voltages trigger a re-stabilisation of the dynamical system. (ii) There are BCells such that condition (42) holds, then gain control is activated and system (32) becomes nonlinear. Here, we get out of the linear analysis, and we have not been able to solve the problem mathematically. There is, however, a simple qualitative argument. If the cell i enters the gain control region, then the , which rapidly decays to 0 (see, e.g. Fig. 4(e)). From the same argument as in (i), this generates a stable eigenvalue ∼-1 τ A controlling as well the exponential instability. Equation (37) features therefore the direct effect of the stimulus as well as the indirect effect via the amacrine network, corresponding to a weighted sum of propagating fronts, generated by the stimulus and influencing a given cell through the connectivity pathways. These fronts interfere either constructively, inducing a wave of activity enhancing the effect of the stimulus and, thereby, anticipation, or destructively somewhat lowering the stimulus effect. The fine tuning between "constructive" and "destructive" interferences depends on the connectivity matrix via the spectrum of L and its projection vectors P β . For example, complex eigenvalues introduce time oscillations which are likely to generate destructive interferences, unless some specific resonance conditions exist between the imaginary parts of the eigenvalues λ β . Such resonances are known to exist, e.g. in neural network models exhibiting a Ruelle-Takens transition to chaos [65], and they are closely related to the spectrum of the connectivity matrix [16]. Although we are not in this situation here, our linear analysis clearly shows the influence of the spectrum of L, itself constrained by W, on the network response to stimuli and anticipation.
Spectrum of L This argumentation invites us to consider different situations where one can figure out how connectivity impacts the spectrum of L and thereby anticipation. We therefore provide some general results about the spectrum of L and potential linear instabilities before considering specific examples. These results are proved in Appendix D.2. As stated in Sect. 2.3.3, to go further in the analysis, we now assume that a BCell connects only one ACell, with a weight w + uniform for all BCells, so that W B A = w + I N,N , w + > 0. We also assume that ACells connect to BCells with a connectivity matrix W, not necessarily symmetric, with a uniform weight -w -, w -> 0, so that W A B = -w -W. We denote by κ n , n = 1, . . . , N , the eigenvalues of W ordered as |κ 1 | ≤ |κ 2 | ≤ · · · ≤ |κ n |, and ψ n is the corresponding eigenvector. We normalise ψ n so that ψ † n . ψ n = 1 where † is the adjoint. (Note that, as W is not symmetric in general, eigenvectors are complex). From the eigenvalues and eigenvectors of W, one can compute the eigenvalues and eigenvectors of L (see Appendix D.2),and infer stability conditions for the linear system. The main conclusions are the following: 1. The stability of the linear system is controlled by the reduced, a-dimensional parameter: where: with a degenerate case when τ A = τ B considered in the Appendix. 2. If W is symmetric, its eigenvalues κ n are real, but the eigenvalues of L can be real or complex. Each κ n corresponds actually to eigenvalues λ ± n of L (see Eq. (71)). (a) If κ n < 0, the two corresponding eigenvalues of L are real and one of the two corresponding eigenmodes of L becomes unstable when (b) If κ n > 0 and if 1 τ = 0, the corresponding eigenvalues of L are complex conjugate if The corresponding eigenmodes are always stable. 3. If W is asymmetric, eigenvalues κ n are complex, κ n = κ n,r + iκ n,i . The eigenvalues of L have the form √ a n + u n ; where a n = 1 -4μκ n,r and u n = (1 -4μκ n,r ) 2 + 16μ 2 κ 2 n,i = 1 -8μκ 2 n,r + 16μ 2 |κ n | 2 . Note that we recover the real case when κ n,i = 0 by setting u n = a n .
Instability occurs if λ β,r > 0 for some β. This gives a n + u n > 2 a condition on μ depending on κ n,r and κ n,i .

Remarks
The introduction of a dimensional parameter μ allows us to simplify the study of the joint influence of w -, w + , τ on dynamics because stability is controlled by μ only.
In other words, a bifurcation condition of the form μ = μ c signifies that this bifurcation holds when the parameters w -, w + , τ lay on the manifold defined by ww + τ 2 = μ c .
We now show this in two examples of connectivity and afferent instabilities.

Nearest neighbours connectivity
Eigenmodes of the linear regime We consider the case where the matrix W, connecting ACells to BCells, is a matrix of nearest neighbours symmetric connections. In this case, W can be written in terms of the discrete Laplacian on a d dimensional regular lattice, d = 1, 2, with lattice spacing δ A = δ B set here equal to 1 without loss of generality: Because of this relation, we will often use the terminology Laplacian connectivity for the nearest-neighbours connectivity. We also assume that dynamics holds on a square lattice with null boundary conditions. That is, ACell and BCells are located on d-dimensional grid with indices i x , i y = 0, . . . , L + 1 where the voltage and activity of cells with indices i x = 0, i x = L + 1, i y = 0 or i y = L + 1 vanish. The eigenvalues and eigenvectors are explicitly known in this case. They are parametrized by a quantum number n = n x ∈ {1, . . . , L = N} in one dimension and by two quantum numbers (n x , n y ) ∈ {1, . . . , L = N} 2 in two dimensions. They define a wave vector k n = ( n x π L+1 , n y π L+1 ) corresponding to wave lengths ( L+1 n x , L+1 n y ). Hence, the first eigenmode (n x = 1, n y = 1) corresponds to the largest space scale (scale of the whole retina) with the smallest eigenvalue (in absolute value) s (1,1) = 2(cos( π L+1 ) + cos( π L+1 ) -2). To each of these eigenmodes is related a characteristic time τ n = 1 λ n . The slowest mode is the mode (1, 1). In contrast, the fastest mode is the mode (n x = L, n y = L) corresponding to the smallest space scale, the scale of the lattice spacing δ = 1.
Eigenvalues κ n can be positive or negative. Consider for example the one-dimensional case, where κ n = 2 cos( nπ L+1 ). We choose L even to avoid having a zero eigenvalue κ L 2 .
Eigenvalues κ n , n = 1, . . . , L 2 , are positive, thus the corresponding eigenvalues λ ± n of L are complex and stable. The modes with the largest space scale L n are therefore stable for the linear dynamical system with oscillations. Eigenvalues κ n , n = L 2 + 1, . . . , L, are negative, thus the corresponding eigenvalues λ ± n of L are real. From (46) the mode n becomes unstable when Therefore, the first mode to become unstable is the mode L with the smallest space scale 1 (lattice spacing). For large L, this happens for ww + ∼ 1 2 1 τ A τ B . This instability induces spatial oscillations at the scale of the lattice spacing. When ww + further increases, the next modes become unstable. This instability results in a wave packet following the drive (as shown in Fig. 6). The width of this wave packet is controlled by the unstable modes and by nonlinear effects. We now illustrate the relations of these spectral properties with the mechanism of anticipation.
Numerical results In all the following 1D simulations, we consider a bar with a width 150 μm, moving in one dimension at constant speed v = 3 mm/s. We simulate 100 BCells, 100 ACells and 100 GCells placed on a 1D horizontal grid with a uniform spacing of δ b = δ a = δ g = 30 μm between to consecutive cells. At time t = 0, the first cell lies at 100 μm to the right of the leading edge of the moving bar. We set τ B = 300 ms, τ a = 50 ms, τ A = 100 ms, corresponding to τ = 150 ms (Eq. (45)). We vary the value of weights w + , w -. For the sake of simplicity, we also choose w + = -w -= w to have only one control parameter. We investigate how the bipolar anticipation time B and the maximum in the response R B depend on w. This is summarised in Fig. 7(top), where we have shown the effect of gain control alone (blue horizontal line, independent of w), the effect of ACell lateral connectivity alone (red triangles) and the compound effect (white squares). Anticipation time is averaged over all cells. On the same figure (bottom) we see the responses of two neighbour cells lying at the centre of the lattice.
As w increases, we observe three areas of interest: the first (A) corresponds to a regime where ACell connectivity has a negative effect on anticipation, competing with gain con- trol. As w is small, the anticipation is controlled by the direct pathway I, II of Fig. 1, from BCells to GCells, with a small inhibition coming from ACells, thereby decreasing the voltage of BCells and impairing the effect of gain control. This explains why the anticipation time in the case of lateral connectivity + gain control is smaller than the anticipation time of gain control alone. The network effect (red triangles) on anticipation time increases with w though. This corresponds to the "push-pull" effect already evoked in Sect. 2.3. When a BCell B i feels the stimulus, its activity increases favoured by the stimulus, it increases the voltage of the connected ACell, inhibiting the next BCell B i+1 , thereby inducing a feedback loop, the push-pull effect, enhancing the voltage of B i .
In zone (B) the push-pull effect becomes more efficient than gain control alone. In this region, the voltage of the BCell feeling the bar increases fast, while the voltage of its neighbours becomes more and more negative, enhancing the feedback loop. This holds until the voltage rectification (5) takes place. This is the time when the dynamical system gets out of . The push-pull effect then saturates and V B i reaches a maximum, corresponding to a peak in activity. This peak is reached faster than the peak in the function G B (A). Thus, the peak of R B i (t) occurs at the same time as the peak of N B (V B i (t)) and, thus, before the reference peak (time t B for isolated BCells defined in Sect. 3.1.1). In other words, the ACell lateral connectivity allows the BCell to outperform the gain control mechanism for anticipation. As w increases in zone B the push-pull effect (averaged over BCells) reaches a maximum, then decreases. This is because the increase in w makes the inhibitory effect of ACells stronger and stronger on silent BCells which then remain silent longer and longer because the ACell voltage increases with w, and it takes longer for it to decrease and deinhibit the neighbours. The silent cells are less and less sensitive to the stimulus, being strongly and durably inhibited.
In region C, the anticipation is again dominated by gain control. In this case, the effect on cells depends on the parity of their index. The response of BCells is either completely suppressed or identical to the response of the reference case (with gain control alone). This is why the average anticipation time with gain control is about half of the gain control without network effect. Cells that are inhibited do no participate to anticipation, and the others anticipate in the same way than with gain control alone. Note that this "parity" effect is due to the nearest neighbours connectivity and the symmetry of interactions.
We now interpret and complete these results from the point of view of the spectrum of L and associated dynamics. The fastest mode to destabilise corresponds to the smallest space scale, i.e. the lattice spacing. This is a mode with alternate sign at the scale of the lattice. We call it the "push-pull" mode, as it is precisely what makes the push-pull effect. When the push-pull mode becomes unstable, the excited BCell becomes more and more excited and the next BCell more and more inhibited. However, the time it takes τ L has to be compared to the time where the bar stays in the RF, τ bar (and more generally the time it takes to RF kernel to respond to the bar). In the case of the simulation σ center = 90 μm (see Appendix A, Table 1) and v = 3 μm/ms giving a characteristic time τ bar = 270ms, whereas, as we observed, τ L < 100 ms. The push-pull mode is therefore quite faster than τ bar , so the push-pull effect takes place fast and leads to a fast exponential increase of the front depicted in Fig. 6(right). This explains the rapid increase of network anticipation effect observed in regions A, B of Fig. 7.

Random connectivity
In this section, we study the behaviour of the model using the more realistic, probabilistic type of connectivity presented in Sect. 2.3.3 and more thoroughly studied in Appendix C. Within this framework, a given ACell A i receives the upstream activity from the BCell lying at the same position B i with a constant weight w. The same ACell inhibits BCells with which it is coupled through the random adjacency matrix W, generated by the probabilistic model of connectivity, and the weight matrix W B A = -wW. We recall that the connectivity depends on a scale parameter ξ for the branch length) and the mean and variancē n, σ for the distribution of the number of branches. These parameters can be found in Table 1 in Appendix A.
Eigenmodes of the linear regime Similarly to Sect. 3.2.2 we now analyse the spectrum of L when W is a random connectivity matrix. Although a couple of results can be established (using the Perron-Frobenius theorem), we have not been able to find general mathematical results on the spectrum or eigenvectors of this family of random matrices. We thus performed numerical simulations.
The spectrum of L is deduced from the spectrum of W as exposed above. The spectrum of W depends onn, σ and ξ . In Fig. 8 we have plotted, on the left, an example of such a spectrum. This is the spectral density (distributions of eigenvalues in the complex plane) obtained from the diagonalization of 10,000 matrices 100 × 100 (so the statistics is made over 10 6 eigenvalues). We note that the largest eigenvalues are always real positive, a straightforward consequence of the Perron-Frobenius theorem [38,69]. More generally, we observe an over-density of real eigenvalues. The same holds for random Gaussian matrices with independent entries N (0, 1 N ) [32] whose asymptotic density converges to the circular law [40]. The shape of the spectral density in our model differs from the circular law though, and it depends on the parametersn, σ and ξ .
On the same figure we show the corresponding spectral density of L obtained from Eq. (48) for w = 0.05, 0.1, 015. We have taken here τ A = 30, τ B = 10 ms to see better the transitions with w (level lines in Fig. 9). There is an evident symmetry with respect to 1 τ AB = -0.066 expected from the mathematical analysis. We see that the largest eigenvalue is real (although it is not necessarily related to the largest eigenvalue of W). We also see that, as w increases, a large number of (complex) eigenvalues become unstable. There is actually a frontier of instability that we have plotted in the plane w, ξ for different values ofn. This is shown in Fig. 9 (dashed line). The level line 0 is the frontier of instability of the linear dynamical system. This frontier has the (empirical) form (ξξ 0 ).(w -w0) = c, where c has the dimension of a characteristic speed.
What matters here is that there are complex unstable eigenvalues with no specific resonance relations between them. They are therefore prone to generate destructive interferences in (37).
Numerical results In Fig. 10 we consider, similarly to Fig. 7 for Laplacian connectivity, the effect of random connectivity on anticipation, compared to pure gain control mechanism. In contrast to the Laplacian case, we have here more parameters to handle: ξ , which controls the characteristic length of branches andn, σ n which control the number of branches distribution. We present here a few results where ξ varies, whereas the average number of branchesn = 2 (σ n = 1). A more systematic study is done in [74]. The interest of varying ξ is to start from a situation which is close to the Laplacian case (characteristic distance ξ = 1) and to increase ξ to see how the size of the dendritic tree of ACells may impact anticipation. This is a preliminary step toward considering different physiological ACells type (e.g. narrow-, medium-, or wide-field [29]). Note, however, that the probability of connection given the distance of cells (fixed byn, σ n ) implicitly impacts w and the anticipation effects.
The main difference with the Laplacian case is the asymmetry of connections. Here, symmetry means that if ACell j connects the BCell i, then the ACell i connects the BCell j too. This does not necessarily hold for random connectivity, and this has a strong impact on the push-pull effect and anticipation. So, even if the connectivity is short-range when ξ is small, mainly connecting nearest neighbours, we observe already a big difference with the Laplacian case. This is shown in Fig. 10, where ξ = 1. Similarly to the Laplacian case, we observe three main regions depending on w. To have the same representation as Fig. 7, we present V A , N B , R B for two connected cells (here, ACell 51 and BCell 52). However, in this case, connection is not symmetric: ACell 51 inhibits BCell 52, but ACell 52 does not inhibit BCell 51.
We observe three regimes, as in the Laplacian case. In the first region (A) ACell random connectivity has a negative effect on anticipation, as compared to gain control alone. However, since in this case the "push-pull" effect is not evoked, this decay simply comes from the fact that BCell 52 receives an inhibition for ACell 51 that reduces the effect of gain control. This inhibition, however, is not strong enough to significantly shift the peak response as in region (B).
Indeed, in region (B), the inhibition of BCell 52 is strong enough to outperform the effect of gain control. In this case, and similarly to the Laplacian case, the peak of R B i (t) occurs at the same time as the peak of N B (V B i (t)) and before the reference peak. However, this effect is not consistent over all cells and only occurs for BCells that receive active inhibition. This explains why the performance of the Laplacian connectivity is better, on average, in this region.
Finally, as w grows higher, the inhibition grows stronger, completely inhibiting BCell 51. Cells that do not receive any inhibition, as BCell 49 in this example, keep a response that is identical to the response without ACell connectivity. The fraction of cells receiving inhibition in this case being quite small (about 15), this explains why the stationary value of anticipation is fairly close to the value with gain control alone.
The role of the characteristic distance In Fig. 11, we analyse the effect of the characteristic length ξ on anticipation. On the top of the figure we represent the joint effect of the random ACell connectivity and gain control on anticipation for three values of ξ . At the bottom we represent the only effect of the random ACell connectivity for the same values of ξ . We observe that performance in anticipation decreases with ξ . More precisely, we observe an anticipatory effect in this case, as shown in Fig. 11(bottom), but this effect is not able to compete with gain control alone. Even worse, the compound effect shown in 11(top) is disastrous since increasing w renders the anticipation time smaller and smaller.
This spurious effect can be interpreted through the analysis made in Sect. 3.2.1, Eq. (37). From the spectrum of L, we see that there are unstable complex eigenvalues whose number increases with w. These eigenvalues are prone to generate destructive interferences, especially when their number becomes large as w increases, explaining the small peak in region B. The consequence on cells activity and gain control can be dramatic as seen in the red trace of Fig. 10(B bottom), line R B . This depends on the precise connectivity pattern when long range connections from ACells to BCells induce a desensitisation of BCells, which is not counterbalanced by the push-pull effect as in the Laplacian connectivity case.

Conclusion
The two numerical examples considered in this section emphasise the role of symmetry in the synapses and, more generally, the role of complex versus real eigenvalues in the spectrum of L. Recall that, from Sect. 3.2.1, if W is symmetric, complex eigenvalues are always stable, so, for the type of architecture considered here, unstable destructive interferences only occur when W is asymmetric. This leads to several questions, potential subjects for further studies.

How much does anticipation depend on the degree of asymmetry in the matrix W?
The way we generate the random connectivity in the model does not allow us to tune the degree of asymmetry (i.e. the probability that a connection A j → B i exists simultaneously with a connection A i → B j ). Therefore, one has to find a different way to generate the connectivity. From the mathematical analysis made in Appendix C, a distribution depending exponentially on the distance, with a tunable probability to have a symmetric connection, could be appropriate. We do not know about any experimental results characterising this degree of symmetry of the connections in the retina. On mathematical grounds, and from the analogy of the spectrum of W with a circular law, one could expect the spectrum of W to become more and more elongated on the real axis as the degree of symmetry increases in an elliptic like law [55]. 2. Nonlinear effects. The destructive interference effect in our model is partly due to the linear nature of the ACell dynamics. In nonlinear dynamics, eigenvalues of the evolution operator can display resonance conditions favouring constructive interferences. On biological grounds, it is for example known that starburst amacrine cells display periodic bursting activity during development, disappearing a few days after birth [94]. Bursting and its disappearance can be understood in the framework of bifurcation theory of a nonlinear dynamical system featuring these cells [50]. In this setting, even if they are not bursting in the mature stage, SACs remain sensitive to specific stimulation that can temporally synchronise them, thereby enhancing the network effect with a potential effect on anticipation.

The potential role of gap junctions on anticipation
In this section, we study the network ability to improve anticipation in the presence of gap junctions coupling, as in Eq. (23), and gain control at the level of GCells. We start first with mathematical results and show then simulation results.

Mathematical study
We use a continuous space limit for a one-dimensional lattice. The extension to two dimensions is straightforward. Here, x corresponds to the preferred direction of the direction sensitive cells. We consider a continuous spatio- t). We assume likewise that V (P) G k ≡ V (P) (kδ G , t) for some continuous function V (P) (x, t) corresponding to the GCells bipolar pooling input (17), and we take the limit δ G → 0. In this limit Eq. (23) becomes where v gap ≡ w gap δ G has the dimension of a speed and ∂V (P) (x,t) Solution Neglecting terms of order δ 2 G , the general solution of (52) is Equation (52) is a transport equation of ballistic type [74]. For example, if we consider a stimulation of the form V (P) (x, t) = h(xvt), where h is a Gaussian pulse of the form (3), propagating with speed v, and an initial profile C(x) = h(xvt 0 ), the voltage of GCells obeys When v gap = 0, the GCell voltage follows the stimulation, i.e. V G (x, t) = h(xvt). In the presence of gap junctions, there are two pulses: the first one π stim with amplitude v v-v gap propagating at speed v and following the stimulation; the second one π gap with amplitude v gap v-v gap propagating at speed v gap . We have the following cases (we take t 0 = 0 for simplicity). An illustration is given in Fig. 12. 1. If v and v gap have the same sign: (A) If v gap < v, the front π stim is amplified by a factor v v-v gap , whereas there is a refractory front π gap , proportional to v gap , behind the excitatory pulse.
when t → ∞ and x → +∞. This divergence is a consequence of the limit δ G → 0 in (52). If v and v gap have the opposite sign, we set v = -αv gap with α > 0. Then the front π stim follows the stimulus but is attenuated by a factor α 1+α . The front π gap propagates in the opposite direction with an attenuated amplitude 1 1+α . This shows that these gap junctions favour the response to motion in the preferred direction and attenuate the motion in the opposite direction although the attenuation is weak. The effect is reinforced by gain control [74]. The most interesting case is 1 c where these gap junctions can induce a wave of activation ahead of the stimulation.
Effect of gain control When the low voltage threshold N G (19) and the gain control G G (A) (21) are applied to V G (x, t), there are two effects: (i) the hyperpolarised front is cut by N G ; (ii) the positive pulse induces a raise in activity, which in turn triggers the ganglion gain control G G (A) inducing an anticipated peak in the response of the GCell, similar to what happens with BCells, with a different form for the GCell gain control though. Moreover, in contrast to pathway II of Fig. 1, where only gain control generates anticipation, in pathway IV the wave of activity generated by gap junctions increases anticipation by two distinct effects. If v gap < v, the cell's response propagates at the same speed as the stimulus, but its amplitude is larger than the case with no gap junction (term π stim ). From Eq. (53) this results in an increase of h B to an effective value h B v v-v gap inducing an improvement in the anticipation time (with a saturation of the effect, though, as v gap → v). If v gap > v, the cell's response propagates at a larger speed than the stimulus (term π gap ), so that the cell responds before the time of response without gaps. This induces as well an increase in the anticipation time.

Numerical illustrations
We consider a bar with a width 200 μm moving in one dimension at constant speed v = 3 mm/s. We simulate here 100 GCells, placed on a 1D horizontal grid, with a spacing of 30 μm between two consecutive cells. At time t = 0, the first cell lies at 100 μm from the leading edge of the moving bar.
We investigate how the GCell anticipation time and GCell firing rate depend on v gap in Fig. 12. The top shows the effect of gain control alone (blue horizontal line independent of v gap ), the effect of the asymmetric gap junction connectivity alone (red triangles) and the compound effect (white squares). Anticipation time is averaged over all GCells. In the bottom part of the figure, we show the responses of two GCells of indices 30 and 60, spaced by 900 μm.
As explained in Sect. 3.3.1, we observe the three regimes A, B, C mathematically anticipated above. Note that, for these parameter values, the negative trailing front predicted in A is not visible.

Symmetric gap junctions
The asymmetry observed by Trenholm et al. is due to the specific structure of the direction selective GCell dendritic tree [80]. However, in general gap junctions connectivity is expected to be symmetric. So, to be complete, we consider here the effect of symmetric gap junctions on anticipation. It is not difficult to derive the equivalent of Eq. (52) in this case too. This is a diffusion equation of the form where D gap = w gap δG 2 is the diffusion coefficient and is the Laplacian operator. The response to a Gaussian stimulus of the form (3) reads as follows: where H(x, y, t) = e -x 2 +y 2 4Dgapt 4πD gap t (55) which is the heat equation diffusion kernel.
where h is a Gaussian pulse of the form (3) propagating with speed v, f is a bimodal function of the form v σ 2 h(xvt) × (xvt), the shape of which can be seen in Fig. 13(bottom), second row. The convolution with the heat kernel leads to a front propagating at the same rate as the stimulus, with a diffusive spreading whose rate is controlled by D gap . In particular, there is positive bump ahead of the motion, which can induce anticipation, as shown in Fig. 13(top). The effect is weak Although this positive front, for small D gap , increases a bit the anticipation time by accelerating the gain control triggering, rapidly the peak in the response R B is led by the voltage peak corresponding to the positive bump with a low voltage. The position of this peak is, roughly, at a distance σ = σ 2 center + σ 2 B from the peak of the Gaussian pool, where σ center is the width of the centre RF and σ B the width of the bar. This corresponds to a time σ v ahead of the peak in the drive, fixing a maximal value to the anticipation time (see the saturation of the anticipation time curve in Fig. 13(top, left)). In our case, σ ∼ 134 given a saturation peak at 134 μm 3 μm/ms = 44.84 ms. A consequence of the voltage decay is the corresponding power law ( 1 √ D gap for large D gap ) decay of the firing rate ( Fig. 13(top, right)).
To conclude, the situation with symmetric gap junctions is in high contrast with direction selective gap junctions where the response to stimuli was ballistic and was not decreasing with time. On this basis we consider that, for symmetric gap junctions, the anticipation effect is irrelevant, especially taking into account the smallness of the voltage response in case C.

Numerical results
We investigate in this section how the GCell anticipation time and GCell firing rate depend on the gap junction conductance in the case of symmetric gap junctions. In Fig. 13(top), we use the same representation as in Fig. 12. For consistency with the direction sensitive case, we choose v gap = D gap δG as a control parameter. We also take (A) v gap = 0.6 mm/s, (B) v gap = 3 mm/s, (C) v gap = 12 mm/s in Fig. 13(bottom). This corresponds to a diffusion coefficient (A) D gap = 18 × 10 -3 mm 2 /s, (B) D gap = 90 × 10 -3 mm 2 /s, (C) D gap = 360 × 10 -3 mm 2 /s.

Conclusion
In this section we have shown how gap junctions direction sensitive cells can display anticipation due to the propagation of a wave of activity ahead of the stimulus. This effect is negligible for symmetric gap junctions. Note that symmetric gap junctions are known to favour waves propagation, for example in the early development (stage I, see [49] and the reference therein for a recent numerical investigation). Here, gap junctions are considered in a different context due to the presence of a nonstationary stimulus triggering the wave.
Let us now comment our computational result. How does it fit to biological reality? Depending on the gap-junction conductance value, the propagation patterns we predict are quite different.
What is the typical value of v gap in biology? It is difficult to make an estimate from the expression v gap = g gap δ G C . The membrane capacity C and gap junctions conductance can be obtained from the literature (for connexins Cx36, g gap ∼ 10-15 pS [75]), but the distance δ G is more difficult to evaluate. In the model, this is the average distance between GCells' soma which corresponds to ∼200-300 μm. But in the computation with gap junctions what matters is the length of a connexin channel which is quite smaller. Taking δ G as the distance between GCells assumes a propagation speed between somas at the speed of a connexin, which is wrong because most of the speed is constrained by the propagation of action potential along the dendritic tree. So we used a phenomenological argument (we thank O. Marre for pointing it out to us). The correlation of spiking activity between GCell neighbours is about 2-5 ms for cells separated by ∼200-300 μm [86]. This gives a speed v gap in the interval [40,150] mm/s, which is quite fast compared to the bar speed in experiments.
So we are in case 1 b, and should one observe an experimental effect? To the best of our knowledge, an effect of DSGC gap junctions on motion anticipation has not been observed. But we do not know about experiments targeting precisely this effect. It would be interesting to block gap junctions and address Berry et al. [9] or Chen et al. [20] experiments in this case. The difficulty is that blocking gap junctions blocks many essential retinal pathways. We do not pursue this discussion further concluding that our model proposes a computational prediction that could be interesting to be experimentally investigated.

Response to two-dimensional stimuli
In this section, we present some examples of retinal responses and anticipation to trajectories more complex than a bar moving in one dimension with a uniform speed. The aim here is not to do an exhaustive study but, instead, to assess qualitatively some anticipatory effects not considered in the previous sections.

Flash lag effect
The flash lag effect is an optical illusion where a bar moving along a smooth trajectory and a flashed bar are presented to the subject and are perceived with a spatial displacement, while they are actually aligned. A variation of this illusion consists of a bar moving in rotation, a bar flashed in angular alignment, giving rise to a perceived angular discrepancy. We have investigated this effect in our model in the presence of the different anticipatory effects considered in the paper. Figure 14 shows the response to a bar moving with a smooth motion, while a second bar is flashed in alignment with the first bar at one time frame. The first line shows the stimuli, consisting of 130 frames, of a bar moving at 2.7 mm/s, with a refreshment rate of 100 Hz. The second line shows the GCell response with gain control, and the third line presents the effect of lateral amacrine connectivity in the case of a Laplacian graph. Keeping the same values of parameters as in the 1D case, we set w = 0.3 ms -1 corresponding to case B in Fig. 7. Finally, the last line shows the effect of asymmetric gap junctions, having a preferred orientation in the direction of motion, with v gap = 9 mm/s.
In the case of the gain control response, the peak of response to the moving bar is shifted by about 10 ms in the direction of motion, as compared to the static bar. The flashed bar elicits a lower response, given its very short appearance in comparison with the characteristic time of adaptation. We choose this time short enough to avoid gain control triggering, explaining the difference with the strong response observed by Chen et al. [20] in the presence of a still bar.
In the case of amacrine connectivity, the moving bar representation is shrunk as compared to the gain control case given the prevalence of inhibition, while the level of activity for the flashed bar remains roughly the same. In this case, cells responding to the moving bar reach their peak activity slightly earlier (about 19 ms for these parameters value) than in the gain control case (Fig. 14(B, top)).
Finally, asymmetric gap junction connectivity displays a wave propagating ahead of the bar, increasing the central blob, which is much larger than the size of the bar in the stimulus, while the flashed bar activity remains similar to the previous cases.

Parabolic trajectory
In this subsection, we assess the effect of the three anticipatory mechanisms on a parabolic trajectory. The interest is to have a trajectory with a change in direction and speed, thus an acceleration. The stimulus consists of 20 frames displayed at 10 Hz. The simulations parameters and connectivity weights are the same as the ones used in the previous section. Figure 15 shows the response to a dot moving along a parabolic trajectory. In the case of gain control, GCell response is more elongated, which has a distortion effect on the dot representation near the turning point of the trajectory (1400-1600 ms). Cells responding near the trajectory turning point are still anticipating motion, as the peak response of the gain control curve is slightly shifted to the left, compared to the RF response ( Fig. 14(B)).
In the case of amacrine connectivity, the elicited response is also more localised, as compared to the gain control response, and the flow of activity follows more accurately the stimulus. This is a direct consequence of the sensitivity of the ACell connectivity model to the stimulus acceleration. In this case, the peak response is also more shifted as compared to the gain control case.
Finally, the gap junction connectivity model performs worse in this case, giving rise to a propagating wave that does not follow the trajectory, since the latter is not parallel to the direction to which GCells are sensitive. Cells responding near the trajectory turning point have a higher level of activity and an increased latency, while the peak response roughly corresponds to the gain control case.

Angular anticipation
We investigate in this subsection a two-dimensional example of motion where angular anticipation takes place. The stimulus consists here of 72 frames, displayed at 100 Hz, of a bar moving at a constant angular speed of 4.25 rad/ms. Figure 16 shows the retina response to a rotating bar with the angular orientation of activity as a function of time for the different models. We used Matlab to estimate the bar orientation from the displayed activity, fitting the set of activated points by an ellipse whose principal axis determines the response orientation.
In the three cases, one can see that the response around the centre of the bar is suppressed due to gain control adaptation. While the gain control activity orientation roughly follows the linear response ( Fig. 16(B)), the ACell response shows a slight angular shift (frames: 250-300 ms), which is also visible on the response orientation time course. The ACell angular anticipation is, however, only observed during the first period of the bar. Interestingly, this effect vanishes during the second rotation due to a persistent effect of the activation function generating a sort of a suppressive effect erasing the second occurrence of the bar (frame: 450 ms). We shall point out that the ACell connectivity weight in this simulation has been reduced to w = 0.2 ms -1 , since with a value of w = 0.3 ms -1 used in the previous simulations, the response to the second rotation of the bar is completely suppressed.
Finally, similarly to the parabolic trajectory, the gap junction connectivity model performs worse due to the wave propagating from left to right, distorting once more the bar shape. Consequently, the bar activity orientation in this case has been discarded in Fig. 16(B), the orientation estimate giving poor results.

Conclusion
This section shows how lateral connectivity can play a role in motion anticipation of 2D stimuli, both in the case of the classical flash lag effect and more complex trajectories. Indeed, for a given network setting, ACell connectivity can noticeably improve anticipation with respect to gain control in all three stimuli and has also the advantage of being sensitive to trajectory shifts (Sect. 3.4.2).
While gap junction connectivity improves anticipation when the trajectory of the bar is parallel to the preferred GCells direction, it also induces more blur around the bar and shape distortion in the case of parabolic motion and rotation, suggesting a trade-off between anticipation and object recognition for this specific model.

Discussion
Using a simplified model, mathematically analysed with numerical simulations examples, we have been able to give strong evidences that lateral connectivity-inhibition with ACells, gap junctions-could participate to motion anticipation in the retina. The main argument is that a moving stimulus can, under specific conditions mathematically controlled, induce a wave of activity which propagates ahead of the stimulus thanks to lateral connectivity. This suggests that, in addition to local gain control mechanism inducing an anticipated peak of GCells activity, lateral connectivity could induce a mechanism of neural latencies reduction similar to what is observed in the cortex [8,46,77]. This is visible in particular in Fig. 14, where the gap junction coupling induces a wave which increases the GCell level activity before the bar reaches its RF.
Yet, these studies raise several questions and remarks. The first one is, of course, the biological plausibility. At the core of the model, what makes the mathematical analysis tractable is the fact that we can reduce dynamics, in some region of the phase space, to a linear dynamical system. This structure is afforded by two facts: (i) Synapses are characterised by a simple convolution; (ii) Cells, especially ACells, have a simple passive dynamics, where nonlinear effects induced e.g. by ionic channels are neglected, as well as propagation delays. As stated in the introduction, the goal here is not to be biologically realistic, but instead to illustrate potential general spatio-temporal response mechanisms taking into account specificities of the retina, as compared to e.g. the cortex. Essentially, most neurons are not spiking (except GCells and some type of BCells or ACells, not considered here [2]). Yet, synapses follow the same biophysics as their cortical counterpart. As it is standard to model the whole chain of biophysical machinery triggering a post-synaptic potential upon a sharp increase of the pre-synaptic voltage by a convolution kernel [27], we adopted here the same approach. Note that it is absolutely not required, in this convolutional approach, for the pre-synaptic increase in voltage to be a spike; it can be a smooth variation of the voltage. Note also that higher order convolution kernels can be considered, integrating more details of the biological machinery. These higher order kernels are represented by higher order linear differential equations [35]. Concerning point (ii), nonlinear effects are neglected, especially for ACells (BCells have gain control), there are not so many available models of ACells. A linear model for predictive coding using linear ACells has been used by Hosoya et al. [43]. We discuss it in more detail below. The nonlinear models of ACells we know have been developed to study the retina in its early stage (retinal waves) and feature either AII ACells [21] or starburst ACells [50]. In Sect. 3.2.4 we have briefly commented how nonlinear mechanisms could enhance resonance effects in the network and, thereby, favour the propagation of a lateral wave of activity induced by a moving stimulus. This would of course deserve more detailed study. Another potentially interesting nonlinear mechanism is short term plasticity discussed below.
The second question one may ask about the model is about the robustness of this mechanism with respect to parameters. The model contains many parameters, some of them (BCells, GCells and gain control) coming from the previous paper from Berry et al. [9] and Chen et al. [20]. Although they did not perform a structural stability analysis of their model (i.e. stability of the model with respect to small variations of parameters), we believe that they are tuned away from bifurcation points so that slight changes in their (isolated cells) model parameters would not induce big changes. As we have shown, the situation changes dramatically when cells are connected via lateral connectivity. Here, many types of dynamical behaviour can be expected simply by changing the connectivity patterns in the case of ACells. A more detailed analysis would require a closer investigation of ACell to BCell connectivity and an estimation of synaptic coupling, implying to define more specifically the type of ACell (AII, starburst, A17, wide field, medium field, narrow field, etc.) and the type of functional circuit one wants to consider. Note that ACells are difficult to access experimentally due to their location inside the retina. Even more difficult is a measurement of ACell connectivity, especially the degree of symmetry discussed in our paper. Such studies can be performed at the computational level, though, where the mathematical framework proposed here can be applied and extended. Computational results do not tell us what is the reality, but they shed light on what it could be.
We would like now to address several possible extensions of this work.
The retino-thalamico-cortical pathway The retina is only the early stage of the visual system. Visual responses are then processed via the thalamus and the cortex. As exposed in the introduction, anticipation is also observed in V1 with a different modality than in the retina. In this paper, our main focus was on the shift of the peak response, while when studying anticipation in the cortex, the main focus lies in the increase of the response latency, i.e. the delay between the time the bar reaches the receptive field of a cortical column and the effective time its activity starts rising. How do these two effects combine? How does retinal anticipation impact cortical anticipation? To answer these questions at a computational level, one would need to propose a model of the retino-thalamico-cortical pathway which, to the best of our knowledge, has never been done. Yet, we have developed a retino-cortical (V1) model-thus, short-cutting the thalamus-based on a mean-field model of the V1 cortex, developed earlier by the groups of F. Chavane and A. Destexhe [19,93], able to reproduce V1 anticipation as observed in VSDI imaging. The aim of this work is to understand, computationally, the effect of retinal anticipation on the cortical one and more generally the combined effects of motion extrapolation in the retina and V1. This is the object of a forthcoming paper (S. Souihel, M. di Volo, S. Chemla, A. Destexhe, F. Chavane and B. Cessac., in preparation). See [74] for preliminary results.
Retinal circuits BCells, ACells and GCells are organised into multiple, local, functional circuits with specific connectivity patterns and dynamics in response to stimuli. Each circuit is related to a specific task, such as light intensity or contrast adaptation, motion detection, orientation, motion direction and so on. Here, we have considered a circuit allowing the retina to detect a moving object on a moving background, where motion sensitive retinal cells remain silent under global motion of the visual scene, but fire when the image patch in their receptive field moves differently from the background. From our study we have emitted the hypothesis that this circuit, spread over the retina, could improve motion anticipation thanks to what we have called the "push-pull" effect. Yet, other circuits could be studied in their role to process motion and anticipation. We especially think of the ON-OFF cone and rod-cone pathways responsible for the separation of highlights and shadows, allowing to provide information to the GCells concerning brighter than background stimuli (ON-centre) or darker than background stimuli (OFF-centre) [54]. This circuit involves both gap junction and ACell (AII) connectivity, and our model could allow to study its dynamics in the presence of a moving object.
Adaptation effects In a paper from 2005, Hosoya et al. [43] studied dynamic predictive coding in the retina and showed how spatio-temporal receptive fields of retinal GCells change after a few seconds in a new environment, allowing the retina to adjust its processing dynamically when encountering changes in its visual environment. They showed that an amacrine network model with plastic synapses can account for the large variety of observed adaptations. They feature a linear network model of ACells, similar to ours, with, in addition, anti-Hebbian plasticity. Their mathematical analysis, based on linear algebra, allows to determine the behaviour of the model in terms of eigenvalues and eigenvectors. However, their analysis does not carry out to the gain control introduced by Berry et al., which, as we show, renders the spectral analysis quite more complex. It would therefore be interesting to explore how plasticity in the ACell synaptic network, conjugated with local gain control, contributes to anticipation.
Correlations The trajectory of a moving object-which is, in general, quite more complex than a moving bar with constant speed-involves long-range correlations in space and in time. Local information about this motion is encoded by retinal GCells. Decoders based on the firing rates of these cells can extract some of the motion features [25,43,51,62,66,67,76]. Yet lateral connectivity plays a central role in motion processing (see, e.g. [41]). One may expect it to induce spatial and temporal correlations in spiking activity, as an echo, a trace, of the object's trajectory. These correlations cannot be read in the variations of firing rate; they also cannot be read in synchronous pairwise correlations, as the propagation of information due to lateral connectivity necessarily involves delays. This example raises the question about what information can be extracted from spatio-temporal correlations in a network of connected neurons submitted to a transient stimulus. What is the effect of the stimulus on these correlations? How can one handle this information from data where one has to measure transient correlations? This question has been addressed in [17]. The potential impact of these spatio-temporal correlations on decoding and anticipating a trajectory will be the object of further studies.
Orientation selective cells Our model affords the possibility to consider BCells with orientation sensitive RF. The potential role of such BCells for predictive coding has been outlined by Johnston et al. [48]. In their model individual GCells receive excitatory BCell inputs tuned to different orientations, generating a dynamic predictive code, while feedforward inhibition generates a high-pass filter that only transmits the initial activation of these inputs, removing redundancy. Should such circuits play a role in motion anticipation? We did not elaborate on this in the present paper, leaving it to a potential forthcoming work. Another important question is "how to model a retinal network with cells having different orientation selectivity?" A V1 cortical model has been proposed by Baspinar et al. [7] for the generation of orientation preference maps, considering both orientation and scale features. Each point (cortical column) is characterised by intrinsic variables, orientation and scale, and the corresponding RF is a rotated Gabor function. The visual stimulus is lifted in a four-dimensional space, characterised by coordinate variables, position, orientation and scale. The authors infer from the V1 connectivity a "natural" geometry from which they can apply methods from differential geometry. This type of mathematical construction could be interesting to investigate in the case of the retina with families of orientation selective cells, although the retinal connectivity between these cells is not the same as the V1 orientation preference map structure [12,13,82,91].
Biologically inspired vision systems When the retina receives a visual stimulus, it determines which component of it is significant and needs to be further transmitted to the brain. This efficient coding heuristic has inspired many recent studies in developing biologically inspired systems, both for static image and motion representations [59,87,90]. Two major applications of biologically inspired vision systems are retinal prostheses [53,63,79] and navigational robotics [24,52]. Focusing on the second field of application, the ability of a mobile device to navigate in its environment is of utmost interest, especially in order to avoid dangerous situations such as collisions. To be able to move, the robot requires a mapped representation of its environment, but also the ability to interpret and process this representation. Motion processing mechanisms such as anticipation can be thus implemented to assess the efficiency of bio-inspired vision in obstacle avoidance.
Psychophysical study of motion anticipation We briefly come back to the flash lag effect introduced in 3.4.1. Neuroscience has explored several explanations for this illusion. The first explanation is that the visual system processes moving objects with a smaller latency than flashed objects [89]. The second explanation suggests that the flash lag effect is due to postdiction: the perception of the flash is conditioned by events happening after its appearance [30,31]. This hypothesis is inspired by the colour phi illusion, where two dots of different colours appearing at two discrete yet close positions, with a small latency, will be perceived as a single moving dot whose colour has changed. Another explanation includes motion extrapolation. The visual system being predictive, it processes differently a bar in smooth motion, whose motion can be extrapolated, and a flashed bar, which cannot be predicted by the system. The study conducted in this article falls under this third explanation, suggesting that the retina could possibly be assisting the cortex in the motion extrapolation task.

Appendix A: Parameters of the model
Hz B.1 Receptive fields The spatial kernel of the BCell i is modelled with a difference of Gaussians (DOG): where X i = x-x i y-y i , denotes the transpose, x i and y i are the coordinates of the receptive field centre which coincide with the coordinates of the cell, C 1 , C 2 are positive definite matrix whose main principal axis represents the preferred orientation. For circular DOGs(no preferred orientation) C 1 ≡ σ 2 1 I, C 2 ≡ σ 2 2 I where I is the identity matrix in two dimensions. The two Gaussians of the DOG are thus concentric. They have the same principal axes. X i has the physical dimension of a length (mm), thus the entries of C a , a = 1, 2, are expressed in mm 2 . A a , a = 1, . . . , 2, have the dimension of mV so that convolution (1) has the dimension of a voltage.
We model the temporal part of the RF with a difference of non-concentric Gaussians whose integral on the time domain is zero. This kernel well fits the shape of the temporal projection of the bipolar RF observed in experiments [74].
where H(t) is the Heaviside function. The parameters μ b , σ b , b = 1, 2, have the dimension of a time (s), whereas K b are dimensionless. The following condition must hold to ensure the continuity of K T (t) at zero: Thus, K B i (x, y, 0) = 0. In addition, we require that the integral of a constant stimulus converges to zero, so that the cell is only reactive to changes. This reads as follows: where is the cumulative distribution function of the standard Gaussian probability.

B.2 Numerical convolution
Here we describe the method used to numerically integrate convolution (1) of the receptive field K B i ,S (56) in the spatial domain with a stimulus S. For the sake of clarity, we restrict the computation to one Gaussian in the DOG. The extension to a difference of Gaussian is straightforward. In the following, we consider a spatially discretized stimulus. When dealing with a 2D stimulus, we have to integrate over two axes. In the case where the eigenvectors of the 2D of Gaussians are the axes of integration, the spatial filter is separable in the stimulus coordinate system. Considering the stimulus as a grid of pixels, we can integrate using the following discretization: let L x be the size of the stimulus along the x axis in pixels, L y be its size along the y axis, and δ be the pixel length. We set S ij (t) ≡ S(iδ, jδ, u), with i = 0, . . . , L x δ and j = 0, . . . , L y δ . The spatial convolution becomes then In the case where the eigenvectors of the 2D of Gaussians are not the axes of integration, the spatial filter is not separable in the stimulus coordinates system. There exist methods that perform the computation by making a linear combination of basis filters [36], others that use Fourier based deconvolution techniques [83] and others using recursive filtering techniques [26]. However, these methods are of high computational complexity. We choose instead to use a computer vision method from Geusenroek et al. [39].
It is based on a projection in a non-orthogonal basis, where the first axis is x and the second is parametrized by an angle φ (see Fig. 17). The new standard deviations read as follows: x cos θ 2 + σ 2 y sin θ 2 , σ φ = σ 2 y cos θ 2 + σ 2 x sin θ 2 sin φ Figure 17 Filter transformation description [39]. The original system of axes is represented by x and y, and the ellipse system of axes by u and v. φ represents the angle of the second axis of the non-orthogonal basis. The integration domain of a pixel is limited by four lines of equations: x = iδ, x = (i + 1)δ, y = jδ and y = (j + 1)δ. Rewriting these fours equations in the new system of axes through a coordinate change enables us to write Eq. (61) with tan(φ) = σ 2 y cos θ 2 + σ 2 x sin θ 2 (σ 2 xσ 2 y ) cos θ sin θ and σ x = σ y (in the orientation sensitive case). We adapt the implementation to the spatially discretized stimulus using an integration scheme similar to the one introduced in the separable case. The spatial convolution reads now as follows: where s x (resp. s y ) denotes the size of the stimulus in pixels along the x axis (resp. y axis), and C ij is the contrast at the pixel located at (i, j). The integral is then computed numerically. The advantage of this formulation is to replace a two-dimensional integration by a one-dimensional one.

Appendix C: Random connectivity
Here, we define the random connectivity matrix from ACell to BCells considered in Sect. 2.3.3. Each cell (ACell and BCell) has a random number of branches (dendritic tree), each of which has a random length and a random angle with respect to the horizontal axis. The length of branches L follows an exponential distribution with spatial scale ξ . The number of branches n is also a random variable, Gaussian with meann and variance σ n . The angle distribution is taken to be isotropic in the plane, i.e. uniform on [0, 2π[. When a branch of an ACell A intersects a branch of a BCell B, there is a chemical synapse from A to B.
Here, we assume that both cell types have the same probability distributions for branches, thus neglecting the actual shape of ACell and BCell dendritic trees. On biological grounds, this assumption is relevant if we consider the shape of BCell dendritic tree in the inner plexiform layer (IPL) (see e.g https://webvision.med.utah.edu/book/part-iiiretinal-circuits/roles-of-ACell-cells/ Figs. 5,16,17). While out of the IPL, BCells have the form of a dipole, in the IPL their dendrites have a form well approximated by our twodimensional model. A potential refinement would consist of considering different set of parameters in the probability laws respectively defining BCell and ACell dendritic tree.
We show in Fig. 18 an example of connectivity matrix produced this way, as well as the probability that two branches intersect as a function of the distance of the two cells.
We now compute this probability. We use the standard notation in probability theory where the random variable is written in capitals and its realisation in a small letter. Thus, Figure 18 Random connectivity. Left. Example of a random connectivity matrix from ACells A j to BCells cells B i . White points correspond to connection from A j to B i . Right. Probability P(d) that two branches intersect as a function of the distance between two cells. 'Exp' corresponds to numerical estimation and 'Th' corresponds to the theoretical prediction. Here, ξ = 2 F X (x) = P[X < x] is the cumulative distribution function of the random variable X and f X (x) = dF X dx is its density. We consider the oriented connection between the cell A (ACell) of coordinates (x A , y A ) to a cell B (BCell) of coordinates (x B , y B ), so that the distance between the two cells is The vector AB makes an oriented angle η = ( AB, Ax) with the positive horizontal axis, where Here, we neglect the effects of boundaries, taking e.g. an infinite lattice or periodic boundary conditions, so that the probability to connect A to B is invariant by rotation. Thus, we compute this probability in the first quadrant x B > x A , y B > y A . In this case η = arctan y B -y A x B -x A . Each cell has a random number of branches (dendritic tree), each of which has a random length and a random angle with respect to the horizontal axis (Fig. 19). The length of branches L follows the exponential distribution (62): f L (l) = 1 ξ e -l ξ , l ≥ 0, with repartition function The spatial scale ξ favours short range connections. The number of branches N distribution follows a normal distribution with meann and variance σ n . The angle distribution is taken to be isotropic in the plane, i.e. uniform on [0, 2π[. We compute the probability that a branch of ACell A of length L A intersects at point C a branch of BCell B of length L B . We denote by α the oriented angle ( Ax, AC); by β the oriented angle ( Bx, BC); by θ the oriented angle ( AB, AC). In the first quadrant, α = θ + η.
Note that the condition to be in the first quadrant constraints η but not α.
From the sin rule we have sin(β-α+θ) . This holds, however, if A, B, C is a triangle, that is, if the two branches are long enough to intersect at C, which reads: 0 ≤ d AC = sin(β-α+θ) sin(β-α) d AB ≤ L A and 0 ≤ d BC = sin θ sin(β-α) d AB ≤ L B . These are necessary and sufficient conditions for the branches to intersect.

Figure 19
Geometry of connection between two neurons. α (β) is the angle of the neuron A's branch with length L A (neuron B's branch with length L B ) with respect to the horizontal axis. θ is the angle between the segment connecting AB and the branch A. C represents the virtual point that lies at the intersection of the branches of length L A and L B . d AB (resp. d AC , d BC ) denotes the distance between A and B (resp. A, C and B, C).
Note that the positivity of these quantities imposes conditions linking the angles α, β, η with θ = αη.
2. If sin(βα) < 0 ⇔ -π < βα < 0 ⇔ -π + α < β < α, we must have sin θ < 0 ⇔ -π < θ = αη < 0 so that -π + η < α < η and Modulo these conditions, the conditional probability ρ c|(α,β) to have intersection given the angles α, β is Using the cumulative distribution function (64) of the exponential distribution and the independence of L A , L B gives The behaviour of this integral depends on the spectrum of L. The difficulty is that L is not diagonalisable (because of the activity term h B I N,N ). We write it in the following form: We assume that the matrix M is diagonalisable. Even in this case, L is not diagonalisable because of the Jordan matrix J . We denote by φ β the normalised eigenvectors of M and by λ β the corresponding eigenvalues with β = 1, . . . , 2N . The eigenvalues of D are then the 2N eigenvalues of M plus N eigenvalues -1 τ a . We denote them by λ β too, with λ β = -1 τ a , β = 2N + 1, . . . , 3N . The eigenvectors of D have the form where 0 N is the N -dimensional vector with entries 0 and e β is the canonical basis vector in direction β. The matrix P made by the columns P β is the matrix which diagonalises D.

D.2 Spectrum of L and stability of dynamical system (34)
Here, we assume that a BCell connects only one ACell, with a weight w + uniform for all BCells, so that W B A = w + I N,N , w + > 0. We also assume that ACells connect to BCells with a connectivity matrix W, not necessarily symmetric, with a uniform weight -w -, w -> 0, so that W A B = -w -W. We have shown in the previous section that the 2N first eigenvalues and eigenvectors of L are given by the 2N eigenvalues and eigenvectors of M, which reads now as follows: We now show that this specific structure allows us to compute the spectrum of M in terms of the spectrum of W.
As a consequence, in addition to the N last eigenvalues -1 τ A , L admits 2N eigenvalues given by (71), while the 2N first columns of the matrix P (eigenvectors of L) are as follows: For the N last eigenvectors P β = e β , β = 2N + 1, . . . , 3N .
Remark The structure of these eigenvectors is quite instructive. Indeed, the factors ρ ± n control the projection of the eigenvectors P β on the space of ACells, thereby tuning the influence of ACells via lateral connectivity.
Remark When μ = 0, M is diagonal: the N first eigenvalues are -1 τ B , the N next eigenvalues are -1 τ A . We have in this case λ + n = -1 τ B and λn = -1 τ A . Therefore, in order to be coherent with this diagonal form of L when μ = 0, we order eigenvalues and eigenvectors of M such that the N first eigenvalues are λ β = λ + n , β = 1, . . . , N , and the N next are λ β = λn , β = N + 1, . . . , 2N .

D.2.2 Stability of eigenmodes
Stability of eigenmodes when W is symmetric If W is symmetric, its eigenvalues κ n are real, but λ β , β = 1, . . . , 2N , can be real or complex, depending on κ n , as μ is positive.
Proof There are two cases.
Only + is possible (because τ and τ AB are positive). This gives which is possible because κ n < 0. Thus, λ β is unstable if Only -is possible (because τ < 0). This gives 1 -4μκ n > τ 2 τ 2 AB , the same condition as in the previous item.
In this case the real part is -1 2τ AB , the imaginary part is ± 1 2τ √ 1 -4μκ n , and all eigenvalues are stable. If μ ≤ μ n,c , eigenvalues λ β are real and all modes are stable as well. Indeed: -If 1 τ = 0, all eigenvalues are equal to -1 2τ AB , hence are stable.
Instability occurs if a n + u n > 2 τ 2 τ 2 AB , a condition depending on κ n,r and κ n,i .