Shallow neural networks trained to 1 detect collisions recover features of 2 visual loom-selective neurons

Animals have evolved sophisticated visual circuits to solve a vital inference problem: 13 detecting whether or not a visual signal corresponds to an object on a collision course. Such 14 events are detected by speciﬁc circuits sensitive to visual looming, or objects increasing in size. 15 Various computational models have been developed for these circuits, but how the 16 collision-detection inference problem itself shapes the computational structures of these circuits 17 remains unknown. Here, inspired by the distinctive structures of LPLC2 neurons in the visual 18 system of Drosophila , we build anatomically-constrained shallow neural network models and 19 train them to identify visual signals that correspond to impending collisions. Surprisingly, the 20 optimization arrives at two distinct, opposing solutions, only one of which matches the actual 21 dendritic weighting of LPLC2 neurons. Both solutions can solve the inference problem with high 22 accuracy when the population size is large enough. The LPLC2-like solutions reproduces 23 experimentally observed LPLC2 neuron responses for many stimuli, and reproduces canonical 24 tuning of loom sensitive neurons, even though the models are never trained on neural data. 25 Thus, LPLC2 neuron properties and tuning are predicted by optimizing an 26 anatomically-constrained neural network to detect impending collisions. More generally, these 27 results illustrate how optimizing inference tasks that are important for an animal’s perceptual 28 goals can reveal and explain computational properties of speciﬁc sensory neurons.


Introduction
For animals living in dynamic visual environments, it is important to detect the approach of predators or other dangerous objects.Many species, from insects to humans, rely on a range of visual cues to identify approaching, or looming, objects (Regan and Beverley, 1978;Sun and Frost, 1998;Gabbiani et al., 1999;Card and Dickinson, 2008;Münch et al., 2009;Temizer et al., 2015).Looming objects create characteristic visual flow fields.When an object is on a straight-line collision course with an animal, its edges will appear to the observer to expand radially outward, gradually occupying a larger and larger portion of the visual field (Video 1).An object heading towards the animal, but which will not collide with it, also expands to occupy an increasing portion of the visual field, but its edges do not expand radially outwards with respect to the observer.Instead, they expand with (D) The axons of a population of more than 200 LPLC2 neurons converge to the giant fibers, descending neurons that mediate escape behaviors (Ache et al., 2019) respect to the object's center so that opposite edges can move in the same direction across the retina (Video 2).A collision detector must distinguish between these two cases, while also avoiding predicting collisions in response to a myriad of other visual flow fields created by the animal's own motion (Video 4).Thus, loom detection can be framed as a visual inference problem.
Many sighted animals solve this inference problem with high precision, thanks to robust loomselective neural circuits evolved over hundreds of millions of years.The neuronal mechanisms for response to looming stimuli have been studied in a wide range of vertebrates, from cats and mice to zebrafish, as well as in humans (King et al., 1992;Hervais-Adelman et al., 2015;Ball and Tronick, 1971;Liu et al., 2011;Salay et al., 2018;Liu et al., 2011;Shang et al., 2015;Wu et al., 2005;Temizer et al., 2015;Dunn et al., 2016;Bhattacharyya et al., 2017).In invertebrates, detailed anatomical, neurophysiological, behavioral, and modeling studies have investigated loom detection, especially in locusts and flies (Oliva and Tomsic, 2014;Sato and Yamawaki, 2014;Santer et al., 2005;Rind and Bramwell, 1996;Card and Dickinson, 2008;De Vries and Clandinin, 2012;Muijres et al., 2014;Klapoetke et al., 2017;Von Reyn et al., 2017;Ache et al., 2019).An influential mathematical model of loom detection was derived by studying the responses of the giant descending neurons of locusts (Gabbiani et al., 1999).This model established a relationship between the timing of the neurons' peak responses and an angular size threshold for the looming object.Similar models have been applied to analyze neuronal responses to looming signals in flies, where genetic tools make it possible to precisely dissect neural circuits, revealing various neuron types that are sensitive to looming signals (Von Reyn et al., 2017;Ache et al., 2019;Morimoto et al., 2020).
However, these computational studies did not directly investigate the relationship between the structure of the loom-sensitive neural circuits and the inference problem they appear to solve.On the one hand, the properties of many sensory circuits appear specifically tuned to the tasks that they are executing (Turner et al., 2019).In particular, by taking into account relevant behaviors mediated by specific sensory neurons, experiments can provide insight into their tuning properties (Krapp and Hengstenberg, 1996;Sabbah et al., 2017).On the other hand, computational studies that have trained artificial neural networks to solve specific visual and cognitive tasks, such as object recognition or motion estimation, have revealed response patterns similar to the corresponding biological circuits (Yamins et al., 2014;Yamins and DiCarlo, 2016;Richards et al., 2019) or even individual neurons (Mano et al., 2021).Thus, here we ask whether we can reproduce the properties associated with neural loom detection simply by optimizing shallow neural networks for collision detection.
The starting point for our computational model of loom detection is the known neuroanatomy of the visual system of the fly.In particular, the loom-sensitive neuron LPLC2 (lobula plate/lobula columnar, type 2) (Wu et al., 2016) has been studied in detail.These neurons tile visual space, sending their axons to descending neurons called the giant fibers (GFs), which trigger the fly's jumping and take-off behaviors (Tanouye and Wyman, 1980;Card and Dickinson, 2008;Von Reyn et al., 2017;Ache et al., 2019).Each LPLC2 neuron has four dendritic branches that receive inputs at the four layers of the lobula plate (LP) (Figure 1A) (Maisak et al., 2013;Klapoetke et al., 2017).The retinotopic LP layers host the axon terminals of motion detection neurons, and each layer uniquely receives motion information in one of the four cardinal directions (Maisak et al., 2013).Moreover, the physical extensions of the LPLC2 dendrites align with the preferred motion directions in the corresponding LP layers (Figure 1B) (Klapoetke et al., 2017).These dendrites form an outward radial structure, which matches the moving edges of a looming object that expands from the receptive field center (Figure 1C).Common stimuli such as the wide-field motion generated by movement of the insect only match part of the radial structure, and strong inhibition for inward-directed motion suppresses responses to such stimuli.Thus, the structure of the LPLC2 dendrites favors responses to visual stimuli with edges moving radially outwards, corresponding to objects approaching the receptive field center.
The focus of this paper is to investigate how loom detection in LPLC2 can be seen as the solution to a computational inference problem.Can the structure of the LPLC2 neurons be explained in terms of optimization-carried out during the course of evolution-for the task of predicting which trajectories will result in collisions?How does coordination among the population of more than 200 LPLC2 neurons tiling a fly's visual system affect this optimization?To answer these questions, we built simple anatomically-constrained neural network models, which receive motion signals in the four cardinal directions.We trained the model using artificial stimuli to detect visual objects on a collision course with the observer.Surprisingly, optimization finds two distinct types of solutions, with one resembling the LPLC2 neurons and the other having a very different configuration.
We analyzed how each of these solutions detects looming events and where they show distinct individual and population behaviors.When tested on visual stimuli not in the training data, the optimized solutions with filters that resemble LPLC2 neurons exhibit response curves that are similar to those of LPLC2 neurons measured experimentally Klapoetke et al. (2017).Importantly, although it only receives motion signals, the optimized model shows characteristics of an angular size encoder, which is consistent with many biological loom detectors, including LPLC2 (Gabbiani et al., 1999;Von Reyn et al., 2017;Ache et al., 2019).Our results show that optimizing a neural network to detect looming events can give rise to the properties and tuning of LPLC2 neurons.

A set of artificial visual stimuli is designed for training models
Our goal is to compare computational models trained to perform loom-detection with the biological computations in LPLC2 neurons.We first created a set of stimuli to act as training data for the inference task (Methods and Materials).We considered the following four types of motion stimuli: loom-and-hit (abbreviated as hit), loom-and-miss (miss), retreat, and rotation (Figure 2).The hit stimuli consist of a sphere that moves in a straight line towards the origin on a collision course (Figure 3, Video 1).The miss stimuli consist of a sphere that moves in a straight line towards the origin but misses it (Figure 3, Video 2).The retreat stimuli consist of a sphere moving in a straight line away from the origin (Figure 3, Video 3).The rotation stimuli consist of objects rotating about an axis that goes through the origin (Figure 3, Video 4).All stimuli were designed to be isotropic; the first three stimuli could have any orientation in space, while the rotation could be about any axis through the origin (Figure 2).All trajectories were simulated in the frame of reference of the fly at the origin, with distances measured with respect to the origin.For simplicity, the fly is assumed to be a point particle with no volume (Red dots in Figure 2 and the apexes of the cones in Figure 3).
For hit, miss, and retreat stimuli, the spherical object has unit radius, and for the case of rotation, there were 100 objects of various radii scattered isotropically around the fly (Figure 3).

An anatomically-constrained mathematical model
We designed and trained simple, anatomically-constrained neural networks (Figure 4) to infer whether or not a moving object will collide with the fly.The features of these networks were designed to mirror anatomical features of the fly's LPLC2 neurons (Figure 1).We will consider two types of single units in our models (Figure 4 the spacing of the ommatidia and processing columns in the fly eye (Stavenga, 2003).
For the LRF model unit (Figure 4A), there is a single set of four real-valued filters, the elements of which can be positive (excitatory, red) or negative (inhibitory, blue).The four filters integrate motion signals from the four cardinal directions, respectively, and their outputs are summed and rectified to generate the output of a single model unit (Figure 4A).These spatial filters effectively represent excitatory inputs to LPLC2 directly from T4 and T5 in the LP (positive elements), and inhibitory inputs mediated by local interneurons (negative elements) (Mauss et al., 2015;Klapoetke et al., 2017).All filters act on the 60 degree receptive field of a unit.A 90-degree rotational symmetry is imposed on the filters, so that the filters in each layer are identical.Moreover, each filter is symmetric about the axis of motion (Methods and Materials).Although these symmetry assumptions are not necessary and may be learned from training (Figure 5  about the structure of the filters.Note that the opposing spatial patterns of the excitatory and inhibitory components in Figure 4 are only for illustration purposes, and are not imposed on the models.The LRF model unit is equivalent to a linear-nonlinear model and constitutes one of the simplest possible models for the LPLC2 neurons.In this work, we will focus most of our analysis on this simplest form of the model. In addition, we will also consider a more complex model unit, which we call the rectified inhibition (RI) model unit (Figure 4B).In this unit, all the same filter symmetries are enforced, but there are two sets of non-negative filters: a set of excitatory filters and a set of inhibitory filters.The RI unit incorporates a fundamental difference between the excitatory and inhibitory filters: while the integrated signals from each excitatory filter are sent directly to the downstream computations, the integrated signals from each inhibitory filter are rectified before being sent downstream.The outputs of the eight filters are summed and rectified to generate the output of a single model unit in response to a given stimulus (Figure 4B).If one removes the rectification of the inhibitory filters (as is possible with an appropriate choice of parameters), the RI unit becomes equivalent to the LRF unit (Methods and Materials).This difference between the two model units reflects different potential constraints on the inhibitory inputs to an actual LPLC2 neuron.While the excitatory inputs to LPLC2 are direct connections, the inhibitory inputs are mediated by inhibitory interneurons (LPi) between LP layers (Mauss et al., 2015;Klapoetke et al., 2017).The LRF model unit assumes that LPi interneurons are linear and do not strongly rectify signals, while the RI model unit approximates rectified transmission by LPi interneurons.
In the fly brain, a population of LPLC2 neurons converges onto the GFs (Figure 1D).Accordingly, in our model there are  replicates of model units, with orientations that are spread uniformly over the 4 steradians of the unit sphere (Figure 4C and Materials).In this way, the receptive fields of the  units roughly tile the whole angular space, with or without overlap, depending on the value of .The sum of the responses of the  model units is fed into a sigmoid function to generate the predicted probability of collision for a given trajectory (Methods and Materials).The loss function then is defined as the cross entropy between the predicted probabilities and the stimulus labels (hits are labeled 1 and all others are labeled 0).

Optimization finds two distinct solutions to the loom-inference problem
The objective of this study is to investigate how the binary classification task shapes the structure of the filters, and how the number of units  affects the results.We begin with the simplest LRF model, which possesses only a single unit,  = 1.After training with 200 random initializations of the filters, we find that the converged solutions fall into three broad categories (Figure 5 As the number of units  increases, the population of units covers more angular space, and when  is large enough ( ≥ 16), the receptive fields of the units begin to overlap with one another (Figure 6A).In the fly visual system there are over 200 LPLC2 neurons across both eyes (Ache et al., 2019), which corresponds to a dense distribution of units.This is illustrated by the third row in Figure 6A, where  = 256.When  is large, objects approaching from any direction are detectable, and such object signals can be detected simultaneously by many neighboring units.

Σ
Output: probability of hit  1).Each filter has real-valued elements, and if the element is positive (negative), it is excitatory (inhibitory), represented by the color red (blue).The opposing spatial arrangement of the excitatory and inhibitory filters are illustrative, and do not represent constraints on the model.Each filter receives a field of motion signals from the corresponding layer of the model LP (fourth to seventh rows in Figure 3), indicated by the four black arrows (Figure 1).The four filtered signals are summed together before a rectifier is applied to produce the output, which is the response of a single unit.(B) Single RI model unit.There are two sets of nonnegative filters: excitatory (red) and inhibitory (blue).Each set has four filters, and each filter receives the same motion signals as the corresponding one in the LRF unit.The weighted signals from the excitatory filters and the inhibitory filters (rectified) are pooled together before a rectifier is applied to produce the output, which is the response of a single unit.When the inhibitory filters are not rectified, this model effectively reduces to the LRF model in (A) (Methods and Materials).(C) The outputs from  units are summed and fed into a sigmoid function to estimate the probability of hit.(D) The  units have their orientations almost evenly distributed in angular space.Red dots represent the centers of the receptive fields and the grey lines represent the boundaries of the receptive fields on unit sphere.The red lines are drawn from the origin to the center of each receptive field.to extend across the entire receptive field (Figure 6B).This broad inhibition is consistent with the large receptive field of LPi neurons suggested by experiments (Mauss et al., 2015;Klapoetke et al., 2017).The outward, inward, and zero solutions all also appear in trained solutions of the RI models.

Units with outward and inward filters respond to hits originating in distinct regions
To understand the differences between the two types of solutions and why the inward ones can predict collisions, we investigated how units respond to hit stimuli originating at different angles  (Figure 7A).When there is no signal, the baseline activity of outward units is zero; however, the baseline activity of inward units is above zero (grey dashed lines in Figure 7B, C).This is because the trained intercepts are negative (positive) in the outward (inward) case and when the input is zero (no signal), the unit activity cannot (can) get through the rectifier (Methods and Materials).(The training did not impose any requirements on these intercepts.)The outward units respond strongly to stimuli originating near the center of the receptive field, but do not respond to stimuli originating at angles larger than approximately 30 • (Figure 7B, C).In contrast, inward units respond below baseline to hit stimuli approaching from the center and above baseline to stimuli approaching from the periphery of the receptive field, with  between roughly 30 • and 90 • (Figure 7B, C).This helps explain why the inward units can act as loom detectors: they are sensitive to hit stimuli coming from the edges of the receptive field rather than from the center.The hit stimuli are isotropic (Figure 2A), so the number of stimuli with angles between 30 • and 90 • is much larger than the number of stimuli with angles below 30 • (Figure 7D).Thus, the inward units are sensitive to more hit cases than the outward ones.One may visualize these responses as heat maps of the mean response of the units in terms of object distance to the fly and the incoming angle (Figure 7E).
For the hit cases, the response patterns are consistent with the intuition about trajectory angles (Figure 7C).Both outward and inward units respond less strongly to miss signals than to hit signals.
As expected, while the outward units respond at most weakly to retreating signals, the inward ones respond to these signals with angles near 180 • , since the motion of edges in such cases is radially inward.The RI model units have similar response patterns to the LRF model units (Figure 7

Outward solutions have sparse codings and populations of units accurately predict hit probabilities
Individual units of the two solutions are very different from one another in both their filter structure and their response patterns to different stimuli.In populations of units, the outward and inward solutions also exhibit very different response patterns for a given hit stimulus (Figure 8A, B, Video 5 and Video 6).In particular, active outward units usually respond more strongly than inward units, but more inward units will be activated by a hit stimulus.This is consistent with the findings above, in which inward filter shapes responded to hits arriving from a wider distribution of angles (Figure 7).For all the four types of stimuli, the outward solutions generally show relatively sparse activity among units, especially for models with larger numbers of units , while the inward solutions show broader activity among units (Figure 8B When a population of units encodes stimuli, at each time point, the sum of the activities of the units is used to infer the probability of hit.In our trained models, the outward and inward solutions predict similar trajectories of probabilities of hit (Figure 8A).The outward solutions suppress the miss and retreat signals better, while the inward solutions better suppress rotation signals (Fig- ure 8C).Both 32 unit and 256 unit models have units covering the entire visual field (Figure 6), but the models with 256 units can more accurately detect hit stimuli (Figure 8C).In some cases, misses can appear very similar to hits if the object passes near the origin.Both inward and outward solutions reflect this in their predictions in response to near misses, which have higher hit probabilities than far misses (Figure 8D).
The inward and outward solutions of RI models have similar behaviors to the LRF models (Fig-

Large populations of units improve performance
Since a larger number of units will cover a larger region of the visual field, a larger population of units can in principle provide more information about the incoming signals.In general, the models perform better as the number of units  increases (Figure 9A).When  is above 32, both the ROC-AUC and PR-AUC scores are almost 1 (Methods and Materials), which indicates that the model is very accurate on the binary classification task presented by the four types of synthetic stimuli.As  increases, the outward solutions become closer to the inward solutions in terms of both AUC score and cross entropy loss (Figure 9A, B).This is also true for the RI models (Figure 9 The current binary prediction task is relatively easy for our loom detection models, as can be seen by the saturated AUC scores when  is large (Figure 9A, Figure 9-Figure Supplement 1A).
Thus, we engineered a new set of stimuli, where for the hit, miss, and retreat cases, we added a rotational background to increase the difficulty of the task.The object of interest, which is moving toward or away from the observer, also rotates with the background.This arrangement mimics self-rotation of the fly while observing a looming or retreating object in a cluttered background.
To train an LRF model, we added such rotational distraction to half of the hit, miss, and retreat cases.The outward and inward solutions both persist (Figure 9

Activation patterns of computational solutions resemble biological responses
The outward solutions have a receptive field structure that is similar to LPLC2 neurons, based on anatomical and functional studies.However, it is not clear whether these models possess the functional properties of LPLC2 neurons, which have been studied systematically (Klapoetke et al., 2017;Ache et al., 2019).To see how trained units compare to LPLC2 neuron properties, we presented stimuli to the trained outward model solution to compare its responses to those measured in LPLC2 (Figure 10).nomenon is successfully predicted by the outward unit (Figure 10E, F) due to its broad inhibitory filters (Figure 10A).The unit also correctly predicts response patterns of the LPLC2 neurons for expanding bars with different orientations (Figure 10G, H).
The ratio of object size to approach velocity, or ∕, is an important parameter for looming There are two ways that this angular size tuning likely arises.First, in hit stimuli, the angular size and angular velocity are strongly correlated (Gabbiani et al., 1999), which means the angular size affects the magnitude of the motion signals.Second, in hit stimuli, the angular size is proportional to the path length of the outward-moving edges.This angular circumference of the hit stimulus determines how many motion detectors are activated, so that integrated motion signal strength is related to the size.Both of these effects influence the response patterns of the model units (and the LPLC2 neurons).Beyond the tuning to stimulus size, the outward model also reproduces a canonical linear relationship between the peak response time relative to the collision and the ∕ ratio (Figure 10L) (Gabbiani et al., 1999;Ache et al., 2019).
Not surprisingly, the inward solution cannot reproduce the neural data (Figure 10 , and so on.This shows that, for the RI models, even within the family of learned outward solutions, there is variability in the learned response properties.Though solving the inference problem with the RI model obtains many of the neural response properties, additional constraints could be required.These additional constraints may be built into the LRF model, causing its outward solutions to more closely match neural responses.

Discussion
In this study, we have shown that training a simple network to detect collisions gives rise to a computation that closely resembles neurons that are sensitive to looming signals.Specifically, we optimized a neural network model to detect whether an object is on a collision course based on visual motion signals (Figure 3), and found that one class of optimized solution matched the anatomy of motion inputs to LPLC2 neurons (Figure 1, Figure 5, Figure 6).Importantly, this solution reproduces a wide range of experimental observations of LPLC2 neuron responses (Figure 10) (Klapoetke et al.,

2017; Von Reyn et al., 2017; Ache et al., 2019).]
The radially structured dendrites of the LPLC2 neuron in the LP can account for its response to motion radiating outward from the receptive field center (Klapoetke et al., 2017).Our results show that the logic of this computation can be understood in terms of inferential loom detection by the population of units.In particular, for an individual detector unit, an inward structure can make a better loom detector than an outward structure, since it is sensitive to colliding objects originating from a wider array of incoming angles (Figure 7).As the number of units across visual space increases, the performance of the outward-sensitive receptive field structure comes to match the performance of the inward solutions (Figure 9 between the peak time relative to collision and the ∕ ratio, which should be followed by loom sensitive neurons that encode angular size (Peek and Card, 2016).In both cases, these properties appear to be the simple result of training the constrained model to reliably detect looming stimuli.
The units of the outward solution exhibit sparsity in their responses to looming stimuli, in contrast to the denser representations in the inward solution (Figure 8).During a looming event, in an outward solution, most of the units are quiet and only a few adjacent units have very large activities, reminiscent of sparse codes that seem to be favored, for instance, in cortical encoding of visual scenes (Olshausen andField, 1996, 1997).Since the readout of our model is a summation of the activities of the units, sparsity does not directly affect the performance of the model, but is an attribute of the favored solution.For a model with a different loss function or with noise, the degree of sparsity might be crucial.For instance, the sparse code of the outward model might make  (B-H) Comparisons of the responses of the unit with the trained filter in (A) and LPLC2 neurons to a variety of stimuli (Methods and Materials).Black lines: data (Klapoetke et al., 2017); magenta lines: LRF unit.Compared with the original plots (Klapoetke et al., 2017), all the stimulus icons here except the ones in (B) have been rotated 45 degrees to match the cardinal directions of LP layers as described in this study.All response curves are normalized by the peak value of the left most panel in (B).(I) Top: temporal trajectories of the angular sizes for different ∕ ratios (color labels apply throughout (I-L)) (Methods and Materials).Middle: response as a function of time for the sum of all 256 units.Bottom: response as a function of time for one of the 256 units.(J-L) Top: experimental data (LPLC2/non-LC4 components of GF activity.Data from (Von Reyn et al., 2017;Ache et al., 2019)).Middle: sum of all 256 units.Bottom: response of one of the 256 units.Responses as function of angular size (J), response as function of angular velocity (K), relationship between peak time relative to collision and ∕ ratios (L).We considered the first peak when there were two peaks in the response, such as in the grey curves in the middle panel of (I).   it easier to localize a hit stimulus (Morimoto et al., 2020), or might make the population response more robust to noise (Field, 1994).
Experiments have shown that inhibitory circuits play an important role for the selectivity of LPLC2 neurons.For example, motion signals at the periphery of the receptive field of an LPLC2 neuron inhibit its activity.This peripheral inhibition causes various interesting response patterns of the LPLC2 neurons to different types of stimuli (Figure 10E, F) (Klapoetke et al., 2017).However, the structure of this inhibitory field is not fully understood, and our model provides a tool to investigate how the inhibitory inputs to LPLC2 neurons affect circuit performance on loom detection tasks.The strong inhibition on the periphery of the receptive field arises naturally in the outward solutions after optimization.The extent of the inhibitory components increases as more units are added to models (Figure 6).The broad inhibition appears in our model to suppress responses to the nonhit stimuli, and as in the data, the inhibition is broader than one might expect if the neuron were simply being inhibited by inward motion.These larger inhibitory fields are also consistent with the larger spatial pooling likely to be supplied by inhibitory LPi inputs (Klapoetke et al., 2017).
The synthetic stimuli used to train models in this study were unnatural in two ways.The first way was in the proportion of hits and non-hits.We trained with 25% of the training data representing hits.The true fraction of hits among all stimuli encountered by a fly is undoubtedly much less, and this affects how the loss function weights different types of errors.It is also clear that a falsepositive hit (in which a fly might jump to escape an object not on collision course) is much less penalized during evolution than a false-negative (in which a fly doesn't jump and an object collides, presumably to the detriment of the fly).It remains unclear how to choose these weights in the training data or in the loss function, but they affect the receptive field weights optimized by the model.
The second issue with the stimuli is that they were caricatures of stimulus types, but did not incorporate the richness of natural stimuli.This richness could include natural textures and spatial statistics (Ruderman and Bialek, 1994), which seem to impact motion detection algorithms (Fitzgerald and Clark, 2015;Leonhardt et al., 2016;Chen et al., 2019).This richness could also include more natural trajectories for approaching objects.Another way to enrich the stimuli would be to add noise, either in inputs to the model or in the model's units themselves.We explored this briefly by adding self-rotation-generated background motion; under those conditions, both solutions were present but optimized outward solutions performed better than the inward solutions (Figure 9-Figure Supplement 3, Methods and Materials).This indicates that the statistics of the stimuli may play an important role in selecting solutions for loom detection.However, it remains less clear what the true performance limits of loom detection are, since most experiments use substantially impoverished looming stimuli.Moreover, it is challenging to characterize the properties of natural looming events.An interesting future direction will be to investigate the effects of more complex and naturalistic stimuli on the model's filters and performance, as well as on LPLC2 neuron responses themselves.
For simplicity, our models did not impose the hexagonal geometry of the compound eye ommatidia.Instead, we assumed that the visual field is separated into a Cartesian lattice with 5 • spacing, each representing a local motion detector with two spatially separated inputs (Figure 3).This simplification alters slightly the geometry of the motion signals compared to the real motion detector receptive fields (Shinomiya et al., 2019).This could potentially affect the learned spatial weightings and reproduction of the LPLC2 responses to various stimuli, since the specific shapes of the filters matter (Figure 10).Thus, the hexagonal ommatidial structure and the full extent of inputs to T4 and T5 might be crucial if one wants to make comparisons with the dynamics and detailed responses of LPLC2 neurons.However, this geometric distinction seems unlikely to affect the main results of how to infer the presence of hit stimuli.
Our model requires a field of estimates of the local motion.Here, we used the simplest model -the Hassenstein-Reichardt correlator model Equation 3 (Methods and Materials) (Hassenstein and Reichardt, 1956) -but the model could be extended by replacing it with a more sophisticated model for motion estimation.Some biophysically realistic ones might take into account synaptic conductances (Gruntman et al., 2018(Gruntman et al., , 2019;;Badwan et al., 2019;Zavatone-Veth et al., 2020) and could respond to static features of visual scenes (Agrochao et al., 2020).Alternatively, in natural environments, contrasts fluctuate in time and space.Thus, if one includes more naturalistic spatial and temporal patterns, one might consider a motion detection model that could adapt to changing contrasts in time and space (Drews et al., 2020;Matulis et al., 2020).
Although the outward filter of the unit emerges naturally from our gradient descent training protocol, that does not mean that the structure is learned by LPLC2 neurons in the fly.There may be some experience dependent plasticity in the fly eye (Kikuchi et al., 2012), but these visual computations are likely to be primarily genetically determined.Thus, one may think of the computation of the LPLC2 neuron as being shaped through millions of years of evolutionary optimization.Optimization algorithms at play in evolution may be able to avoid getting stuck in local optima (Stanley et al., 2019), and thus work well with the sort of shallow neural network found in the fly eye.
In this study, we focused on the motion signal inputs to LPLC2 neurons, and we neglected other inputs to LPLC2 neurons, such as those coming from the lobula that likely report non-motion visual features.It would be interesting to investigate how this additional non-motion information affects the performance and optimized solutions of the inference units.For instance, another lobula columnar neurons, LC4, is loom sensitive and receives inputs in the lobula (Von Reyn et al., 2017).The LPLC2 and LC4 neurons are the primary excitatory inputs to the GF, which mediates escape behaviors (Von Reyn et al., 2014;Ache et al., 2019).The inference framework set out here would allow one to incorporate parallel non-motion intensity channels, either by adding them into the inputs to the LPLC2-like units, or by adding in a parallel population of LC4-like units.This would require a reformulation of the probabilistic model in Equation 6. Notably, one of the most studied loom detecting neurons, the lobula giant movement detector (LGMD) in locusts, does not appear to receive direction-selective inputs, as LPLC2 does (Rind and Bramwell, 1996;Gabbiani et al., 1999).
Thus, the inference framework set out here could be flexibly modified to investigate loom detection under a wide variety of constraints and inputs, which allow it to be applied to other neurons, beyond LPLC2.
Each motion detector is assumed to be a Hassenstein Reichardt Correlator (HRC) and calculates local flow fields from (  ,   , ) (Hassenstein and Reichardt, 1956) (Figure 3

-Figure Supplement 1).
The HRC used here has two inputs, separated by 5 • in angular space.Each input applies first a spatial filter on the contrast (  ,   , ) and then temporal filters: where   ( ∈ 1, 2) is a temporal filter and  is a discrete 2d Gaussian kernel with mean 0 • and standard deviation of 2.5 • to approximate the acceptance angle of the fly photoreceptors (Stavenga, 2003).The temporal filter  1 was chosen to be an exponential function  1 ( as the local flow field at time  between two inputs located at ( 1 ,  1 ) and ( 2 ,  2 ).
Four types of T4 and T5 neurons have been found that project to layers 1, 2, 3, and 4 of the LP.Each type is sensitive to one of the cardinal directions: down, up, left, right (Maisak et al., 2013).Thus, in our model, there are four non-negative, local flow fields that serve as the only inputs to the model:  − () (downward, corresponding LP layer 4),  + () (upward, LP layer 3),  − () (leftward, LP layer 1) and  + () (rightward, LP layer 2), each of which is a -by- matrix.To calculate these matrices, two sets of motion detectors are needed, one for the vertical directions and one for the horizontal directions.The HRC model in Equation 3 is direction sensitive and is opponent, meaning that for motion in the preferred (null) direction, the output of the HRC model is positive (negative).Thus, assuming that upward (rightward) is the preferred vertical (horizontal) direction, we obtain the non-negative elements of the four flow fields as where  1 ,  2 ∈ {1, 2, … , }.In the above expressions, for [ − ()]  1  2 and [ + ()]  1  2 , the vertical motion detector at ( 1 ,  2 ) has its two inputs located at ( 1 ,   ) and ( 2 ,   ), respectively.Similarly, for for [ − ()]  1  2 and [ + ()]  1  2 , the horizontal motion detector at ( 1 ,  2 ) has its two inputs located at (  ,  1 ) and (  ,  2 ).Using the opponent HRC output as the motion signals for each layer is reasonable because the motion detectors T4 and T5 are highly direction-selective over a large range of inputs (Maisak et al., 2013;Creamer et al., 2018) and synaptic, 3-input models for T4 are approximately equivalent to opponent HRC models (Zavatone-Veth et al., 2020).
We simulated the trajectories () of the object in the frame of reference Σ at a time resolution of 0.01 seconds, which is also be the time step of the training and testing stimuli.For hit, miss, and retreat cases, the trajectories of the object are always straight lines, and the velocities of the object were randomly sampled from a range [2, 10]( −1 ) with the trajectories confined to be within a sphere of 5 centered at the fly head.The radius of the object, , is always set to be 1 except in the rotational stimuli.To generate rotational stimuli, we placed 100 objects with various radii selected uniformly from [0, 1] at random distances ([5, 15]) and positions around the fly, and rotated them all around a randomly chosen axis.The rotational speed was chosen from a Gaussian distribution with mean 0 • ∕ and standard deviation 200 • ∕, a reasonable rotational velocity for walking flies (DeAngelis et al., 2019).In one case, training data included both an object moving and global rotation due to self-rotation (Figure 9-Figure Supplement 3).These stimuli were simply combinations of the rotational stimuli with the other three cases (hit, miss, and retreat), so that the object that moves in the depth dimension also rotates together with the background.
We reproduced a range of stimuli used in a previous study (Klapoetke et al., 2017) and tested them on our trained model (Figure 10B-H).To match the cardinal directions of LP layers (Figure 1), we have rotated the stimuli (except in Figure 10B) 45 degrees compared with the ones displayed in the figures in (Klapoetke et al., 2017).The disc (Figure 10B (all the rest).All the responses of the models (except in Figure 10B) have been normalized by the peak of the response to the expanding disc (Figure 10B).
We created a range of hit stimuli with various ∕ ratios: 0.01 , 0.02 , 0.04 , 0.08 , 0.10 , 0.12 , 0.14 , 0.16 , 0.18 , 0.20 .The radius  of the spherical object is fixed to be 1, and the velocity is changed accordingly to achieve different ∕ ratios.

Models
LPLC2 neurons have four dendritic structures in the four LP layers, and they receive direct excitatory inputs from T4/T5 motion detection neurons (Maisak et al., 2013;Klapoetke et al., 2017).It has been proposed that each dendritic structure also receives inhibitory inputs mediated by lobulate plate intrinsic interneurons, such as LPi4-3 (Klapoetke et al., 2017).We built two model units to approximate this anatomy.
LRF models A linear receptive field (LRF) model is characterized by a real-valued filter, represented by a 12-by-12 matrix  r (Figure 4A).The elements of the filter combine the effects of the excitatory and inhibitory inputs and can take both positive (stronger excitation) and negative (stronger inhibition) values.We rotate  r counterclockwise by multiples of 90 • to obtain the filters that are used to integrate the four motion signals:  − (),  + (),  − (),  + ().Specifically, we define the corresponding four filters as:  r = rotate( r , 0 • ).In addition, we impose mirror symmetry on the filters, and with the above definitions of the rotated filters, the upper half of  r is a mirror image of the lower half of  r .Thus, there are in total 72 parameters in the filters.In fact, since only the elements within a 60 degree cone contribute to the filter for the units, the corners are excluded, resulting in only 56 trainable parameters.
In computer simulations, the filters or weights and flow fields are flattened to be one-dimensional column vectors.The response of a single LRF unit  is: where (⋅) = max(⋅, 0) is the rectified linear unit (ReLU), and  r ∈ ℝ is the intercept (Figure 4A).
The ReLU is used as the activation function in all the figures, except in one panel (Figure 9-Figure Supplement 2), where an exponential linear unit is used: For  = 1, the LRF model is very close to a generalized linear model, except that it includes an additional activation function .This activation function changes the convexity of the model to make it a non-convex optimization problem, in general.

RI models
The rectified inhibition (RI) models have two types of nonnegative filters, one excitatory and one inhibitory, represented by  e and  i , respectively (Figure 4B).Each filter is a 12-by-12 matrix.The same rotational and mirror symmetries are imposed as in the LRF models, which leads to four excitatory filters as:  e = rotate( i , 0 • ).Thus, there are in total 112 parameters in the two sets of filters, excluding the elements in the corners.
The responses of the inhibitory units are: where (⋅) = max(⋅, 0) is the ReLU, and  i ∈ ℝ is the intercept.In the RI model, the rectification of each inhibitory layer is motivated by the LPi neurons, which mediate the inhibition within each layer, and could themselves rectify their inhibitory input into LPLC2.)   − ()− where  e ∈ ℝ is the intercept (Figure 4B).Interestingly, the RI models become equivalent to the LRF models if we remove the ReLU in the inhibitions and define  r and  r =  e − 4 i .
For both the LRF and RI models, the inferred probability of hit for a specific trajectory is where  is the total number of time steps in the trajectory and (⋅) is the sigmoid function.Since we are adding two intercepts ( r , and ) to the LRF models, and three intercepts ( i ,  e , and ) to the RI models, there are 58 and 115 parameters to train the two models, respectively.

Training and testing
We created a synthetic data set containing four types of motion: loom-and-hit, loom-and-miss, retreat, and rotation.The proportions of these types were 0.25, 0.125, 0.125, and 0.5, respectively.In total, there were 5,200 trajectories, with 4,000 for training and 1,200 for testing.Trajectories with motion type loom-and-hit are labeled as hit or   = 1 (probability of hit is 1), while trajectories of other motion types are labeled as non-hit or   = 0 (probability of hit is 0), where  is the index of each specific sample.Models with smaller  have fewer trajectories in the receptive field of any unit.For stability of training, we therefore increased the number of trajectories by factors of eight, four, and two for  = 1, 2, 4, respectively.
The loss function to be minimized in our training was the cross entropy between the label   and the inferred probability of hit Phit , and averaged across all samples, together with a regularization term: where Phit () is the inferred probability of hit for sample ,  is the strength of the  2 regularization, and  represents all the effective parameters in the two excitatory and inhibitory filters.
The strength of the regularization  was set to be 10 −4 , which was obtained by gradually in-creasing  until the performance of the model on test data started to drop.The regularization sped up convergence of solutions, but the regularization strength did not strongly influence the main results in the paper.
To speed up training, rather than taking a temporal average as shown in Equation 6, a snapshot was sampled randomly from each trajectory, and the probability of hit of this snapshot was used to represent the whole trajectory, i.e., Phit =  (∑    () +  ) , where  is a random sample from {1, 2, … ,  }.Mini-batch gradient descent was used in training, and the learning rate was 0.001.
After training, the models were tested on the entire trajectories with the probability of hit defined in Equation 6. Models trained only on snapshots performed well on the test data.During testing, the performance of the model was evaluated by the area under the curve (AUC) of the receiver operating characteristic (ROC) and precision-recall (PR) curves (Hanley and McNeil, 1982;Davis and Goadrich, 2006).TensorFlow (Abadi et al., 2016) was used to train all models.

Clustering the solutions
We used the following procedure to cluster the solutions.For the LRF models, the filter of each solution was simply flattened to form a vector.But, for the RI mdoels, each solution had an excitatory and an inhibitory filter.We flattened these two filters, and concatenated them into a single vector.(The elements at the corners were deleted since they are outside of the receptive field.)Thus, each solution was represented by a vector, from which we calculated the cosine distance for each pair of solutions.The obtained distance matrix was then fed into a hierarchical clustering algorithm (Virtanen et al., 2020).After obtaining the hierarchical clustering, the outward and inward filters were identified by their shape.We counted the positive filter elements corresponding to flow fields with components radiating outward and subtracted the number of positive filter elements corresponding to flow fields with components directed inward.If the resulting value was positive, the filters were labeled as outward; otherwise, the filters were labeled as inward.We could also sum the elements rather than count them, but we found that the latter was more robust.If the elements in the concatenated vector were all close to zero, then the corresponding filters were labeled as zero solutions.

Statistics
To calculate the fraction of active units for the model with  = 256 (Figure 8B), we examined the response curves of each unit to all trajectories of a specific type of stimuli.If a unit response is above baseline (dotted lines in Figure 7B), then the unit is counted as active.For each trajectory/stimulus, we obtained the number of active units.We used this number to calculate the mean and standard deviation of active units across all the trajectories within each type of stimulus (hit, miss, retreat, rotation).
From this, we calculate the error bars as the propagated standard deviation (Morgan et al., 1990): Σ  ( = 1, 2, … , ) are associated with each local model unit, the center of which is represented by a red dot.The origin of the Σ  coincides with the origin of Σ, but for better illustration, we have translated its origin radially along   so that all the Σ  's sit on a sphere.For Σ  , only   and   are shown, and   axis should be chosen such that Σ  is right-handed.For the unit coordinate systems,   is chosen to be normal to the unit sphere while   points north tangent to the sphere.This system does not impose the left-right mirror symmetry of the fly eyes.Responses are plotted in this way because the dynamic range of responses of this model to these stimuli does not often exceed the trained threshold.

Figure 1 .
Figure 1.Sketches of the anatomy of LPLC2 neurons (Klapoetke et al., 2017).(A) An LPLC2 neuron has dendrites in lobula and the four layers of the lobula plate (LP): LP1, LP2, LP3 and LP4.(B) Schematic of the four branches of the LPLC2 dendrites in the four layers of the LP.The arrows indicate the preferred direction of motion sensing neurons with axons in each LP layer (Maisak et al., 2013).(C) The outward dendritic structure of an LPLC2 neuron is selective for the outwardly expanding edges of a looming object (black circle).(D)The axons of a population of more than 200 LPLC2 neurons converge to the giant fibers, descending neurons that mediate escape behaviors(Ache et al., 2019)

Video 1 .Figure 3 .Figure 2 .
Figure 3. Since we combined left (down) and right (up) flow fields in one panel, we used blue and red colors to indicate left (down) and right (up) directions, respectively.The movie has been slowed down by a factor of 5.All the movies shown in this paper can be found here: https://github.com/ClarkLabCode/LoomDetectionANN/tree/main/results/movies_exp.
): a linear receptive field (LRF) unit and a rectified inhibition(RI)      unit.Both types of model units receive input from a 60 degree diameter cone of visual space, represented by white cones and grey circles in Figure3, approximately the same size as the receptive fields measured in LPLC2(Klapoetke et al., 2017).The four stimulus sets were projected into this receptive field for training and evaluating the models.The inputs to the units are local directional signals computed in the four cardinal directions at each point of the visual space: downward, upward, rightward, and leftward (Figure3).These represent the combined motion signals from T4 and T5 neurons in the four layers of the lobula plate(Maisak et al., 2013).They are computed as the non-negative components of a Hassenstein-Reichardt correlator model(Hassenstein and Reichardt, 1956) in both horizontal and vertical directions (Figure3-Figure Supplement 1A, Methods and Materials).The motion signals are computed with a spacing of 5 degrees, roughly matching

Figure 3 .
Figure 3. Snapshots of optical flows and flow fields calculated by a Hassenstein Reichardt correlator (HRC) model (Figure 3-Figure Supplement 1A, Methods and Materials) for the 4 types of stimuli (Figure 2).First row: 3d rendering of the spherical objects and the LPLC2 receptive field (represented by a cone) at a specific time in the trajectory.The orange arrows indicate the motion direction of each object.Second row: 2d projections of the objects (black shading) within the LPLC2 receptive field (the grey circle).Third row: the thin black arrows indicate flow fields generated by the edges of the moving objects.Forth to seventh rows: decomposition of the flow fields in the four cardinal directions with respect to the LPLC2 neuron under consideration: downward, upward, rightward, and leftward, as indicated by the thick black arrows.These act as models of the motion signal fields in each layer of the LP.Figure 3-Figure supplement 1. Tuning curve of HRC motion estimator and distributions of the estimated flow fields.

Figure 3 -
Figure 3. Snapshots of optical flows and flow fields calculated by a Hassenstein Reichardt correlator (HRC) model (Figure 3-Figure Supplement 1A, Methods and Materials) for the 4 types of stimuli (Figure 2).First row: 3d rendering of the spherical objects and the LPLC2 receptive field (represented by a cone) at a specific time in the trajectory.The orange arrows indicate the motion direction of each object.Second row: 2d projections of the objects (black shading) within the LPLC2 receptive field (the grey circle).Third row: the thin black arrows indicate flow fields generated by the edges of the moving objects.Forth to seventh rows: decomposition of the flow fields in the four cardinal directions with respect to the LPLC2 neuron under consideration: downward, upward, rightward, and leftward, as indicated by the thick black arrows.These act as models of the motion signal fields in each layer of the LP.Figure 3-Figure supplement 1. Tuning curve of HRC motion estimator and distributions of the estimated flow fields.

, Figure 5 -Figure
Figure Supplement 1).Two solutions have spatial structures that are-surprisingly-roughly opposite from one another (magenta and green).Based on the configurations of the positive-valued elements (stronger excitation) of the filters (Methods and Materials), we call one solution type outward solutions (magenta) and the other type inward solutions (green) (Figure5Cand Figure 5-Figure Supplement 1).In this single-unit model, the inward solutions have higher area under the curve (AUC) scores for both receiver operating characteristic (ROC) and precision recall curves, and thus perform better than the outward solutions on the discrimination task (Figure 5D).A third category of solution has all the elements in the filters very close to zero (zero solutions in black squares, Figure 5-Figure Supplement 1).This solution appears roughly 5-15% of the time, and appears to be a local minimum of the optimization, dependent on the random initialization.This uninteresting category of solutions is ignored in subsequent analyses.

Figure 4 -
Figure 4. Schematic of the models (Methods and Materials).(A) Single LRF model unit.There are four linear spatial filters, labeled LP4, LP3, LP2, and LP1, which correspond to the four LP layers (Figure1).Each filter has real-valued elements, and if the element is positive (negative), it is excitatory (inhibitory), represented by the color red (blue).The opposing spatial arrangement of the excitatory and inhibitory filters are illustrative, and do not represent constraints on the model.Each filter receives a field of motion signals from the corresponding layer of the model LP (fourth to seventh rows in Figure3), indicated by the four black arrows (Figure1).The four filtered signals are summed together before a rectifier is applied to produce the output, which is the response of a single unit.(B) Single RI model unit.There are two sets of nonnegative filters: excitatory (red) and inhibitory (blue).Each set has four filters, and each filter receives the same motion signals as the corresponding one in the LRF unit.The weighted signals from the excitatory filters and the inhibitory filters (rectified) are pooled together before a rectifier is applied to produce the output, which is the response of a single unit.When the inhibitory filters are not rectified, this model effectively reduces to the LRF model in (A) (Methods and Materials).(C) The outputs from  units are summed and fed into a sigmoid function to estimate the probability of hit.(D) The  units have their orientations almost evenly distributed in angular space.Red dots represent the centers of the receptive fields and the grey lines represent the boundaries of the receptive fields on unit sphere.The red lines are drawn from the origin to the center of each receptive field.Figure 4-Figure supplement 1. Coordinate system for model and stimuli.

(Figure 5 -
Figure Supplement 2, Figure 6-Figure Supplement 3).Units with outward-oriented filters are activated by motion radiating outwards from the center of the receptive field, such as the hit event illustrated in Figure 3.These excitatory components resemble the dendritic structures of the actual LPLC2 neurons observed in experiments, where for example, the rightward motion-sensitive component (LP2) occupies mainly the right side of the receptive field.In the outward solutions of the LRF models, the rightward motion-sensitive inhibitory components mainly occupy the left side of the receptive field (Figure 6B, Figure 6-Figure Supplement 2).This is also consistent with the properties of the lobula plate intrinsic (LPi) interneurons, which project inhibitory signals roughly retinotopically from one LP layer to the adjacent layer with opposite directional tuning (Mauss et al., 2015; Klapoetke et al., 2017).The unexpected inward-oriented filters have the opposite structure.In the inward solutions, the rightward sensitive excitatory component occupies the left side of the receptive field, and the inhibitory component occupies the right side.Such weightings make the model selective for motion converging towards the receptive field center, such as the retreat event shown in Figure3.This is a puzzling structure for a loom detector, and warrants a more detailed exploration of the response properties of the inward and outward solutions.

Figure 5 .
Figure 5. Three distinct types of solutions appear from training a single unit on the binary classification task (LRF model).(A) Clustering of the trained filters/weights shown as a dendrogram (Methods and Materials).Different colors indicate different clusters, which are preserved for the rest of the paper: outward, inward, and zero solutions are magenta, green, and black, respectively.(B) The trajectories of the loss functions during training.More than one example are shown for each type of solution, but lines fall on top of one another.(C) Two distinct types of solutions are represented by two types of filters that have roughly opposing structures: an outward solution (magenta boxes) and an inward solution (green boxes).For each solution type, two examples of the trained filters from different initializations are shown; they are almost identical.The third type of solution (color black in (A) and (B)) has filter elements all close to zero.We call these zero solutions (Figure 5-Figure Supplement 1).(D) Performance of the three solution types (Methods and Materials).TPR: true positive rate; FPR: false positive rate; ROC: receiver operating characteristic; PR: precision recall; AUC: area under the curve.More than one example is shown for each type of solution, but lines and dots with the same color fall on top of one another.Figure 5-Figure supplement 1.More examples of the trained filters for the three types of solutions.Figure 5-Figure supplement 2. As in the main figure but for the RI model.Figure 5-Figure supplement 3. Examples of the trained outward and inward filters without imposed symmetries.

Figure 5 -
Figure 5. Three distinct types of solutions appear from training a single unit on the binary classification task (LRF model).(A) Clustering of the trained filters/weights shown as a dendrogram (Methods and Materials).Different colors indicate different clusters, which are preserved for the rest of the paper: outward, inward, and zero solutions are magenta, green, and black, respectively.(B) The trajectories of the loss functions during training.More than one example are shown for each type of solution, but lines fall on top of one another.(C) Two distinct types of solutions are represented by two types of filters that have roughly opposing structures: an outward solution (magenta boxes) and an inward solution (green boxes).For each solution type, two examples of the trained filters from different initializations are shown; they are almost identical.The third type of solution (color black in (A) and (B)) has filter elements all close to zero.We call these zero solutions (Figure 5-Figure Supplement 1).(D) Performance of the three solution types (Methods and Materials).TPR: true positive rate; FPR: false positive rate; ROC: receiver operating characteristic; PR: precision recall; AUC: area under the curve.More than one example is shown for each type of solution, but lines and dots with the same color fall on top of one another.Figure 5-Figure supplement 1.More examples of the trained filters for the three types of solutions.Figure 5-Figure supplement 2. As in the main figure but for the RI model.Figure 5-Figure supplement 3. Examples of the trained outward and inward filters without imposed symmetries.

Figure 5 -
Figure 5. Three distinct types of solutions appear from training a single unit on the binary classification task (LRF model).(A) Clustering of the trained filters/weights shown as a dendrogram (Methods and Materials).Different colors indicate different clusters, which are preserved for the rest of the paper: outward, inward, and zero solutions are magenta, green, and black, respectively.(B) The trajectories of the loss functions during training.More than one example are shown for each type of solution, but lines fall on top of one another.(C) Two distinct types of solutions are represented by two types of filters that have roughly opposing structures: an outward solution (magenta boxes) and an inward solution (green boxes).For each solution type, two examples of the trained filters from different initializations are shown; they are almost identical.The third type of solution (color black in (A) and (B)) has filter elements all close to zero.We call these zero solutions (Figure 5-Figure Supplement 1).(D) Performance of the three solution types (Methods and Materials).TPR: true positive rate; FPR: false positive rate; ROC: receiver operating characteristic; PR: precision recall; AUC: area under the curve.More than one example is shown for each type of solution, but lines and dots with the same color fall on top of one another.Figure 5-Figure supplement 1.More examples of the trained filters for the three types of solutions.Figure 5-Figure supplement 2. As in the main figure but for the RI model.Figure 5-Figure supplement 3. Examples of the trained outward and inward filters without imposed symmetries.

Figure 5 -Figure 6 .
Figure 5. Three distinct types of solutions appear from training a single unit on the binary classification task (LRF model).(A) Clustering of the trained filters/weights shown as a dendrogram (Methods and Materials).Different colors indicate different clusters, which are preserved for the rest of the paper: outward, inward, and zero solutions are magenta, green, and black, respectively.(B) The trajectories of the loss functions during training.More than one example are shown for each type of solution, but lines fall on top of one another.(C) Two distinct types of solutions are represented by two types of filters that have roughly opposing structures: an outward solution (magenta boxes) and an inward solution (green boxes).For each solution type, two examples of the trained filters from different initializations are shown; they are almost identical.The third type of solution (color black in (A) and (B)) has filter elements all close to zero.We call these zero solutions (Figure 5-Figure Supplement 1).(D) Performance of the three solution types (Methods and Materials).TPR: true positive rate; FPR: false positive rate; ROC: receiver operating characteristic; PR: precision recall; AUC: area under the curve.More than one example is shown for each type of solution, but lines and dots with the same color fall on top of one another.Figure 5-Figure supplement 1.More examples of the trained filters for the three types of solutions.Figure 5-Figure supplement 2. As in the main figure but for the RI model.Figure 5-Figure supplement 3. Examples of the trained outward and inward filters without imposed symmetries.

Figure 6 -
Figure 6.The outward and inward solutions also arise for models with multiple units (LRF models).(A) Left column: angular distribution of the units, where red dots are centers of the receptive fields, the grey circles are the boundaries of the receptive fields, and the black star indicates the top of the fly head.Middle column: 2d map of the units with the same symbols as in the left column, with one unit highlighted in black.Right column: clustering results shown as dendrogams with color codes as in Figure 5. (B) Examples of the trained filters for outward and inward solutions with different numbers of units.Figure 6-Figure supplement 1. Performance of the different solutions (LRF models).Figure 6-Figure supplement 2.More examples of the outward and inward filters (LRF models).Figure 6-Figure supplement 3. Examples of the outward and inward filters for RI models.

Figure 6 -
Figure 6.The outward and inward solutions also arise for models with multiple units (LRF models).(A) Left column: angular distribution of the units, where red dots are centers of the receptive fields, the grey circles are the boundaries of the receptive fields, and the black star indicates the top of the fly head.Middle column: 2d map of the units with the same symbols as in the left column, with one unit highlighted in black.Right column: clustering results shown as dendrogams with color codes as in Figure 5. (B) Examples of the trained filters for outward and inward solutions with different numbers of units.Figure 6-Figure supplement 1. Performance of the different solutions (LRF models).Figure 6-Figure supplement 2.More examples of the outward and inward filters (LRF models).Figure 6-Figure supplement 3. Examples of the outward and inward filters for RI models.

Figure 6 -Figure 7 .
Figure 6.The outward and inward solutions also arise for models with multiple units (LRF models).(A) Left column: angular distribution of the units, where red dots are centers of the receptive fields, the grey circles are the boundaries of the receptive fields, and the black star indicates the top of the fly head.Middle column: 2d map of the units with the same symbols as in the left column, with one unit highlighted in black.Right column: clustering results shown as dendrogams with color codes as in Figure 5. (B) Examples of the trained filters for outward and inward solutions with different numbers of units.Figure 6-Figure supplement 1. Performance of the different solutions (LRF models).Figure 6-Figure supplement 2.More examples of the outward and inward filters (LRF models).Figure 6-Figure supplement 3. Examples of the outward and inward filters for RI models.

ure 8 -Figure 8 .
Figure 8. Population coding of stimuli (LRF models).(A) Top row: snapshots of the unit responses of outward solutions (magenta dots) and inward solutions (green dots) for a hit stimulus.The size of the dots represents the strength of the response.The gray shading represents the looming object in the snapshot.See also Video 5 and Video 6. Symbols and colors are as in Figure 6A.Middle row: time traces of the responses for the same hit stimulus as in the top row.Time proceeds to the right in each trace.Bottom row: time trace of the probability of hit for the same hit stimulus as in the top row (Methods and Materials).Black dots in the middle and bottom rows indicate the time of the snapshot in the top row.The dotted gray line represents the basal model response.(B) Fractions of the units that are activated above the baseline by different types of stimuli (hit, miss, retreat, rotation) as a function of the number of units  in the model.The lines represent the mean values averaged across stimuli, and the shaded areas show one standard deviation (Methods and Materials).(C)Histograms of the probability of hit inferred by models with 32 or 256 units for the four types of synthetic stimuli (Methods and Materials).(D) The inferred probability of hit as a function of the minimum distance of the object to the fly eye for the miss cases.For comparison, the hit distribution is represented by a box plot (the center line in the box: the median; the upper and lower boundaries of the box: 25% and 75% percentiles; the upper and lower whiskers: the minimum and maximum of non-outlier data points; the circles: outliers).Figure 8-Figure supplement 1. Geometry of responses (as in Fig. Figure 8A), but for miss and retreat stimuli (LRF models).Figure 8-Figure supplement 2. Sample individual unit response curves (LRF models with  = 256).Figure 8-Figure supplement 3.As in the main figure but for the RI models.

Figure 8 -
Figure 8. Population coding of stimuli (LRF models).(A) Top row: snapshots of the unit responses of outward solutions (magenta dots) and inward solutions (green dots) for a hit stimulus.The size of the dots represents the strength of the response.The gray shading represents the looming object in the snapshot.See also Video 5 and Video 6. Symbols and colors are as in Figure 6A.Middle row: time traces of the responses for the same hit stimulus as in the top row.Time proceeds to the right in each trace.Bottom row: time trace of the probability of hit for the same hit stimulus as in the top row (Methods and Materials).Black dots in the middle and bottom rows indicate the time of the snapshot in the top row.The dotted gray line represents the basal model response.(B) Fractions of the units that are activated above the baseline by different types of stimuli (hit, miss, retreat, rotation) as a function of the number of units  in the model.The lines represent the mean values averaged across stimuli, and the shaded areas show one standard deviation (Methods and Materials).(C)Histograms of the probability of hit inferred by models with 32 or 256 units for the four types of synthetic stimuli (Methods and Materials).(D) The inferred probability of hit as a function of the minimum distance of the object to the fly eye for the miss cases.For comparison, the hit distribution is represented by a box plot (the center line in the box: the median; the upper and lower boundaries of the box: 25% and 75% percentiles; the upper and lower whiskers: the minimum and maximum of non-outlier data points; the circles: outliers).Figure 8-Figure supplement 1. Geometry of responses (as in Fig. Figure 8A), but for miss and retreat stimuli (LRF models).Figure 8-Figure supplement 2. Sample individual unit response curves (LRF models with  = 256).Figure 8-Figure supplement 3.As in the main figure but for the RI models.

Figure 8 -
Figure 8. Population coding of stimuli (LRF models).(A) Top row: snapshots of the unit responses of outward solutions (magenta dots) and inward solutions (green dots) for a hit stimulus.The size of the dots represents the strength of the response.The gray shading represents the looming object in the snapshot.See also Video 5 and Video 6. Symbols and colors are as in Figure 6A.Middle row: time traces of the responses for the same hit stimulus as in the top row.Time proceeds to the right in each trace.Bottom row: time trace of the probability of hit for the same hit stimulus as in the top row (Methods and Materials).Black dots in the middle and bottom rows indicate the time of the snapshot in the top row.The dotted gray line represents the basal model response.(B) Fractions of the units that are activated above the baseline by different types of stimuli (hit, miss, retreat, rotation) as a function of the number of units  in the model.The lines represent the mean values averaged across stimuli, and the shaded areas show one standard deviation (Methods and Materials).(C)Histograms of the probability of hit inferred by models with 32 or 256 units for the four types of synthetic stimuli (Methods and Materials).(D) The inferred probability of hit as a function of the minimum distance of the object to the fly eye for the miss cases.For comparison, the hit distribution is represented by a box plot (the center line in the box: the median; the upper and lower boundaries of the box: 25% and 75% percentiles; the upper and lower whiskers: the minimum and maximum of non-outlier data points; the circles: outliers).Figure 8-Figure supplement 1. Geometry of responses (as in Fig. Figure 8A), but for miss and retreat stimuli (LRF models).Figure 8-Figure supplement 2. Sample individual unit response curves (LRF models with  = 256).Figure 8-Figure supplement 3.As in the main figure but for the RI models.

Figure 8 -
Figure 8. Population coding of stimuli (LRF models).(A) Top row: snapshots of the unit responses of outward solutions (magenta dots) and inward solutions (green dots) for a hit stimulus.The size of the dots represents the strength of the response.The gray shading represents the looming object in the snapshot.See also Video 5 and Video 6. Symbols and colors are as in Figure 6A.Middle row: time traces of the responses for the same hit stimulus as in the top row.Time proceeds to the right in each trace.Bottom row: time trace of the probability of hit for the same hit stimulus as in the top row (Methods and Materials).Black dots in the middle and bottom rows indicate the time of the snapshot in the top row.The dotted gray line represents the basal model response.(B) Fractions of the units that are activated above the baseline by different types of stimuli (hit, miss, retreat, rotation) as a function of the number of units  in the model.The lines represent the mean values averaged across stimuli, and the shaded areas show one standard deviation (Methods and Materials).(C)Histograms of the probability of hit inferred by models with 32 or 256 units for the four types of synthetic stimuli (Methods and Materials).(D) The inferred probability of hit as a function of the minimum distance of the object to the fly eye for the miss cases.For comparison, the hit distribution is represented by a box plot (the center line in the box: the median; the upper and lower boundaries of the box: 25% and 75% percentiles; the upper and lower whiskers: the minimum and maximum of non-outlier data points; the circles: outliers).Figure 8-Figure supplement 1. Geometry of responses (as in Fig. Figure 8A), but for miss and retreat stimuli (LRF models).Figure 8-Figure supplement 2. Sample individual unit response curves (LRF models with  = 256).Figure 8-Figure supplement 3.As in the main figure but for the RI models.
the two solution types, we also calculated the ratio of the number of outward to inward solutions in 200 random initializations of the models with  units.For the LRF models, as the number of units increases, the ratio remained relatively constant, fluctuating around 0.5 (Figure9-Figure Supplement 2).For the RI models, on the other hand, as the number of units increases, an increasing proportion of solutions have outward filters (Figure 9-Figure Supplement 2).For RI models with 256 units, the chance that an outward filter appears as a solution is almost 90% compared with roughly 50% when  = 1.Because of this qualitative difference between the LRF and RI models, we next asked whether the form of the nonlinearity in the LRF model could influence the solutions found through optimization.When we replaced the rectified linear unit (ReLU) in LRF models with the exponential linear unit (ELU) (Methods and Materials), only inward solutions exist for models with  < 32, and the outward solutions emerge more often as  increases (Figure9-FigureSupplement 2).Combined, these results with LRF and RI models indicate that the form and position of the nonlinearity in the circuit play a role in selecting between different optimized solutions.This suggests that further studies of the nonlinearities in LPLC2 processing will lead to additional insight into how a population of LPLC2s encodes looming stimuli.
-Figure Supplement 3), though in this case, outward solutions outperform inward ones (Figure 9-Figure Supplement 3).It remains unclear whether this added complexity brings our artificial stimuli closer to actual detection tasks performed by flies, but this result makes clear that identifying the natural statistics of loom will be important to understanding loom inferences.

Figure 9 .
Figure 9. Large populations of units improve performance (LRF models) (Methods and Materials).(A) Both ROC and PR AUC scores increase as the number of units increases.Colored lines and dots: average scores; shading: one standard deviation of the scores over the trained models.Magenta: outward solutions; green: inward solutions.The dotted horizontal gray lines indicate the value of 1. (B) As the population of units increases, cross entropy losses of the outward solutions approach the losses of the inward solutions.Figure 9-Figure supplement 1.As in the main figure but for RI models.Figure 9-Figure supplement 2. The ratio of the number of the two types of solutions.Figure 9-Figure supplement 3.As in the main figure but for LRF models trained using stimuli that include self-rotation during hits, misses, and retreats.

Figure 9 -
Figure 9. Large populations of units improve performance (LRF models) (Methods and Materials).(A) Both ROC and PR AUC scores increase as the number of units increases.Colored lines and dots: average scores; shading: one standard deviation of the scores over the trained models.Magenta: outward solutions; green: inward solutions.The dotted horizontal gray lines indicate the value of 1. (B) As the population of units increases, cross entropy losses of the outward solutions approach the losses of the inward solutions.Figure 9-Figure supplement 1.As in the main figure but for RI models.Figure 9-Figure supplement 2. The ratio of the number of the two types of solutions.Figure 9-Figure supplement 3.As in the main figure but for LRF models trained using stimuli that include self-rotation during hits, misses, and retreats.

Figure 9 -
Figure 9. Large populations of units improve performance (LRF models) (Methods and Materials).(A) Both ROC and PR AUC scores increase as the number of units increases.Colored lines and dots: average scores; shading: one standard deviation of the scores over the trained models.Magenta: outward solutions; green: inward solutions.The dotted horizontal gray lines indicate the value of 1. (B) As the population of units increases, cross entropy losses of the outward solutions approach the losses of the inward solutions.Figure 9-Figure supplement 1.As in the main figure but for RI models.Figure 9-Figure supplement 2. The ratio of the number of the two types of solutions.Figure 9-Figure supplement 3.As in the main figure but for LRF models trained using stimuli that include self-rotation during hits, misses, and retreats.

Figure 9 -
Figure 9. Large populations of units improve performance (LRF models) (Methods and Materials).(A) Both ROC and PR AUC scores increase as the number of units increases.Colored lines and dots: average scores; shading: one standard deviation of the scores over the trained models.Magenta: outward solutions; green: inward solutions.The dotted horizontal gray lines indicate the value of 1. (B) As the population of units increases, cross entropy losses of the outward solutions approach the losses of the inward solutions.Figure 9-Figure supplement 1.As in the main figure but for RI models.Figure 9-Figure supplement 2. The ratio of the number of the two types of solutions.Figure 9-Figure supplement 3.As in the main figure but for LRF models trained using stimuli that include self-rotation during hits, misses, and retreats.
stimuli, and many studies have investigated how the response patterns of loom-sensitive neurons depend on this ratio (Top panels in Figure10I, J, K, L)(Gabbiani et al., 1999;Von Reyn et al., 2017;Ache et al., 2019;De Vries and Clandinin, 2012).Here, we presented the trained model (Figure10A) with hit stimuli with different ∕ ratios, and compared its response with experimental measurements (Figure10I-L)).Surprisingly, although our model only has angular velocities as inputs (Fig-ure 3), it reliably encodes the angular size of the stimulus rather than its angular velocity (Figure10J).This is indicated by the collapsed response curves with different ∕ ratios (up to different scales) when plotted against the angular sizes (Von Reyn et al., 2017).When the curves are plotted against angular velocity, they shift for different ∕ ratios, which means the response depends on the velocity  of the object, since  is fixed to be 1.The relative shifts in these curves are consistent with properties of LPLC2.
On the other hand, some forms of the the RI model outward solutions can closely reproduce the neural data (Figure 10-Figure Supplement 2), while other outward solutions fail to do so (Figure 10-Figure Supplement 3, Figure 10-Figure Supplement 4).For example, some RI outward solutions predict the patterns in the wide expanding bars differently and out of phase from the biological data (Figure 10-Figure Supplement 3H), do a poor job predicting the response curves of the LPLC2 neurons to looming signals with different ∕ ratios (Figure 10-Figure Supplement 3J, K), respond strongly to the moving gratings (Figure 10-Figure Supplement 4B), cannot show the peripheral inhibition (Figure 10-Figure Supplement 4E, F) Figure Supplement 2).As the number of units increases, the inhibitory component of the outward solutions also becomes broader as the population size becomes larger, which is crucial for reproducing key experimental observations, such as peripheral inhibition (Figure10)(Klapoetke et al., 2017).The optimized solutions depend on the number of detectors, and this is likely related to the increasing overlap in receptive fields as the population grows (Figure6).This result is consistent with prior work showing that populations of neurons often exhibit different and improved coding strategies compared to individual neurons(Pasupathy and Connor, 2002;Georgopoulos et al., 1986;Vogels, 1990;Franke et al., 2016;Zylberberg et al., 2016;Cafaro et al., 2020).Thus, understanding anatomical, physiological, and algorithmic properties of individual neurons can require considering the population response.The solutions we found to the loom inference problem suggest that individual LPLC2 responses should be interpreted in light of the population of LPLC2 responses.Our results shed light on discussions of -like (encoding angular size) and -like (encoding angular velocity) looming sensitive neurons in the literature(Gabbiani et al., 1999;Wu et al., 2005;Liu et al., 2011;Shang et al., 2015;Temizer et al., 2015;Dunn et al., 2016;Von Reyn et al., 2017;Ache et al., 2019).In particular, these optimized models clarify an interesting but puzzling fact: LPLC2 neurons transform their inputs of direction-selective motion signals to computations of angular size(Ache et al., 2019).Consistent with this tuning, our model also shows a linear relationship

Figure 10 .
Figure 10.Units of models trained on binary classification tasks exhibit similar responses to LPLC2 neuron experimental measurements (outward solution of the LRF model with 256 units).(A) The trained filter.(B-H) Comparisons of the responses of the unit with the trained filter in (A) and LPLC2 neurons to a variety of stimuli (Methods and Materials).Black lines: data (Klapoetke et al., 2017); magenta lines: LRF unit.Compared with the original plots (Klapoetke et al., 2017), all the stimulus icons here except the ones in (B) have been rotated 45 degrees to match the cardinal directions of LP layers as described in this study.All response curves are normalized by the peak value of the left most panel in (B).(I) Top: temporal trajectories of the angular sizes for different ∕ ratios (color labels apply throughout (I-L)) (Methods and Materials).Middle: response as a function of time for the sum of all 256 units.Bottom: response as a function of time for one of the 256 units.(J-L) Top: experimental data (LPLC2/non-LC4 components of GF activity.Data from (Von Reyn et al., 2017; Ache et al., 2019)).Middle: sum of all 256 units.Bottom: response of one of the 256 units.Responses as function of angular size (J), response as function of angular velocity (K), relationship between peak time relative to collision and ∕ ratios (L).We considered the first peak when there were two peaks in the response, such as in the grey curves in the middle panel of (I). Figure 10-Figure supplement 1.As in the main figure but for an inward solution of the LRF model obtained from the same training procedure.Figure 10-Figure supplement 2. As in the main figure but for an outward solution of the RI model with 256 units.Figure 10-Figure supplement 3.As in the main figure but for a second outward solution of the RI model with 256 units.Figure 10-Figure supplement 4. As in the main figure but for a third outward solution of the RI model with 256 units.

Figure 10 -
Figure 10.Units of models trained on binary classification tasks exhibit similar responses to LPLC2 neuron experimental measurements (outward solution of the LRF model with 256 units).(A) The trained filter.(B-H) Comparisons of the responses of the unit with the trained filter in (A) and LPLC2 neurons to a variety of stimuli (Methods and Materials).Black lines: data (Klapoetke et al., 2017); magenta lines: LRF unit.Compared with the original plots (Klapoetke et al., 2017), all the stimulus icons here except the ones in (B) have been rotated 45 degrees to match the cardinal directions of LP layers as described in this study.All response curves are normalized by the peak value of the left most panel in (B).(I) Top: temporal trajectories of the angular sizes for different ∕ ratios (color labels apply throughout (I-L)) (Methods and Materials).Middle: response as a function of time for the sum of all 256 units.Bottom: response as a function of time for one of the 256 units.(J-L) Top: experimental data (LPLC2/non-LC4 components of GF activity.Data from (Von Reyn et al., 2017; Ache et al., 2019)).Middle: sum of all 256 units.Bottom: response of one of the 256 units.Responses as function of angular size (J), response as function of angular velocity (K), relationship between peak time relative to collision and ∕ ratios (L).We considered the first peak when there were two peaks in the response, such as in the grey curves in the middle panel of (I). Figure 10-Figure supplement 1.As in the main figure but for an inward solution of the LRF model obtained from the same training procedure.Figure 10-Figure supplement 2. As in the main figure but for an outward solution of the RI model with 256 units.Figure 10-Figure supplement 3.As in the main figure but for a second outward solution of the RI model with 256 units.Figure 10-Figure supplement 4. As in the main figure but for a third outward solution of the RI model with 256 units.

Figure 10 -
Figure 10.Units of models trained on binary classification tasks exhibit similar responses to LPLC2 neuron experimental measurements (outward solution of the LRF model with 256 units).(A) The trained filter.(B-H) Comparisons of the responses of the unit with the trained filter in (A) and LPLC2 neurons to a variety of stimuli (Methods and Materials).Black lines: data (Klapoetke et al., 2017); magenta lines: LRF unit.Compared with the original plots (Klapoetke et al., 2017), all the stimulus icons here except the ones in (B) have been rotated 45 degrees to match the cardinal directions of LP layers as described in this study.All response curves are normalized by the peak value of the left most panel in (B).(I) Top: temporal trajectories of the angular sizes for different ∕ ratios (color labels apply throughout (I-L)) (Methods and Materials).Middle: response as a function of time for the sum of all 256 units.Bottom: response as a function of time for one of the 256 units.(J-L) Top: experimental data (LPLC2/non-LC4 components of GF activity.Data from (Von Reyn et al., 2017; Ache et al., 2019)).Middle: sum of all 256 units.Bottom: response of one of the 256 units.Responses as function of angular size (J), response as function of angular velocity (K), relationship between peak time relative to collision and ∕ ratios (L).We considered the first peak when there were two peaks in the response, such as in the grey curves in the middle panel of (I). Figure 10-Figure supplement 1.As in the main figure but for an inward solution of the LRF model obtained from the same training procedure.Figure 10-Figure supplement 2. As in the main figure but for an outward solution of the RI model with 256 units.Figure 10-Figure supplement 3.As in the main figure but for a second outward solution of the RI model with 256 units.Figure 10-Figure supplement 4. As in the main figure but for a third outward solution of the RI model with 256 units.

Figure 10 -
Figure 10.Units of models trained on binary classification tasks exhibit similar responses to LPLC2 neuron experimental measurements (outward solution of the LRF model with 256 units).(A) The trained filter.(B-H) Comparisons of the responses of the unit with the trained filter in (A) and LPLC2 neurons to a variety of stimuli (Methods and Materials).Black lines: data (Klapoetke et al., 2017); magenta lines: LRF unit.Compared with the original plots (Klapoetke et al., 2017), all the stimulus icons here except the ones in (B) have been rotated 45 degrees to match the cardinal directions of LP layers as described in this study.All response curves are normalized by the peak value of the left most panel in (B).(I) Top: temporal trajectories of the angular sizes for different ∕ ratios (color labels apply throughout (I-L)) (Methods and Materials).Middle: response as a function of time for the sum of all 256 units.Bottom: response as a function of time for one of the 256 units.(J-L) Top: experimental data (LPLC2/non-LC4 components of GF activity.Data from (Von Reyn et al., 2017; Ache et al., 2019)).Middle: sum of all 256 units.Bottom: response of one of the 256 units.Responses as function of angular size (J), response as function of angular velocity (K), relationship between peak time relative to collision and ∕ ratios (L).We considered the first peak when there were two peaks in the response, such as in the grey curves in the middle panel of (I). Figure 10-Figure supplement 1.As in the main figure but for an inward solution of the LRF model obtained from the same training procedure.Figure 10-Figure supplement 2. As in the main figure but for an outward solution of the RI model with 256 units.Figure 10-Figure supplement 3.As in the main figure but for a second outward solution of the RI model with 256 units.Figure 10-Figure supplement 4. As in the main figure but for a third outward solution of the RI model with 256 units.
, C) expands from 20 • to 60 • with an edge speed of 10 • ∕.All the bar and edge motions have an edge speed of 20 • ∕.The width of the bars are 60 • (right panel of Figure 10E, and Figure 10H), 20 • (middle panel of Figure 10E), and 10 • − = rotate( e , 270 • ),  e  + = rotate( e , 90 • ),  e  − = rotate( e , 180 • ),  e  + = rotate( e , 0 • ), and four inhibitory filters as: The average score curves and points in Figure 9A, Figure 9-Figure Supplement 1A and Figure 9-Figure Supplement 2A were obtained by taking the average among each type of solution, with the shading indicating the standard deviations.The curves and point in Figure 9-Figure Supplement 1C are the ratio of the number of outward solutions to the number of inward solutions.To obtain error bars (grey shading), we considered the training results as a binomial distribution, with the probability of obtaining an outward solution being  out ∕( out +  in ), and with the probability of obtaining an inward solution being  in ∕( out +  in ).Thus, the standard deviation of this binomial distribution is  b

Figure 3 -Figure 4 -
Figure 3-Figure supplement 1. (A) Diagram of a simple HRC motion estimator (Methods and Materials), where  indicates temporal delay, the two crosses indicate multiplication, and the minus sign indicates subtraction.(B) Tuning curves of the HRC motion estimator for different stimuli.Grey region indicates the velocity range used in simulations (Methods and Materials).(C) The distributions of the magnitude of all the estimated flow fields in the four cardinal directions for different types of stimuli.

Figure 9 -Figure 9 -Figure 10 -
Figure 9-Figure supplement 2. The black line and dots show the ratio of the numbers of the two types of solutions in the set of randomly initialized, trained models.The grey shading is one standard deviation, assuming that the distribution is binomial (Methods and Materials).The dotted horizontal gray lines indicate a ratio of 1. From left to right: LRF models with ReLU activation functions, LRF models with ELU activation functions, RI models ReLU activation functions.

Figure 10 -
Figure 10-Figure supplement 2. As in the main figure but for an outward solution of the RI model with 256 units, the trained filters of which are shown in (A).

Figure 10 -Figure 10 -
Figure 10-Figure supplement 3.As in the main figure but for a second outward solution of the RI model with 256 units, the trained filters of which are shown in (A).The response curves in (B-H) are produced by setting the threshold of the single unit to be zero (Figure 4B, Methods and Materials).Responses are plotted in this way because the dynamic range of responses of this model to these stimuli does not often exceed the trained threshold.
random initializations were used to train it.For the LRF models, within these 200 training runs, the number of outward solutions  out were (starting from smaller values of )45, 59, 60, 65, 52, 63, 52, 57, 58,     49, and the number of inward solutions  in were 142