Optical flow estimation using the Fisher–Rao metric

The optical flow in an event camera is estimated using measurements in the address event representation (AER). Each measurement consists of a pixel address and the time at which a change in the pixel value equalled a given fixed threshold. The measurements in a small region of the pixel array and within a given window in time are approximated by a probability distribution defined on a finite set. The distributions obtained in this way form a three dimensional family parameterized by the pixel addresses and by time. Each parameter value has an associated Fisher–Rao matrix obtained from the Fisher–Rao metric for the parameterized family of distributions. The optical flow vector at a given pixel and at a given time is obtained from the eigenvector of the associated Fisher–Rao matrix with the least eigenvalue. The Fisher–Rao algorithm for estimating optical flow is tested on eight datasets, of which six have ground truth optical flow. It is shown that the Fisher–Rao algorithm performs well in comparison with two state of the art algorithms for estimating optical flow from AER measurements.


Introduction
The address event representation (AER) [9,27,29] is a new paradigm in computer vision. Each pixel in an AER camera emits a signal when the change in the pixel value equals a fixed threshold. If the change in value is less than the threshold, then no signal is emitted. The pixels emit their signals asynchronously, i.e. without coordination. There is no concept of an image frame of pixel values all of which are obtained at the same instant in time [4]. Cameras which supply data in the AER format are referred to as event cameras or as silicon retinas. The latter term arises from an analogy between the AER and the human retina. The advantages of event cameras over conventional cameras are a low power consumption, a very rapid response to changes in the image and a high dynamic range [6]. Event based vision is surveyed in [17].

Notation for the AER
Each pixel emits a measurement when the value recorded by the pixel changes by an amount ±d, where d is a fixed positive threshold. If the change in value is less than d in absolute value, then the pixel in question does not emit any measurement. The measurements emitted by a pixel q form a list (q, s(i), Δ(i)), i = 1, 2, 3, . . . (1) where s(1), s (2), . . . is an increasing sequence of times and Δ(i) ∈ {0, 1} is the polarity of event i. The component Δ(i) of the measurement specifies the sign of the change in the pixel value. If Δ(i) = 0, then the value decreases by d and if Δ(i) = 1, then the value increases by d.
The full list of measurements obtained from an event camera consists of the union of the lists of measurements (1) over all pixels of the sensor array. Event cameras have lower data rates and lower power consumption than conventional cameras because there is no wasteful output from pixels for which there is only a small or zero change in value.

Related work
Research on processing techniques suitable for AER data has been prolific in the past few years. AER datasets with ground truth are described by Barranco et al [5] and by Zhu et al [38][39][40]. The latter very extensive dataset is used in the experiments described in section 7. Direct conversions of state of the art computer vision algorithms to AER based algorithms are usually achieved by using the intensity information estimated by the local integration of the events (1). This approach is adopted for event correlation applied to stereo matching [24], for photoconsistency based estimation of optical flow [6] and for machine learning using convolution networks [31]. However, local integration of events does not preserve the event camera's temporal accuracy. Akolkar et al [1] show that the high temporal accuracy of event camera measurements yields up to 70% more information, compared with conventional frame based methods. This is a motivation for focussing on truly event-based techniques. For example, Benosman et al [7] reformulate optical flow estimation as a robust local plane fitting problem. The fitted planes are updated as new events arrive. The same technique is generalized by Ieng et al [22] to 3D scenes by fitting planes to ruled surfaces generated by point clouds. The point clouds can be reconstructed either synchronously or asynchronously. The technique can handle different forms of data such as 3D events [13,33], Lidar measurements and 3D points obtained from image frames by classical triangulation. The plane fitting algorithm in [7] for estimating optical flow is compared with the Fisher-Rao method in section 7.2 below.
Rueckauer and Delbruck [34] evaluate nine algorithms for estimating the components of the optical flow normal to moving edges. The first algorithm searches for moving edges using the time differences between nearby events. The motions of the edges are measured. The next four algorithms are based on the Lucas-Kanade approach in which the optical flow is assumed to be locally constant and linear constraints on the optical flow are obtained using the motion constraint equation together with estimates of the intensity gradient. The remaining four algorithms are based on planes fitted to the measurements. The nine algorithms are evaluated on computer generated data obtained firstly from a translating square and secondly from a rotating bar. The algorithms are then evaluated on three experimental datasets obtained using a rotating camera. In contrast with [34], our Fisher-Rao method is a new way of estimating optical flow that does not assume that the optical flow is locally constant and that does not require estimates of the intensity gradient. In section 6, the Fisher-Rao method is applied to two of the datasets in [34].
Bardow et al [3] estimate the optical flow and the full intensity image from event camera data by minimizing a complicated objective function that penalizes high optical flow gradients, high intensity gradients and large deviations from the motion constraint equation for optical flow. The objective function also includes terms to take into account the fact that only differences in image intensities can be measured. The objective function is minimized using iteration, to yield estimates of optical flow and of the full intensity image. In contrast, our Fisher-Rao method for estimating optical flow does not require the full intensity image. In any case, the task of obtaining the full intensity image is ill-posed in that regularisation is required in order to obtain a unique solution. Our Fisher-Rao matrix is obtained by a standard least squares calculation which has a unique solution without the necessity for regularisation.
Brosch et al [11] and Brosch et al [12] construct filters for event camera data by analogy with the filters found in biological vision. The filtered data are used to estimate the components of the optical flow normal to moving edges. In contrast, our Fisher-Rao method does not assume that the event measurements originate from moving edges and it does not require any filtering, except for a single Gaussian smoothing of a spike count array.
Barranco et al [4] estimate normal velocities along a moving contour using event camera measurements. Motion boundaries are located by finding connected groups of pixels such that each pixel emits at least one event during a specified time interval. The components of the normal velocity are estimated separately by considering first the horizontal motion and then the vertical motion. Experiments are carried out using data from a dynamic and active vision based sensor, DAVIS, which is referred to as the ApsDVS sensor by Berner et al [8]. The DAVIS sensor provides both AER measurements and complete frames of intensity values. The data from the frames is used to improve the accuracy of the estimates of contour motion. The estimates of the optical flow and the moving contours are checked using AER measurements synthesized from conventional image sequences. In contrast, the Fisher-Rao method does not require frames of intensity values and it does not attempt to identify any contours in the image. The Fisher-Rao method estimates full optical flow vectors, rather than particular components of the flow vectors.
Zhu et al [40] discretize event camera measurements in the time domain. The discrete measurements are input to a neural network to estimate optical flow. Discrete measurements from a stereo pair are input to a second neural network to estimate ego motion and scene depths. Gallego et al [16] specify a constant value for the optical flow in a small 3D neighbourhood. Each measurement in the neighbourhood defines a trajectory parameterised by time. The point on the trajectory at a fixed reference time is obtained. An objective function is defined using the resulting set of points. The objective function is maximised iteratively over the space of possible values for the optical flow. In contrast, our Fisher-Rao method does not assume that the optical flow is locally constant and it does not require the iterative maximisation of an objective function.
Gherig et al [18] describe a general framework for obtaining a grid based representation for event camera measurements. The measurements are initially represented by a weighted sum of Dirac functions. The Dirac functions are convolved with a kernel and the convolved measurements are sampled in space and time to produce a fourth order array, taking the event polarities in (1) into account. A range of different arrays can be produced by varying the weighting of the Dirac functions, varying the kernel or by projection from the fourth order array. Applications to object recognition and optical flow estimation are described. The optical flow is estimated using EV-FlowNet [39].
Liu and Delbruck [28] record events in three time slice memories. The first memory simply accumulates events. The optical flow associated with an incoming event is estimated by matching blocks of data in the second time slice memory with blocks of data in the third time slice memory. The three memories are updated periodically: the previous first time slice becomes the new second time slice, and the previous second time slice becomes the new third time slice. Three different methods for choosing the times to make the updates are evaluated experimentally. In contrast, our Fisher-Rao method does not rely on block matching. Instead, it estimates optical flow by matching probability distributions using the Fisher-Rao metric. The results of the matching are invariant under the choice of parameterisation of the image.
Ghosh et al [19] use slow feature analysis to extract features from event camera measurements. The features remain stable when events are missed. A convolutional neural network is used to classify actions given the filter responses. In [20] Ghosh et al summarise the information in sets of events using neighbourhood spike count arrays. Features are extracted from the spike count arrays using principal components analysis and slow feature analysis. The features are applied to the tracking of cars in a traffic dataset. Algorithms for high speed tracking using event camera measurements are described by Lagorce et al [26].
The use of local histograms for matching in conventional images is well established [21,23,35]. In this paper the local histograms are normalised to produce probability distributions. Once these distributions are obtained, the optical flow is estimated using powerful methods taken from probability theory, in particular, methods based on the Fisher-Rao metric. The Fisher-Rao metric is described by Amari [2] and by Cover and Thomas [14]. As far as the authors are aware, there is no previous application of the Fisher-Rao metric to the estimation of optical flow using AER measurements.

Optical flow estimation using the Fisher-Rao metric
The relevant properties of the Fisher-Rao metric are described in section 3.1. The metric is applied to a family of discrete probability distributions obtained in section 3.2 by dividing the AER measurement space into rectangular cuboids and counting the number of events in each cuboid. Brightness constancy is discussed in section 3.3. The details of the Fisher-Rao method for estimating optical flow are given in section 3.4.

Overview
The event camera measurements in a small spatiotemporal volume are summarized by a probability distribution defined on a three dimensional grid centred at the mid point (q, t) of the spatiotemporal volume, where q is a pixel and t is a time. In this way, a three parameter family of probability distributions is obtained. These distributions have the role of spatiotemporal features. The Fisher-Rao metric is a Riemannian metric defined on the parameter space for the probability distributions. In this case the parameter space is a subset of R 3 . The metric is specified at each point (q, t) of the parameter space by a 3 × 3 symmetric non-negative matrix J(q, t). Further information is given by Amari [2], Cover and Thomas [14] and Kullback [25]. The squared distance between the probability distribution with parameters (q, t) and the probability distribution with parameters (q + Δq, t + Δt) is given to leading order by The squared distance (2) is estimated directly from the two probability distributions using the Kullback-Leibler divergence [14,25]. The matrix J(q, t) is estimated in turn using the squared distances obtained for a range of different values of (Δq, Δt). It is assumed that brightness constancy holds to a good approximation over a short time interval. With this assumption a moving object gives rise to a sequence of distributions that are close together in the Fisher-Rao metric. If (q, t) and (q + Δq, t + Δt) are the parameter values for two distributions in this sequence, then the squared distance (2) is small, and (Δq, Δt) is an estimate of the eigenvector (Δq e , Δt e ) of J(q, t) with the least eigenvalue. The optical flow vector at (q, t) is given by Δq e /Δt e . If two of the eigenvalues of J(q, t) are small, then the full optical flow cannot be estimated. Instead, one component only of the optical flow can be estimated. This is the well known aperture problem. Further details are included at the end of section 3.4, below.
The advantages of the Fisher-Rao algorithm for estimating optical flow are as follows.
• The Fisher-Rao algorithm has a small number of parameters. There is no learning stage and no requirement for application specific features. It is not necessary to estimate the pixel grey levels. • The use of the Fisher-Rao metric ensures that the squared distances in (2), on which the Fisher-Rao algorithm depends, are fundamental quantities, in that they are unaffected by the choice of the parameterisation of the family of probability distributions. • The aperture problem can be described cleanly, using the eigenvalues of the Fisher-Rao matrix.

Implementation details
Each point (q, t) of the parameter space has a box neighbourhood, as noted in [20]. To be specific, let m, n be odd positive integers and let τ > 0 be a time interval. An event (r, s), r ≡ (r 1 , r 2 ), is in the box neighbourhood of (q, t) if and where |.| is the absolute value.
The events in the box neighbourhood of (q, t) are used to define a neighbourhood spike array, L(q, t), as in [20]. The array L(q, t) has dimensions m × m × n. Each element of L(q, t) corresponds to a voxel in R 3 .
Let . be the floor function. A point (r, s) in the box neighbourhood of (q, t) is in the voxel corresponding to the array indices i, j, k defined by The array element L ijk (q, t), 1 i, j m, 1 k n, is equal to the number of events in the voxel corresponding to (i, j, k). The array L(q, t) is scaled to produce a discrete probability distribution, g(q, t), defined on the set The sum of the elements g ijk (q, t) over i, j and k is equal to one.
It is convenient to choose coordinates in R 3 such that q = (0, 0) and t = 0. With this choice, the probability distribution g(q, t) is denoted by g 0 . Let a = (a 1 , a 2 , a 3 ) be a vector in R 3 . Then g a is defined to be the probability distribution obtained from the neighbourhood spike array L(r, s) centred at the point r = (a 1 , a 2 ), s = a 3 τ . The probability distributions g a for a in {−1, 0, 1} 3 are used in section 3.4 to estimate the 3 × 3 matrix J(0) that specifies the Fisher-Rao metric at q = 0, t = 0.

Optical flow
Suppose that a moving object is observed by a camera for a short period of time. It is assumed that brightness constancy holds, in that the appearance of a point on the object does not change significantly as the point moves through a short distance in the field of view. If a point is observed at the pixel (i 0 , j 0 ) at time t 0 and if the same point is observed at the pixel (i 1 , j 1 ) at a later time t 1 near to t 0 , then the value of the pixel (i 0 , j 0 ) at time t 0 is approximately equal to the value of the pixel (i 1 , j 1 ) at time t 1 . This brightness constancy is the basis of many methods for estimating optical flow [15]. The optical flow (u, v) at (i 0 , j 0 ) at time t 0 is estimated by Let τ = t 1 − t 0 . It follows from (3) that The optical flow (u, v) and the time interval τ together define a translation (uτ , vτ , τ ) in the measurement space R 3 . The magnitude of this translation is proportional to τ . In some cases it is not possible to establish a unique match between points (i 0 , j 0 , t 0 ) and (i 1 , j 1 , t 1 ). For example, if the optical flow is due to a moving straight edge and if (i 0 , j 0 , t 0 ) matches (i 1 , j 1 , t 1 ), then (i 0 , j 0 , t 0 ) also matches any point (i 2 , j 2 , t 1 ) for which (i 2 − i 1 , j 2 − j 1 ) is parallel to the edge. The component of the optical flow parallel to the edge cannot be measured. This ambiguity is known as the aperture problem.

Estimation of the optical flow
The optical flow is estimated at a point (q, t), where q is a pixel and t is a time. The estimate is obtained using the Fisher-Rao matrix J(q, t). As noted at the end of section 3.2, it is convenient to choose coordinates in R 3 such that q = (0, 0) and t = 0. In this context, the Fisher-Rao matrix is written as J(0), in place of the notation J(q, t) used in section 3.1. Let a be a vector in R 3 and let g a be the associated probability distribution, as defined in section 3.2.
A particular value for Δ(i) in (1) is chosen, for example Δ(i) = 1. The Fisher-Rao matrix is obtained using the fact that a scaled version of the Fisher-Rao matrix is a leading order approximation to the Kullback-Leibler divergence [2,25] as shown in (6) below. The Kullback-Leibler divergence D(0 a) of g a from g 0 is defined by If D(0 a) is sufficiently smooth as a function of a, then the leading order term in a Taylor expansion of D(0 a) at a = 0 is quadratic in a [25], in that where J (0) is a symmetric 3 × 3 non-negative matrix. The matrix J (0) is estimated using the 26 values of D(0 a) for a in {−1, 0, 1} 3 , a = 0, together with the approximation In fact it is only necessary to estimate accurately the eigenvector of J (0) associated with the least eigenvalue.
The relevant values of D(0 a) are those near to zero. Let a = (uτ , vτ , τ ), where (uτ , vτ , τ ) is as defined in (4). The square of the distance between the distributions g 0 and g a is estimated by It follows from brightness constancy, as described in section 3.3, that the measurements used to estimate g a are translates in R 3 of the measurements used to estimate g 0 . It follows that g a is equal to g 0 , thus the Fisher-Rao distance between g a and g 0 is zero and (uτ , vτ , τ ) is an eigenvector of J (0) with eigenvalue 0.
In the above calculations the terms Δ(i) in (1) have the value 1. The measurements for which Δ(i) = 0 are also used to obtain a set of probability distributions and an associated Fisher-Rao matrix J (0). Let J(0) be defined by The optical flow at the point (q, t) corresponding to the point 0 ≡ {0, 0, 0} is estimated using the eigenvector of J(0) with the least eigenvalue. The eigenvalues of J(0) can be used to detect the aperture problem described in section 3.3. If two of the eigenvalues of J(0) are near to zero and the third eigenvalue is significantly different from zero, then only one component of the optical flow can be measured accurately, i.e. the aperture problem appears. In detail, let (u, v) be the optical flow. If J(0) has two eigenvalues equal to zero, then it has the form J(0) = e e, where e is a row vector with coordinates e = (e 1 , e 2 , e 3 ). It follows from the definition of (u, v) that for a short time interval τ , The magnitude of the component of the optical flow parallel to (e 1 , e 2 ) is obtained by taking the scalar product of (u, v) with the unit vector in the direction (e 1 , e 2 ), namely which is equal to The component of the optical flow normal to (e 1 , e 2 ) cannot be measured.

Implementation
The algorithm described in section 3.4 for estimating the optical flow requires some modifications and choices of parameters in order to obtain accurate results in practice. In the following description it is assumed that the quantity Δ(i) in (1) + (a 1 , a 2 such that r is a pixel and (a 1 , a 2 , a 3 ) ≡ a is in {−1, 0, 1} 3 . Letã t (q) be the (m + 2) × (m + 2) × (n + 2) subarray of A t defined byã where : is the MATLAB notation for a range of array entries. The sub-arrayã t (q) is referred to as a block centred at (q, t). It contains the 27 sub-arrays obtained by setting r = q in (10). Let n q be the number of non-zero entries inã t (q). A list C t is made of the pixels q for which where f is a fixed parameter taking a value in [0, 1]. The optical flow is estimated only for the pixels contained in C t . A small strictly positive quantity is added to each element of A t to ensure that the elements are all strictly larger than zero. This is to avoid numerical instabilities in the calculation (5) of the Kullback-Leibler divergence. The resulting array is smoothed with a mask that approximates to a Gaussian function with covariance σ 2 I where I is the 3 × 3 identity matrix. Let B t be the smoothed array.
Letb t (q) be the block obtained by replacing A t in (11) with B t . For each pixel q in C t , let g a for a in {−1, 0, 1} 3 be the set of 27 probability distributions obtained fromb t (q). At this point, it is convenient to choose coordinates such that q = (0, 0) and t = 0. The Fisher-Rao matrix J (0) is estimated using (6). There are six parameters to be estimated, namely J 11 (0), J 12 (0) A solution J (0) to (13) is estimated using least squares. Similar calculations are carried out using the measurements with Δ(i) = 0, to obtain a Fisher-Rao matrix J (0). If J (0) and J (0) are both defined, in that the corresponding sub-arraysã t (0) andã t (0) each have a sufficient number of non-zero entries, then they are added, as in (8), to yield a matrix J(0). If J (0) or J (0) is not defined, then the calculation is abandoned.
If λ 3 is comparable in magnitude to λ 1 and λ 2 , then there is no match between nearby probability distributions and the optical flow is not defined. Let w ≡ (w 1 , w 2 , w 3 ) be the eigenvector corresponding to the least eigenvalue of an accepted matrix J(0). The optical flow at the corresponding pixel (x, y) is estimated by The units for the components u, v of the optical flow are pixels τ −1 . The estimate (u, v) of the optical flow is accepted only if (u, v) maxFlow, where maxFlow is a physically plausible threshold and . is the Euclidean norm.
The time complexity for computing the array A t is linear in the number of events. The time complexity for smoothing the array A t and obtaining the list C t of pixels is O(x max y max (n + 2)). The time complexity for estimating each Fisher-Rao matrix is the sum of the O(m 2 n) cost of calculating the Kullback-Leibler divergences (5) and the O(1) cost of the least squares estimate of J(0). The time complexity also depends on the parameter f in (12). If f is large then few flow vectors are obtained.
A summary of the algorithm for estimating optical flow is included in appendix B.

Experiments with five datasets
This section describes experiments to test the Fisher-Rao method for estimating optical flow using five new datasets, namely data 1, data 2, data 3, data 4 and data 5. The datasets data 1, data 2, data 3 and data 4 were obtained using the asynchronous time-based image sensor (ATIS) [32] made by Prophesee. Data 5 was obtained using the next generation sensor, H-VGA. Ground truth optical flow was obtained for data 3, data 4 and data 5 using the OptiTrack motion capture system [30]. Further information about OptiTrack is given in appendix A. The relevant properties of the proposed datasets are summarized in table 2. The sixth dataset, MV/SEC, is described in section 7.1 below. The parameter values used to test the Fisher-Rao algorithm on these datasets are summarised in table 3. Changes in the parameter values from one dataset to the next were avoided, as far as possible. For example, five of the six datasets in table 3 use the same size 11 × 11 × 11 arrays to construct the box neighbourhoods, as described in section 3.2. This indicates that the tuning of the parameter values is stable.

Data 1
The data were obtained in a laboratory using an ATIS event camera which was rotated about a fixed axis while viewing two flat pages. The camera was initially at rest, then it was rotated about the axis and finally brought to rest at the end of the motion. The first measurement was obtained at time t min = 8513 875 μs and the last measurement was obtained at time t max = 11 791 554 μs. The estimated optical flow is shown in figure 1 for four consecutive time intervals, each one of length 400 000 μs, with the parameter τ in sections 3.3 and 3.4 given by τ = 400 000/(n + 2)μs. The size of the pixel array was 240 × 304.
A pixel in figure 1 is blue if the events with Δ(i) = 0 predominate. A pixel is yellow if the events with Δ(i) = 1 predominate. If the number of events with Δ(i) = 0 is equal to the number of events with Δ(i) = 1, then the pixel is white. The estimated optical flow vectors are shown superposed in red. Each flow vector is scaled up by a factor of 10 in order to make it more visible. If the number of flow vectors is large, then some vectors are removed in order to improve the visibility of the remaining vectors. The parameters used to obtain the results in figure 1 are m = 11 pixels, n = 11 pixels, f = 1/20, σ = 2 pixels, β 1 = 10, β 2 = 4, maxFlow = 5 pixels τ −1 and = 0.01, using the notation in section 4. The parameter values are chosen empirically. The parameters m, n and the time interval 400 000 μs are chosen large enough to ensure that the probability distributions are stable. The threshold maxFlow is necessary in order to remove outliers from the optical flow vectors.

Data 2
The data consist of measurements obtained by an ATIS event camera placed in a city street. The camera was mounted on a tripod and directed towards the road traffic. The pixel array is of size 240 × 304. The first measurement was obtained at t min = 13 μs and the last was obtained at t max = 5.06 × 10 7 μs. The four images shown in figure 2 were obtained from consecutive time intervals, each one of length 10 5 μs. The parameter τ was given by τ = 10 5 /(n + 2) μs. As in figure 1, each optical flow vector is scaled up by a factor of 10 and some flow vectors are not shown in order to improve the visibility of the remaining vectors. The parameters m, n, f, β 1 , β 2 , σ, maxFlow and have the same values as in section 5.1 for data 1.
Grey level image frames are available for data 2. See for example figure 3. However, these frames were not used in the estimation of the optical flow.

Comparison with ground truth for data 3 and data 4
The ATIS datasets data 3 and data 4 are used to test the Fisher-Rao method by comparing the estimated optical flow with a ground truth optical flow provided by a motion capture system. The position and orientation of a moving camera relative to a planar set of points are measured over time. The ground truth optical flow is obtained by projecting the points into the camera. The motions of the camera and the motions of the planar set of points are measured using the motion capture system OptiTrack [30], which consists of 8 Flex13 cameras. The motion capture has a sub-millimeter spatial resolution and an acquisition frequency of 120 Hz. All technical specifications, including information about accuracy, can be found on the web page [30]. Further information about the calculations used in OptiTrack is given in appendix A.
In order to track the event-based camera, markers are put on its casing so that when the camera moves the trajectory of the casing can be updated in real time. The camera observes a planar printed pattern which contains a set of points that provide the data for the Fisher-Rao algorithm. The planar pattern also contains markers from which the ground truth optical flow is calculated.
The Fisher-Rao algorithm estimates the optical flow using time slices of 400 ms (2.5 Hz) in data 3 and slices of 100 ms (10 Hz) in data 4. The ground truth image velocities are computed from the sequences of positions of the tracked points obtained at times between two consecutive Fisher-Rao estimates of the optical flow.
The camera motion to obtain data 3 is similar to that used for data 1: the camera is placed on a tripod and rotated about a vertical axis, first clockwise and then counterclockwise, while the planar pattern is held stationary. An example of the optical flow obtained by the Fisher-Rao algorithm is shown in figure 4. The flow vectors are scaled up by a factor of 15 and some flow vectors are not shown in order to improve the visibility of the flow field. Histograms of the directions errors, i.e. the differences between the orientations of the ground truth optical flow vectors and the orientations of the estimated optical flow vectors, are shown in the upper part of figure 5 for the two sweeps of the camera. The mean amplitude error shown for the two sweeps in figure 5 is the mean of the Euclidean norms of the differences between the empirical flow vectors and the corresponding ground truth flow vectors. The parameters are m = 11 pixels, n = 11 pixels, f = 1/20, σ = 2 pixels, β 1 = 10, β 2 = 4, maxFlow = 5 pixels τ −1 and = 0.025. The value of τ is τ = 400 000/(n + 2) μs.
A normal distribution is fitted to the scaled histogram of the directions errors. The mean value of the distribution is −1.5 × 10 −3 rad and the standard deviation is 5 × 10 −3 rad. The estimated amplitudes are close, with a mean Euclidean error of 7 pixels s −1 . The last sample in sweep 1 and the first sample in sweep 2 produce large errors because the pattern leaves the field of view and there are only a few pixels for which the optical flow can be estimated. Data 4 differs from data 3 in that the camera is static while the pattern moves in the world reference frame. An example of the optical flow obtained by the Fisher-Rao algorithm is shown in figure 4(b). The flow vectors are scaled up by a factor of 10 and some flow vectors are not shown in order to improve the visibility of the flow field. The parameters are m = 11 pixels, n = 11 pixels, f = 1/30, σ = 4 pixels, β 1 = 10, β 2 = 4, maxFlow = 5 pixels τ −1 and = 0.025. The value of τ is τ = 100 000/(n + 2)μs. Figure 6 shows the distribution of the directions errors and the mean amplitude errors, calculated as for data 3. The errors are higher than for data 3 because the pattern was moved manually with a velocity varying in amplitude and direction, as required to retain the pattern within the field of view. The estimated mean and standard deviation of the directions errors are respectively −7.5 × 10 −2 rad and 4 × 10 −2 rad. The lower performance of the Fisher-Rao flow in this experiment is likely to be due to the varying velocity of the pattern. The matrices for the Fisher-Rao metric are estimated over spatiotemporal volumes whose dimensions have to fit the motion. If the volume is too small, then not enough events are available to estimate the flow and if the volume is too big, then the velocity estimate is an averaged value.

Data 5
Data 5 was obtained using a new generation H-VGA of the ATIS sensor. H-VGA and ATIS are compared in table 1. The H-VGA sensor has an increased spatial resolution, a higher signal to noise ratio, sharper images and better performance in low light, as compared with the ATIS sensor. As a result, more accurate estimates of the optical flow can be obtained from H-VGA measurements.
An example of the optical flow obtained from data 5 is shown in figure 7. Some flow vectors are omitted in order to make the flow field clearer. The flow vectors are also scaled up by a factor of 10. The ground truth optical flow is obtained from OptiTrack using markers attached to a plane surface held by the experimenter, as shown in figure 8. The results of the Fisher-Rao algorithm are compared with ground truth only for the optical flow arising from the moving plane.
The increase in quality of the estimated optical flow can be seen by comparing the error curves for data 4, given in figure 6, with those of data 5, given in figure 8. In both cases the flow arises from a plane moved  with an unconstrained velocity while keeping the camera static. The directions errors for data 5 have a more uniform and narrower distribution than the direction errors for data 4. The mean amplitude error is also lower, going from few pixels s −1 to a maximum of 20 pixels s −1 . The parameters for data 5 are m = 5 pixels, n = 5 pixels, f = 1/160, σ = 2 pixels, β 1 = 10, β 2 = 4, maxFlow = 2 pixels τ −1 and = 0.025. The value of τ is τ = 62500/(n + 2) μs. The errors in estimating optical flow with the H-VGA sensor are less than the errors obtained using the ATIS sensor, even though the parameters m, n, which control the size of the window for each probability distribution, are reduced from m = n = 11 pixels to m = n = 5 pixels.

Experiments with the Rueckauer-Delbruck data
The Fisher-Rao algorithm was applied to two of the datasets in [34], namely the translating square (translSquare) and the rotating disk (rotDisk). These datasets were chosen because the ground truth is known. The results for translSquare and rotDisk are discussed in sections 6.1 and 6.2 respectively.

Translating square
The data set translSquare is computer generated. It shows a textureless square of size 40 × 40 pixels 2 translating with a constant velocity of (20, 20) pixels s −1 . The first measurement is obtained at time t min = 50, 000 μs and the last measurement is obtained at time t max = 5, 000, 000 μs. Nineteen consecutive subintervals of width Δs = 247 500 μs are chosen from [t min , t max ], such that the central (i.e. 10th) subinterval is It is not possible to obtain the full optical flow vectors along the sides of the square because of the aperture problem. The normal components of the optical flow vectors are obtained using (9). The parameters used by the Fisher-Rao algorithm are m = 11 pixels, n = 11 pixels, f = 1/100, β 1 = 5, σ = 2 pixels, maxFlow = 2 pixels Δs −1 . The condition λ 2 β 2 λ 3 in (14) is discarded because the eigenvalues λ 2 , λ 3 of the Fisher-Rao matrix both tend to be small compared with λ 1 . It was found that very few spatiotemporal volumes contain enough measurements with Δ(i) = 0 and with Δ(i) = 1 to enable the calculation of both of the matrices J (0), J (0) in (8). If only one of J (0), J (0) is available, then J(0) is set equal to that matrix. The optical flow estimated by the Fisher-Rao algorithm for the 10th subinterval is shown in figure 9

Rotating disk
The data set rotDisk [34] is obtained from a 240 × 180 pixel dynamic and active-pixel vision sensor (DAVIS). The camera observes a disk divided into eight sectors with varying grey levels. The disk is kept stationary while the camera rotates about a fixed axis. The disk appears to rotate in a clockwise direction about a centre c = (115, 86). The first measurement is obtained at time t min = 341 678 μs and the last measurement is obtained at time t max = 3508 437 μs. 19 consecutive subintervals of width Δs = 158 338 μs are defined following the example in section 6.1.
It is not possible to obtain the full optical flow vectors because of the aperture problem. The boundaries between the different sectors of the disc are straight lines with uniform grey levels on either side. As in section 6.1, the normal component of the optical flow is obtained using (9). In this particular case the normal optical flow coincides with the full optical flow. The parameters used by the Fisher-Rao algorithm are the same as for translSquare. The condition λ 2 β 2 λ 3 in (14)   It is apparent by visual inspection that the upper vertical bar in figure 9(b) moves clockwise to the corner of the image in figure 9(c). The change in orientation of the bar is estimated to be 0.69 radians. The time interval is 9Δs, thus the angular velocity is estimated to be 0.48 radians s −1 , in agreement with the value obtained by the Fisher-Rao algorithm.

Comparison with the state of the art
In this section the Fisher-Rao algorithm is compared with two state of the art algorithms for estimating optical flow from event camera measurements. The first algorithm has two forms, namely EV − FlowNet 2R and EV − FlowNet 4R (Zhu et al [39]). The results are described in section 7.1. The second algorithm fits planes to sets of events [6,7]. The results are described in section 7.2.

Comparison with EV − FlowNet 2R and EV − FlowNet 4R
The data for the comparison between the Fisher-Rao algorithm and EV − FlowNet 2R were obtained from the multi vehicle stereo event camera (MVSEC) dataset [38][39][40]. The dataset contains stereo event camera measurements from a car, motorcycle, hexacopter and hand held camera. The data were obtained from both indoor and outdoor environments. The Fisher-Rao algorithm was applied to the indoor hexacopter measurements. Zhu et al [39] obtained the ground truth optical flow using the Vicon motion capture system with 20 cameras to observe the hexacopter. The camera was a DAVIS m346B with a resolution of 346 × 260 pixels 2 .
The list of events was obtained from the file indoor_flying1_data.hdf5. The ground truth optical flow and a list of time stamps were obtained from the MVSEC file indoor_flying1_gt_flow_dist.npz. Let ts be the list of time stamps and let t be a time interval. Nine time slices were chosen, namely for 681 i 689. The time step τ is given by τ = t/(n + 2) = t/13. The remaining parameter values are m = 11 pixels, n = 11 pixels, f = 1/240, σ = 2 pixels, β 1 = 5, β 2 = 2, maxFlow = 3/2 pixels τ −1 and = 10 −5 . Figure 10  from table 1 in [39], where they are referred to as average end point errors. The mean amplitude errors for the Fisher-Rao algorithm, for example as shown in figure 10(d), are scaled to give the values obtained after 1 s of the flow. In order to make the comparison with EV − FlowNet 2R and EV − FlowNet 4R in table 2 it is necessary to scale the errors to give the values obtained after t seconds of the flow. It is apparent from tables 2 and 4 that the errors for the Fisher-Rao algorithm are near to the errors for EV − FlowNet 2R and EV − FlowNet 4R . The advantage of the Fisher-Rao algorithm is that it is much simpler to initialize. Grey level images and ground truth optical flow are not required. It is not necessary to train a complicated deep neural network. It is only necessary to tune the values of the nine parameters listed in table 3. The experimental results show that the Fisher-Rao algorithm is stable, in that only minor changes in the values of the parameters are required if the dataset is changed (table 4).

Comparison with a plane fitting algorithm
The Fisher-Rao algorithm is compared with a state of the art event based algorithm for estimating optical flow. The algorithm fits planes to sets of measurements in R 3 [6,7]. The algorithm is applied to data 3 and its results compared with those obtained from the Fisher-Rao algorithm. The parameters of the algorithms are made as similar as possible: the size of the spatial neighbourhood is set to 11 × 11pixels 2 and only the latest events within this spatial neighbourhood are used for estimating the flow. A local plane fitting is carried out for each incoming event and the optical flow is estimated using the orientation of the plane. The plane based algorithm only estimates the components of the flow normal to the moving edges that generate the events, as shown in the image in figure 11.
As the results of the Fisher-Rao algorithm have already been compared with the ground truth in section 5.3, only the directions of the estimated flow vectors are compared. The histograms of directions computed by the two algorithms are shown in figure 12. The most interesting result that can be obtained from the histograms is a comparison of the robustness of each algorithm to the aperture problem. The estimated flow directions for the   11. Comparison of the Fisher-Rao algorithm with the plane based algorithm in [7] using zooms of the flows. The red arrows show the normal flow obtained in [7]. The green arrows show the Fisher-Rao flow.
Fisher-Rao algorithm agree with ground truth. The estimated flow directions for the plane based algorithm have a larger spread around the ground truth direction.

Conclusion
A new algorithm for estimating optical flow from event camera measurements has been described. The algorithm is based on the Fisher-Rao metric, which is defined on the parameter spaces for families of probability distributions. In this application, the distributions are obtained from the measurements in small spatiotemporal volumes referred to as box neighbourhoods. The time component of the measurement is quantized, to ensure that each distribution is defined on a three dimensional neighbourhood spike array. The parameter space for the family of distributions is also three dimensional. The parameter value corresponding to a given distribution is, by definition, the centre of the spatiotemporal volume from which the distribution is The Fisher-Rao algorithm requires a sufficient number of measurements to establish the probability distributions from which the optical flow is obtained. In the experiments reported in sections 5, 6 and 7, time slices of several hundred milliseconds are required to accumulate the measurements. The event camera's native time precision is the order of a microsecond. An optical flow algorithm to meet this time precision might be obtained using a sliding window technique, but this is a topic for future research.