Turbulence mitigation in imagery including moving objects from a static event camera

Abstract. Long-range horizontal path imaging through atmospheric turbulence is hampered by spatiotemporally randomly varying shifting and blurring of scene points in recorded imagery. Although existing software-based mitigation strategies can produce sharp and stable imagery of static scenes, it remains highly challenging to mitigate turbulence in scenes with moving objects such that they remain visible as moving objects in the output. In our work, we investigate if and how event (also called neuromorphic) cameras can be used for this challenge. We explore how the high temporal resolution of the event stream can be used to distinguish between the apparent motion due to turbulence and the actual motion of physical objects in the scene. We use this to propose an algorithm to reconstruct output image sequences in which the static background of the scene is mitigated for turbulence, while the moving objects in the scene are preserved. The algorithm is demonstrated on indoor experimental recordings of moving objects imaged through artificially generated turbulence.


Introduction
Light traveling through the atmosphere encounters turbulent regions that modify the optical path length. 1 As the light propagates, the effects of turbulent regions accumulate, leading to a random phase distortion of the wavefront, which causes time-varying blurs and shifts in the image recorded by a camera.Atmospheric turbulence therefore limits the effective resolution of optical imaging in many long-range observation applications such as surveillance or astronomy.
The effect of turbulence can be mitigated using hardware capable of measuring and correcting for the wavefront distortion while recording (adaptive optics) and/or using software such as image processing techniques. 2 For astronomy, adaptive optics often performs well at correcting the wavefront distortion for the observation of point sources such as stars.For observation of extended areas (in surveillance applications), this technique is often unsuitable, as the wavefront distortion varies across the field of view.This implies that a correction that removes the distortion for one point in the scene does not remove the distortion for regions in other parts of the scene.In such a case, mitigation through image (post)processing is preferred.[5][6][7][8][9][10][11][12][13][14] Motion compensation is often accomplished by computing a static reference frame and warping all frames in a sequence to that reference.Typically, the reference frame is obtained by temporal filtering of the intensity values per pixel, such as in the methods of Fishbain et al. 3 and Zhu and Milanfar, 7 or by filtering the estimated pixel motion from frame-to-frame to estimate the true pixel location, such as in the method of Halder et al. 14 Alternatively, a dynamic reference frame can be computed by tracking the frame-to-frame motion, such as in the method of Nieuwenhuizen et al. 8 Sharp image regions are identified using spatial sharpness measures, often based on the local gradients, such as in Aubailly et al. 11 Finally, many proposals have been made for the multi-frame data fusion, often based on temporal low-pass filtering and subsequent sharpening or deconvolution, such as in Zhu and Milanfar. 7Notable alternative approaches include that of Anantrasirichai et al., 13 who proposed a recursive image fusion scheme using the dual-tree complex wavelet transform, and Oreifej et al., 6 who proposed the use of a three-term low-rank matrix decomposition of the spatiotemporal data cube to extract the background estimate.
A limitation of image processing approaches is that the frame rate of classical cameras is typically too low to capture all of the dynamics of the turbulence-induced changes in the images.As a result, local regions in single frames combine instances that are temporarily sharp with instances that are less sharp, and tip/tilt aberrations are averaged to further blur these regions.Moreover, distinguishing moving objects from motion due to turbulence is often problematic at these frame rates.At these frame rates, the frame-to-frame shifts can be substantially larger than a single pixel.Due to this, and due to the small-scale turbulent image distortions, it is difficult to accurately estimate the shifts from the images, which severely complicates the identification of moving objects pixels as shown in Re. 15.
Event cameras, also known as neuromorphic cameras, do not record an entire frame with a shutter but instead output an asynchronous stream of intensity changes, so-called events.Contrary to classical frame recordings that record a lot of redundant data of image regions that stay constant, this new recording technique allows for recording of only the local changes between frames, so the bandwidth and the recording resources are best used to record the local dynamics of the scene or of the camera.When combined with computer vision algorithm like optical flow such as in Ref.With its low latency and high temporal sampling, the event camera is therefore expected to be well suited to record the temporal variations of the atmospheric turbulence otherwise unseen by a conventional camera.This additional information may be used to improve the quality of the restored image as shown in Ref. 22.In this paper, we present our results combining image processing on an intensity image recorded by a camera and event processing to show the enhancement brought by the additional event stream over classical frame-based mitigation.
Because this is a first exploration of this possibility, the scope of the investigation is limited here to applications in which the camera itself is static and moving objects exhibit rigid body motion.This means that different parts of the moving object do not exhibit significant motion relative to each other in the imagery.Section 2 describes the principle of operation of the event camera.Section 3 details how the rapid sampling of the event camera is used to reconstruct a fixed background, separate moving objects from turbulence motion, and reconstruct the appearance of a fast-moving object.These algorithmic building blocks are used to construct the turbulence mitigation pipeline described in Sec. 3. The experimental setup to validate this approach is described in Sec. 4. Section 5 explains the results of the experiment designed to assess the benefit of the event camera and presents results of the turbulence mitigation pipeline compared with state-ofthe-art image processing methods.Finally, the conclusions are summarized in Sec. 6.

Event Camera
Unlike conventional cameras that record entire frames synchronously, event cameras only record logarithmic intensity changes.The event camera encodes the changes as an asynchronous series of spikes called events.E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 2 1 3 where the k'th event e k is a quadruplet that consists of a location u k ¼ ðx k ; y k Þ T , a polarity p k (positive or negative direction), and a time stamp t k .As shown in Fig. 1, an event is produced when the difference between the memorized log intensity and the current log intensity exceeds a preset threshold S (controlled by the user).With the intensity I and using the notation L ¼ log 10 ðIÞ, this is written as ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 1 2 1 When the threshold is passed, the time stamp, the pixel position, and the polarity are emitted, and the current log intensity value is memorized for the monitoring.

Event-Based Turbulence Mitigation
Most of the software-based turbulence mitigation approaches aim at recording multiple frames of the same scene.The turbulence will change over time, such that frames will suffer from varying distortions and blur.Turbulence mitigation attempts to distinguish between the temporal consistency of the actual scene and the random distortions.A particularly popular technique relies on the identification and selection of "lucky" space-time regions, regions that show locally the best image quality during a short period of time.With a standard camera, increasing the recording frame rate will then increase the chances of catching the instants for which the image quality is at its best.With its new sensing principle, the event camera has the ability to record short lucky instants without being limited by the integration time.
Here, we investigate how the above-mentioned concept of luckiness can be exploited; it is defined on a local neighborhood of intensities recorded synchronously, while using a camera that records only intensity variations asynchronously.The first step consists of estimating the continuous intensity image from the set of sparsely recorded intensity images and the continuous event stream.Using an iterative backprojection algorithm, we evaluate if the event stream carries information that can indicate if portions of the continuous intensity image stream are lucky.
We now detail our proposed method for how frames are reconstructed from events to be used in an image reconstruction algorithm.

Image Reconstruction from Events
The main purpose of image reconstruction from events is to transform the asynchronous event stream into classical intensity frames.The created frames benefit from the extended dynamic range of event-based recording and the ability to precisely choose the time at which the frame is reconstructed.This allows for recording information in highly dynamic scenes or information present for a short time, which would not be accessible with classical frame imaging as it has limited dynamic range, blind time, and integration time.However, relying purely on events to recreate an intensity frame also has shortcomings.Indeed, in the presence of flat regions or little motion, very few events may be triggered, resulting in a lack of information for those zones.Also, integrating incremental changes to recreate an intensity image will lead to the unavoidable integration of error and a drift of the estimated intensity.To overcome these limitations, recent event sensors such as the DAVIS346 from IniVation have an architecture providing two readouts: an asynchronous event stream and an intensity frame resulting from the photocurrent integration during the exposure time.
The combination of regular intensity frames together with the asynchronous stream of events has been researched in different publications.Early approaches for image reconstruction relied on the integration of the contribution of each event. 23More recently, joint estimation of optical flow and intensities manifold regularization 24 were proposed to address the issues with noise of the early approaches while offering real-time processing.A continuous intensity estimation based on a complementary filter was proposed by Ref. 25.The most recent techniques rely on deep learning with small and large architecture 26,27 providing state-of-the-art performance.The recent high speed, high dynamic range dataset 28 shows the growing interest for this subject's important for automotive applications.
Figure 2 shows the method used to recreate an intensity image IðtÞ at a point in time t located between the moment at which the camera recorded intensity frames I j and I jþ1 .Figure 2(a) shows what the camera recorded: the intensity frames (with their integration marked in blue) and the event stream.For clarity, only a subset of the events is shown on the figure.As events are emitted when the camera records an intensity change, it should be possible to recreate the intensity at a time t by incrementally integrating the intensity changes corresponding to each event up to a specified point in time [Fig.2(b)].To work with an event camera, this integration needs to be adapted in two ways: a. First, events are emitted for log intensity changes, and therefore the intensity frames I j and I jþ1 recorded by the camera need to be transformed to the log domain before integrating.To distinguish, we denote with I the frames containing linear intensities and with L the frames containing log intensities (also called log luminance).Frames recorded by the camera are noted with an index such as I n , and frames reconstructed from events are considered a function of time IðtÞ.b.The log intensity change corresponding to each event is not known precisely, and the relation with the threshold S [Eq.( 2)] set by the user when operating the camera is unknown a priori.We therefore need to estimate the log intensity change corresponding to one event, a quantity that we call a contribution and denote cðp k Þ.The contribution is independent of the event position in the image and only depends on its polarity [Eq.( 3)]: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 3 4 2 The estimate for the intensity image reconstructed from events IðtÞ is given by Eq. ( 4).It consists of the log intensity of the previous frame updated with the sum of the contribution of the N events occurring between frames I j (time t j ) and I jþ1 (time t jþ1 ) and transformed back to linear intensities using the exponential.
E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 6 ; 7 1 1 To estimate the optimal contribution of each event, we use the method proposed by Ref. 23.This method starts from the assumption that in each pixel the difference in log intensity ΔL E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 6 3 6 should correspond to the integrated contribution of all events between the two frames.
E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 6 ; 5 9 2 Assuming independent and normally distributed errors with zero mean, one can estimate a global spatially invariant event contribution (per polarity) by solving E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 1 1 6 ; 5 1 7 min x kAx − bk 2 2 ; where the rows of matrix A denote the M pixels in the image for which events occurred between time t j and time t jþ1 : With the i'th location i ∈ ½0; M, u i ¼ ðx i ; y i Þ T , and for the N events fe k jt j < t k < t jþ1 g with respective position u k E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 1 1 6 ; 3 4 8 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 0 ; 1 1 6 ; 2 8 6 and

Background Reconstruction
The background reconstruction is shown in Fig. 3, and it implements (in software) an iterative back projection (IBP) scheme.The IBP aims at iteratively updating the estimate of the background B n−1 with the residue R n of each new frame I n .To compute the residue between the previous background estimate B n−1 and a new frame I n , one needs to correct for the local motion and transform the previous estimate to B n−1 to the current recorded frame I n using the warp The warp is a grid transformation derived from Φ n ðB n−1 ; I n Þ the dense optical flow map, which is updated with each new frame I n based on Farnebäck. 29The residue is then projected back to the background reference space using the inverse warp to update the estimate according to the weighting factor α. E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 2 ; 1 1 6 ; 5 5 7 To start the process, one needs to pick an initial state.For our test setup, we pick the initial background image as the average of the first eight recorded intensity frames.The frame rate used to record the intensity frames depends on the exposure time chosen by the camera auto exposure function as this provides the best image quality with the given scene illumination (see Table 2 for the intensity frame integration time for each dataset).
As shown in Fig. 4, this setup allows for a comparison of different options to inject events in the process by: i.Using events to generate frames at a higher temporal sampling than the native recorded frames.The fast temporal variations contained in the event stream and which are for classical frames integrated (if happening during the integration time) or lost (if happening during blind time) can improve the final image.We generate I Ã n from events using a fixed period (1 ms) such that multiple IBP loops are run until the next intensity frame.
E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 3 ; 1 1 6 ; 3 8 1 ii.Using events to directly find lucky zones in an intensity frame.We use the event stream to quantify the luckiness of each pixel in each intensity frame.Assuming no camera motion, a given image location produces events if it contains an edge and if intensity variations induced by turbulence occur.For the image locations that are producing events, the pixel value of the intensity frame for which no event occurred during the frame integration is considered to have a higher chance of being lucky (sharp without motion) than blurred.
Fig. 3 Overview of the background reconstruction.
Fig. 4 Two options to integrate the events in the reconstruction process.Top left in orange, using frames recreated from events instead of recorded intensity frames, right in green using events to restrict background updates to zones that did not change (no events were emitted) during the frame integration time.We expect that a lucky patch has a temporarily flat (or stationary) wavefront distortion, with a temporal first derivative that is also small.To remove the updates of pixels with a likelihood of being disturbed by varying blur or motion, we filter out the updates at the corresponding locations [Eq.(15)].Therefore the update equation becomes E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 4 ; 1 1 6 ; 6 8 7 With the event filter E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 5 ; 1 1 6 ; 6 3 6

Features for Moving Object Detection
The events produced by the camera are created by the intensity changes caused by image motion of contrasted edges.The field of action recognition from the event stream offers a selection of features that aim at capturing the spatiotemporal behavior of an edge; see Ref. 30.The features are derived from the image of the time tag Tðu k ; t k Þ (also called image of time surface) of the last event e k at location u k ¼ ðx k ; y k Þ T with time stamp t k (and independently from its polarity): ; t e m p : i n t r a l i n k -; e 0 1 6 ; 1 1 6 ; 4 9 3 As shown in Fig. 5(a), for a rigid body motion due to a moving object (or the own camera motion), the edge travels through the field of view producing events at the same rate along the entire edge.The edge contrast does not change over time or space, and we expect few variations in the event production rate spatially (along the edge) and temporally.
In the case of atmospheric turbulence, in Fig. 5(b), a static edge exhibits motion that is centered around the actual edge position and has a local random direction.The edge contrast varies randomly in space and time due to the variation of the refractive index.We expect bigger variations in the event production rate.
To distinguish between turbulence and rigid body motion, we update for each new event the image of the time tag Tðu k ; t k Þ, and we evaluate two simple features: • The time difference between the new event and the time tag of the previous event at that location.
E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 7 ; 1 1 6 ; 3 0 3 The time difference depends on the speed (component in the edge gradient direction) and the actual contrast of the moving edge, and therefore the distribution of dt k produced by rigid body motion is expected to be compact and centered around the dt characterized by the moving object average speed.
i. To take the spatial variations of the motion into account, we also analyze the gradient of the time surface.
E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 8 ; 1 1 6 ; 6 7 9 Due to the random local motion orientation in turbulence, the distribution of ∇T k is expected to be broader for turbulent motion than for rigid body motion.
The ability of those two features to distinguish between turbulence and moving object is investigated in Sec. 6.

Moving Object Classification
As shown in Fig. 6, the distinction between background and a moving object is implemented using a binary mask that indicates the pixels corresponding to the moving object.In our algorithm, a coarse mask is computed by splitting the intensity frame into subblocks (8 × 8 pixels).For each subblock, the algorithm counts the proportion of pixels in that frame for which the event-based feature [Eq.(17) or Eq. ( 18)] is below a predefined threshold.However, creating a mask based solely on event statistics would only give information about the edges of the moving object.As no events are triggered by flat surfaces of the moving object, one needs to propagate for each subblock the belief of being part of the moving object.To do that, we also compare the last background estimate with the current frame and compute the block-wise mean-squarederror (MSE) between the two images.The deviation is compared against the distribution of the error between the previous pairs of background and corresponding intensity frame that were affected by turbulent motion only.The algorithm thresholds the block-wise MSE image at N standard deviations (usually between 3 and 5) to create the map of candidate blocks.The mask derived from event statistics and the mask derived from the error between the background and the current frame are combined using a region growing algorithm.Starting from a seed (the event-derived mask), the algorithm iteratively integrates neighboring blocks if they are marked in the error mask.This strategy makes it possible to minimize the amount of false positive (MSE outliers due to strong turbulence incorrectly classified as the moving object) and false negative (flat zones of the moving object incorrectly classified as background).

Moving Object Reconstruction
As shown in Fig. 7, to reconstruct the appearance, we first estimate the object velocity and subsequently use this to remove the motion blur on the object.The contrast maximization framework is a global approach that was developed for the estimation of the own camera motion.It relies, for a given set of events, on the maximization of contrast of an image of warped events (see Ref. 31 for details).The warping here consists of transforming a set of N events fe k g that were recorded during the frame integration time having the time stamp t k and position u k in the image into the set of warped events fe 0 k g corresponding to a reference time t ref such that u 0 k ðθÞ ¼ Hðu k ; t k − t ref ; θÞ, where H is the warping operator and θ is the velocity parameter.An image of the warped events LðθÞ is created by integrating the polarity p k of each event at its warped position: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 9 ; 1 1 6 ; 3 8 6 LðθÞ ¼ Iterative nonlinear optimization algorithms are then used to solve the contrast maximization problem E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 0 ; 1 1 6 ; 3 1 1 max θ VarðLðθÞÞ; (20)   and find the optimal velocity parameter θ.Here, we apply this algorithm on the set of events fe k g that were classified as being produced by a moving object.The camera is considered to be static and the object motion to be linear in a direction different from the camera optical axis.This reduces the set of motion parameters to a two-dimensional velocity vector θ ¼ ðv x ; v y Þ T that we estimate by solving Eq. ( 20).
To create the moving object image O n , first, a filtered image J n is created by subtracting each event contribution from the log intensity frame using the event original location [Eq.(21)].
E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 1 ; 1 1 6 ; 1 8 6 Then, with the estimated object velocity θ, all events of the moving object are warped to the new location u 0 k ð θÞ that corresponds to a reference time chosen during the frame integration (usually the frame mid-exposure time).The moving object image O n is created by adding each event contribution to the base image J n using that warped location u 0 k ð θÞ and by exponentiating the result to transform back to linear intensities [Eqs.( 22) and ( 23)].E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 2 ; 1 1 6 ; 7 3 5 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 3 ; 1 1 6 ; 6 8 3 In a final step, the turbulence corrected background and the motion corrected object appearance are then combined using the coarse (binary) mask.The final image replaces the background with the corrected moving object appearance in the zones marked in the mask.

Experimental Setup
For the research reported here, we used the DAVIS346 dynamic vision sensor from iniVation. 32able 1 shows the specification of this camera.
To collect test material, we recorded video sequences through man-made, indoor turbulence.The experimental setup of this recordings is shown in Fig. 8.We produced turbulent air flows with a hot plate placed close to the camera and placed a flat test chart with enough contrasted content at a distance of about 4 m.To create footages with moving objects, we used a toy friction bus and a self-propelled train to pass next to the chart and generate motion.The footages were recorded with the Avenir lens 16 to 160 mm f∕2 at 100 and 160 mm.The lens was focused using a Siemens star target placed in the object plane before recording.The turbulent air-flow pattern produced by the hot plate provides a reasonable approximation of operationally relevant turbulence.The resulting magnitudes of shifts and blurs in the intensity images as well as their correlation length scales appear consistent with previously recorded data in a field trial. 33e measured the average and standard deviation of the optical flow magnitude computed between the average intensity frame and each single intensity frame, Table 2 summarizes the results.
Even though we notice some variations in the turbulence magnitude, the variation of the event rate mainly depends on the contrast threshold used for the experiment.To record enough events, we set the contrast threshold as low as possible such that, for the sequences "Escher short" and "Escher train" and "Siemens star," the event rate was above 100 kevt∕s while minimizing the  amount of noise (event triggered on the flat surface).The latter could only be assessed visually using the DV Viewer preview and not quantitatively.In a future study, we will investigate the tradeoff between event stream sensitivity and SNR.In the case of a moving object being used in the footage, the event rate and background motion are measured on the frames that do not contain the moving object.

Background Reconstruction
In our first set of experiments, we evaluate the quality of the reconstructed static background image.Using the setup described in Sec. 5, we imaged a fixed Siemens star, and no moving objects were used during the recording of the sequence.By switching off the hot plate, we could record the same target without turbulence, and a ground truth image was created from the average of all intensity frames recorded without turbulence.The footage with turbulence was processed using the background reconstruction algorithm variants described in Sec. 4. We used the PSNR between the ground truth and the output of each reconstruction method.We compared four variants: a. mean, simply averaging the intensity frame b. "ibp_frame," IBP using the recorded intensity frames [Eq.( 12)] c. "ibp_evt," IBP using intensity frames reconstructed from events at 1 kHz [Eq.( 13)] d. "ibp_noevt," IBP using the recorded intensity frames, while canceling updates for locations with events during each frame integration Eq. ( 14).
For each method, we logged the PSNR over time, and Fig. 9 shows its evolution.The green dashed line shows the PSNR for each intensity frame.It is worth noting that the random quality variation of the raw frames over time.Averaging the raw frames (turquoise cross line) leads to a convergence of the quality after 10 to 12 frames.It prevents suffering from the temporarily significant quality drops but also prevents benefiting from the lucky frames.The reconstruction using IBP (pink dotted, blue diamond, and yellow continuous lines) shows a significant performance improvement for most frames.Surprisingly, as the pink dotted line shows, removing updates (α ¼ 0) for locations for which at least one event occurred during integration (which is thus supposed to be degraded) does not show a significant benefit over not using this residue selection (blue diamond line).Finally, creating virtual high-speed frames from the event stream at a high frame rate of 1000 fps (yellow continuous line) exhibits the best performance in all frames.
Figure 10 shows a side-by-side comparison of a central 100 × 100 pixel crop along with an intensity profile of the resulting image after 16 frames, and Fig. 11 plots the corresponding radial relative contrast result for each method.As expected, a simple averaging produces an image blurred by the accumulation of local shifts.The evolution of the relative contrast also shows the strong blur for the finer details.Using IBP with an event filter (not updating zones with events) tends to be counterproductive for a resolution gain.Indeed, regions with finer details produce events more frequently such that few or no updates can be done in the center region and the estimate remains at the initial value.
We also compared the result of the background reconstruction on a textured chart.Figure 12 shows the results for the Escher footage after 16 frames.When comparing the mean [Fig.12(a)] with ibp_frame [Fig.12(b)] and ibp_evt, [Fig.12(c)], we notice again the amount of blur corrected by the registration step.When comparing ibp_frame with ibp evt, we also notice a gain on the local contrast (visible on the windows) and on the resolving power (visible on the field texture).
Using IBP on frames recreated from events instead of using the recorded intensity frames shows the highest performance.Even with a simple approach like the direct event integration  based on a global contribution, we were able to transform the event stream into high-speed frames containing valuable information for the reconstruction.
When measuring on Siemens star, the IBP output shows a better PSNR and a higher resolving power when using the frames created from events.This gain is also confirmed on a target with texture variations.It shows that the information of the short time scale variations contained in the event stream outweighs the disadvantage of the imperfectly localized and noisy contribution.

Moving Object Segmentation
Next, we evaluated the moving object segmentation approach.In particular, we investigate a. whether the event stream would allow us to distinguish between the moving object and turbulence using event-based temporal features only [Eqs.(17) and ( 18)] b. the dependence of the moving object detection accuracy on object velocity.
Using our experimental setup, we recorded different moving objects (Fig. 13) passing in front of the test chart while having the heater produce turbulence.Over the entire footage, we measured for each pixel location u the minimum value of the time interval between events dt k ðuÞ and the minimum (non-zero) magnitude of the time surface gradient k∇T k ðuÞk.Figure 14 compares the histogram between the bottom part of the image, where a moving object generated events, and the top part, where motion originates only from turbulence.
For the first feature min k ðdt k ðuÞÞ, we see (left column) that the fast-moving object (top) generates a compact distribution around 1 ms, whereas turbulence generates a relatively wide distribution between 10 and 100 ms.The corresponding min k ðk∇T k ðuÞkÞ (top right) has a wider distribution for the moving object centered around 4 ms, whereas the turbulence generates nearly no events with min k ðk∇T k ðuÞkÞ below 10 ms.For a fast-moving object, both features appear  highly discriminative.On the bottom row events produced by a slow-moving object may not be distinguishable from events produced by the turbulence when using only min k ðdt k ðuÞÞ.One can notice that the recording has a much higher event rate (partially due to a lower contrast threshold, which tends to be more sensitive and generate events more frequently).As visible in the bottom right part of the figure, min k ðk∇T k ðuÞkÞ represents a more reliable metric to distinguish between a moving object and turbulence.The turbulence and moving object distribution overlap significantly.The distribution for the moving object is more compact and centered around 12 ms, whereas the turbulence spreads over a wider range and has a peak near 18 ms.
This experiment shows that both features allow for distinguishing between a moving object and turbulence when the moving object is moving fast relative to the turbulence speed.Nevertheless min k ðk∇T k ðuÞkÞ performs better than min k ðdt k ðuÞÞ when the object motion has a similar magnitude as the turbulence motion and confirms that the local random motion orientation is captured by the events and can be used to distinguish between the two types of motion.Therefore we decided to use a moving object classifier based solely on min k ðk∇T k ðuÞkÞ.The experiment also shows that the latter is still prone to misclassification when the object speed differs less from the turbulent motion.This misclassification can only be avoided by integrating appearance-based features.

Comparison with State-of-the-Art Methods
Finally, we compared the output of the pipeline with the state-of-the-art methods described in Nieuwenhuizen, 8 Oreifej et al., 6 Anantrasirichai, 5 and Halder. 14We processed the intensity frames from our recorded sequences with the frame-based state-of-the-art methods (using the 2× super resolution mode for the approach from Nieuwenhuizen et al. 8 ).We then compared the results with the ones produced with the proposed event-based mitigation using the same frames and the event stream.Figure 15 shows the comparison of the outputs on three different sequences with crops (on the bottom of each frame) made at various locations in each image (white box).
Fig. 15 Comparison with state-of-the-art methods on three recorded image sequences.From left to right, Nieuwenhuizen et al., 8 Oreifej et al., 6 Anantrasirichai et al., 5 Halder et al., 14 and eventbased mitigation (proposed).For each recording, the white squares mark the position of the detail crops shown at the bottom.
The comparison above shows that the approach from Anantrasirichai et al. 5 provides the perceptually sharpest but also noisiest reconstruction of the static background.The proposed event-based processing ranks second in terms of sharpness, but resolves similar levels of detail.This can be observed on the stripe textures on the fields in the background of the first two sequences or on the preservation of the square window shapes on the control tower in the bottom left cutout of the bottom sequence.In most sequences, the described approach thus manages to use the lucky information contained in the event stream to produce an output comparable to the state-of-the-art methods in the static parts of the scene.
For slow-moving objects with little texture such as the Escher train sequence, the incomplete moving object segmentation from the event-based mitigation performs worse than Nieuwenhuizen et al., 8 Oreifej et al. 6 and Anantrasirichai et al. 5 However, for the fast-moving bus, the event-based mitigation provides a similarly complete segmentation of the bus as Oreifej et al. 6 and Anantrasirichai et al., 5 whereas the approach from Nieuwenhuizen et al. 8 exhibits mixing of foreground and background on the leading edge of the bus and background deformation above the bus in some cases.When comparing corrected frames, one can also observe the benefit of using events for the moving object reconstruction.By warping the events at a reference time, we were able to correct for the motion blur to refine the object boundaries and to enhance the edges of the moving object, which is visible, for instance, on the top bus sequence.
To quantify the performance of each method, we generated a ground truth image of the background using the footage recorded without turbulence, with the hot plate turned off.The ground truth image is used to compute the peak signal-to-noise ratio (PSNR) 34 and the structural similarity index measure (SSIM) 35 of the registered corrected frames produced by each method.The results are summarized in Table 3, and they show that the proposed method provides a similar quality as Anantrasirichai et al. 5 and Halder et al. 14 To assess the improvement in resolving power that the turbulence mitigation algorithm attempts to deliver, we also computed the ratio, expressed here as a gain (in dB), between the power spectrum density (PSD) of the frames produced by each method and the PSD of the ground truth image.Figure 16 shows for each sequence the comparison of the frequencydependent gain of each method when compared with the ground truth image.Table 4 summarizes these results across frames by reporting on the gain at Nyquist frequency (0.5 cycles per  pixel), which can be seen as a measure sensitive to changes in resolving power.Figure 16 and Table 4 provide confirmation for our qualitative observation that our method has a higher resolving power than Halder 14 and Oreifej et al. 6 The approaches of Nieuwenhuizen et al. 8 and Anantrasirichai et al. 5 achieve higher PSDs.However, their positive gain indicates that the PSD is higher than the ground truth.This implies that they apply excess sharpening and therefore amplify the noise, without necessarily increasing the resolving power.
Even though the experiment shows that the event stream is useful for static background reconstruction, the main advantage resides in the moving object reconstruction.First, it helps in the segmentation of the moving object without being limited by the comparison of two integrated frames such as in an optical flow-based algorithm.Second, it improves the reconstruction of the moving object appearance and the refinement of its boundaries, which can provide important information in an operational situation.
6 Conclusions, Discussion, and Future Work In this paper, we explored some of the advantages of the event camera for turbulence mitigation.First, we showed that the event stream contains information that may be used to reconstruct a fixed background disturbed by turbulence.When compared with integrated intensity frames, the event stream encodes high frequent variations that allow for a faster convergence of the image reconstruction toward a best estimate having a quality that is comparable to the state-of-the-art method on static scene elements.Then, we used the high temporal resolution of the sensor to build a motion signature and distinguish the fixed background disturbed by turbulence from an object moving in the same scene.The event stream carries a finer description of motion than integrated intensity frames.This allowed us to build accurate moving object masks without computing the optical flow between frames, being only limited by the presence of contrasted moving edges.Finally, we used the event stream to reconstruct the appearance of a moving object from a motion blurred frame.The three different aspects were combined into a processing pipeline.With an indoor experiment, the processing showed improved image reconstruction of moving objects through turbulence when compared with state-of-the-art methods and competitive performance on the static background.
This first study shows the strong potential of event cameras for turbulence mitigation.In future work, we will focus on collecting data with real turbulence and on improving the robustness of the method for variable scenarios.Unlike conventional cameras, which having automatic controls for setting the proper exposure, focus, and white balance, event cameras still lack algorithms to automatically select the best set of settings to record a given scene.We will also search for the best tradeoffs between event collection (and processing) and the final output quality to provide real-time processing.While this paper focused on application scenarios for which the camera was static, in the future we will aim at assessing the potential of the camera for scenarios with the camera moving, possibly under strong motion (vibration and high speed).
Finally, recent research shows increasing interest in deep learning for processing data from event cameras.By learning a richer imaging model, these new methods outperform classical approaches to recreate high-quality video from event streams.This axis of research will be a main topic for the next improvement on turbulence mitigation.

Fig. 1
Fig.1Pixel operation of an event camera.From top to bottom, the log intensity received by a pixel over time, the corresponding log intensity variations measured by the pixel, and the resulting events emitted by the pixel at instances in which the log intensity variations exceed the threshold values.

Fig. 2
Fig.2Image reconstruction from events, (a) the intensity frames and the events and (b) the integration of each event contribution to reconstruct the image at time t .

Fig. 5
Fig. 5 (a) Rigid body versus (b) turbulent motion, motion in black, blur level in orange circle.

Fig. 6
Fig.6Overview of the moving object mask computation.

Fig. 7
Fig. 7 Overview of the moving object reconstruction.

Fig. 9
Fig. 9 Evolution of the PSNR between ground truth and different background reconstruction approaches.

Fig. 12
Fig. 12 Background reconstruction result after 16 frames on a textured chart [(a)-(c): mean, ibp_frame, ibp_evt].For each figure, the white squares mark the position of the detail crops shown at the bottom.

Fig. 13
Fig. 13 Intensity frames from two sequences of a (a) fast bus and (b) slow train imaged through turbulence.

Fig. 16
Fig. 16 Frequency gain comparison for the state-of-the-art methods on three recorded image sequences: (a) Escher bus, (b) Airport bus, and (c) Escher train.
16, visual odometry such as in Ref. 17, 18, and 19, 3D reconstruction such as in Ref. 20, this new paradigm shows advantages for dynamic scene, see Ref. 21 for an extensive overview.

Table 1
Specifications of the DAVIS346 dynamic vision sensor of iniVation.

Table 2
Summary of the recorded dataset.Boehrer, Nieuwenhuizen, and Dijk: Turbulence mitigation in imagery including moving objects. . .
Note:The bold values are the best performing methods for these images.

Table 4
14in at Nyquist frequency comparison between the methods of Nieuwenhuizen et al.8(N), Oreifej et al.6(O), Anantrasirichai et al.5(A), Halder et al.14(H), and our proposed event-based mitigation.The bold values are the best performing methods for these images.Boehrer, Nieuwenhuizen, and Dijk: Turbulence mitigation in imagery including moving objects. . .